Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Light support for postgres expression indexes #13684

Open
wants to merge 6 commits into from

9 participants

@jbaudanza

This PR allows you to generate indexes that are wrapped in a SQL function. For example:

add_index :users, :email, unique: true, function: 'lower'

will generate:

CREATE UNIQUE INDEX index_users_on_lower_email_index ON users (LOWER(email))

This allows you to index for the common postgres use case:

user = User.where('LOWER(email) = LOWER(?)', params[:email]).first
if user && user.authenticate(params[:password])
...

The schema dumper will currently only dump functions on single column indexes, but I can extend to support composite indexes if there is interest.

This PR only allows for simple expressions such as lower(email). To use more complex expressions, it's probably better to use the SQL schema dumper.

EDIT

I've updated the API to support functions on multiple columns. For example:

add_index :users, [:first_name, :last_name, :active], functions: ['lower', 'lower', nil]
activerecord/lib/active_record/schema_dumper.rb
@@ -198,11 +198,10 @@ def indexes(table, stream)
index_orders = (index.orders || {})
statement_parts << ('order: ' + index.orders.inspect) unless index_orders.empty?
- statement_parts << ('where: ' + index.where.inspect) if index.where
-
- statement_parts << ('using: ' + index.using.inspect) if index.using
-
- statement_parts << ('type: ' + index.type.inspect) if index.type
+ %w(where using type function).each do |attr|
+ value = index.send(attr)

can we use public_send here.

Sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@hgmnz

If I may, I would like to suggest a more flexible syntax that allows developers to really make use of expression indexes. How about instead of

add_index :users, :email, unique: true, function: 'lower'

we had

add_index :users, 'lower(email)', unique: true

I think that's clearer (know exactly what you're indexing) and more flexible (multi-column, non-function indexes). For example:

add_index :users, "first_name ' || ' last_name"
@jbaudanza

@hgmnz I went back and forth on this myself. Ideally, I would love a way to create arbitrary expression indexes in postgres.

I settled on this simpler syntax because I wanted the feature to degrade gracefully when you use a database that doesn't support expression indexes.

For example, say you are using sqlite in development and psql in production. This isn't my cup of tea, but I know a lot of people do this.

In your schema.rb:

add_index :users, :email, unique: true, function: 'lower'

When running this on sqlite, it will skip the lower part and create a normal unique index on users.email. Since indexes are case insensitive on sqlite, it will behave the roughly the same as your production psql database.

The ActiveRecord index creation API is pretty tightly coupled with the concept of creating indexes on lists of columns. Trying to inject arbitrary expressions instead of columns is going to be awkward.

I'm willing to be convinced otherwise though, because this would be a great feature to have.

@senny senny added the PostgreSQL label
@senny senny modified the milestone: 4.1.0, 4.2.0
...d/connection_adapters/postgresql/schema_statements.rb
((19 lines not shown))
- unless column_names.empty?
+ if index_supported && !column_names.empty?
@tenderlove Owner

Change this to column_names.any? please.

Changed in e377d06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@tenderlove
Owner

Looks good! Change that one thing I commented on and I'll merge.

@tenderlove
Owner

When running this on sqlite, it will skip the lower part and create a normal unique index on users.email. Since indexes are case insensitive on sqlite, it will behave the roughly the same as your production psql database.

This seems really bad. I'd rather the database blows up if we try to use a function that doesn't actually work on that database.

@matthewd
Collaborator

I think I'd prefer something closer to @hgmnz's suggestion. The parameter should probably be more explicit that it's raw SQL though:

add_index :users, Arel.sql('lower(email)'), unique: true

The above does seem likely to pose an issue when generating the index name... maybe we just refuse to try, and insist one is provided?

I'm making no claim as to how easy this would be :smile:

@jbaudanza

@tenderlove This PR follows the same pattern as the psql partial index support. For example:

add_index :widgets, :sprockets_id, where: '!disabled'

The above statement will create a partial index in psql, but the sqlite driver will silently drop the where clause. I think this graceful degradation is OK because it doesn't change the application functionality, only its performance. The same is true of the expression index functionality in this PR.

To me, the ruby schema dumper should "just work" for any database environment. If you want platform specific, non-portable features, you have the SQL schema dumper available to you instead.

If we decide to move away from the graceful degradation behavior, then it probably makes more sense to rewrite this to use the syntax that @hgmnz suggests.

@matthewd
Collaborator

Well, ending up with a full index when you expected a partial one might take up a bunch more space, but the queries that wanted it will (generally) still be able to use it.

That's not true for an expression index that's suddenly not indexing the intended expression.

More generally, I'm not sure how strongly we can endorse the idea that it's "working" if the schema can't round-trip.


I think it seems pretty reasonable that if you start using DB-specific features, you're going to end up with a DB-specific schema file -- even before you switch to the SQL dump. Consider the PG-specific column types, for example.

It seems more confusing to find that your PG-specific schema file will load on sqlite3 (because the feature is ignored), yet fails on Oracle (say), because the feature's supported, but that particular function doesn't exist / has a different name.

Similarly, what happens if sqlite3 gains the ability to index expressions?

@jbaudanza

@matthewd That's a good point re: PG-specific datatypes. Now I'm starting to sway the other way.

On an implementation note, there is a lot of code that assumes that the second parameter to add_index is an array of column names. This might take a bit of work to change. It would be good to get some input on the proposed syntax from a maintainer before I dig too deep into a refactoring.

For some background, I'm trying to cover the case where an application needs to do a case insensitive query on a username or email address, which I think is something that nearly every Rails app running on postgres will have to do. I tried to optimize the API for that use-case.

@rafaelfranca rafaelfranca modified the milestone: 4.2.0, 5.0.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 11, 2014
  1. @jbaudanza
  2. @jbaudanza
Commits on Jan 12, 2014
  1. @jbaudanza
  2. @jbaudanza
Commits on Jan 16, 2014
  1. @jbaudanza
Commits on Mar 28, 2014
  1. @jbaudanza

    simplify boolean logic

    jbaudanza authored
This page is out of date. Refresh to see the latest.
View
2  activerecord/lib/active_record/connection_adapters/abstract/schema_definitions.rb
@@ -8,7 +8,7 @@ module ConnectionAdapters #:nodoc:
# Abstract representation of an index definition on a table. Instances of
# this type are typically created and returned by methods in database
# adapters. e.g. ActiveRecord::ConnectionAdapters::AbstractMysqlAdapter#indexes
- class IndexDefinition < Struct.new(:table, :name, :unique, :columns, :lengths, :orders, :where, :type, :using) #:nodoc:
+ class IndexDefinition < Struct.new(:table, :name, :unique, :columns, :lengths, :orders, :where, :type, :using, :functions) #:nodoc:
end
# Abstract representation of a column definition. Instances of this type
View
20 activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb
@@ -763,7 +763,23 @@ def quoted_columns_for_index(column_names, options = {})
option_strings = add_index_sort_order(option_strings, column_names, options)
end
- column_names.map {|name| quote_column_name(name) + option_strings[name]}
+ if options[:functions]
+ functions = Array(options[:functions])
+ if functions.length != column_names.length
+ raise(ArgumentError,
+ "The number of functions must match the number of column names")
+ end
+ end
+
+ column_names.each_with_index.map do |name, index|
+ quoted = quote_column_name(name)
+
+ if supports_index_functions? && functions
+ quoted = "#{functions[index]}(#{quoted})"
+ end
+
+ quoted + option_strings[name]
+ end
end
def options_include_default?(options)
@@ -774,7 +790,7 @@ def add_index_options(table_name, column_name, options = {})
column_names = Array(column_name)
index_name = index_name(table_name, column: column_names)
- options.assert_valid_keys(:unique, :order, :name, :where, :length, :internal, :using, :algorithm, :type)
+ options.assert_valid_keys(:unique, :order, :name, :where, :length, :internal, :using, :algorithm, :type, :functions)
index_type = options[:unique] ? "UNIQUE" : ""
index_type = options[:type].to_s if options.key?(:type)
View
5 activerecord/lib/active_record/connection_adapters/abstract_adapter.rb
@@ -202,6 +202,11 @@ def supports_partial_index?
false
end
+ # Can this adapter apply a SQL function to values before they are indexed?
+ def supports_index_functions?
+ false
+ end
+
# Does this adapter support explain? As of this writing sqlite3,
# mysql2, and postgresql are the only ones that do.
def supports_explain?
View
72 activerecord/lib/active_record/connection_adapters/postgresql/schema_statements.rb
@@ -128,15 +128,18 @@ def schema_exists?(name)
# Returns an array of indexes for the given table.
def indexes(table_name, name = nil)
- result = query(<<-SQL, 'SCHEMA')
- SELECT distinct i.relname, d.indisunique, d.indkey, pg_get_indexdef(d.indexrelid), t.oid
- FROM pg_class t
- INNER JOIN pg_index d ON t.oid = d.indrelid
- INNER JOIN pg_class i ON d.indexrelid = i.oid
- WHERE i.relkind = 'i'
- AND d.indisprimary = 'f'
- AND t.relname = '#{table_name}'
- AND i.relnamespace IN (SELECT oid FROM pg_namespace WHERE nspname = ANY (current_schemas(false)) )
+ result = query(<<-SQL, 'SCHEMA')
+ SELECT distinct i.relname, d.indisunique, d.indkey,
+ pg_get_expr(d.indexprs, t.oid), pg_get_expr(d.indpred, t.oid),
+ pg_get_indexdef(d.indexrelid), t.oid, m.amname, m.amcanorder
+ FROM pg_class t
+ INNER JOIN pg_index d ON t.oid = d.indrelid
+ INNER JOIN pg_class i ON d.indexrelid = i.oid
+ INNER JOIN pg_am m ON i.relam = m.oid
+ WHERE i.relkind = 'i'
+ AND d.indisprimary = 'f'
+ AND t.relname = '#{table_name}'
+ AND i.relnamespace IN (SELECT oid FROM pg_namespace WHERE nspname = ANY (current_schemas(false)) )
ORDER BY i.relname
SQL
@@ -144,8 +147,34 @@ def indexes(table_name, name = nil)
index_name = row[0]
unique = row[1] == 't'
indkey = row[2].split(" ")
- inddef = row[3]
- oid = row[4]
+ expression = row[3]
+ where = row[4]
+ inddef = row[5]
+ oid = row[6]
+ using = row[7].to_sym
+
+ index_supported = true
+
+ function_names = []
+ function_columns = []
+
+ if expression
+ expression.split(',').each do |part|
+ part.strip!
+
+ # Matches "lower((column)::text)" or "lower(column)"
+ if part =~ /\A(\w+)\(\((\w+)\)::[\w\s]+\)\Z/ ||
+ part =~ /\A(\w+)\((\w+)\)\Z/
+
+ function_names << $~[1]
+ function_columns << $~[2]
+ else
+ # Complex expression indexes aren't supported
+ index_supported = false
+ break
+ end
+ end
+ end
columns = Hash[query(<<-SQL, "SCHEMA")]
SELECT a.attnum, a.attname
@@ -154,16 +183,27 @@ def indexes(table_name, name = nil)
AND a.attnum IN (#{indkey.join(",")})
SQL
- column_names = columns.values_at(*indkey).compact
+ column_names = []
+ functions = []
+
+ indkey.each do |i|
+ if i == '0'
+ functions << function_names.shift
+ column_names << function_columns.shift
+ else
+ functions << nil
+ column_names << columns[i]
+ end
+ end
+
+ functions = nil if functions.compact.empty?
- unless column_names.empty?
+ if index_supported && column_names.any?
# add info on sort order for columns (only desc order is explicitly specified, asc is the default)
desc_order_columns = inddef.scan(/(\w+) DESC/).flatten
orders = desc_order_columns.any? ? Hash[desc_order_columns.map {|order_column| [order_column, :desc]}] : {}
- where = inddef.scan(/WHERE (.+)$/).flatten[0]
- using = inddef.scan(/USING (.+?) /).flatten[0].to_sym
- IndexDefinition.new(table_name, index_name, unique, column_names, [], orders, where, nil, using)
+ IndexDefinition.new(table_name, index_name, unique, column_names, [], orders, where, nil, using, functions)
end
end.compact
end
View
4 activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb
@@ -483,6 +483,10 @@ def supports_partial_index?
true
end
+ def supports_index_functions?
+ true
+ end
+
def supports_transaction_isolation?
true
end
View
9 activerecord/lib/active_record/schema_dumper.rb
@@ -198,11 +198,10 @@ def indexes(table, stream)
index_orders = (index.orders || {})
statement_parts << ('order: ' + index.orders.inspect) unless index_orders.empty?
- statement_parts << ('where: ' + index.where.inspect) if index.where
-
- statement_parts << ('using: ' + index.using.inspect) if index.using
-
- statement_parts << ('type: ' + index.type.inspect) if index.type
+ %w(where using type functions).each do |attr|
+ value = index.public_send(attr)
+ statement_parts << ("#{attr}: " + value.inspect) if value
+ end
' ' + statement_parts.join(', ')
end
View
28 activerecord/test/cases/adapters/postgresql/schema_test.rb
@@ -12,19 +12,24 @@ class SchemaTest < ActiveRecord::TestCase
INDEX_C_NAME = 'c_index_full_text_search'
INDEX_D_NAME = 'd_index_things_on_description_desc'
INDEX_E_NAME = 'e_index_things_on_name_vector'
+ INDEX_F_NAME = 'f_index_lower_name_on_things'
+ INDEX_G_NAME = 'g_index_composite_index_on_things'
INDEX_A_COLUMN = 'name'
INDEX_B_COLUMN_S1 = 'email'
INDEX_B_COLUMN_S2 = 'moment'
INDEX_C_COLUMN = %q{(to_tsvector('english', coalesce(things.name, '')))}
INDEX_D_COLUMN = 'description'
INDEX_E_COLUMN = 'name_vector'
+ INDEX_F_COLUMN = 'LOWER(name)'
+ INDEX_G_COLUMN = 'UPPER(name),active,LOWER(email)'
COLUMNS = [
'id integer',
'name character varying(50)',
'email character varying(50)',
'description character varying(100)',
'name_vector tsvector',
- 'moment timestamp without time zone default now()'
+ 'moment timestamp without time zone default now()',
+ 'active boolean default true'
]
PK_TABLE_NAME = 'table_with_pk'
UNMATCHED_SEQUENCE_NAME = 'unmatched_primary_key_default_value_seq'
@@ -56,8 +61,8 @@ def setup
@connection.execute "CREATE TABLE #{SCHEMA_NAME}.\"#{TABLE_NAME}.table\" (#{COLUMNS.join(',')})"
@connection.execute "CREATE TABLE #{SCHEMA_NAME}.\"#{CAPITALIZED_TABLE_NAME}\" (#{COLUMNS.join(',')})"
@connection.execute "CREATE SCHEMA #{SCHEMA2_NAME} CREATE TABLE #{TABLE_NAME} (#{COLUMNS.join(',')})"
- @connection.execute "CREATE INDEX #{INDEX_A_NAME} ON #{SCHEMA_NAME}.#{TABLE_NAME} USING btree (#{INDEX_A_COLUMN});"
- @connection.execute "CREATE INDEX #{INDEX_A_NAME} ON #{SCHEMA2_NAME}.#{TABLE_NAME} USING btree (#{INDEX_A_COLUMN});"
+ @connection.execute "CREATE INDEX #{INDEX_A_NAME} ON #{SCHEMA_NAME}.#{TABLE_NAME} USING btree (#{INDEX_A_COLUMN}) WHERE active;"
+ @connection.execute "CREATE INDEX #{INDEX_A_NAME} ON #{SCHEMA2_NAME}.#{TABLE_NAME} USING btree (#{INDEX_A_COLUMN}) WHERE active;"
@connection.execute "CREATE INDEX #{INDEX_B_NAME} ON #{SCHEMA_NAME}.#{TABLE_NAME} USING btree (#{INDEX_B_COLUMN_S1});"
@connection.execute "CREATE INDEX #{INDEX_B_NAME} ON #{SCHEMA2_NAME}.#{TABLE_NAME} USING btree (#{INDEX_B_COLUMN_S2});"
@connection.execute "CREATE INDEX #{INDEX_C_NAME} ON #{SCHEMA_NAME}.#{TABLE_NAME} USING gin (#{INDEX_C_COLUMN});"
@@ -66,6 +71,10 @@ def setup
@connection.execute "CREATE INDEX #{INDEX_D_NAME} ON #{SCHEMA2_NAME}.#{TABLE_NAME} USING btree (#{INDEX_D_COLUMN} DESC);"
@connection.execute "CREATE INDEX #{INDEX_E_NAME} ON #{SCHEMA_NAME}.#{TABLE_NAME} USING gin (#{INDEX_E_COLUMN});"
@connection.execute "CREATE INDEX #{INDEX_E_NAME} ON #{SCHEMA2_NAME}.#{TABLE_NAME} USING gin (#{INDEX_E_COLUMN});"
+ @connection.execute "CREATE INDEX #{INDEX_F_NAME} ON #{SCHEMA_NAME}.#{TABLE_NAME} USING btree (#{INDEX_F_COLUMN});"
+ @connection.execute "CREATE INDEX #{INDEX_F_NAME} ON #{SCHEMA2_NAME}.#{TABLE_NAME} USING btree (#{INDEX_F_COLUMN});"
+ @connection.execute "CREATE INDEX #{INDEX_G_NAME} ON #{SCHEMA_NAME}.#{TABLE_NAME} USING btree (#{INDEX_G_COLUMN});"
+ @connection.execute "CREATE INDEX #{INDEX_G_NAME} ON #{SCHEMA2_NAME}.#{TABLE_NAME} USING btree (#{INDEX_G_COLUMN});"
@connection.execute "CREATE TABLE #{SCHEMA_NAME}.#{PK_TABLE_NAME} (id serial primary key)"
@connection.execute "CREATE SEQUENCE #{SCHEMA_NAME}.#{UNMATCHED_SEQUENCE_NAME}"
@connection.execute "CREATE TABLE #{SCHEMA_NAME}.#{UNMATCHED_PK_TABLE_NAME} (id integer NOT NULL DEFAULT nextval('#{SCHEMA_NAME}.#{UNMATCHED_SEQUENCE_NAME}'::regclass), CONSTRAINT unmatched_pkey PRIMARY KEY (id))"
@@ -352,13 +361,23 @@ def with_schema_search_path(schema_search_path)
def do_dump_index_tests_for_schema(this_schema_name, first_index_column_name, second_index_column_name, third_index_column_name, fourth_index_column_name)
with_schema_search_path(this_schema_name) do
indexes = @connection.indexes(TABLE_NAME).sort_by {|i| i.name}
- assert_equal 4,indexes.size
+ assert_equal 6,indexes.size
do_dump_index_assertions_for_one_index(indexes[0], INDEX_A_NAME, first_index_column_name)
do_dump_index_assertions_for_one_index(indexes[1], INDEX_B_NAME, second_index_column_name)
do_dump_index_assertions_for_one_index(indexes[2], INDEX_D_NAME, third_index_column_name)
do_dump_index_assertions_for_one_index(indexes[3], INDEX_E_NAME, fourth_index_column_name)
+ assert_equal 'active', indexes[0].where
+
+ assert_equal INDEX_F_NAME, indexes[4].name
+ assert_equal ['name'], indexes[4].columns
+ assert_equal ['lower'], indexes[4].functions
+
+ assert_equal INDEX_G_NAME, indexes[5].name
+ assert_equal ['name', 'active', 'email'], indexes[5].columns
+ assert_equal ['upper', nil, 'lower'], indexes[5].functions
+
indexes.select{|i| i.name != INDEX_E_NAME}.each do |index|
assert_equal :btree, index.using
end
@@ -372,5 +391,6 @@ def do_dump_index_assertions_for_one_index(this_index, this_index_name, this_ind
assert_equal 1, this_index.columns.size
assert_equal this_index_column, this_index.columns[0]
assert_equal this_index_name, this_index.name
+ assert_nil this_index.functions
end
end
View
24 activerecord/test/cases/migration/index_test.rb
@@ -193,6 +193,30 @@ def test_add_partial_index
connection.remove_index("testings", "last_name")
assert !connection.index_exists?("testings", "last_name")
end
+
+ def test_index_functions
+ connection.add_index("testings", "last_name", :functions => "lower")
+ assert connection.index_exists?("testings", "last_name")
+
+ connection.add_index("testings",
+ ["first_name", "last_name", "administrator"],
+ name: 'test_composite_index',
+ functions: ["lower", "upper", nil])
+
+ index = connection.indexes("testings").find do |i|
+ i.columns == ["first_name", "last_name", "administrator"]
+ end
+ assert_not_nil index
+ assert_equal ["lower", "upper", nil], index.functions
+
+ assert_raise ArgumentError do
+ # pass in an incorrect number of functions
+ connection.add_index :testings, :foo, unique: true, :functions => ["lower", "upper"]
+ end
+
+ connection.remove_index("testings", "last_name")
+ assert !connection.index_exists?("testings", "last_name")
+ end
end
private
View
11 activerecord/test/cases/schema_dumper_test.rb
@@ -195,6 +195,17 @@ def test_schema_dumps_partial_indices
end
end
+ def test_schema_dumps_index_functions
+ index_definition = standard_dump.split(/\n/).grep(/add_index.*company_lower_name_index/).first.strip
+ if current_adapter?(:PostgreSQLAdapter)
+ assert_equal 'add_index "companies", ["name"], name: "company_lower_name_index", using: :btree, functions: ["lower"]', index_definition
+ elsif current_adapter?(:MysqlAdapter) || current_adapter?(:Mysql2Adapter)
+ assert_equal 'add_index "companies", ["name"], name: "company_lower_name_index", using: :btree', index_definition
+ else
+ assert_equal 'add_index "companies", ["name"], name: "company_lower_name_index"', index_definition
+ end
+ end
+
def test_schema_dump_should_honor_nonstandard_primary_keys
output = standard_dump
match = output.match(%r{create_table "movies"(.*)do})
View
1  activerecord/test/schema/schema.rb
@@ -189,6 +189,7 @@ def create_table(*args, &block)
add_index :companies, [:firm_id, :type, :rating], name: "company_index"
add_index :companies, [:firm_id, :type], name: "company_partial_index", where: "rating > 10"
add_index :companies, :name, name: 'company_name_index', using: :btree
+ add_index :companies, :name, name: 'company_lower_name_index', using: :btree, functions: 'lower'
create_table :vegetables, force: true do |t|
t.string :name
Something went wrong with that request. Please try again.