-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for PostgreSQL operator classes to add_index #19090
Add support for PostgreSQL operator classes to add_index #19090
Conversation
There's a similar work at #18499 |
Is there a reason simply passing a string to |
It's great to see that Rails is improving in this area. I can see that my PR is smaller and more focused than the other PR. @sgrif If you do add_index(:users, :name, :using => 'gist (name gist_trgm_ops)') then the query is CREATE INDEX "index_users_on_name" ON "users" USING gist (name gist_trgm_ops) ("name" ) Please notice that the columns specified as the second argument to |
That seems like a bug. I'd rather just make sure it's possible to pass a string to using properly |
@sgrif that may be an option. There are some issues though. If we pass @sgrif Please tell me whether |
Yes, that is the syntax I had in mind. We can probably just drop the automatic arguments if you pass a string to using. |
What is nice about your approach is that it simpler. I can see one more issue. If I pass I don't know what is in greater alignment with core values of Rails: the ability to switch the database without breaking migrations or |
In this case, the simpler syntax and implementation. |
|
Sorry, I was imprecise. My question was: do we need the ability to switch a database to something other than PostgreSQL and still be able to run migrations or If the answer is No, we don't need that then your approach will be way simpler. |
No, we don't need that. |
Great! That simplifies a lot. I also came up with another syntax for this: # Allow opclasses to be specified in column_name.
# Currently the whole string is quoted and treated as a column name.
add_index :users, 'name gist_trgm_ops', using: :gist
# or expect columns and opclasses to appear in using: when
# it's specified as a string (no column_name in this case)
add_index :users, using: 'gist (name gist_trgm_ops)' The latter is what you suggested @sgrif, right? I think the first syntax is more compatible with what we currently have. The downside is that it breaks compatibility for people who use spaces in column names (are there any? 😄) Which do you think is better? |
I think it should be:
Simply so we can continue to have the column name for index naming purposes.. |
What if An option is to test What do you think? |
01b3518
to
4bd4a4f
Compare
I updated the PR with Sean's suggestions. I'd love to hear your feedback! |
@sgrif this seems like quite a perversion of the existing call syntax to me. 😕 Not to mention the danger of people supplying different column names in the two places... |
@matthewd what would you like to see? |
@matthewd, @sgrif, @cristianbica is there a chance we can make some progress on this? Thanks! |
I guess the spelling I'd consider to be most reflective of the underlying PostgreSQL syntax would be: add_index :users, [[:name, :gist_trgm_ops]], using: :gist
# or perhaps a more extensible:
add_index :users, [[:name, opclass: :gist_trgm_ops]], using: :gist .. which isn't particularly pretty... but may still be an improvement over how we currently do things? Otherwise, again in line with Note that even if we adopted my above suggestion of You seem to have done a slightly-too-good job of revising history here, so I can't actually see whether any of the above resembles how you had it before @sgrif suggested the current form. But I do feel that conflating the |
4bd4a4f
to
01b3518
Compare
@matthewd, I reverted the previous version of the code and the pull request message. The usage I implemented looked like: add_index :users, :name, using: :gist, opclasses: {name: :gist_trgm_ops} So this is in line with |
@grn I'm having this same issue but with Example: CREATE INDEX widget_name_search_idx ON widgets USING gin(to_tsvector('english', name)) The problem is the same here since you can pass a string to add_index :widgets, :name, name: 'widget_name_search_idx', using: "gin(to_tsvector('english', name))" |
I also have run into this using the to_tsvector function. I think that it would be good to allow for indexes to be created on eny expression. If we are going to allow any expression to be indexed, then I think that the syntax add_index :users, :name, using: "gist (name gist_trgm_ops)" has a few of issues. First, there may be times when you want to index an expression without using a custom index type, like an index on a LOWER function. Here, you would to specify the index type even though you are not changing it from the default. ie. add_index :users, :name, using: "btree (LOWER(name))" instead of add_index :users, "LOWER(name)" Second, in the case where you want to index multiple columns, you would have to repeat all of the columns in the using clause. ie. add_index :users, :organisation_id, :name, using: "btree (organisation_id, LOWER(name))" instead of add_index :users, :organisation_id, "LOWER(name)", using: :btree Third, it might be required to have multiple indexes on the same column with different functions/operator classes, but if the name is generated using only the column name then the names will conflict. For example, you might need to have indexes like: add_index :users, :name, using: "gist (name gist_trgm_ops)"
add_index :users, :name To support both equality and similarity searches, but both indexes would by default have the same name. I think that it would be better to either require the name to be specified if there is an expression, or automatically generate the name based on the whole expression instead of just the column. add_index :users, :name #=> creates index "index_users_on_name"
add_index :users, "name gist_trgm_ops", using: :gist #=> creates index "index_users_on_name_gist_trgm_ops"
add_index :users, "LOWER(name)" #=> creates index "index_users_on_lower_name" |
I agree... but that sounds more like #13684; while they're written close together in the SQL, I think opclasses are ultimately unrelated. |
If it's of any use, I just backported these changes to GitLab (see https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2987 for the exact changes) and they work like a charm. If anybody wants to backport these as well they can dump the following code somewhere in their Rails application (e.g. an initializer): https://gist.github.com/YorickPeterse/00a4364ec11e3b63c2c3 |
Oops.. I'd left this to give a chance for second opinions, but then failed to come back to it. Sorry @gregnavis 😟 I like this implementation. From a quick scroll through to reacquaint myself, I've spotted:
The I haven't looked at how bad the conflicts are after having neglected this for so long 😕 |
No problem, @matthewd! I know you're super-busy with other stuff. I'll try to address these issues and rebase it on top of |
77cf54d
to
7eb4554
Compare
7eb4554
to
3a0ee0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder!
One little thing, and this looks ready to go! 🚀
@@ -391,6 +391,25 @@ def default_index_type?(index) # :nodoc: | |||
|
|||
private | |||
|
|||
def add_index_opclass(column_names, options = {}) | |||
opclass = case options[:opclass] | |||
when String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this misses Symbol?
Seems like it might read more easily if it were flipped a bit, making this the else
clause -- then there's no need for a separate empty-hash branch.
index_name, index_type, index_columns, index_options, index_algorithm, index_using, comment = add_index_options(table_name, column_name, options) | ||
execute("CREATE #{index_type} INDEX #{index_algorithm} #{quote_column_name(index_name)} ON #{quote_table_name(table_name)} #{index_using} (#{index_columns})#{index_options}").tap do | ||
index_name, index_type, index_columns_and_opclasses, index_options, index_algorithm, index_using, comment = add_index_options(table_name, column_name, options) | ||
execute("CREATE #{index_type} INDEX #{index_algorithm} #{quote_column_name(index_name)} ON #{quote_table_name(table_name)} #{index_using} (#{index_columns_and_opclasses})#{index_options}").tap do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the index_columns
name is necessary? Actually this also includes index sort orders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted the name to reflect the content. I don't have a strong opinion here so I can revert if you think it's unnecessary.
@@ -494,7 +494,39 @@ def test_dump_foreign_key_targeting_different_schema | |||
end | |||
end | |||
|
|||
class DefaultsUsingMultipleSchemasAndDomainTest < ActiveRecord::PostgreSQLTestCase | |||
class SchemaIndexOpclassTest < ActiveRecord::TestCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ActiveRecord::TestCase/ActiveRecord::PostgreSQLTestCase/
end | ||
end | ||
|
||
class DefaultsUsingMultipleSchemasAndDomainTest < ActiveSupport::TestCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ActiveSupport::TestCase/ActiveRecord::PostgreSQLTestCase/
10508bb
to
5c68d97
Compare
5c68d97
to
d278636
Compare
There's a CodeClimate error and it seems it prefers: def add_index_opclass(column_names, options = {})
opclass = if options[:opclass].is_a?(Hash)
options[:opclass].symbolize_keys
else
Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
end
# ...
end to def add_index_opclass(column_names, options = {})
opclass = if options[:opclass].is_a?(Hash)
options[:opclass].symbolize_keys
else
Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
end
# ...
end Is that style really preferred? If so, I'll update the PR. |
I believe it will acquiesce to: def add_index_opclass(column_names, options = {})
opclass =
if options[:opclass].is_a?(Hash)
options[:opclass].symbolize_keys
else
Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
end
# ...
end |
super | ||
end | ||
|
||
# See http://www.postgresql.org/docs/current/static/errcodes-appendix.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny mismerge: we've lost an s
Add support for specifying non-default operator classes in PostgreSQL indexes. An example CREATE INDEX query that becomes possible is: CREATE INDEX users_name ON users USING gist (name gist_trgm_ops); Previously it was possible to specify the `gist` index but not the custom operator class. The `add_index` call for the above query is: add_index :users, :name, using: :gist, opclasses: {name: :gist_trgm_ops}
d278636
to
1dca75c
Compare
Eagle eye! I updated the PR. Please take another look, @matthewd. |
🎉 Sorry it took so long... and it got dropped so many times along the way 😞 And thanks for persisting -- I'm really glad to have this in. Great work! ❤️ |
Woohoo! 🚀 Thank you @matthewd and everyone else for making it happen. |
Use case
I needed to use trigrams when
SELECT
-ing from a table. I want to useschema.rb
. Unfortunately this wasn't possible to create an appropriate index. The required query is:The
gist_trgm_ops
aftername
is the operator class to use when using the index. Currently it's possible to specify... USING gist (name)
but there's no way of adding the operator class aftername
.PostgreSQL is the only affected database. Other databases are not affected.
Solution
Operator classes can be explicitly specified in
add_index
as:Changes
opclass
toIndexDefinition
and made it a validadd_index
optionopclass
toSchemaDumper
Issues
Below are issues I run into. I present my decision and a rationale for it. Any feedback is welcome! Hopefully some improvement is possible.
Syntax
I wasn't sure what's the best syntax. I considered
but it places PostgreSQL-specific data where the user might not expect it.
I decided to use a new option as this makes the implementation very simple and makes the opclasses used explicit. The tradeoff is that the column names must be specified twice.
Extraneous whitespace (resolved)
There's always a space after a column name used in the index even when no operator class is specified. For example... USING gist (name)
is turned into... USING gist (name )
. I decided that this makes the code simpler at the expense of a tiny ugliness in the test suite. Additionally multiple spaces already appear in some statements, e.g.CREATE INDEX
whenUNIQE
is not present.