Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PostgreSQL operator classes to add_index #19090

Conversation

@gregnavis
Copy link
Contributor

gregnavis commented Feb 26, 2015

Use case

I needed to use trigrams when SELECT-ing from a table. I want to use schema.rb. Unfortunately this wasn't possible to create an appropriate index. The required query is:

CREATE INDEX users_name ON users USING gist (name gist_trgm_ops);

The gist_trgm_ops after name is the operator class to use when using the index. Currently it's possible to specify ... USING gist (name) but there's no way of adding the operator class after name.

PostgreSQL is the only affected database. Other databases are not affected.

Solution

Operator classes can be explicitly specified in add_index as:

add_index :users, :name, using: :gist, opclass: :gist_trgm_ops

Changes

  • added opclass to IndexDefinition and made it a valid add_index option
  • added support for opclass to SchemaDumper
  • test cases for the changes above

Issues

Below are issues I run into. I present my decision and a rationale for it. Any feedback is welcome! Hopefully some improvement is possible.

Syntax

I wasn't sure what's the best syntax. I considered

add_index :users, {name: :gist_trgm_ops}, using: :gist

but it places PostgreSQL-specific data where the user might not expect it.

I decided to use a new option as this makes the implementation very simple and makes the opclasses used explicit. The tradeoff is that the column names must be specified twice.

Extraneous whitespace (resolved)

There's always a space after a column name used in the index even when no operator class is specified. For example ... USING gist (name) is turned into ... USING gist (name ). I decided that this makes the code simpler at the expense of a tiny ugliness in the test suite. Additionally multiple spaces already appear in some statements, e.g. CREATE INDEX when UNIQE is not present.

@cristianbica
Copy link
Member

cristianbica commented Feb 26, 2015

There's a similar work at #18499

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

Is there a reason simply passing a string to :using is insufficient for this?

@gregnavis
Copy link
Contributor Author

gregnavis commented Feb 26, 2015

It's great to see that Rails is improving in this area. I can see that my PR is smaller and more focused than the other PR.

@sgrif If you do

add_index(:users, :name, :using => 'gist (name gist_trgm_ops)')

then the query is

CREATE  INDEX  "index_users_on_name" ON "users" USING gist (name gist_trgm_ops) ("name" )

Please notice that the columns specified as the second argument to add_index are listed after USING. The result is invalid syntax. Or did you have something else in mind?

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

That seems like a bug. I'd rather just make sure it's possible to pass a string to using properly

@gregnavis
Copy link
Contributor Author

gregnavis commented Feb 26, 2015

@sgrif that may be an option. There are some issues though. If we pass using: :gist then we must use the columns specified as the second argument to add_index. When we pass using: 'gist(name gist_trgm_ops)' then we should ignore the columns specified in the second argument because we can pass something entirely different here.

@sgrif Please tell me whether using: 'gist(name gist_trgm_ops)' is the syntax you had in mind.

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

Yes, that is the syntax I had in mind. We can probably just drop the automatic arguments if you pass a string to using.

@gregnavis
Copy link
Contributor Author

gregnavis commented Feb 26, 2015

What is nice about your approach is that it simpler. I can see one more issue. If I pass using: 'gist(name gist_trgm_ops)' and don't pass the array of columns (I assume this is what you meant by automatic arguments) then the migration or schema.rb will break after switching to a database other than PostgreSQL.

I don't know what is in greater alignment with core values of Rails: the ability to switch the database without breaking migrations or schema.rb or a simpler syntax and implementation. Could you give a hint?

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

In this case, the simpler syntax and implementation.

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

using: :gist already isn't portable.

@gregnavis
Copy link
Contributor Author

gregnavis commented Feb 26, 2015

Sorry, I was imprecise. My question was: do we need the ability to switch a database to something other than PostgreSQL and still be able to run migrations or schema.rb albeit with a different result (e.g. an ordinary index). It's a form of partial portability because everything will work but you won't get the features not supported by the other database.

If the answer is No, we don't need that then your approach will be way simpler.

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

No, we don't need that.

@gregnavis
Copy link
Contributor Author

gregnavis commented Feb 27, 2015

Great! That simplifies a lot.

I also came up with another syntax for this:

# Allow opclasses to be specified in column_name.
# Currently the whole string is quoted and treated as a column name.
add_index :users, 'name gist_trgm_ops', using: :gist

# or expect columns and opclasses to appear in using: when
# it's specified as a string (no column_name in this case)
add_index :users, using: 'gist (name gist_trgm_ops)'

The latter is what you suggested @sgrif, right?

I think the first syntax is more compatible with what we currently have. The downside is that it breaks compatibility for people who use spaces in column names (are there any? 😄)

Which do you think is better?

@sgrif
Copy link
Contributor

sgrif commented Feb 27, 2015

I think it should be:

add_index :users, :name, using: "gist (name gist_trgm_ops)"

Simply so we can continue to have the column name for index naming purposes..

@gregnavis
Copy link
Contributor Author

gregnavis commented Feb 28, 2015

What if using: is a string, e.g. "gist"? This happens in one of the tests.

An option is to test %w(gin gist hash btree).include?(options[:using].downcase). If so, then do what the current code does (i.e. index columns specified in column_name). Otherwise insert options[:using] into the query without inserting column_name (which would be used only to name the index).

What do you think?

@gregnavis gregnavis force-pushed the gregnavis:support-postgresql-operator-classes-in-indexes branch from 01b3518 to 4bd4a4f Mar 2, 2015
@gregnavis
Copy link
Contributor Author

gregnavis commented Mar 2, 2015

I updated the PR with Sean's suggestions. I'd love to hear your feedback!

@matthewd
Copy link
Member

matthewd commented Mar 2, 2015

@sgrif this seems like quite a perversion of the existing call syntax to me. 😕

Not to mention the danger of people supplying different column names in the two places...

@sgrif
Copy link
Contributor

sgrif commented Mar 2, 2015

@matthewd what would you like to see?

@gregnavis
Copy link
Contributor Author

gregnavis commented May 25, 2015

@matthewd, @sgrif, @cristianbica is there a chance we can make some progress on this? Thanks!

@matthewd
Copy link
Member

matthewd commented May 25, 2015

I guess the spelling I'd consider to be most reflective of the underlying PostgreSQL syntax would be:

add_index :users, [[:name, :gist_trgm_ops]], using: :gist

# or perhaps a more extensible:
add_index :users, [[:name, opclass: :gist_trgm_ops]], using: :gist

.. which isn't particularly pretty... but may still be an improvement over how we currently do things?

Otherwise, again in line with :order and :length, the consistent-with-precedent approach would be to add a top level :opclass option, which can either be a string (applies to all columns) or a hash (keys are column names).

Note that even if we adopted my above suggestion of [column, options] pairs, we could still support a top-level option as applying to all the columns -- meaning you could ignore that syntax for all the more common single-column / consistent-opclass indexes.


You seem to have done a slightly-too-good job of revising history here, so I can't actually see whether any of the above resembles how you had it before @sgrif suggested the current form.

But I do feel that conflating the USING parameter with the index column list would be an error: they are no more related than are the table name and the column list.

@gregnavis gregnavis force-pushed the gregnavis:support-postgresql-operator-classes-in-indexes branch from 4bd4a4f to 01b3518 May 25, 2015
@gregnavis
Copy link
Contributor Author

gregnavis commented May 25, 2015

@matthewd, I reverted the previous version of the code and the pull request message. The usage I implemented looked like:

add_index :users, :name, using: :gist, opclasses: {name: :gist_trgm_ops}

So this is in line with :order and :length (except I should change :opclasses to :opclass for consistency). How should I continue from the code that is currently in this PR?

@swalkinshaw
Copy link
Contributor

swalkinshaw commented Aug 13, 2015

@grn I'm having this same issue but with to_tsvector. Could this PR be more generic to support functions as well? Having the opclasses option limits this to your use case.

Example:

CREATE INDEX widget_name_search_idx ON widgets USING gin(to_tsvector('english', name))

The problem is the same here since you can pass a string to using but Rails still adds the column name at the end.

add_index :widgets, :name, name: 'widget_name_search_idx', using: "gin(to_tsvector('english', name))"
@lsylvester
Copy link
Contributor

lsylvester commented Aug 27, 2015

I also have run into this using the to_tsvector function. I think that it would be good to allow for indexes to be created on eny expression.

If we are going to allow any expression to be indexed, then I think that the syntax

add_index :users, :name, using: "gist (name gist_trgm_ops)"

has a few of issues.

First, there may be times when you want to index an expression without using a custom index type, like an index on a LOWER function. Here, you would to specify the index type even though you are not changing it from the default. ie.

add_index :users, :name, using: "btree (LOWER(name))"

instead of

add_index :users, "LOWER(name)"

Second, in the case where you want to index multiple columns, you would have to repeat all of the columns in the using clause. ie.

add_index :users, :organisation_id, :name, using: "btree (organisation_id, LOWER(name))"

instead of

add_index :users, :organisation_id, "LOWER(name)", using: :btree

Third, it might be required to have multiple indexes on the same column with different functions/operator classes, but if the name is generated using only the column name then the names will conflict.

For example, you might need to have indexes like:

add_index :users, :name, using: "gist (name gist_trgm_ops)"
add_index :users, :name

To support both equality and similarity searches, but both indexes would by default have the same name.

I think that it would be better to either require the name to be specified if there is an expression, or automatically generate the name based on the whole expression instead of just the column.

add_index :users, :name                               #=> creates index "index_users_on_name"
add_index :users, "name gist_trgm_ops", using: :gist  #=> creates index "index_users_on_name_gist_trgm_ops"
add_index :users, "LOWER(name)"                       #=> creates index "index_users_on_lower_name"
@matthewd
Copy link
Member

matthewd commented Aug 27, 2015

I think that it would be good to allow for indexes to be created on any expression

I agree... but that sounds more like #13684; while they're written close together in the SQL, I think opclasses are ultimately unrelated.

@YorickPeterse
Copy link

YorickPeterse commented Mar 3, 2016

If it's of any use, I just backported these changes to GitLab (see https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2987 for the exact changes) and they work like a charm. If anybody wants to backport these as well they can dump the following code somewhere in their Rails application (e.g. an initializer): https://gist.github.com/YorickPeterse/00a4364ec11e3b63c2c3

@gregnavis
Copy link
Contributor Author

gregnavis commented Mar 24, 2017

No problem, @matthewd! I know you're super-busy with other stuff. I'll try to address these issues and rebase it on top of master next week. Stay tuned!

@maclover7 maclover7 added needs work and removed needs feedback labels Mar 24, 2017
@gregnavis gregnavis force-pushed the gregnavis:support-postgresql-operator-classes-in-indexes branch 2 times, most recently May 13, 2017
@gregnavis
Copy link
Contributor Author

gregnavis commented May 13, 2017

@matthewd I rebased the branch (the conflicts weren't that bad) and address the issues you mentioned.

@matthewd @jeremy @sgrif - please review and let me know if anything else should be done before merging

@gregnavis gregnavis force-pushed the gregnavis:support-postgresql-operator-classes-in-indexes branch May 13, 2017
@gregnavis
Copy link
Contributor Author

gregnavis commented Jul 7, 2017

@matthewd @jeremy @sgrif I'm floating this to the top of your inboxes. If there's anything I could do to make the PR better please let me know.

Copy link
Member

matthewd left a comment

Thanks for the reminder!

One little thing, and this looks ready to go! 🚀

activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb Outdated
@@ -391,6 +391,25 @@ def default_index_type?(index) # :nodoc:

private

def add_index_opclass(column_names, options = {})
opclass = case options[:opclass]
when String

This comment has been minimized.

Copy link
@matthewd

matthewd Jul 8, 2017

Member

I think this misses Symbol?

Seems like it might read more easily if it were flipped a bit, making this the else clause -- then there's no need for a separate empty-hash branch.

index_name, index_type, index_columns, index_options, index_algorithm, index_using, comment = add_index_options(table_name, column_name, options)
execute("CREATE #{index_type} INDEX #{index_algorithm} #{quote_column_name(index_name)} ON #{quote_table_name(table_name)} #{index_using} (#{index_columns})#{index_options}").tap do
index_name, index_type, index_columns_and_opclasses, index_options, index_algorithm, index_using, comment = add_index_options(table_name, column_name, options)
execute("CREATE #{index_type} INDEX #{index_algorithm} #{quote_column_name(index_name)} ON #{quote_table_name(table_name)} #{index_using} (#{index_columns_and_opclasses})#{index_options}").tap do

This comment has been minimized.

Copy link
@kamipo

kamipo Jul 8, 2017

Member

Changing the index_columns name is necessary? Actually this also includes index sort orders.

This comment has been minimized.

Copy link
@gregnavis

gregnavis Jul 10, 2017

Author Contributor

I wanted the name to reflect the content. I don't have a strong opinion here so I can revert if you think it's unnecessary.

activerecord/test/cases/adapters/postgresql/schema_test.rb Outdated
@@ -494,7 +494,39 @@ def test_dump_foreign_key_targeting_different_schema
end
end

class DefaultsUsingMultipleSchemasAndDomainTest < ActiveRecord::PostgreSQLTestCase
class SchemaIndexOpclassTest < ActiveRecord::TestCase

This comment has been minimized.

Copy link
@kamipo

kamipo Jul 8, 2017

Member

s/ActiveRecord::TestCase/ActiveRecord::PostgreSQLTestCase/

activerecord/test/cases/adapters/postgresql/schema_test.rb Outdated
end
end

class DefaultsUsingMultipleSchemasAndDomainTest < ActiveSupport::TestCase

This comment has been minimized.

Copy link
@kamipo

kamipo Jul 8, 2017

Member

s/ActiveSupport::TestCase/ActiveRecord::PostgreSQLTestCase/

@gregnavis gregnavis force-pushed the gregnavis:support-postgresql-operator-classes-in-indexes branch 2 times, most recently Jul 10, 2017
@gregnavis
Copy link
Contributor Author

gregnavis commented Jul 11, 2017

@matthewd @kamipo thanks for feedback! I updated the PR as per your suggestions. It seems the build is failing for reasons unrelated to my changes.

@gregnavis gregnavis force-pushed the gregnavis:support-postgresql-operator-classes-in-indexes branch Nov 29, 2017
@gregnavis
Copy link
Contributor Author

gregnavis commented Nov 29, 2017

@matthewd @kamipo I just rebase the PR to the recent master. Please let me know whether there's anything I could do to help to merge it.

@gregnavis
Copy link
Contributor Author

gregnavis commented Nov 29, 2017

There's a CodeClimate error and it seems it prefers:

        def add_index_opclass(column_names, options = {})
          opclass = if options[:opclass].is_a?(Hash)
            options[:opclass].symbolize_keys
          else
            Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
          end
          # ...
        end

to

        def add_index_opclass(column_names, options = {})
          opclass = if options[:opclass].is_a?(Hash)
                      options[:opclass].symbolize_keys
                    else
                      Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
                    end
          # ...
        end

Is that style really preferred? If so, I'll update the PR.

@matthewd
Copy link
Member

matthewd commented Nov 30, 2017

I believe it will acquiesce to:

      def add_index_opclass(column_names, options = {})
        opclass =
          if options[:opclass].is_a?(Hash)
            options[:opclass].symbolize_keys
          else
            Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
          end
        # ...
      end
activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb Outdated
super
end

# See http://www.postgresql.org/docs/current/static/errcodes-appendix.html

This comment has been minimized.

Copy link
@matthewd

matthewd Nov 30, 2017

Member

Tiny mismerge: we've lost an s

Add support for specifying non-default operator classes in PostgreSQL
indexes. An example CREATE INDEX query that becomes possible is:

    CREATE INDEX users_name ON users USING gist (name gist_trgm_ops);

Previously it was possible to specify the `gist` index but not the
custom operator class. The `add_index` call for the above query is:

    add_index :users, :name, using: :gist, opclasses: {name: :gist_trgm_ops}
@gregnavis gregnavis force-pushed the gregnavis:support-postgresql-operator-classes-in-indexes branch to 1dca75c Nov 30, 2017
@gregnavis
Copy link
Contributor Author

gregnavis commented Nov 30, 2017

Eagle eye! I updated the PR. Please take another look, @matthewd.

@matthewd matthewd merged commit 8e7b9e2 into rails:master Dec 1, 2017
2 checks passed
2 checks passed
codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@matthewd
Copy link
Member

matthewd commented Dec 1, 2017

🎉

Sorry it took so long... and it got dropped so many times along the way 😞

And thanks for persisting -- I'm really glad to have this in.

Great work! ❤️

@gregnavis
Copy link
Contributor Author

gregnavis commented Dec 1, 2017

Woohoo! 🚀 Thank you @matthewd and everyone else for making it happen.

@gregnavis gregnavis deleted the gregnavis:support-postgresql-operator-classes-in-indexes branch Jan 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.