Add support for PostgreSQL operator classes to add_index #19090

gregnavis · 2015-02-26T12:25:26Z

Use case

I needed to use trigrams when SELECT-ing from a table. I want to use schema.rb. Unfortunately this wasn't possible to create an appropriate index. The required query is:

CREATE INDEX users_name ON users USING gist (name gist_trgm_ops);

The gist_trgm_ops after name is the operator class to use when using the index. Currently it's possible to specify ... USING gist (name) but there's no way of adding the operator class after name.

PostgreSQL is the only affected database. Other databases are not affected.

Solution

Operator classes can be explicitly specified in add_index as:

add_index :users, :name, using: :gist, opclass: :gist_trgm_ops

Changes

added opclass to IndexDefinition and made it a valid add_index option
added support for opclass to SchemaDumper
test cases for the changes above

Issues

Below are issues I run into. I present my decision and a rationale for it. Any feedback is welcome! Hopefully some improvement is possible.

Syntax

I wasn't sure what's the best syntax. I considered

add_index :users, {name: :gist_trgm_ops}, using: :gist

but it places PostgreSQL-specific data where the user might not expect it.

I decided to use a new option as this makes the implementation very simple and makes the opclasses used explicit. The tradeoff is that the column names must be specified twice.

Extraneous whitespace (resolved)

There's always a space after a column name used in the index even when no operator class is specified. For example ... USING gist (name) is turned into ... USING gist (name ). I decided that this makes the code simpler at the expense of a tiny ugliness in the test suite. Additionally multiple spaces already appear in some statements, e.g. CREATE INDEX when UNIQE is not present.

cristianbica · 2015-02-26T13:39:35Z

There's a similar work at #18499

sgrif · 2015-02-26T14:29:00Z

Is there a reason simply passing a string to :using is insufficient for this?

gregnavis · 2015-02-26T15:00:39Z

It's great to see that Rails is improving in this area. I can see that my PR is smaller and more focused than the other PR.

@sgrif If you do

add_index(:users, :name, :using => 'gist (name gist_trgm_ops)')

then the query is

CREATE  INDEX  "index_users_on_name" ON "users" USING gist (name gist_trgm_ops) ("name" )

Please notice that the columns specified as the second argument to add_index are listed after USING. The result is invalid syntax. Or did you have something else in mind?

sgrif · 2015-02-26T15:03:45Z

That seems like a bug. I'd rather just make sure it's possible to pass a string to using properly

gregnavis · 2015-02-26T15:35:33Z

@sgrif that may be an option. There are some issues though. If we pass using: :gist then we must use the columns specified as the second argument to add_index. When we pass using: 'gist(name gist_trgm_ops)' then we should ignore the columns specified in the second argument because we can pass something entirely different here.

@sgrif Please tell me whether using: 'gist(name gist_trgm_ops)' is the syntax you had in mind.

sgrif · 2015-02-26T15:44:05Z

Yes, that is the syntax I had in mind. We can probably just drop the automatic arguments if you pass a string to using.

gregnavis · 2015-02-26T15:59:28Z

What is nice about your approach is that it simpler. I can see one more issue. If I pass using: 'gist(name gist_trgm_ops)' and don't pass the array of columns (I assume this is what you meant by automatic arguments) then the migration or schema.rb will break after switching to a database other than PostgreSQL.

I don't know what is in greater alignment with core values of Rails: the ability to switch the database without breaking migrations or schema.rb or a simpler syntax and implementation. Could you give a hint?

sgrif · 2015-02-26T16:00:26Z

In this case, the simpler syntax and implementation.

sgrif · 2015-02-26T16:00:37Z

using: :gist already isn't portable.

gregnavis · 2015-02-26T16:12:32Z

Sorry, I was imprecise. My question was: do we need the ability to switch a database to something other than PostgreSQL and still be able to run migrations or schema.rb albeit with a different result (e.g. an ordinary index). It's a form of partial portability because everything will work but you won't get the features not supported by the other database.

If the answer is No, we don't need that then your approach will be way simpler.

sgrif · 2015-02-26T17:52:04Z

No, we don't need that.

gregnavis · 2015-02-27T13:45:40Z

Great! That simplifies a lot.

I also came up with another syntax for this:

# Allow opclasses to be specified in column_name.
# Currently the whole string is quoted and treated as a column name.
add_index :users, 'name gist_trgm_ops', using: :gist

# or expect columns and opclasses to appear in using: when
# it's specified as a string (no column_name in this case)
add_index :users, using: 'gist (name gist_trgm_ops)'

The latter is what you suggested @sgrif, right?

I think the first syntax is more compatible with what we currently have. The downside is that it breaks compatibility for people who use spaces in column names (are there any? 😄)

Which do you think is better?

sgrif · 2015-02-27T22:07:37Z

I think it should be:

add_index :users, :name, using: "gist (name gist_trgm_ops)"

Simply so we can continue to have the column name for index naming purposes..

gregnavis · 2015-02-28T11:35:34Z

What if using: is a string, e.g. "gist"? This happens in one of the tests.

An option is to test %w(gin gist hash btree).include?(options[:using].downcase). If so, then do what the current code does (i.e. index columns specified in column_name). Otherwise insert options[:using] into the query without inserting column_name (which would be used only to name the index).

What do you think?

gregnavis · 2015-03-02T13:00:59Z

I updated the PR with Sean's suggestions. I'd love to hear your feedback!

matthewd · 2015-03-02T13:35:22Z

@sgrif this seems like quite a perversion of the existing call syntax to me. 😕

Not to mention the danger of people supplying different column names in the two places...

sgrif · 2015-03-02T14:03:33Z

@matthewd what would you like to see?

gregnavis · 2015-05-25T14:09:12Z

@matthewd, @sgrif, @cristianbica is there a chance we can make some progress on this? Thanks!

matthewd · 2015-05-25T14:58:13Z

I guess the spelling I'd consider to be most reflective of the underlying PostgreSQL syntax would be:

add_index :users, [[:name, :gist_trgm_ops]], using: :gist

# or perhaps a more extensible:
add_index :users, [[:name, opclass: :gist_trgm_ops]], using: :gist

.. which isn't particularly pretty... but may still be an improvement over how we currently do things?

Otherwise, again in line with :order and :length, the consistent-with-precedent approach would be to add a top level :opclass option, which can either be a string (applies to all columns) or a hash (keys are column names).

Note that even if we adopted my above suggestion of [column, options] pairs, we could still support a top-level option as applying to all the columns -- meaning you could ignore that syntax for all the more common single-column / consistent-opclass indexes.

You seem to have done a slightly-too-good job of revising history here, so I can't actually see whether any of the above resembles how you had it before @sgrif suggested the current form.

But I do feel that conflating the USING parameter with the index column list would be an error: they are no more related than are the table name and the column list.

gregnavis · 2015-05-25T19:52:06Z

@matthewd, I reverted the previous version of the code and the pull request message. The usage I implemented looked like:

add_index :users, :name, using: :gist, opclasses: {name: :gist_trgm_ops}

So this is in line with :order and :length (except I should change :opclasses to :opclass for consistency). How should I continue from the code that is currently in this PR?

swalkinshaw · 2015-08-13T19:44:15Z

@grn I'm having this same issue but with to_tsvector. Could this PR be more generic to support functions as well? Having the opclasses option limits this to your use case.

Example:

CREATE INDEX widget_name_search_idx ON widgets USING gin(to_tsvector('english', name))

The problem is the same here since you can pass a string to using but Rails still adds the column name at the end.

add_index :widgets, :name, name: 'widget_name_search_idx', using: "gin(to_tsvector('english', name))"

lsylvester · 2015-08-27T03:48:20Z

I also have run into this using the to_tsvector function. I think that it would be good to allow for indexes to be created on eny expression.

If we are going to allow any expression to be indexed, then I think that the syntax

add_index :users, :name, using: "gist (name gist_trgm_ops)"

has a few of issues.

First, there may be times when you want to index an expression without using a custom index type, like an index on a LOWER function. Here, you would to specify the index type even though you are not changing it from the default. ie.

add_index :users, :name, using: "btree (LOWER(name))"

instead of

add_index :users, "LOWER(name)"

Second, in the case where you want to index multiple columns, you would have to repeat all of the columns in the using clause. ie.

add_index :users, :organisation_id, :name, using: "btree (organisation_id, LOWER(name))"

instead of

add_index :users, :organisation_id, "LOWER(name)", using: :btree

Third, it might be required to have multiple indexes on the same column with different functions/operator classes, but if the name is generated using only the column name then the names will conflict.

For example, you might need to have indexes like:

add_index :users, :name, using: "gist (name gist_trgm_ops)"
add_index :users, :name

To support both equality and similarity searches, but both indexes would by default have the same name.

I think that it would be better to either require the name to be specified if there is an expression, or automatically generate the name based on the whole expression instead of just the column.

add_index :users, :name                               #=> creates index "index_users_on_name"
add_index :users, "name gist_trgm_ops", using: :gist  #=> creates index "index_users_on_name_gist_trgm_ops"
add_index :users, "LOWER(name)"                       #=> creates index "index_users_on_lower_name"

matthewd · 2015-08-27T20:25:39Z

I think that it would be good to allow for indexes to be created on any expression

I agree... but that sounds more like #13684; while they're written close together in the SQL, I think opclasses are ultimately unrelated.

yorickpeterse · 2016-03-03T11:39:47Z

If it's of any use, I just backported these changes to GitLab (see https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2987 for the exact changes) and they work like a charm. If anybody wants to backport these as well they can dump the following code somewhere in their Rails application (e.g. an initializer): https://gist.github.com/YorickPeterse/00a4364ec11e3b63c2c3

matthewd · 2017-03-24T06:23:35Z

Oops.. I'd left this to give a chance for second opinions, but then failed to come back to it. Sorry @gregnavis 😟

I like this implementation.

From a quick scroll through to reacquaint myself, I've spotted:

rename opclasses to opclass as you mentioned;
even though we're already far from perfect on this query construction, it's probably worth avoiding the space when no opclass is set;
opclass can be a non-hash, in which case that value applies to all columns (and the dumper should presumably take advantage of that, especially for the special-but-common case of a single-column index).

The inddef parsing is taking some liberties in assuming things about the names (e.g., that neither the column nor opclass name contains a space), but it looks like desc_order_columns, for example, is already being similarly presumptuous. So it seems fine to call that someone else's future problem.

I haven't looked at how bad the conflicts are after having neglected this for so long 😕

gregnavis · 2017-03-24T10:36:58Z

No problem, @matthewd! I know you're super-busy with other stuff. I'll try to address these issues and rebase it on top of master next week. Stay tuned!

gregnavis · 2017-05-13T02:32:07Z

@matthewd I rebased the branch (the conflicts weren't that bad) and address the issues you mentioned.

@matthewd @jeremy @sgrif - please review and let me know if anything else should be done before merging

gregnavis · 2017-07-07T05:24:50Z

@matthewd @jeremy @sgrif I'm floating this to the top of your inboxes. If there's anything I could do to make the PR better please let me know.

matthewd

Thanks for the reminder!

One little thing, and this looks ready to go! 🚀

matthewd · 2017-07-08T20:40:46Z

activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb

@@ -391,6 +391,25 @@ def default_index_type?(index) # :nodoc:

      private

+        def add_index_opclass(column_names, options = {})
+          opclass = case options[:opclass]
+                    when String


I think this misses Symbol?

Seems like it might read more easily if it were flipped a bit, making this the else clause -- then there's no need for a separate empty-hash branch.

kamipo · 2017-07-08T20:56:27Z

activerecord/lib/active_record/connection_adapters/postgresql/schema_statements.rb

-          index_name, index_type, index_columns, index_options, index_algorithm, index_using, comment = add_index_options(table_name, column_name, options)
-          execute("CREATE #{index_type} INDEX #{index_algorithm} #{quote_column_name(index_name)} ON #{quote_table_name(table_name)} #{index_using} (#{index_columns})#{index_options}").tap do
+          index_name, index_type, index_columns_and_opclasses, index_options, index_algorithm, index_using, comment = add_index_options(table_name, column_name, options)
+          execute("CREATE #{index_type} INDEX #{index_algorithm} #{quote_column_name(index_name)} ON #{quote_table_name(table_name)} #{index_using} (#{index_columns_and_opclasses})#{index_options}").tap do


Changing the index_columns name is necessary? Actually this also includes index sort orders.

I wanted the name to reflect the content. I don't have a strong opinion here so I can revert if you think it's unnecessary.

kamipo · 2017-07-08T21:04:08Z

activerecord/test/cases/adapters/postgresql/schema_test.rb

@@ -494,7 +494,39 @@ def test_dump_foreign_key_targeting_different_schema
  end
 end

-class DefaultsUsingMultipleSchemasAndDomainTest < ActiveRecord::PostgreSQLTestCase
+class SchemaIndexOpclassTest < ActiveRecord::TestCase


s/ActiveRecord::TestCase/ActiveRecord::PostgreSQLTestCase/

kamipo · 2017-07-08T21:04:37Z

activerecord/test/cases/adapters/postgresql/schema_test.rb

+  end
+end
+
+class DefaultsUsingMultipleSchemasAndDomainTest < ActiveSupport::TestCase


s/ActiveSupport::TestCase/ActiveRecord::PostgreSQLTestCase/

gregnavis · 2017-07-11T05:15:39Z

@matthewd @kamipo thanks for feedback! I updated the PR as per your suggestions. It seems the build is failing for reasons unrelated to my changes.

gregnavis · 2017-11-29T21:47:28Z

@matthewd @kamipo I just rebase the PR to the recent master. Please let me know whether there's anything I could do to help to merge it.

gregnavis · 2017-11-29T21:55:12Z

There's a CodeClimate error and it seems it prefers:

        def add_index_opclass(column_names, options = {})
          opclass = if options[:opclass].is_a?(Hash)
            options[:opclass].symbolize_keys
          else
            Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
          end
          # ...
        end

to

        def add_index_opclass(column_names, options = {})
          opclass = if options[:opclass].is_a?(Hash)
                      options[:opclass].symbolize_keys
                    else
                      Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
                    end
          # ...
        end

Is that style really preferred? If so, I'll update the PR.

matthewd · 2017-11-30T02:06:06Z

I believe it will acquiesce to:

      def add_index_opclass(column_names, options = {})
        opclass =
          if options[:opclass].is_a?(Hash)
            options[:opclass].symbolize_keys
          else
            Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
          end
        # ...
      end

matthewd · 2017-11-30T02:06:07Z

activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb

+          super
+        end
+
+        # See http://www.postgresql.org/docs/current/static/errcodes-appendix.html


Tiny mismerge: we've lost an s

Add support for specifying non-default operator classes in PostgreSQL indexes. An example CREATE INDEX query that becomes possible is: CREATE INDEX users_name ON users USING gist (name gist_trgm_ops); Previously it was possible to specify the `gist` index but not the custom operator class. The `add_index` call for the above query is: add_index :users, :name, using: :gist, opclasses: {name: :gist_trgm_ops}

gregnavis · 2017-11-30T10:12:33Z

Eagle eye! I updated the PR. Please take another look, @matthewd.

matthewd · 2017-12-01T10:23:14Z

🎉

Sorry it took so long... and it got dropped so many times along the way 😞

And thanks for persisting -- I'm really glad to have this in.

Great work! ❤️

gregnavis · 2017-12-01T10:32:30Z

Woohoo! 🚀 Thank you @matthewd and everyone else for making it happen.

carlosantoniodasilva added the activerecord label Mar 2, 2015

gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch from 01b3518 to 4bd4a4f Compare March 2, 2015 12:55

matthewd mentioned this pull request Mar 2, 2015

Support for :add_index with json-columns/attributes? [PostgreSQL] #19179

Closed

gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch from 4bd4a4f to 01b3518 Compare May 25, 2015 19:31

matthewd mentioned this pull request Jun 9, 2015

Migration of trigram search #20495

Closed

rails-bot assigned sgrif Oct 20, 2015

maclover7 added needs work and removed needs feedback labels Mar 24, 2017

gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch 2 times, most recently from 77cf54d to 7eb4554 Compare May 13, 2017 02:28

gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch from 7eb4554 to 3a0ee0e Compare May 13, 2017 02:39

ghost mentioned this pull request Jun 5, 2017

[5.5] Capability to specify index columns modifiers laravel/framework#19476

Closed

matthewd reviewed Jul 8, 2017

View reviewed changes

kamipo reviewed Jul 8, 2017

View reviewed changes

gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch 2 times, most recently from 10508bb to 5c68d97 Compare July 11, 2017 03:20

gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch from 5c68d97 to d278636 Compare November 29, 2017 21:46

matthewd reviewed Nov 30, 2017

View reviewed changes

gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch from d278636 to 1dca75c Compare November 30, 2017 09:47

matthewd merged commit 8e7b9e2 into rails:master Dec 1, 2017

gregnavis deleted the support-postgresql-operator-classes-in-indexes branch January 3, 2018 20:34

kamipo mentioned this pull request Oct 20, 2018

Support more PostgreSQL and SQLite3 index options #18499

Closed

matthewd mentioned this pull request Mar 29, 2019

Make SchemaCache loading faster #35785

Closed

Add support for PostgreSQL operator classes to add_index #19090

Add support for PostgreSQL operator classes to add_index #19090

Conversation

gregnavis commented Feb 26, 2015 • edited

Use case

Solution

Changes

Issues

Syntax

Extraneous whitespace (resolved)

cristianbica commented Feb 26, 2015

sgrif commented Feb 26, 2015

gregnavis commented Feb 26, 2015

sgrif commented Feb 26, 2015

gregnavis commented Feb 26, 2015

sgrif commented Feb 26, 2015

gregnavis commented Feb 26, 2015

sgrif commented Feb 26, 2015

sgrif commented Feb 26, 2015

gregnavis commented Feb 26, 2015

sgrif commented Feb 26, 2015

gregnavis commented Feb 27, 2015

sgrif commented Feb 27, 2015

gregnavis commented Feb 28, 2015

gregnavis commented Mar 2, 2015

matthewd commented Mar 2, 2015

sgrif commented Mar 2, 2015

gregnavis commented May 25, 2015

matthewd commented May 25, 2015

gregnavis commented May 25, 2015

swalkinshaw commented Aug 13, 2015

lsylvester commented Aug 27, 2015

matthewd commented Aug 27, 2015

yorickpeterse commented Mar 3, 2016

matthewd commented Mar 24, 2017

gregnavis commented Mar 24, 2017

gregnavis commented May 13, 2017

gregnavis commented Jul 7, 2017

matthewd left a comment

Choose a reason for hiding this comment

matthewd Jul 8, 2017

Choose a reason for hiding this comment

kamipo Jul 8, 2017

Choose a reason for hiding this comment

gregnavis Jul 10, 2017

Choose a reason for hiding this comment

kamipo Jul 8, 2017

Choose a reason for hiding this comment

kamipo Jul 8, 2017

Choose a reason for hiding this comment

gregnavis commented Jul 11, 2017

gregnavis commented Nov 29, 2017

gregnavis commented Nov 29, 2017

matthewd commented Nov 30, 2017

matthewd Nov 30, 2017

Choose a reason for hiding this comment

gregnavis commented Nov 30, 2017

matthewd commented Dec 1, 2017 • edited

gregnavis commented Dec 1, 2017

gregnavis commented Feb 26, 2015 •

edited

matthewd commented Dec 1, 2017 •

edited