Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PostgreSQL operator classes to add_index #19090

Conversation

gregnavis
Copy link
Contributor

@gregnavis gregnavis commented Feb 26, 2015

Use case

I needed to use trigrams when SELECT-ing from a table. I want to use schema.rb. Unfortunately this wasn't possible to create an appropriate index. The required query is:

CREATE INDEX users_name ON users USING gist (name gist_trgm_ops);

The gist_trgm_ops after name is the operator class to use when using the index. Currently it's possible to specify ... USING gist (name) but there's no way of adding the operator class after name.

PostgreSQL is the only affected database. Other databases are not affected.

Solution

Operator classes can be explicitly specified in add_index as:

add_index :users, :name, using: :gist, opclass: :gist_trgm_ops

Changes

  • added opclass to IndexDefinition and made it a valid add_index option
  • added support for opclass to SchemaDumper
  • test cases for the changes above

Issues

Below are issues I run into. I present my decision and a rationale for it. Any feedback is welcome! Hopefully some improvement is possible.

Syntax

I wasn't sure what's the best syntax. I considered

add_index :users, {name: :gist_trgm_ops}, using: :gist

but it places PostgreSQL-specific data where the user might not expect it.

I decided to use a new option as this makes the implementation very simple and makes the opclasses used explicit. The tradeoff is that the column names must be specified twice.

Extraneous whitespace (resolved)

There's always a space after a column name used in the index even when no operator class is specified. For example ... USING gist (name) is turned into ... USING gist (name ). I decided that this makes the code simpler at the expense of a tiny ugliness in the test suite. Additionally multiple spaces already appear in some statements, e.g. CREATE INDEX when UNIQE is not present.

@cristianbica
Copy link
Member

There's a similar work at #18499

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

Is there a reason simply passing a string to :using is insufficient for this?

@gregnavis
Copy link
Contributor Author

It's great to see that Rails is improving in this area. I can see that my PR is smaller and more focused than the other PR.

@sgrif If you do

add_index(:users, :name, :using => 'gist (name gist_trgm_ops)')

then the query is

CREATE  INDEX  "index_users_on_name" ON "users" USING gist (name gist_trgm_ops) ("name" )

Please notice that the columns specified as the second argument to add_index are listed after USING. The result is invalid syntax. Or did you have something else in mind?

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

That seems like a bug. I'd rather just make sure it's possible to pass a string to using properly

@gregnavis
Copy link
Contributor Author

@sgrif that may be an option. There are some issues though. If we pass using: :gist then we must use the columns specified as the second argument to add_index. When we pass using: 'gist(name gist_trgm_ops)' then we should ignore the columns specified in the second argument because we can pass something entirely different here.

@sgrif Please tell me whether using: 'gist(name gist_trgm_ops)' is the syntax you had in mind.

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

Yes, that is the syntax I had in mind. We can probably just drop the automatic arguments if you pass a string to using.

@gregnavis
Copy link
Contributor Author

What is nice about your approach is that it simpler. I can see one more issue. If I pass using: 'gist(name gist_trgm_ops)' and don't pass the array of columns (I assume this is what you meant by automatic arguments) then the migration or schema.rb will break after switching to a database other than PostgreSQL.

I don't know what is in greater alignment with core values of Rails: the ability to switch the database without breaking migrations or schema.rb or a simpler syntax and implementation. Could you give a hint?

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

In this case, the simpler syntax and implementation.

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

using: :gist already isn't portable.

@gregnavis
Copy link
Contributor Author

Sorry, I was imprecise. My question was: do we need the ability to switch a database to something other than PostgreSQL and still be able to run migrations or schema.rb albeit with a different result (e.g. an ordinary index). It's a form of partial portability because everything will work but you won't get the features not supported by the other database.

If the answer is No, we don't need that then your approach will be way simpler.

@sgrif
Copy link
Contributor

sgrif commented Feb 26, 2015

No, we don't need that.

@gregnavis
Copy link
Contributor Author

Great! That simplifies a lot.

I also came up with another syntax for this:

# Allow opclasses to be specified in column_name.
# Currently the whole string is quoted and treated as a column name.
add_index :users, 'name gist_trgm_ops', using: :gist

# or expect columns and opclasses to appear in using: when
# it's specified as a string (no column_name in this case)
add_index :users, using: 'gist (name gist_trgm_ops)'

The latter is what you suggested @sgrif, right?

I think the first syntax is more compatible with what we currently have. The downside is that it breaks compatibility for people who use spaces in column names (are there any? 😄)

Which do you think is better?

@sgrif
Copy link
Contributor

sgrif commented Feb 27, 2015

I think it should be:

add_index :users, :name, using: "gist (name gist_trgm_ops)"

Simply so we can continue to have the column name for index naming purposes..

@gregnavis
Copy link
Contributor Author

What if using: is a string, e.g. "gist"? This happens in one of the tests.

An option is to test %w(gin gist hash btree).include?(options[:using].downcase). If so, then do what the current code does (i.e. index columns specified in column_name). Otherwise insert options[:using] into the query without inserting column_name (which would be used only to name the index).

What do you think?

@gregnavis gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch from 01b3518 to 4bd4a4f Compare March 2, 2015 12:55
@gregnavis
Copy link
Contributor Author

I updated the PR with Sean's suggestions. I'd love to hear your feedback!

@matthewd
Copy link
Member

matthewd commented Mar 2, 2015

@sgrif this seems like quite a perversion of the existing call syntax to me. 😕

Not to mention the danger of people supplying different column names in the two places...

@sgrif
Copy link
Contributor

sgrif commented Mar 2, 2015

@matthewd what would you like to see?

@gregnavis
Copy link
Contributor Author

@matthewd, @sgrif, @cristianbica is there a chance we can make some progress on this? Thanks!

@matthewd
Copy link
Member

I guess the spelling I'd consider to be most reflective of the underlying PostgreSQL syntax would be:

add_index :users, [[:name, :gist_trgm_ops]], using: :gist

# or perhaps a more extensible:
add_index :users, [[:name, opclass: :gist_trgm_ops]], using: :gist

.. which isn't particularly pretty... but may still be an improvement over how we currently do things?

Otherwise, again in line with :order and :length, the consistent-with-precedent approach would be to add a top level :opclass option, which can either be a string (applies to all columns) or a hash (keys are column names).

Note that even if we adopted my above suggestion of [column, options] pairs, we could still support a top-level option as applying to all the columns -- meaning you could ignore that syntax for all the more common single-column / consistent-opclass indexes.


You seem to have done a slightly-too-good job of revising history here, so I can't actually see whether any of the above resembles how you had it before @sgrif suggested the current form.

But I do feel that conflating the USING parameter with the index column list would be an error: they are no more related than are the table name and the column list.

@gregnavis gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch from 4bd4a4f to 01b3518 Compare May 25, 2015 19:31
@gregnavis
Copy link
Contributor Author

@matthewd, I reverted the previous version of the code and the pull request message. The usage I implemented looked like:

add_index :users, :name, using: :gist, opclasses: {name: :gist_trgm_ops}

So this is in line with :order and :length (except I should change :opclasses to :opclass for consistency). How should I continue from the code that is currently in this PR?

@swalkinshaw
Copy link
Contributor

@grn I'm having this same issue but with to_tsvector. Could this PR be more generic to support functions as well? Having the opclasses option limits this to your use case.

Example:

CREATE INDEX widget_name_search_idx ON widgets USING gin(to_tsvector('english', name))

The problem is the same here since you can pass a string to using but Rails still adds the column name at the end.

add_index :widgets, :name, name: 'widget_name_search_idx', using: "gin(to_tsvector('english', name))"

@lsylvester
Copy link
Contributor

I also have run into this using the to_tsvector function. I think that it would be good to allow for indexes to be created on eny expression.

If we are going to allow any expression to be indexed, then I think that the syntax

add_index :users, :name, using: "gist (name gist_trgm_ops)"

has a few of issues.

First, there may be times when you want to index an expression without using a custom index type, like an index on a LOWER function. Here, you would to specify the index type even though you are not changing it from the default. ie.

add_index :users, :name, using: "btree (LOWER(name))"

instead of

add_index :users, "LOWER(name)"

Second, in the case where you want to index multiple columns, you would have to repeat all of the columns in the using clause. ie.

add_index :users, :organisation_id, :name, using: "btree (organisation_id, LOWER(name))"

instead of

add_index :users, :organisation_id, "LOWER(name)", using: :btree

Third, it might be required to have multiple indexes on the same column with different functions/operator classes, but if the name is generated using only the column name then the names will conflict.

For example, you might need to have indexes like:

add_index :users, :name, using: "gist (name gist_trgm_ops)"
add_index :users, :name

To support both equality and similarity searches, but both indexes would by default have the same name.

I think that it would be better to either require the name to be specified if there is an expression, or automatically generate the name based on the whole expression instead of just the column.

add_index :users, :name                               #=> creates index "index_users_on_name"
add_index :users, "name gist_trgm_ops", using: :gist  #=> creates index "index_users_on_name_gist_trgm_ops"
add_index :users, "LOWER(name)"                       #=> creates index "index_users_on_lower_name"

@matthewd
Copy link
Member

I think that it would be good to allow for indexes to be created on any expression

I agree... but that sounds more like #13684; while they're written close together in the SQL, I think opclasses are ultimately unrelated.

@yorickpeterse
Copy link

If it's of any use, I just backported these changes to GitLab (see https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2987 for the exact changes) and they work like a charm. If anybody wants to backport these as well they can dump the following code somewhere in their Rails application (e.g. an initializer): https://gist.github.com/YorickPeterse/00a4364ec11e3b63c2c3

@matthewd
Copy link
Member

Oops.. I'd left this to give a chance for second opinions, but then failed to come back to it. Sorry @gregnavis 😟

I like this implementation.

From a quick scroll through to reacquaint myself, I've spotted:

  • rename opclasses to opclass as you mentioned;
  • even though we're already far from perfect on this query construction, it's probably worth avoiding the space when no opclass is set;
  • opclass can be a non-hash, in which case that value applies to all columns (and the dumper should presumably take advantage of that, especially for the special-but-common case of a single-column index).

The inddef parsing is taking some liberties in assuming things about the names (e.g., that neither the column nor opclass name contains a space), but it looks like desc_order_columns, for example, is already being similarly presumptuous. So it seems fine to call that someone else's future problem.

I haven't looked at how bad the conflicts are after having neglected this for so long 😕

@gregnavis
Copy link
Contributor Author

No problem, @matthewd! I know you're super-busy with other stuff. I'll try to address these issues and rebase it on top of master next week. Stay tuned!

@gregnavis gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch 2 times, most recently from 77cf54d to 7eb4554 Compare May 13, 2017 02:28
@gregnavis
Copy link
Contributor Author

@matthewd I rebased the branch (the conflicts weren't that bad) and address the issues you mentioned.

@matthewd @jeremy @sgrif - please review and let me know if anything else should be done before merging

@gregnavis
Copy link
Contributor Author

@matthewd @jeremy @sgrif I'm floating this to the top of your inboxes. If there's anything I could do to make the PR better please let me know.

Copy link
Member

@matthewd matthewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder!

One little thing, and this looks ready to go! 🚀

@@ -391,6 +391,25 @@ def default_index_type?(index) # :nodoc:

private

def add_index_opclass(column_names, options = {})
opclass = case options[:opclass]
when String
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this misses Symbol?

Seems like it might read more easily if it were flipped a bit, making this the else clause -- then there's no need for a separate empty-hash branch.

index_name, index_type, index_columns, index_options, index_algorithm, index_using, comment = add_index_options(table_name, column_name, options)
execute("CREATE #{index_type} INDEX #{index_algorithm} #{quote_column_name(index_name)} ON #{quote_table_name(table_name)} #{index_using} (#{index_columns})#{index_options}").tap do
index_name, index_type, index_columns_and_opclasses, index_options, index_algorithm, index_using, comment = add_index_options(table_name, column_name, options)
execute("CREATE #{index_type} INDEX #{index_algorithm} #{quote_column_name(index_name)} ON #{quote_table_name(table_name)} #{index_using} (#{index_columns_and_opclasses})#{index_options}").tap do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the index_columns name is necessary? Actually this also includes index sort orders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the name to reflect the content. I don't have a strong opinion here so I can revert if you think it's unnecessary.

@@ -494,7 +494,39 @@ def test_dump_foreign_key_targeting_different_schema
end
end

class DefaultsUsingMultipleSchemasAndDomainTest < ActiveRecord::PostgreSQLTestCase
class SchemaIndexOpclassTest < ActiveRecord::TestCase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ActiveRecord::TestCase/ActiveRecord::PostgreSQLTestCase/

end
end

class DefaultsUsingMultipleSchemasAndDomainTest < ActiveSupport::TestCase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ActiveSupport::TestCase/ActiveRecord::PostgreSQLTestCase/

@gregnavis gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch 2 times, most recently from 10508bb to 5c68d97 Compare July 11, 2017 03:20
@gregnavis
Copy link
Contributor Author

@matthewd @kamipo thanks for feedback! I updated the PR as per your suggestions. It seems the build is failing for reasons unrelated to my changes.

@gregnavis gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch from 5c68d97 to d278636 Compare November 29, 2017 21:46
@gregnavis
Copy link
Contributor Author

@matthewd @kamipo I just rebase the PR to the recent master. Please let me know whether there's anything I could do to help to merge it.

@gregnavis
Copy link
Contributor Author

There's a CodeClimate error and it seems it prefers:

        def add_index_opclass(column_names, options = {})
          opclass = if options[:opclass].is_a?(Hash)
            options[:opclass].symbolize_keys
          else
            Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
          end
          # ...
        end

to

        def add_index_opclass(column_names, options = {})
          opclass = if options[:opclass].is_a?(Hash)
                      options[:opclass].symbolize_keys
                    else
                      Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
                    end
          # ...
        end

Is that style really preferred? If so, I'll update the PR.

@matthewd
Copy link
Member

I believe it will acquiesce to:

      def add_index_opclass(column_names, options = {})
        opclass =
          if options[:opclass].is_a?(Hash)
            options[:opclass].symbolize_keys
          else
            Hash.new { |hash, column| hash[column] = options[:opclass].to_s }
          end
        # ...
      end

super
end

# See http://www.postgresql.org/docs/current/static/errcodes-appendix.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny mismerge: we've lost an s

Add support for specifying non-default operator classes in PostgreSQL
indexes. An example CREATE INDEX query that becomes possible is:

    CREATE INDEX users_name ON users USING gist (name gist_trgm_ops);

Previously it was possible to specify the `gist` index but not the
custom operator class. The `add_index` call for the above query is:

    add_index :users, :name, using: :gist, opclasses: {name: :gist_trgm_ops}
@gregnavis gregnavis force-pushed the support-postgresql-operator-classes-in-indexes branch from d278636 to 1dca75c Compare November 30, 2017 09:47
@gregnavis
Copy link
Contributor Author

Eagle eye! I updated the PR. Please take another look, @matthewd.

@matthewd matthewd merged commit 8e7b9e2 into rails:master Dec 1, 2017
@matthewd
Copy link
Member

matthewd commented Dec 1, 2017

🎉

Sorry it took so long... and it got dropped so many times along the way 😞

And thanks for persisting -- I'm really glad to have this in.

Great work! ❤️

@gregnavis
Copy link
Contributor Author

Woohoo! 🚀 Thank you @matthewd and everyone else for making it happen.

@gregnavis gregnavis deleted the support-postgresql-operator-classes-in-indexes branch January 3, 2018 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.