Fix Rails generated index name being too long #47753

mscoutermarsh · 2023-03-23T19:42:39Z

Motivation / Background

Frequently I find myself hitting my database's "is too long" error when adding a compound index.

When this happens, I need to manually set a shorter name to run the migration.

Detail

This updates the index name generation to always create a valid index name if one is not passed by the user.

I set the limit to 62 to ensure it works for the default configurations of sqlite, mysql & postgres.

MySQL: 64
Postgres: 63
Sqlite: 62

I considered using index_name_length. But that value could change, resulting in inconsistent index names when migrations are run.

Checklist

Before submitting the PR make sure the following are checked:

This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
Tests are added or updated if you fix a bug or add a feature.
CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

mscoutermarsh · 2023-03-23T20:01:44Z

I believe the test failures are unrelated/flakes. 😄

p8 · 2023-03-23T20:23:26Z

I'm wondering if we should always show the name for the generated migration (or maybe only when the default is too long?).
This would allow changing the shortened name and maybe making it less magical?

andrewculver · 2023-03-23T21:55:12Z

One really nice thing about this is even if the user doesn't like the shorter auto-generated index name, they don't have to go search for the syntax to specify the index name in the migration file... it's been added for them! All they have to do is name the index whatever they want. 👌 It's a huge improvement on what happens today (searching the error on Google, reading StackOverflow, figuring out the appropriate option for specifying an index name, coming up with an index name.)

mscoutermarsh · 2023-03-23T23:05:55Z

Love that idea. I could look into updating the generator to add name to the add_index calls. Not sure if that makes sense to include in this PR, or separately?

activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb

byroot · 2023-03-24T08:01:47Z

👍 on principle.

zzak · 2023-03-24T09:15:41Z

I think the limit (on postgres at least) is 63 bytes, not characters. 🤔

Isn't this also a global limit for all columns? If so, shouldn't we try to enforce that everywhere?

p8 · 2023-03-24T09:20:16Z

Should this check for existence of index names?
The following would be shortened to the same names:

"index_on_attribute1_with_a_very_long_name_attribute2_with_a_very_long_name_some_attribute"[..62]
=> "index_on_attribute1_with_a_very_long_name_attribute2_with_a_ver"
"index_on_attribute1_with_a_very_long_name_attribute2_with_a_very_long_name_another_attribute"[..62]
=> "index_on_attribute1_with_a_very_long_name_attribute2_with_a_ver"

Or maybe the attribute names should be shortened?
"index_on_attribute1_attribute2_another_attribute"
On the other hand this might be something for the developer to decide.

mscoutermarsh · 2023-03-24T13:12:06Z

@zzak I just did some reading on the postgres limit and pushed a commit to swap the check to bytes, thank you! It's now 62 bytes which should safely satisfy all 3 databases.

mscoutermarsh · 2023-03-24T13:23:51Z

Should this check for existence of index names?

@p8 Thinking about this scenario. One potential downside of checking for index names would be: It could make the index name generator non-deterministic? If for example, a developer manually added an index to their database that isn't captured in the migration files.

Might be safer to let the migration fail in that case and have them manually update it.

byroot · 2023-03-24T13:25:47Z

Yes agreed. The index name shouldn't be derived from that database state, otherwise running migration in a different order may cause different index names.

Fryguy · 2023-03-25T14:12:50Z

Reminds me of this old discussion from 2011! 😮

#1993

rafaelfranca · 2023-03-25T15:11:10Z

I think adding new heuristics is ok, but maybe we should also take the opportunity to improve what the error message is? That way there is no need to search on google, just tell users what they need to do. This would help people to for example know what to do if the new heuristics generate an index that clashes with another one.

matthewd · 2023-03-25T16:21:42Z

Unlike in 2011 😅, we have migration versioning, so this is safe to do without changing behaviour of existing migrations. (Please use migration versioning to maintain existing behaviour for old migrations.)

If we're going to automatically hard-truncate (and immediately drop the table name, at huge risk of moving from "name too long" to "name not unique"), what if we start with the nice version, but then jump to e.g. "ix_#{truncated}_#{hash}", a la FKs? https://github.com/rails/rails/blob/15c53ac37188e0ef45e0e4bf981b82ad32ae3569/activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb#L1676-L1678

We've previously chosen not to apply the same "meaningless mess" naming convention to indexes because it's useful to know what they're doing in EXPLAIN / hints / etc., but if it's acceptable to chop some detail in general, trading a couple of characters for automatic uniqueness seems worthwhile?

mscoutermarsh · 2023-03-25T18:56:25Z

@matthewd Nice, I'm playing around with that idea a bit now.

This looks pretty good: index_on_foo_bar_first_name_last_name_very_long_col_f807dfa58f. The "mess" does seem worth the benefit of uniqueness. I'll push up code soon.

mscoutermarsh · 2023-03-25T19:34:43Z

Just pushed a commit to handle the uniqueness case 😄 . Thanks @matthewd

matthewd · 2023-03-26T15:51:25Z

So I was thinking that we'd immediately employ the digest as soon as we remove the table name. On Postgres, index names are database-global, so indexes without a table qualification are likely to clash on overlapping column names even without string truncation getting involved.

mscoutermarsh · 2023-03-26T17:56:58Z

Updated to include the hash as soon as the table name is removed. Will now be unique database wide.

byroot · 2023-03-26T17:59:49Z

Please use migration versioning to maintain existing behaviour for old migrations.

@mscoutermarsh can you have a look at migration versioning? https://github.com/rails/rails/blob/89bd41201a060e8e6ee39591430d136e9e012c34/activerecord/lib/active_record/migration/compatibility.rb

This new behavior should only apply to migrations generated in 7.1.

mscoutermarsh · 2023-03-26T22:42:25Z

Thanks @byroot. Pushed an update, I think I have the migration versioning right. Tried to copy the patterns already there.

byroot

The versioning looks good to me.

As for the formats, IMO we should try to original format, and if it's too long directly digest it. Adding more intermediate formats would just be more hard to predict the behavior.

Also once you think you are done, please squash your commits.

byroot · 2023-03-27T08:37:52Z

activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb

+          # Remove _and_ between columns
+          name = "index_#{table_name}_on_#{Array(column) * '_'}"
+          return name if name.bytesize <= MAX_GENERATED_INDEX_NAME_BYTES


I wonder if this "semi-fallback" is really worth it. The behavior may be more easy to understand by users if we limit ourselves to just two formats.

byroot · 2023-03-27T08:39:03Z

activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb

+          # Remove table_name, add hash for uniqueness
+          hashed_identifier = OpenSSL::Digest::SHA256.hexdigest(name).first(10)
+
+          name = "index_on_#{Array(column) * '_'}"


Suggested change

name = "index_on_#{Array(column) * '_'}"

name = ""index_#{table_name}_on_#{Array(column) * '_'}"

If we're gonna truncate anyway, I'd say the table name is the part that is the more relevant to try to include?

IMO in most interesting contexts where you see indexes mentioned, they're inherently table-scoped: if you're looking at which index is being used to scan the users table (or writing a hint to force it), the part that distinguishes this index from its peers is which columns it covers.

Right, but then I'd rather never include the table name?

I dunno, I don't like the idea of multiplying the patterns that can be use. I'm OK with having one nice human readable one, and one fallback one, but having two "human readable" ones seems wrong to me.

Not a strong opinion though.

Yeah, my opinion is that we have exactly two patterns: the "there's plenty of room" historical default that includes table + columns, and the "rest of the time" fallback that only lists columns plus a digest (protecting against inter-table column collisions, plus the edge case of collisions caused by truncation).

(I'm also still tempted to give it a different shorter prefix like ix_, both to emphasize that this is a Mode Two name, and to conserve name chars given they're apparently not plentiful... also not strong on that though.)

Thanks! I pushed an update based on this convo. I like the idea of keeping it simple with only 2 versions.

Here's how it works now:
If the original version is too long, we immediately fallback to the "short version"

short format example:

ix_on_foo_bar_first_name_last_name_administrator_5939248142

Fryguy · 2023-03-27T15:07:07Z

activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb

@@ -1510,6 +1530,25 @@ def valid_primary_key_options # :nodoc:
      end

      private
+        MAX_GENERATED_INDEX_NAME_BYTES = 62


Perhaps not for this PR, but is it possible to ask the adapter what the max length is? Different DBs have different restrictions, so it makes sense that each adapter declare the max length.

I was looking into that. I believe the risk here is that if that value does change for the DB, then running past migrations could result in different names.

That does feel pretty unlikely to happen. But mitigating that possibility is the main benefit here of hard coding it.

Yeah we definitely don't want it to be per-database dynamic, but it does seem reasonable to make it static per DB adapter class. While I don't particularly care to claw back the extra one or two characters (per your analysis of consistent max lengths across the platforms we support), it does maybe make sense to accomodate other platforms that might have notably different (especially shorter) lengths.

Just to write the words: I don't think it's worth considering situations where the max length invalidates our assumptions around fixed-length parts of our generated name. But apparently Oracle's max length was 30 not terribly long ago, and now it and MSSQL are both at 128 -- which is high enough to legitimately raise an eyebrow if we start squishing to fit an imagined lower limit.

Making it easier for a database adapter to vary the maximum, either dynamically or merely across versions, does create more possibility for problems that a totally static value would avoid... but realistically I think there are plenty of other ways a custom adapter can create misadventure, so it's probably fine?

Fryguy · 2023-03-27T15:09:30Z

activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb

+          hashed_identifier = OpenSSL::Digest::SHA256.hexdigest(name).first(10)
+
+          name = "index_on_#{Array(column) * '_'}"
+          short_name = name.mb_chars.limit(MAX_GENERATED_INDEX_NAME_BYTES - 11).to_s.chomp("_")


Would it be possible to move this 11 into a constant with a comment (or perhaps also the 10 above)? A future reader may not understand where this magic number comes from or that it's related to the hash length plus underscore.

For sure, thanks!

matthewd · 2023-03-27T15:22:23Z

activerecord/lib/active_record/migration/compatibility.rb

@@ -54,6 +54,10 @@ def add_column(table_name, column_name, type, **options)
          super
        end

+        def add_index(table_name, column_name, **options)


I haven't re-read enough context to check, but can this be def generate_index_name(table_name, column) with the body from legacy_index_name?

"Hidden" options like this are an available fallback when all other options fail, but they should be a last resort: they push the legacy behaviour into the modern implementation instead of isolating it in the compatibility handling -- plus it introduces a singular "legacy" state, losing the version-specific scoping we start with.

That would be much nicer! I originally tried that approach. It might be possible but I was struggling to find a way to make it work.

The challenge was, it looks to me like the migrator will call the overridden method if available (such as add_index). If not, it falls back to method_missing.

When I tried overriding generate_index_name from Compatibility, it was never being called. I believe because the migrator is never directly calling it.

Possible idea:
Maybe I could do something like...

# in compatibility def add_index(...) options[:name] = generate_index_name(...) if options[:name].nil? super end

Then it would be setting the name explicitly using the "legacy" method as if that is what the user passed in.

I can play around with this more later, just sharing the context and wondering your opinion. 😄 Thanks!

Ah, yeah, that makes sense.

Yeah, that :name approach sounds viable 👍

Got it working! No more hidden option.

byroot · 2023-03-28T08:34:00Z

Looking good. I think the last outstanding thing is whether to move that constant into adapters.

I think we can just define

def max_index_name_size
  62
end

And call that.

mscoutermarsh · 2023-03-28T12:26:59Z

Thanks @byroot! Just pushed an update with that change.

I'm guessing when people implement a custom adapter, they could now override the value if they wanted?

byroot · 2023-03-28T12:29:28Z

I'm guessing when people implement a custom adapter, they could now override the value if they wanted?

Exactly.

Now everything looks good to me. I'll wait a bit to see if Matthew has any other concerns, but if not I'll merge soon.

Thanks for the feature.

byroot · 2023-03-28T21:24:26Z

@mscoutermarsh can you squash to avoid that Merge commit please? We try to keep a clean history.

This updates the index name generation to always create a valid index name if one is not passed by the user. Set the limit to 62 bytes to ensure it works for the default configurations of Sqlite, mysql & postgres. MySQL: 64 Postgres: 63 Sqlite: 62 When over the limit, we fallback to a "short format" that includes a hash to guarantee uniqueness in the generated index name.

mscoutermarsh · 2023-03-28T21:36:17Z

@byroot squashed ✅

byroot · 2023-03-29T07:24:59Z

I just chatted with Matthew, he's 👍

Followup: #47753

byroot · 2023-03-29T07:29:24Z

NB: I changed the ix_ prefix for idx_: 59f2b06

mscoutermarsh · 2023-03-29T12:26:06Z

Thanks everyone for your help on this! ❤️

cheshire137 · 2023-03-30T01:16:31Z

Thanks Mike! This is going to be very helpful.

matthewd · 2023-04-03T18:46:19Z

I think we need compatibility handling on remove_index too 🤔

mscoutermarsh · 2023-04-03T18:56:18Z

@matthewd I can look into it! 😄

mscoutermarsh · 2023-04-04T13:58:43Z

@matthewd I looked into this and found it already works without changes.

Check this out:

rails/activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb

Lines 1581 to 1583 in c7f06b5

    
           if column_names.present? && !(options.key?(:name) && expression_column_name?(column_names)) 
        
             checks << lambda { |i| index_name(table_name, i.columns) == index_name(table_name, column_names) } 
        
           end

Took me quite a while to understand what is happening here.

The remove_index code does not use directly use index name generation to determine which index name to remove. It instead checks if based on the table/column name, the generated index name is the same. If so, then it uses the name of the index set on those columns already.

Found a test that covers this behavior as well:

rails/activerecord/test/cases/migration/index_test.rb

Lines 204 to 213 in c7f06b5

    
           def test_remove_named_index 
        
             connection.add_index :testings, :foo, name: "index_testings_on_custom_index_name" 
        
             assert connection.index_exists?(:testings, :foo) 
        
             assert_raise(ArgumentError) { connection.remove_index(:testings, "custom_index_name") } 
        
             connection.remove_index :testings, :foo 
        
             assert_not connection.index_exists?(:testings, :foo) 
        
           end

Previously, if you were using a pre-7.1 migration and you were adding an index using `create_table`, the index would be named using the new index naming functionality introduced in rails#47753.

ActiveRecord added max_index_name_size in rails/rails#47753 and picked a default of 62. Oracle 12.2 and greater supports 128 and the difference broke spec/active_record/connection_adapters/oracle_enhanced/schema_statements_spec.rb:330 since our test index has a bytesize of 64, too long for active record's defaults which meant the call to super in index_name was returning a shortened version. The fix is to define max_index_name_size in oracle enhanced with the 128 limit. Althought active record defines it in SchemaStatements, I put it in OracleEnhancedAdapter were we define other methods based on the max_identifier_length.

rails-bot bot added the activerecord label Mar 23, 2023

byroot reviewed Mar 24, 2023

View reviewed changes

activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb Outdated Show resolved Hide resolved

mscoutermarsh force-pushed the auto-gen-index-name branch from 4ab6491 to 542bcaf Compare March 25, 2023 19:13

mscoutermarsh force-pushed the auto-gen-index-name branch 2 times, most recently from b1d9ce9 to 1e4139c Compare March 26, 2023 21:31

byroot reviewed Mar 27, 2023

View reviewed changes

Fryguy reviewed Mar 27, 2023

View reviewed changes

matthewd reviewed Mar 27, 2023

View reviewed changes

mscoutermarsh force-pushed the auto-gen-index-name branch 2 times, most recently from e5f7cea to 7847f15 Compare March 27, 2023 15:29

mscoutermarsh force-pushed the auto-gen-index-name branch from 98fb1b8 to 93a0602 Compare March 28, 2023 12:23

mscoutermarsh force-pushed the auto-gen-index-name branch from d45f8c1 to 3682aa1 Compare March 28, 2023 21:35

byroot merged commit c792c71 into rails:main Mar 29, 2023

byroot added a commit that referenced this pull request Mar 29, 2023

Use idx_ instead of ix_ for index prefix

59f2b06

Followup: #47753

gazayas mentioned this pull request Apr 3, 2023

Remove unneeded script to trim index name bullet-train-co/bullet_train#718

Merged

aharpole mentioned this pull request Apr 4, 2023

Ensure pre-7.1 migrations use legacy index names when using create_table #47863

Merged

4 tasks

rkrage mentioned this pull request Sep 17, 2023

Ensure that generated index names are <= 63 bytes braintree/pg_ha_migrations#89

Closed

p8 mentioned this pull request Oct 11, 2023

Make the table name too long error message more explicit for Postgres #49558

Closed

rkrage mentioned this pull request Nov 20, 2023

Generate index names less than 63 bytes braintree/pg_ha_migrations#94

Merged

r7kamura mentioned this pull request Jan 22, 2024

ActiveRecord::Migration[7.0]'s rename_table uses 7.1's new truncated index name format #50833

Closed

fatkodima mentioned this pull request Jan 22, 2024

Ensure pre-7.1 migrations use legacy index names when using rename_table #50837

Merged

	name = "index_on_#{Array(column) * '_'}"
	name = ""index_#{table_name}_on_#{Array(column) * '_'}"

Fix Rails generated index name being too long #47753

Fix Rails generated index name being too long #47753

Conversation

mscoutermarsh commented Mar 23, 2023 • edited Loading

Motivation / Background

Detail

Checklist

mscoutermarsh commented Mar 23, 2023

p8 commented Mar 23, 2023

andrewculver commented Mar 23, 2023 • edited Loading

mscoutermarsh commented Mar 23, 2023

byroot commented Mar 24, 2023

zzak commented Mar 24, 2023 • edited Loading

p8 commented Mar 24, 2023 • edited Loading

mscoutermarsh commented Mar 24, 2023

mscoutermarsh commented Mar 24, 2023

byroot commented Mar 24, 2023

Fryguy commented Mar 25, 2023

rafaelfranca commented Mar 25, 2023

matthewd commented Mar 25, 2023

mscoutermarsh commented Mar 25, 2023

mscoutermarsh commented Mar 25, 2023

matthewd commented Mar 26, 2023

mscoutermarsh commented Mar 26, 2023

byroot commented Mar 26, 2023

mscoutermarsh commented Mar 26, 2023

byroot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

byroot commented Mar 28, 2023

mscoutermarsh commented Mar 28, 2023

byroot commented Mar 28, 2023

byroot commented Mar 28, 2023

mscoutermarsh commented Mar 28, 2023

byroot commented Mar 29, 2023

byroot commented Mar 29, 2023

mscoutermarsh commented Mar 29, 2023

cheshire137 commented Mar 30, 2023

matthewd commented Apr 3, 2023

mscoutermarsh commented Apr 3, 2023

mscoutermarsh commented Apr 4, 2023

mscoutermarsh commented Mar 23, 2023 •

edited

Loading

andrewculver commented Mar 23, 2023 •

edited

Loading

zzak commented Mar 24, 2023 •

edited

Loading

p8 commented Mar 24, 2023 •

edited

Loading