-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Rails generated index name being too long #47753
Conversation
I believe the test failures are unrelated/flakes. 😄 |
I'm wondering if we should always show the name for the generated migration (or maybe only when the default is too long?). |
One really nice thing about this is even if the user doesn't like the shorter auto-generated index name, they don't have to go search for the syntax to specify the index name in the migration file... it's been added for them! All they have to do is name the index whatever they want. 👌 It's a huge improvement on what happens today (searching the error on Google, reading StackOverflow, figuring out the appropriate option for specifying an index name, coming up with an index name.) |
Love that idea. I could look into updating the generator to add |
activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb
Outdated
Show resolved
Hide resolved
👍 on principle. |
I think the limit (on postgres at least) is 63 bytes, not characters. 🤔 Isn't this also a global limit for all columns? If so, shouldn't we try to enforce that everywhere? |
Should this check for existence of index names? "index_on_attribute1_with_a_very_long_name_attribute2_with_a_very_long_name_some_attribute"[..62]
=> "index_on_attribute1_with_a_very_long_name_attribute2_with_a_ver"
"index_on_attribute1_with_a_very_long_name_attribute2_with_a_very_long_name_another_attribute"[..62]
=> "index_on_attribute1_with_a_very_long_name_attribute2_with_a_ver" Or maybe the attribute names should be shortened? |
@zzak I just did some reading on the postgres limit and pushed a commit to swap the check to bytes, thank you! It's now 62 bytes which should safely satisfy all 3 databases. |
@p8 Thinking about this scenario. One potential downside of checking for index names would be: It could make the index name generator non-deterministic? If for example, a developer manually added an index to their database that isn't captured in the migration files. Might be safer to let the migration fail in that case and have them manually update it. |
Yes agreed. The index name shouldn't be derived from that database state, otherwise running migration in a different order may cause different index names. |
Reminds me of this old discussion from 2011! 😮 |
I think adding new heuristics is ok, but maybe we should also take the opportunity to improve what the error message is? That way there is no need to search on google, just tell users what they need to do. This would help people to for example know what to do if the new heuristics generate an index that clashes with another one. |
Unlike in 2011 😅, we have migration versioning, so this is safe to do without changing behaviour of existing migrations. (Please use migration versioning to maintain existing behaviour for old migrations.) If we're going to automatically hard-truncate (and immediately drop the table name, at huge risk of moving from "name too long" to "name not unique"), what if we start with the nice version, but then jump to e.g. We've previously chosen not to apply the same "meaningless mess" naming convention to indexes because it's useful to know what they're doing in EXPLAIN / hints / etc., but if it's acceptable to chop some detail in general, trading a couple of characters for automatic uniqueness seems worthwhile? |
@matthewd Nice, I'm playing around with that idea a bit now. This looks pretty good: |
4ab6491
to
542bcaf
Compare
Just pushed a commit to handle the uniqueness case 😄 . Thanks @matthewd |
So I was thinking that we'd immediately employ the digest as soon as we remove the table name. On Postgres, index names are database-global, so indexes without a table qualification are likely to clash on overlapping column names even without string truncation getting involved. |
Updated to include the hash as soon as the table name is removed. Will now be unique database wide. |
@mscoutermarsh can you have a look at migration versioning? https://github.com/rails/rails/blob/89bd41201a060e8e6ee39591430d136e9e012c34/activerecord/lib/active_record/migration/compatibility.rb This new behavior should only apply to migrations generated in 7.1. |
b1d9ce9
to
1e4139c
Compare
Thanks @byroot. Pushed an update, I think I have the migration versioning right. Tried to copy the patterns already there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The versioning looks good to me.
As for the formats, IMO we should try to original format, and if it's too long directly digest it. Adding more intermediate formats would just be more hard to predict the behavior.
Also once you think you are done, please squash your commits.
# Remove _and_ between columns | ||
name = "index_#{table_name}_on_#{Array(column) * '_'}" | ||
return name if name.bytesize <= MAX_GENERATED_INDEX_NAME_BYTES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this "semi-fallback" is really worth it. The behavior may be more easy to understand by users if we limit ourselves to just two formats.
# Remove table_name, add hash for uniqueness | ||
hashed_identifier = OpenSSL::Digest::SHA256.hexdigest(name).first(10) | ||
|
||
name = "index_on_#{Array(column) * '_'}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name = "index_on_#{Array(column) * '_'}" | |
name = ""index_#{table_name}_on_#{Array(column) * '_'}" |
If we're gonna truncate anyway, I'd say the table name is the part that is the more relevant to try to include?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO in most interesting contexts where you see indexes mentioned, they're inherently table-scoped: if you're looking at which index is being used to scan the users
table (or writing a hint to force it), the part that distinguishes this index from its peers is which columns it covers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but then I'd rather never include the table name?
I dunno, I don't like the idea of multiplying the patterns that can be use. I'm OK with having one nice human readable one, and one fallback one, but having two "human readable" ones seems wrong to me.
Not a strong opinion though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my opinion is that we have exactly two patterns: the "there's plenty of room" historical default that includes table + columns, and the "rest of the time" fallback that only lists columns plus a digest (protecting against inter-table column collisions, plus the edge case of collisions caused by truncation).
(I'm also still tempted to give it a different shorter prefix like ix_
, both to emphasize that this is a Mode Two name, and to conserve name chars given they're apparently not plentiful... also not strong on that though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I pushed an update based on this convo. I like the idea of keeping it simple with only 2 versions.
Here's how it works now:
If the original version is too long, we immediately fallback to the "short version"
short format example:
ix_on_foo_bar_first_name_last_name_administrator_5939248142
@@ -1510,6 +1530,25 @@ def valid_primary_key_options # :nodoc: | |||
end | |||
|
|||
private | |||
MAX_GENERATED_INDEX_NAME_BYTES = 62 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not for this PR, but is it possible to ask the adapter what the max length is? Different DBs have different restrictions, so it makes sense that each adapter declare the max length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking into that. I believe the risk here is that if that value does change for the DB, then running past migrations could result in different names.
That does feel pretty unlikely to happen. But mitigating that possibility is the main benefit here of hard coding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we definitely don't want it to be per-database dynamic, but it does seem reasonable to make it static per DB adapter class. While I don't particularly care to claw back the extra one or two characters (per your analysis of consistent max lengths across the platforms we support), it does maybe make sense to accomodate other platforms that might have notably different (especially shorter) lengths.
Just to write the words: I don't think it's worth considering situations where the max length invalidates our assumptions around fixed-length parts of our generated name. But apparently Oracle's max length was 30 not terribly long ago, and now it and MSSQL are both at 128 -- which is high enough to legitimately raise an eyebrow if we start squishing to fit an imagined lower limit.
Making it easier for a database adapter to vary the maximum, either dynamically or merely across versions, does create more possibility for problems that a totally static value would avoid... but realistically I think there are plenty of other ways a custom adapter can create misadventure, so it's probably fine?
hashed_identifier = OpenSSL::Digest::SHA256.hexdigest(name).first(10) | ||
|
||
name = "index_on_#{Array(column) * '_'}" | ||
short_name = name.mb_chars.limit(MAX_GENERATED_INDEX_NAME_BYTES - 11).to_s.chomp("_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to move this 11
into a constant with a comment (or perhaps also the 10
above)? A future reader may not understand where this magic number comes from or that it's related to the hash length plus underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, thanks!
@@ -54,6 +54,10 @@ def add_column(table_name, column_name, type, **options) | |||
super | |||
end | |||
|
|||
def add_index(table_name, column_name, **options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't re-read enough context to check, but can this be def generate_index_name(table_name, column)
with the body from legacy_index_name
?
"Hidden" options like this are an available fallback when all other options fail, but they should be a last resort: they push the legacy behaviour into the modern implementation instead of isolating it in the compatibility handling -- plus it introduces a singular "legacy" state, losing the version-specific scoping we start with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be much nicer! I originally tried that approach. It might be possible but I was struggling to find a way to make it work.
The challenge was, it looks to me like the migrator
will call the overridden method if available (such as add_index
). If not, it falls back to method_missing.
When I tried overriding generate_index_name
from Compatibility, it was never being called. I believe because the migrator is never directly calling it.
Possible idea:
Maybe I could do something like...
# in compatibility
def add_index(...)
options[:name] = generate_index_name(...) if options[:name].nil?
super
end
Then it would be setting the name explicitly using the "legacy" method as if that is what the user passed in.
I can play around with this more later, just sharing the context and wondering your opinion. 😄 Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, that makes sense.
Yeah, that :name
approach sounds viable 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it working! No more hidden option.
e5f7cea
to
7847f15
Compare
Looking good. I think the last outstanding thing is whether to move that constant into adapters. I think we can just define def max_index_name_size
62
end And call that. |
98fb1b8
to
93a0602
Compare
Thanks @byroot! Just pushed an update with that change. I'm guessing when people implement a custom adapter, they could now override the value if they wanted? |
Exactly. Now everything looks good to me. I'll wait a bit to see if Matthew has any other concerns, but if not I'll merge soon. Thanks for the feature. |
@mscoutermarsh can you squash to avoid that |
This updates the index name generation to always create a valid index name if one is not passed by the user. Set the limit to 62 bytes to ensure it works for the default configurations of Sqlite, mysql & postgres. MySQL: 64 Postgres: 63 Sqlite: 62 When over the limit, we fallback to a "short format" that includes a hash to guarantee uniqueness in the generated index name.
d45f8c1
to
3682aa1
Compare
@byroot squashed ✅ |
I just chatted with Matthew, he's 👍 |
NB: I changed the |
Thanks everyone for your help on this! ❤️ |
Thanks Mike! This is going to be very helpful. |
I think we need compatibility handling on |
@matthewd I can look into it! 😄 |
@matthewd I looked into this and found it already works without changes. Check this out: rails/activerecord/lib/active_record/connection_adapters/abstract/schema_statements.rb Lines 1581 to 1583 in c7f06b5
Took me quite a while to understand what is happening here. The remove_index code does not use directly use index name generation to determine which index name to remove. It instead checks if based on the table/column name, the generated index name is the same. If so, then it uses the name of the index set on those columns already. Found a test that covers this behavior as well: rails/activerecord/test/cases/migration/index_test.rb Lines 204 to 213 in c7f06b5
|
Previously, if you were using a pre-7.1 migration and you were adding an index using `create_table`, the index would be named using the new index naming functionality introduced in rails#47753.
ActiveRecord added max_index_name_size in rails/rails#47753 and picked a default of 62. Oracle 12.2 and greater supports 128 and the difference broke spec/active_record/connection_adapters/oracle_enhanced/schema_statements_spec.rb:330 since our test index has a bytesize of 64, too long for active record's defaults which meant the call to super in index_name was returning a shortened version. The fix is to define max_index_name_size in oracle enhanced with the 128 limit. Althought active record defines it in SchemaStatements, I put it in OracleEnhancedAdapter were we define other methods based on the max_identifier_length.
Motivation / Background
Frequently I find myself hitting my database's "is too long" error when adding a compound index.
When this happens, I need to manually set a shorter name to run the migration.
Detail
This updates the index name generation to always create a valid index name if one is not passed by the user.
I set the limit to 62 to ensure it works for the default configurations of sqlite, mysql & postgres.
MySQL: 64
Postgres: 63
Sqlite: 62
I considered using
index_name_length
. But that value could change, resulting in inconsistent index names when migrations are run.Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]