-
Notifications
You must be signed in to change notification settings - Fork 58
make it a little harder to mess up new db schema migrations #5274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For the record, I've confirmed that if you make only this change to "main" (where the current version is 43):
and run the test suite, after 25 minutes, 16 tests (the max that nextest runs concurrently on this machine?) appear hung: they've been running for 24+ minutes. I think when I dug into this previously, they were waiting for the db version to catch up to Nexus's (which would never happen). Similarly, from the same commit, I undid that diff and applied this:
with the same result: after 17 minutes, I had 16 tests that had been hung for 16 minutes. This makes sense -- again, Nexus would be waiting for a catch-up that's never going to happen. I don't expect this to be changed by this PR. |
To test the merge/conflict behavior, I did this:
Here's what the merge looks like. Starting from this PR's branch:
Create the simulated "main" branch:
Merge in one branch:
Merge in the other:
Good! Here's what the conflict looks like (I have merge.conflictstyle=diff3 so it looks a little different than the default):
The hand-merged result looks like this:
You also need to remember to update SCHEMA_VERSION above and dbinit.sql (which is annoying, but tests will at least not pass if you forget either or both of these). All three of these branches are on GitHub if folks want to look or try this out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, I hope it makes the process a little less conflict prone!
schema/crdb/README.adoc
Outdated
these `schema/crdb/NAME/upN.sql` for as many `N` as you need, staring with | ||
`N=1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these `schema/crdb/NAME/upN.sql` for as many `N` as you need, staring with | |
`N=1`. | |
these `schema/crdb/NAME/upN.sql` for as many `N` as you need, starting with | |
`N=1`. |
* Update `schema/crdb/dbinit.sql`: | ||
** Update the SQL statements to match what the database should look like | ||
after your up*.sql files are applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Practically, I do this first, and save my schema changes for later. I wonder if we should recommend this first? "All tests except the schema tests" should pass if you only change dbinit.sql
, and don't change the schema version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea and I looked at it but it makes the instructions more complicated because up front you update the schema parts of dbinit.sql and later you go back to it. Reading it straight through it reads like "why did you tell me to go back to the same file twice"? I'd welcome edits here but otherwise I'm just going to leave it.
I also think this will be less necessary now. I did the same as you because I was worried about dealing with merges, but now that's pretty easy so I will probably just go ahead and do the migrations earlier than I used to.
// | leaving the first copy as an example for the next person. | ||
// v | ||
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"), | ||
KnownVersion::new(45, "migration-demo"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this in a way that encourages "only bumping major version" for now!
As a super nitpicky nitpick, can we change the name from migration-demo
to like first-named-migration
? At first glance, I thought it was test data, but I realize it actually is a durable long-lived migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll second the super nitpicky nitpick; the name threw me off also. (I was wondering if it was a demo that should be removed before this lands.)
/// All versions have an associated SemVer. We only use the major number. | ||
semver: SemverVersion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We only use the major number"
This isn't strictly true, we do keep them ordered by minor/patch versions too, we just can't ensure backwards compatibility and recommend only bumping the major number.
I only bring this up because there already are legacy versions that only differ by minor versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what I meant was not "only the major number is ever non-zero" but rather "we only use the major number to determine compatibility". I'll clarified this comment.
// | leaving the first copy as an example for the next person. | ||
// v | ||
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"), | ||
KnownVersion::new(45, "migration-demo"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll second the super nitpicky nitpick; the name threw me off also. (I was wondering if it was a demo that should be removed before this lands.)
/// from the previous version to this version. | ||
fn new(major: u64, relative_path: &str) -> KnownVersion { | ||
let semver = SemverVersion::new(major, 0, 0); | ||
KnownVersion { semver, relative_path: relative_path.to_owned() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to propose that we make the relative path for these {semver}-{name}
so that the directory also remains sorted. But I don't feel super strongly about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this but it makes the instructions for merging more complicated (you have to git mv
your directory as part of resolving a conflict, and that wouldn't necessarily be obvious) and I'm not sure it helps that much because they don't appear in sorted order in ls(1)
output and on GitHub anyway (because it's a lexicographic sort, not a semver-aware sort).
let target_versions: Vec<&SchemaVersion> = all_versions | ||
.versions_range(( | ||
Bound::Excluded(&found_version), | ||
Bound::Included(&desired_version), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nitpick and I realize this didn't change, but do you think this would be clearer as .versions_range(&found_version..=&desired_version)
? I don't usually see explicit Bound
tuples.
Here's the heads-up mail I plan to send about this. If you do not actively work on Omicron, you can ignore this mail. This pull request:
changes the way developers make changes to the Omicron (CockroachDB) database schema. The official instructions have been updated:
The main change to how you change the database schema is:
The goals are that:
This is all in the new instructions. Please reach out if you have any trouble! |
// | leaving the first copy as an example for the next person. | ||
// v | ||
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"), | ||
KnownVersion::new(46, "first-named-migration"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Came up in #oxide-update
today, but some schema changes will require multiple versions to be correct. (Came up for #5032 earlier, and also for #5287 which requires three migrations.) It would be nice to be able to express an ordered list of versions this way, maybe within a single directory have a bunch of subdirectories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Though we think we can simplify this constraint and not require that, by changing the schema version after each file rather than each directory.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Though we think we can simplify this constraint and not require that, by changing the schema version after each file rather than each directory.)
@jgallagher is going to be doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't quite follow the problem. @sunshowers Is this PR blocked on that work? Does it make this situation worse?
Came up in #oxide-update today, but some schema changes will require multiple versions to be correct. (Came up for #5032 earlier, and also for #5287 which requires three migrations.) It would be nice to be able to express an ordered list of versions this way, maybe within a single directory have a bunch of subdirectories.
For a PR that needs to land more than one separate schema update, is there any reason not to just define two new KnownVersion
values with separate major numbers? i.e., why does this use case not fit into the infrastructure we already have / what's added in this PR?
let target_versions: Vec<&SchemaVersion> = all_versions | ||
.versions_range(&found_version..=&desired_version) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that I'm really late on this, but after rebasing on top of this locally -- this snippet changes the behavior of updates, by changing the bounds.
- Before, the "current version" was excluded from the set of target versions (
Bound::Excluded(¤t_version)
). - Now, it's included (in the
&found_version..
syntax).
This means that after this PR has merged, the "current" version is always re-applied during schema migrations, even if it has been fully applied before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! That's definitely my bad; should've been ..
instead of ..=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's exactly right, I think ..
is "start inclusive, end exclusive". I believe we actually want "start exclusive, end inclusive", which doesn't have a short-hand syntax. That's why I was using std::ops::Bound::Excluded
explicitly before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooof. Doubly sorry then. :( Do you want to fix this on #5293 or should I open a fix just for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm patching it in 5293, so no need for something else unless the urgency seems like it should be higher
This is something we've talked about for a while. Many ideas have been floated but as far as I know nobody's implemented any of them. Here's a time-boxed attempt to make a tangible improvement. It doesn't solve all the problems.
My goals here were:
SCHEMA_VERSION
in Rust or the schema version indb_metadata
. This was already true viadbinit_version_matches_version_known_to_nexus
. Unfortunately I've found that if you get this wrong, many tests wind up hanging (because Nexus is sitting waiting for the database to be updated), so you'd have to be a little lucky to notice this one explicit failure. We could have a Nexus config option that causes it to explode if the version doesn't match and then use that only in the test suite to try to catch this better but I decided to punt on this in this PR.The approach I took:
KNOWN_VERSIONS
. This means if two changes attempt to add the same item at the same spot in the array, they should cause a git conflict. (I still want to test this.) It is conceivable that incorrect changes would not generate a conflict (e.g., if somebody added their version in the wrong spot in the array) but I've added a lot of tests to verify that the array looks like it should (i.e., the versions can't be out of order, there can't be any gaps, etc.).KNOWN_VERSIONS
. The developer's expected to create their own unique string for this (e.g.,drop-table-services
). This way, if you do wind up conflicting with somebody, all you have to do is fix up theKNOWN_VERSIONS
table.KnownVersion::new
, you have to provide a directory name and the test suite makes sure it does not contain the version number in it).KnownVersion::legacy()
constructor for those and you should get a test failure if you try to use that for any new migrations.)I'm really interested to know if people feel like this would be a net improvement. I know there were a lot of other ideas floated (and I'd be just as happy if someone implemented one of those instead!) but I'm trying to avoid letting the perfect be the enemy of the good.
Edit: there's an example migration in 89416f2.