New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple heads / branch resolution support #167

Closed
sqlalchemy-bot opened this Issue Jan 20, 2014 · 18 comments

Comments

Projects
None yet
1 participant
@sqlalchemy-bot

sqlalchemy-bot commented Jan 20, 2014

Migrated issue, originally created by Michael Bayer (@zzzeek)

this is a continuation of issue #114, which introduces the problem of merging revision branches that are also applied to target databases already. In that issue, it took me a really long time to get my sluggish mind around it and even after I "got it" I think my thinking was still muddy. Perhaps it still is now, but I'd like to re-establish this as a feature add with my latest thinking on it.

I've spent many months thinking about #114 and what always disturbed me was how to cleanly address my notion that to support a DAG (directed acyclic graph), we need to store every node that's been invoked in the versions table. Doing so would mean that all existing alembic version tables would need to be "upgraded", having every version that's been invoked be written out to the table. This in itself is extremely difficult to manage, especially for shops using flat SQL files to maintain versions on production databases which would need some special migration script that has a huge series of INSERT statements. And then, once the version table has been populated in this new way, now you can't use an older version of Alembic anymore either. And, you can't even easily tell what the "head" version is anymore without reconciling the version table with the structure of the version scripts themselves.

I can only assume that the use case I was thinking was needed, is the one that we talk about in the documentation; that of knitting together two branches into a single line:

A -> B -> Cb1 -> Db1 -> Eb1

A -> B -> Cb2 -> Db2

becomes

A -> B -> Cb1 -> Db1 -> Eb1 -> Cb2 -> Db2

that is, we knitted 'b1' and 'b2' into a single line - this is how we say to do it in the "Working with Branches" documentation (e.g. "splice"). So for alembic to figure out on databases that previously only had "b2" applied that "b1" also needs to be applied, it needs to examine the version table for every version and find those that aren't present.

But this really shouldn't be necessary. My thinking above is still missing the fact that as we support a DAG, we support multiple heads, which can be reconciled in the traditional way, using a branch point. For Alembic to determine which migrations are necessary in order to reconcile multiple heads only needs to store just what any other VCS stores, the heads.

The use case in the docs of "splicing" two branch points into a single line just wouldn't be necessary anymore. And in fact the manual editing of files wouldn't be needed anymore. The ScriptDirectory object already knows how to recognize multiple heads. All we need to do is have the migration system track these multiple heads as it proceeds. It also makes trivial the notion of an "alembic merge" command, which does nothing more than create a migration file that declares all the current heads as its dependencies. The only need for this file is to keep the migration stream clean; it would not be at all necessary and the system could be used in the "pure DAG" format suggested in #114 as an alternative mode of operation.

Suppose we start with a migration series like the above, with two heads:

A -> B -> Cb1 -> Db1 -> Eb1
       -> Cb2 -> Db2

A target database which is at version Db1 would proceed as follows:

  1. Identify Db1 from alembic_version as the current "head".

  2. Identify Eb1 and Db2 as the current target heads.

  3. As Db1 is the actual head invoked, we can determine that the full set of migration files that have been run is A, B, Cb1, Db1. We automatically know that Cb2 and Db2 also need to be run.

  4. We invoke Eb1, and upgrade this particular branch point in the version table. The current SQL for this is UPDATE alembic_version SET version='Eb1'. All we have to do is change this SQL to read UPDATE alembic_version SET version='Eb1' WHERE version='Db1'. That is, our UPDATE statements are now always local to a single head, and other heads in the version table remain unaffected.

  5. We then attend to b2. We invoke Cb2, and upgrade this particular branch point in the version table. We know that this branch isn't present, so we invoke INSERT INTO alembic_version VALUES ('Cb2').

  6. We continue on b2 and invoke Db2. We upgrade in alembic version via UPDATE alembic_version SET version='Db2' WHERE version='Cb2'.

  7. We now have two heads in the alembic_version table, and we are done.

Now, most users probably don't want to work in this mode, where they just maintain two heads like that. These users will want to create a "merge point", like this:

A -> B -> Cb1 -> Db1 -> Eb1  -> Eb1Db2_merge
       -> Cb2 -> Db2         ->

Alembic can by default deliver its usual message, "version history has two heads" (or three, or whatever), and suggest running alembic merge to create a merge point. The command just creates a new version file corresponding above to Eb1Db2_merge.

The process for taking our database from version Db1 to Eb1Db2_merge looks like this:

  1. Identify Db1 from alembic_version as the current "head".

  2. Identify Eb1Db2_merge as the current target head.

  3. Upon detection that Eb1Db2_merge has multiple parents, we again move to handle multiple heads at this point, and we go back to step 2 from the previous list of instructions.

  4. Upon completion of the "multiple heads" use case, we now have Eb1 and Db2 in alembic_version. We choose a single one of the heads as the UPDATE target and delete the rest: DELETE FROM alembic_version WHERE version='Db2'; UPDATE alembic_version SET version='Eb1Db2_merge' WHERE version='Eb1'.

  5. We're back to a single head.

Using this system, we now no longer have any upgrade path for users that upgrade to an Alembic that supports this feature; their alembic_version table is already in the correct format. These users can, if they choose, immediately start working with multiple heads and/or merge points, or not; the data written to alembic_version doesn't change.

The alembic revision command will behave as follows:

  1. When run on a series of migration files that only have a single head, the behavior is as it is now; a new file is created with the parent referring to this single head.

  2. When run on a series of files that have multiple heads, currently we deliver an error about "only a single head supported; manually edit the files". This error will be altered to read: "multiple heads are present. Please merge these using alembic merge, or specify an explicit parent revision to work with multiple heads".

  3. The user can either specify an explicit parent revision, as in alembic revision -p <parent_revision>, which will make this new file refer to that parent alone, or they can run alembic merge, which will create a do-nothing "merge point" which specifies all current heads as parents. This "merge point" file can of course be edited to refer only to specific heads if desired. The "merge point" reconciles the multiple heads back into one version, and invoking alembic revision again would use this merge point as the parent.

  4. the -p / --parent option to alembic revision must validate that the given parent version is in fact a head.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 20, 2014

Changes by Michael Bayer (@zzzeek):

  • edited description
1 similar comment
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 20, 2014

Changes by Michael Bayer (@zzzeek):

  • edited description
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 20, 2014

Michael Bayer (@zzzeek) wrote:

Issue #114 was marked as a duplicate of this issue.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2014

Changes by Michael Bayer (@zzzeek):

  • set milestone to "tier 1"
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Oct 29, 2014

Dimitrios Theodorou (@dtheodor) wrote:

Has any work started on this or are you open to contributions?

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Oct 29, 2014

Michael Bayer (@zzzeek) wrote:

I haven't started this one yet. This one, and the SQlite migrations which I have started, are the current priority. If you want to show me some patches towards making a pull request that would be great! I don't think I'm starting any new development for a couple of weeks.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 11, 2014

Michael Bayer (@zzzeek) wrote:

I'm starting to look into this now.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 11, 2014

Dimitrios Theodorou (@dtheodor) wrote:

Ok, nothing from my side.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 20, 2014

Michael Bayer (@zzzeek) wrote:

  • The "multiple heads / branches" feature has now landed. This is
    by far the most significant change Alembic has seen since its inception;
    while the workflow of most commands hasn't changed, and the format
    of version files and the alembic_version table are unchanged as well,
    a new suite of features opens up in the case where multiple version
    files refer to the same parent, or to the "base". Merging of
    branches, operating across distinct named heads, and multiple
    independent bases are now all supported. The feature incurs radical
    changes to the internals of versioning and traversal, and should be
    treated as "beta mode" for the next several subsequent releases
    within 0.7.
    fixes #167

5c747a0

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 20, 2014

Changes by Michael Bayer (@zzzeek):

  • changed status to closed
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 20, 2014

Nicolas Dietrich (@nidi) wrote:

Awesome! Looking forward to use it, thank you very much!!

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 21, 2014

MichaelE (@miracle2k) wrote:

Huge thanks from me, too.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 21, 2014

Michael Bayer (@zzzeek) wrote:

here's the docs, any comments on usage etc., now's the time, hoping for a sunday night release: http://alembic.readthedocs.org/en/latest/tutorial.html#working-with-branches. The basic branch-and-merge case is pretty uncontroversial, it's more the long-lived branch support where things get weird.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 21, 2014

Byron Young wrote:

Yes! We're really excited to have this feature. Thanks!

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 21, 2014

Tobias Bengfort wrote:

Sounds great! I just read the documentation and it is pretty extensive. Just one thing I noticed: If I use a branch label and there is a branch point later in that branch, what does branchname@head point to? Is there something like branchname@heads?

Example:

                                   --- 29f8
                                  /
<base> --> 3782 -----> 109e ---->
         (networking)             \
                                   --- 1975

For example, I would expect networking@head to give an error and networking@heads to point to 29f8 and 1975.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 21, 2014

Michael Bayer (@zzzeek) wrote:

@ximmm sure, here's a new section I added for that: http://alembic.readthedocs.org/en/latest/tutorial.html#more-label-syntaxes

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 23, 2014

Nicolas Dietrich (@nidi) wrote:

As a consequence of the branching feature, the stamp semantics probably needs to be adapted, i.e. allow stamping to multiple revisions, or alternatively allow unstamping.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 23, 2014

Michael Bayer (@zzzeek) wrote:

@NiDi - stamp has been adjusted. It does something a lot more complicated now which is that it has to calculate which branches logically need to be at the revision you're giving it, and adjust the whole table accordingly. but it works, try it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment