New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple versions directories #124

Closed
sqlalchemy-bot opened this Issue May 13, 2013 · 36 comments

Comments

Projects
None yet
1 participant
@sqlalchemy-bot

sqlalchemy-bot commented May 13, 2013

Migrated issue, originally created by Wichert Akkerman (@wichert)

I have a collection of packages, each of which import one or more models that may be used in an application. Each package can require its own migrations, but will may depend on migrations from other packages it depends on as well. I am looking at what it would take to use alembic in this type of environment. I see two basic approaches: manually importing versions, or allowing versions in multiple places.

Manual importing of versions

One option is to turn the versions directory into a package and import the downgrade and upgrade functions defined there in a version definition of another package. This does not require any changes in alembic

from other.package.versions.v3_0 import downgrade as downgrade_other
from other.package.versions.v3_0 import upgrade as upgrade_other

def upgrade():
    upgrade_other()
    # Upgrade logic for this package

def downgrade():
    downgrade_other()
    # Downgrade logic for this package

This works, but quickly becomes problematic as soon as the dependencies are not completely trivial.

Versions in multiple places

Alembic has an undocumented option allowing you use a dotted path as version location. It should be easy to extend this to allow multiple version directories.

[alembic]
script_location =
    other.package:upgrade
    other.package2:upgrade
    my_package:upgrade

You can then treat each package as a branch to make sure all migration dependencies are handled correctly.

If you think this approach is workable I can probably whip up a pull request for it.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 13, 2013

Michael Bayer (@zzzeek) wrote:

i use multiple sets of versions by just putting multiple sections in my .ini file and referring to them using the "--name" option:

-n NAME, --name NAME  Name of section in .ini file to use for Alembic config

so that is:

[alembic_app1]
script_location = foo

[alembic_app2]
script_location = bar

but these are separate series of migration scripts. I can't tell in your use case how the versions across these different packages intend to be organized into a line, or if they all run as separate series of migrations.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 13, 2013

Wichert Akkerman (@wichert) wrote:

I tend to have an master-thing (generally a package with admin tools) that pulls in everything needed for a deployment, and where I was considering have a kind of master-alembic versioning that pulls in its dependencies as branches and should provide a single head.

Your approach is nice and I completely overlooked that option. It has one small downside: in some cases I must upgrade pkg1 before I can upgrade pkg2. One example where I need this is if pkg1 defines some abstract models that are used by pkg2, and the migrations for pkg1 must have run before I can migrate pkg2.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 13, 2013

Michael Bayer (@zzzeek) wrote:

Well what I'm doing is manually running each, if you need pkg1/pkg2 in some order, whatever it is in your setup that runs each "app" would be dealing with that. not sure how you'd want to resolve those dependencies under normal circumstances (do they change depending on what version each app is at?)

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 14, 2013

Wichert Akkerman (@wichert) wrote:

I just tested your multiple-sections approach but it does not work for me. If I run the migrations for app1 first and then try to run the migration from app2 I get this:

$ bin/alembic -n alembic_app2 upgrade head
INFO  [alembic.migration] Context impl PostgresqlImpl.
INFO  [alembic.migration] Will assume transactional DDL.
  No such revision '3d03d22f5ca'

Which is understandable since that revision is from app1.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 14, 2013

Michael Bayer (@zzzeek) wrote:

OK, trying to seek the answer from my question, it seems like you are dealing with just one revision chain, the only difference being that the files themselves are spread out among multiple directories.

So "manual importing of versions", this is a problem "as soon as dependencies are not completely trivial", meaning, you're concerned about import resolution order issues ? or just being able to locate all the various locations that have version files? How is "multiple script location" any different, seems like in both cases, you have to list out locations?

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 14, 2013

Wichert Akkerman (@wichert) wrote:

I'll try to describe a scenario I am working with now.

I have a package with basic models (lets call it my.core) which pretty much everything depends on. The majority of those are abstract classes. A recent major refactoring required changes in many of the defined models.

I have an admin application that builds directly on top of my.core and does some fairly custom things.

I have a utility package which implements the standard models for public websites. We'll call this my.site. This extends the basic models from my.core and adds various extra models. This needed its own migration logic that must be run after the my.code migrations have been run.

I have a number of different public websites (custA.site, custB.site, etc.) that build on top of my.site. These may occasionally define a few models themselves for specific features of a site.

That means the basic dependency chain looks like this:

        my.core
        /    \
       /      \
 my.admin    my.site
            /  |   \
           /   |    \
          /    |     \
         /     |      \
custA.site custB.site  custC.site

The dependency change of migrations can look like this (ignoring the cust* packages for simplicity):

my.core v1 
    |     \
    |      \
    |    my.site v1
    |       |
    |    my.site v2
    |        |
my.core v2   |
       \     |
        \    |
         my.site v3

What I would like to be able to do for any deployed site is to run all needer migrations with one command. So upgrading custA.site should run the migrations for my.core, my.site and cust.A.site. If I have an admin site for that customer deployed as well running the migrations for my.admin should be able to see that my.core has already been updated and only run the migrations for my.admin itself.

If that is not possible upgrading my.core, my.admin and custX.site separately seems like the best option

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 14, 2013

Michael Bayer (@zzzeek) wrote:

do custA.site, custB.site, my.core etc. have their own alembic_version tables and version ID? Or is there just one version ID that all of these migrations participate within?

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 14, 2013

Wichert Akkerman (@wichert) wrote:

I'm hoping we can figure out in this ticket how that will look like. Currently my.core and my.site have an alembic setup (a basic alembic init skeleton with a couple of versions), but as soon as I added alembic to my.site things broke because alembic got confused when it found versions from my.core in the alembic_version table.

From an end-user point of view only my.admin and the custX.site packages need a full alembic setup since those are the things you deploy and upgrade. The fact that they use my.core and my.site internally is ideally be an implementation detail that isn't noticeable.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 14, 2013

Michael Bayer (@zzzeek) wrote:

OK, the answer is, "you don't know yet". So then I'll point you to a related issue, which is a big change to Alembic that would turn the versioning scheme from a linked list into a directed acyclic graph. The last two comments lay out how this would work, it's #114.

with that issue we'd be talking about a much more open paradigm for how alembic works, and since your issue here is a dependency tree, it seems like your needs here should be rolled into that one (maybe both tickets should just be replaced by a whole new one).

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 14, 2013

Wichert Akkerman (@wichert) wrote:

The DAG approach from #114 seems very relevant. Combined with the ability to read versions from multiple locations that should solve my use case.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 16, 2013

Michael Bayer (@zzzeek) wrote:

OK, so I've moved the whole thing to git. does this make contributing to #114 any more interesting for you ? :)

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jun 7, 2013

Changes by Michael Bayer (@zzzeek):

  • added labels: versioning model
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Sep 6, 2013

MichaelE (@miracle2k) wrote:

I need this also. In my case, I am not interested in complex DAG setups - just each instance of my app having a single alembic_version table, but working with a different set of tables (and thus migrations).

So I need to spread the migration files across multiple directories. I'd like to implement this.

I can see two options:

  1. Make the "script_location" key accept multiple directories, separated by a semicolon, or colon (too unixy?).
  2. Have a new version_dirs key to manually specify the version directories to use (default behaviour if not set).

Initially my preference was (1), because I thought it best not to introduce a new concept/setting.

However, upon reflecting further, I don't think the "concept" of the "script_location" makes sense if there are multiple of them (within one project/dataset).
Certainly I don't want to copy the env.py, script.py.mako etc. files all across. Should it behave like a search path? When migrating from a revision in location A to a revision in location B, which env.py file would be used? Seems way to complex.

Adjusting ScriptCommand to search multiple version_dirs seems straightforward. Any thoughts on this before I try to implement it?

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Sep 6, 2013

MichaelE (@miracle2k) wrote:

I also wonder if this should be exposed more fully to the user by allowing each version dir to have an alias, such that the "revision" command can have an argument to select the directory to write to.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Sep 6, 2013

Michael Bayer (@zzzeek) wrote:

the use case of separate version directories that are fully independent is already available, as I mentioned up above, using the --name option. There's no need to copy env.py across, build one common env.py within your application's main area and the local env.py's become stubs which just import from that one. I use this approach in my current work project.

though im not sure what you mean by "single alembic_version" table. if you have different collections of migration scripts, they need to be on independent version tables.

for this ticket, so far the only change im seeing is the very dramatic refactoring of #114. it's unlikely anything will happen there until i get sqlalchemy 0.9 out and can return to working on alembic for a bit (and then, autogenerate may still be the first priority).

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Sep 7, 2013

MichaelE (@miracle2k) wrote:

I have two customer deployments A and B. Both use a webapp library that has separate modules that they can pick and choose from.

If a new module gets activated for customer A, the respective tables need to be created. This needs to be able to happen a) without user intervention and b) in so far as that a user is involved, it should be easy. Manually dealing with 10 different --name options isn't a good option.

(although I could maybe see an interface that would present the user a list of "migration modules" and their respective status, and allow to migrate --all)

But the way I really wanted to solve this was spread out the migrations files over all the modules, and construct an alembic environment of all the migration directories of activated folders. Activating a new module would cause new migration files to become available.

I.e. I was thinking Django/South "apps".

I realize now that this might not be a good option here, with alembic revisions always needing a down_revision (i.e. I would see the "only a single head supported so far" message).

Unfortunately, I don't think using separate ScriptEnvironment's would help me either, since they would require separate databases/alembic_version tables - seeing that the table only stores the revision id.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Sep 7, 2013

Michael Bayer (@zzzeek) wrote:

if your migrations are occurring without user intervention, then the user isn't running the "alembic" command either, you'd be running alembic programmatically in any case. Plenty of support for that.

if you're looking for a way to spread out the migration files all over the place, that's really not the issue. the issue is how many different versioning lineages do you have. I see you're saying that you don't want separate versioning tables, but I don't understand that. If these are separate apps with their own set of tables and everything, you'd definitely want them to be on independent versioning lineages.

Otherwise, what happens when you say this:

alembic revision -m "change something"

where does the new file go? What change is made to Script._rev_path for that?

I mean, if you really want it to be as simple as, when Script goes to read versions/ , it instead reads all the files from any number of directories all over the place, and nothing else, those files are otherwise just as though they are all in that one directory, I can see some very non-invasive changes to Script that would allow that. I'd want you to be able to do that. I just don't want Alembic's code at all cluttered with this idea directly, I'd prefer if there were some extension system that lets you plug this in.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Sep 7, 2013

MichaelE (@miracle2k) wrote:

Spreading the files across multiple directories was my original intention, but you are correct, the lack of a good way to support different versioning lineages is the real issue.

Are you familiar with how South does migrations for Django apps? Each app has it's own models, migrations and lineages, but they are allowed to reference each other (http://south.readthedocs.org/en/latest/dependencies.html). That this, the data structure allows for app A to have a different head than app B, though a simple migrate command will bring all apps up to date.

This is really my use case. Add a new app to your Django project down the line, South will handle this just fine (except I am not using Django).

The thing is that because the alembic_version table is designed to only store a single row, even if I were to use --name, each such configuration would need to use a separate database (or have alembic_version_foo, alembic_version_bar etc. tables, which would be terrible once you have 10 of them).

In other words, --name seems to be more of a way to work with multiple distinct configurations/"projects" from a single ini file, not a way to work with separate lineages within a single database/project.

This seems to be an entirely separate issue now, and I might open a ticket for it.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Sep 7, 2013

Michael Bayer (@zzzeek) wrote:

yeah the approach we have for that is in #114. instead of a single lineage, migration files can basically be dependent on any subset of files and the whole thing is just a big dependency tree. alembic_version changes, all of that. it should handle everything you're talking about.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2014

Michael Bayer (@zzzeek) wrote:

the newer version of #114 is re-stated in #167. So we'll have the ability to have any number of lineages arranged into an arbitrary dependency tree. now on this issue. do we need something here? do we need alembic to fan out among more than one "versions" directory or is that part of it something folks here have implemented separately?

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2014

MichaelE (@miracle2k) wrote:

I will need the ability to have the same CLI interface operate on multiple "versions" directories, each containing their own lineage (as said above, the comparison is what Django/South does with separate but potentially inter-dependent "apps"). While I think it would be generally useful to have this, I'm happy to implement it myself on top of alembic as well.

The only thing I would say is that when in the past I've tried to customize alembic, I've had the problem that each command is a completely separate entrypoint, creating its own ScriptDirectory instance, so you have to basically monkey-patch that.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2014

Michael Bayer (@zzzeek) wrote:

Well I know we talked about the --name argument for truly separate lineages (e.g. different databases), to maintain multiple lineages in one database, Alembic could either have multiple version directories or you could use the #167 feature. I think the #167 feature is the best route here as I kind of believe django's "apps" thing is a myth (e.g. "apps" are always inter-related).

Trying to parse "each command is a completely separate entrypoint" means, I guess that they create ScriptDirectory internally, OK well the issue there is different commands need to do different things, so not all of them need a scriptdirectory. some need an EnvironmentContext, others don't, or conditionally, etc. there's ways to reorganize this so that resources are collected but it's not clear to me if this would include just ScriptDirectory or also EnvironmentContext as well, e.g. how this would be used matters. In this case why does it matter if each command uses a new ScriptDirectory?

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2014

MichaelE (@miracle2k) wrote:

Yes, I am assuming I will be able to work based off of #167. Each app is a separate branch/lineage. In case #167 does not support multiple HEADs without a common ancestor, I can always use an empty dummy migration as the initial node that all "apps" will branch off from:

Dummy
     -> a_app_m1 -> a_app_m2 -> a_app_m3
     -> b_app_m1 -> b_app_m2

The other thing with south apps is dependencies between them, as you said, and I suppose to do that a graph like the following would need to be supported:

a1 -> a2 -> a3
             ˆ 
             |
      b1 -> b2 -> b3

That is, the b2 migration (from the b app) declares both b2 and a3 to be parents. I'm not sure if your current approach of storing only the heads can still support this, but in any case, supporting such dependencies is not that important to me.

So I believe #167 works well for me. I just need to split the migration files across multiple directories (each app has its own).

You indicated before that you might not want this in the core, and that's ok, because as you said, I could collect them from multiple folders, and the graph/migration logic doesn't care.

This is really a question relating to the UI part of alembic then. Alembic could probably make it a bit easier to support that sort of customization without replacing the whole command line. For example, I could subclass ScriptDirectory to pull migration files from multiple directories, but then I'd have to monkey patch my subclass, or reimplement every command.

For creating new migration files, e.g. the revision command, I have to pick a single folder, obviously, and alembic could provide me with a way, via my ScriptDirectory subclass for example, to decide where the file should be written to.

So my thinking here is I should be to support my scenario based on #167 alone and maybe some small changes to how the CLI part internally is put together to provide better hooks.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2014

Michael Bayer (@zzzeek) wrote:

I don't have too much issue with ScriptDirectory supporting multiple version directories natively. But we just need to figure out what ScriptDirectory.generate_revision() looks like. I guess we add an optional keyword argument for "directory=" that takes in the name of a non-default directory. Other than that we only refer to the "versions" directory when we iterate through files and that's easy enough to expand out into multiple directories.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2014

Michael Bayer (@zzzeek) wrote:

ScriptDirectory.from_config() could just attach itself to the Config, and that's the same ScriptDirectory you get back each time. So that would be a way to share that state into commands.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2014

Wichert Akkerman (@wichert) wrote:

I haven't read #167 yet, but I can tell you that multiple versions directories is essential for my needs. Pretty much all my projects are build around packages where each package defines part of the overall model (for example image handling, or shop article logic) and will have its own migrations for its own models. Being able to to have a separate HEAD for each package would be ideal, but I can live with having a central app depend on the current HEAD for each package.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2014

Michael Bayer (@zzzeek) wrote:

obviously this is a use case but the workflow needs to be worked out. as i said I just use --name, but that's not enough automation here, so let me try to lay out what it seems like this would be:

  1. assume there might only one database and schema, and multiple lineages live there.

  2. we do alembic init. This will create the usual env.py + a versions directory:

     alembic init ./somedirectory
    
  3. developer creates additional "lineages", by modifying alembic.ini (or whatever) something like this:

     [alembic]
     # path to migration scripts
     script_location = somedirectory
    
     # additional lineages
     lineages = auth images shop
    
     lineage_shop_location = model/shop/versions
     lineage_auth_location = model/auth/versions
     lineage_images_location = model/images/versions
    
  4. when "lineages" is present, alembic commands now accept (or require?) --lineage when it is needed:

     alembic revision -m "add thing" --autogenerate --lineage auth
    
  5. revision then generates the version file in model/auth/versions. Files which are placed in a specific "lineage" only depend on the other scripts that are in that lineage. In this case, ScriptDirectory, when it does generate_revision(), will be interpreting this "--lineage" argument similarly to how it will interpret the -p argument in #167 (e.g. -p tells it which head it should be looking at) - when it looks for heads, it will filter them out to that head which is within that lineage. (there still can be multiple heads within a lineage though so we still need to fall through to the usual error reporting if -p is not present).

  6. alembic upgrade/downgrade will need --lineage specified when a symbolic name such as "base" or "head" is used, or when relative numbers are used:

     alembic upgrade --lineage auth head
     alembic upgrade --lineage images +1
    

But if a specific rev is given, then we know the specific revision tree to resolve towards:

   alembic upgrade dfa64783

If this description works for both cases described here, then I'll be pretty confident this is a good spec.

I'll move this to the new "tier1" milestone. The medium-term plan is that I hope to secure funding to go through the large "tier1" items for some period within the spring/summer. This funding may be through specific entities or maybe through something like a kickstarter.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2014

Changes by Michael Bayer (@zzzeek):

  • set milestone to "tier 1"
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 22, 2014

Michael Bayer (@zzzeek) wrote:

now that #167 is complete this should be much simpler. We have what we need for what is described here as "lineages", in the form of "branch labels". As the new system looks at the total set of migration files as one big graph, all we need here is to change ScriptDirectory to include a config for the name "versions", enabling ScriptDirectory._load_revision to walk across multiple directories. However, this approach still has just the one "dir", one env.py, and all of that. If all of these revisions are part of one big system sharing one alembic_version table, then I don't see how we could have multiple env.py's going on; these scripts could refer to each other.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 22, 2014

Michael Bayer (@zzzeek) wrote:

  • In conjunction with support for multiple independent bases, the
    specific version directories are now also configurable to include
    multiple, user-defined directories. When multiple directories exist,
    the creation of a revision file with no down revision requires
    that the starting directory is indicated; the creation of subsequent
    revisions along that lineage will then automatically use that
    directory for new files.
    fixes #124

3be8c22

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 22, 2014

Changes by Michael Bayer (@zzzeek):

  • changed status to closed
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Nov 22, 2014

Michael Bayer (@zzzeek) wrote:

alrighty, can we please all carefully read http://alembic.readthedocs.org/en/latest/branches.html#working-with-multiple-bases and go through the whole thing, I've added additional features to cover being able to have clean dependencies between revision streams as well. It's a lot of new stuff, and reading it, the concept seems really deep to me, if that's any kind of excuse for why it took me a couple of years to get my head around this. I think this will work. It seems very cool. But it's at the edge of my abilities, so...well check it out.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 9, 2015

MichaelE (@miracle2k) wrote:

I've now integrated this into our projects, and I want to say that this was a truly excellent piece of work. It's the perfect solution, certainly for our problem, and the documentation is so good I had it working in no time.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 9, 2015

Michael Bayer (@zzzeek) wrote:

that's amazing. This was one of the most intense bits of coding I've done in years, after thinking about it for a year I did two solid weeks on this system. I was fortunate to be able to take that time. thanks for the feedback!

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Apr 12, 2018

Nam VU wrote:

Hi Michael Bayer,

This is great work! I have a question, when defining version_locations, the value defined in script_location will be ignored, isn't it?

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Apr 12, 2018

Michael Bayer (@zzzeek) wrote:

well the default version directory is /versions, so if you use version_locations then "versions" isn't used. but you still have your env.py in .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment