Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does alembic plan to add data migrations? #620

Closed
abetkin opened this issue Nov 11, 2019 · 6 comments
Closed

Does alembic plan to add data migrations? #620

abetkin opened this issue Nov 11, 2019 · 6 comments
Labels
question usage and API questions

Comments

@abetkin
Copy link

abetkin commented Nov 11, 2019

These ones: https://docs.djangoproject.com/en/2.2/topics/migrations/#data-migrations

I understand that it is not an easy thing to add, but probably out-of-the-box way to run data migrations in "database reflection" mode?

Also, the docs say nothing about data migrations, as if they didn't exist)
Sorry if it sounds pushy, but I really admire sqlalchemy in so many aspects

@zzzeek
Copy link
Member

zzzeek commented Nov 11, 2019

I was thinking django "data migrations" are some super slick system built on top of their admin system that write themselves automatically, but this does not seem to be the case. it looks like it's just a separate stream of normal migration files and they are written manually. I don't see anything about this feature that implies they are for "data" other than the name.

I haven't looked closely enough to see how they run them but usually a small data migration is just something you'd put in a normal Alembic version file. It's not clear if Django has these organized into a separate stream, you can certainly do that in Alembic also but I would think that data migrations would have dependencies on schema migrations and would just be inline.

at the moment i have no concept of what this "feature" would be. It looks like at the moment it would be a docuemntation section with a bit of discussion about how to run INSERT/UPDATE/DELETE statements in a migration file. People do this already and there is also the https://alembic.sqlalchemy.org/en/latest/ops.html?highlight=bulk_insert#alembic.operations.Operations.bulk_insert operation which is pretty popular.

Please specify exactly what API additions you are seeking. thanks!

@zzzeek zzzeek added the question usage and API questions label Nov 11, 2019
@abetkin
Copy link
Author

abetkin commented Nov 11, 2019

Hi @zzzeek, thanks for your reply!

Briefly, you don't have the table definitions inside a data migration, so you are very limited in what parts of sqlalchemy you can use.

Speaking of your last link (bulk_insert): there you have to define an ad-hoc table, despite the fact that it probably has already been defined somewhere in models.py. But you cannot use that definition since there probably were a lot of schema changes after that bulk insert.

django's data migrations solve that problem: they can reconstruct the current state of the tables before a migration. The operations for schema migrations are composable: you can "add" them together to produce a model. For example, creating an empty table + adding a varchar column + extending its length will result in a table with one varchar column with extended length. AFAIK alembic operations are just functions that produce SQL, and cannot be "added" together, am I right?

@zzzeek
Copy link
Member

zzzeek commented Nov 11, 2019

Hi @zzzeek, thanks for your reply!

Briefly, you don't have the table definitions inside a data migration, so you are very limited in what parts of sqlalchemy you can use.

it is quite easy to either use the table definitions that are part of one's model, although this is not necessarily "safe" depending on what you're doing since if that model changes the older migration won't work anymore, or to simply define the ad-hoc tables (typically using lowercase table/column which don't require as much verbosity) that you need in order to do the SQL operations you want. It seems like the cookbook is lacking an example similar to the one given for bulk_insert for this so that can be added. so that's documentation.

You can also of course use table reflection if you don't want to define the tables in an ad-hoc way. This is very common as well, it just isn't supported for "offline" migrations that use --sql.

Speaking of your last link (bulk_insert): there you have to define an ad-hoc table, despite the fact that it probably has already been defined somewhere in models.py.
But you cannot use that definition since there probably were a lot of schema changes after that bulk insert.

that's not necessarily true in all cases as the op.create_table() operation returns for you the table() object which can be used for data migrations.

django's data migrations solve that problem: they can reconstruct the current state of the tables before a migration.

Yes this is because Django has a radically different approach to generating migrations which does not rely on table introspection, they take advantage of the fact that Django models are all within a narrow range of functionality that works across all backends so they are able to recreate the database schema as a logical object in memory for each migration; when you are sitting on a particular Django revision, they have a whole Django ORM model ready to go based on that state. This is not how SQLAlchemy works at all, there is no ORM model implied by a Core schema model unless you use the automap extension which will only deliver a certain kind of ORM model. Additionally, Alembic rejected the approach of its predecessor SQLAlchemy-migrate which attempts to keep Table() / Column() objects around as an operation proceeds; the end result of SQLAlchemy migrate, since it didn't build up these Table objects as a product of the whole migration stream, meant that migration files were full of table reflection operations that are slow, verbose to work with, as well as non-portable to offline migrations.

So basically if Django has the models ready to go in each migration, it's leveraging automation that is very specific to Django and does not exist in Alembic, nor could it without completely changing Alembic's core mechanism and way of working.

Instead, for access to the current state of the database, Alembic continues to rely upon table reflection. It just doesn't need to run reflection as the migration operations proceed, as was the case with SQLAlchemy-Migrate in practice.

However, you certainly can reflect the Table and MetaData structures you need in a migration file in order to do data migrations, as long as you don't require support for "offline" mode. As mentioned above, this can be documented.

The operations for schema migrations are composable: you can "add" them together to produce a model. For example, creating an empty table + adding a varchar column + extending its length will result in a table with one varchar column with extended length. AFAIK alembic operations are just functions that produce SQL, and cannot be "added" together, am I right?

pretty much yes. Django has a much nicer system which is not something Alembic can reproduce, unfortunately. This is not to say someone couldn't produce a SQLAlchemy migrations tool that works in this way but that's not something that can happen in Alembic for both architectural reasons as well as the fact that Alembic has very few development resources.

@abetkin
Copy link
Author

abetkin commented Nov 12, 2019

Thank you. I think, it makes sense to say something in the docs about it, but that is up to you entirely. Maybe that's not the first thing needed by sqlalchemy community indeed. Still, I think, you're overestimating the complexity of the design of django migrations, and the assumptions needed for them to work (I've digged into that code deeply enough). The thing I always liked about sqlalchemy - alembic also falls in this category - that it just happens to be extensible without being intended to be so. Thanks a lot for great work!

@zzzeek
Copy link
Member

zzzeek commented Nov 12, 2019

Thank you. I think, it makes sense to say something in the docs about it, but that is up to you entirely. Maybe that's not the first thing needed by sqlalchemy community indeed. Still, I think, you're overestimating the complexity of the design of django migrations, and the assumptions needed for them to work (I've digged into that code deeply enough).

After 15 years of doing this I have found that overestimating the complexity of something never happens. things are always much more complex than they seem. I spend huge amounts of time maintaining use cases for features in SQLAlchemy and Alembic that all seemed incredibly simple to start with. For Django's feature, IIUC a single developer actually got a $20000 kickstarter in order to build the new architecture so this does not seem like it was a quick hit.

The thing I always liked about sqlalchemy - alembic also falls in this category - that it just happens to be extensible without being intended to be so. Thanks a lot for great work!

thanks, although Alembic started out not being very extensible and there were multiple large refactorings ( #302 #306 ) in order to add extensibility that was widely requested.

@zzzeek
Copy link
Member

zzzeek commented Feb 12, 2020

closing this for now.

@zzzeek zzzeek closed this as completed Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question usage and API questions
Projects
None yet
Development

No branches or pull requests

2 participants