Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify column position for add column in batch mode #640

Closed
ms32035 opened this issue Dec 23, 2019 · 11 comments
Closed

Specify column position for add column in batch mode #640

ms32035 opened this issue Dec 23, 2019 · 11 comments
Labels
batch migrations feature motivated volunteers requested

Comments

@ms32035
Copy link
Contributor

@ms32035 ms32035 commented Dec 23, 2019

I'd like to hear your opinion on possibility of setting the position of a column added in a batch migration (something like FIRST or AFTER <COLUMN> in mysql). It could work through an extra parameter in batch add_column

with op.batch_alter_table("some_table") as batch_op:
    batch_op.add_column(Column('foo', Integer), before_column="bar")

My initial estimate is that changes would be required here: https://github.com/sqlalchemy/alembic/blob/master/alembic/operations/batch.py#L374 and in OrderedDict in sqlalchemy to support inserts

@zzzeek zzzeek added use case mysql labels Dec 23, 2019
@zzzeek
Copy link
Member

@zzzeek zzzeek commented Dec 23, 2019

Hi there -

I re-read this a few times and it's not clear if you're looking for MySQL's directive, which is easy to add, or you are looking for a general utility to rewrite the columns of a table in a different order specific to batch. the former is easy, the latter I don't know if i'd want to add that and it seems like more of a recipe thing.

@ms32035
Copy link
Contributor Author

@ms32035 ms32035 commented Dec 23, 2019

Closer to the latter. MySQL is the only db that support this syntax and has an option to specify column position. In other dbs you have to work this around with pretty much the same pattern what the batch add_column does (create a temp table, insert data, swap tables). The only difference is that the new column is not appended at the end but inserted at a specific position. I'm not looking for a general column reordering tool, though I guess it might be possible to use the proposed functionality for that purpose too.

@zzzeek zzzeek removed the mysql label Dec 23, 2019
@zzzeek
Copy link
Member

@zzzeek zzzeek commented Dec 23, 2019

Here's the way I'd want to do it, however some internal mechanics of the insert from select are causing it to fail and I don't have the resources to figure out why right now. If someone has the resources to work on this, the general idea is to use topological sort and to reorder the columns when we're ready to make the table.

diff --git a/alembic/operations/base.py b/alembic/operations/base.py
index a4adf15..fadd554 100644
--- a/alembic/operations/base.py
+++ b/alembic/operations/base.py
@@ -177,6 +177,7 @@ class Operations(util.ModuleClsProxy):
         table_name,
         schema=None,
         recreate="auto",
+        partial_reordering=None,
         copy_from=None,
         table_args=(),
         table_kwargs=util.immutabledict(),
@@ -319,6 +320,7 @@ class Operations(util.ModuleClsProxy):
             reflect_args,
             reflect_kwargs,
             naming_convention,
+            partial_reordering
         )
         batch_op = BatchOperations(self.migration_context, impl=impl)
         yield batch_op
diff --git a/alembic/operations/batch.py b/alembic/operations/batch.py
index 42db905..da92bef 100644
--- a/alembic/operations/batch.py
+++ b/alembic/operations/batch.py
@@ -11,6 +11,7 @@ from sqlalchemy import Table
 from sqlalchemy import types as sqltypes
 from sqlalchemy.events import SchemaEventTarget
 from sqlalchemy.util import OrderedDict
+from sqlalchemy.util import topological
 
 from ..util.sqla_compat import _columns_for_constraint
 from ..util.sqla_compat import _fk_is_self_referential
@@ -31,6 +32,7 @@ class BatchOperationsImpl(object):
         reflect_args,
         reflect_kwargs,
         naming_convention,
+        partial_reordering,
     ):
         self.operations = operations
         self.table_name = table_name
@@ -52,6 +54,7 @@ class BatchOperationsImpl(object):
             ("column_reflect", operations.impl.autogen_column_reflect)
         )
         self.naming_convention = naming_convention
+        self.partial_reordering = partial_reordering
         self.batch = []
 
     @property
@@ -99,7 +102,11 @@ class BatchOperationsImpl(object):
                 reflected = True
 
             batch_impl = ApplyBatchImpl(
-                existing_table, self.table_args, self.table_kwargs, reflected
+                existing_table,
+                self.table_args,
+                self.table_kwargs,
+                reflected,
+                partial_reordering=self.partial_reordering,
             )
             for opname, arg, kw in self.batch:
                 fn = getattr(batch_impl, opname)
@@ -139,12 +146,16 @@ class BatchOperationsImpl(object):
 
 
 class ApplyBatchImpl(object):
-    def __init__(self, table, table_args, table_kwargs, reflected):
+    def __init__(
+        self, table, table_args, table_kwargs, reflected, partial_reordering=()
+    ):
         self.table = table  # this is a Table object
         self.table_args = table_args
         self.table_kwargs = table_kwargs
         self.temp_table_name = self._calc_temp_name(table.name)
         self.new_table = None
+
+        self.partial_reordering = partial_reordering  # list of tuples
         self.column_transfers = OrderedDict(
             (c.name, {"expr": c}) for c in self.table.c
         )
@@ -188,12 +199,37 @@ class ApplyBatchImpl(object):
         for k in self.table.kwargs:
             self.table_kwargs.setdefault(k, self.table.kwargs[k])
 
+    def _adjust_self_columns_for_partial_reordering(self):
+        pairs = set()
+        for tuple_ in self.partial_reordering:
+            for index, elem in enumerate(tuple_):
+                if index > 0:
+                    pairs.add((tuple_[index - 1], elem))
+
+        col_by_idx = list(self.columns)
+
+        sorted_ = topological.sort(pairs, col_by_idx, deterministic_order=True)
+        self.columns = OrderedDict(
+            (k, self.columns[k]) for k in sorted_
+        )
+        self.dest_column_transfers = self.column_transfers
+
+        # TODO: breaks things, dont know why
+        #OrderedDict(
+        #    (k, self.column_transfers[k]) for k in sorted_
+        #)
+
     def _transfer_elements_to_new_table(self):
         assert self.new_table is None, "Can only create new table once"
 
         m = MetaData()
         schema = self.table.schema
 
+        if self.partial_reordering:
+            self._adjust_self_columns_for_partial_reordering()
+        else:
+            self.dest_column_transfers = self.column_transfers
+
         self.new_table = new_table = Table(
             self.temp_table_name,
             m,
@@ -291,7 +327,7 @@ class ApplyBatchImpl(object):
                 self.new_table.insert(inline=True).from_select(
                     list(
                         k
-                        for k, transfer in self.column_transfers.items()
+                        for k, transfer in self.dest_column_transfers.items()
                         if "expr" in transfer
                     ),
                     select(
@@ -371,11 +407,13 @@ class ApplyBatchImpl(object):
         if autoincrement is not None:
             existing.autoincrement = bool(autoincrement)
 
-    def add_column(self, table_name, column, **kw):
+    def add_column(self, table_name, column, partial_ordering=None, **kw):
         # we copy the column because operations.add_column()
         # gives us a Column that is part of a Table already.
         self.columns[column.name] = column.copy(schema=self.table.schema)
         self.column_transfers[column.name] = {}
+        if partial_ordering:
+            self.partial_reordering += (partial_ordering,)
 
     def drop_column(self, table_name, column, **kw):
         if column.name in self.table.primary_key.columns:
diff --git a/tests/test_batch.py b/tests/test_batch.py
index 1d76841..0eab512 100644
--- a/tests/test_batch.py
+++ b/tests/test_batch.py
@@ -41,7 +41,7 @@ class BatchApplyTest(TestBase):
     def setUp(self):
         self.op = Operations(mock.Mock(opts={}))
 
-    def _simple_fixture(self, table_args=(), table_kwargs={}):
+    def _simple_fixture(self, table_args=(), table_kwargs={}, **kw):
         m = MetaData()
         t = Table(
             "tname",
@@ -50,7 +50,7 @@ class BatchApplyTest(TestBase):
             Column("x", String(10)),
             Column("y", Integer),
         )
-        return ApplyBatchImpl(t, table_args, table_kwargs, False)
+        return ApplyBatchImpl(t, table_args, table_kwargs, False, **kw)
 
     def _uq_fixture(self, table_args=(), table_kwargs={}):
         m = MetaData()
@@ -466,6 +466,22 @@ class BatchApplyTest(TestBase):
         new_table = self._assert_impl(impl, colnames=["id", "x", "y", "g"])
         eq_(new_table.c.g.name, "g")
 
+    def test_add_col_partial_ordering(self):
+        impl = self._simple_fixture()
+        col = Column("g", Integer)
+        # operations.add_column produces a table
+        t = self.op.schema_obj.table("tname", col)  # noqa
+        impl.add_column("tname", col, partial_ordering=("x", "g", "y"))
+        new_table = self._assert_impl(impl, colnames=["id", "x", "g", "y"])
+        eq_(new_table.c.g.name, "g")
+
+    # TODO: doesn't work, don't know why
+    def test_partial_reordering(self):
+        # MARKMARK
+        impl = self._simple_fixture(partial_reordering=[("x", "id", "y")])
+        new_table = self._assert_impl(impl, colnames=["x", "id", "y"])
+        eq_(new_table.c.g.name, "g")
+
     def test_add_server_default(self):
         impl = self._simple_fixture()
         impl.alter_column("tname", "y", server_default="10")

@zzzeek zzzeek added feature batch migrations motivated volunteers requested and removed use case labels Dec 23, 2019
@ms32035
Copy link
Contributor Author

@ms32035 ms32035 commented Dec 23, 2019

Not really sure how this commented out piece looked like

+        # TODO: breaks things, dont know why
+        #OrderedDict(
+        #    (k, self.column_transfers[k]) for k in sorted_
+        #)

but my guess is that when trying to set column_transfer you might have iterated over sorted_ generator for the second time

@zzzeek
Copy link
Member

@zzzeek zzzeek commented Dec 24, 2019

yes that was an assignment.

the batch.py module is a little tricky, I don't work on it much and usually have to re-familiarize for awhile when something needs to change. Also the general subject of dependency sorting requires some getting used to. I'm not totally sure it's feasible how I'm proposing the feature; one can specify any combination of columns that suggests part of the ordering, but then this might imply that it's not possible to maintain the order of the rest of the columns, so not really sure. it is pretty straightforward to get it to work for new columns being added in however. I can look at this more directly after the holidays.

@zzzeek
Copy link
Member

@zzzeek zzzeek commented Dec 24, 2019

specifically it was this:

self.dest_column_transfers =  OrderedDict(
            (k, self.column_transfers[k]) for k in sorted_
       )

this is based on the guess that the "insert from select" needs the column ordering for the old and new tables at the same time but I'm not sure if that's correct.

@ms32035
Copy link
Contributor Author

@ms32035 ms32035 commented Dec 24, 2019

This is where it fails

 -  'INSERT INTO _alembic_tmp_tname (id, x, y) SELECT tname.id, tname.x, tname.y '
  ?                                       ---                    ---------
  +  'INSERT INTO _alembic_tmp_tname (x, id, y) SELECT tname.x, tname.id, tname.y '

but it my opinion you don't need the self.dest_column_transfers object at all as both queries are fully valid SQL and do exactly the same, just the test needs fixing

@ms32035
Copy link
Contributor Author

@ms32035 ms32035 commented Jan 6, 2020

@zzzeek did you have a chance to think about my last comment?

@zzzeek
Copy link
Member

@zzzeek zzzeek commented Jan 6, 2020

not really I was hoping someone had time to look into making a working version of this

@ms32035
Copy link
Contributor Author

@ms32035 ms32035 commented Jan 23, 2020

@zzzeek looks like I've fixed the tests (#646) please have a look if this is still reasonable and let me know what else needs to be updated

@sqla-tester
Copy link
Collaborator

@sqla-tester sqla-tester commented Jan 31, 2020

Marcin Szymanski has proposed a fix for this issue in the master branch:

Support explicit column ordering in batch mode https://gerrit.sqlalchemy.org/1698

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
batch migrations feature motivated volunteers requested
Projects
None yet
Development

No branches or pull requests

3 participants