New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
table.transform() method for advanced alter table #114
Comments
Idea: use a chained API to define a complex transition and then execute it all at once. For example: db["mytable"].transform().rename("col1", "col_1") \
.change_type("col1", float) \
.execute() |
Ideally this would all happen in a single transaction, such that other processes talking to the database would not see any inconsistent state while the table copy was taking place. Need to confirm that this is possible. Also refs transactions thoughts in #121. |
I'm not so keen on that chained API - it's pretty complicated. Here's an idea for a much simpler interface. Essentially it lets you say "take table X and migrate its contents to a new table with this structure - then atomically rename the tables to switch them": db["mytable"].migrate_table({"id": int, "name": str"}, pk="id") The sqlite-utils/sqlite_utils/db.py Lines 615 to 625 in a236a6b
|
Alternative possible names:
I'm torn between |
Since neither the term "transform" or "migrate" are used in the codebase at the moment, I think I'll go with |
Don't forget this step:
|
Thinking about the method signature: def transform_table(
self,
columns,
pk=None,
foreign_keys=None,
column_order=None,
not_null=None,
defaults=None,
hash_id=None,
extracts=None,
): This requires the caller to provide the exact set of columns for the new table. It would be useful if this was optional - if you could omit the columns and have it automatically use the previous columns. This would let you change things like the primary key or the column order using the other arguments. Even better: allow column renaming using an optional db["dogs"].transform_table(rename={"name": "dog_name"}) |
I can have a convenient |
Work in progress: not quite right yet, I need smarter logic for how renamed columns are reflected in the generated def transform_table(
self,
columns=None,
rename=None,
change_type=None,
pk=None,
foreign_keys=None,
column_order=None,
not_null=None,
defaults=None,
hash_id=None,
extracts=None,
):
assert self.exists(), "Cannot transform a table that doesn't exist yet"
columns = columns or self.columns_dict
if rename is not None or change_type is not None:
columns = {rename.get(key, key): change_type.get(key, value) for key, value in columns.items()}
new_table_name = "{}_new_{}".format(self.name, os.urandom(6).hex())
previous_columns = set(self.columns_dict.keys())
with self.db.conn:
columns = {name: value for (name, value) in columns.items()}
new_table = self.db.create_table(
new_table_name,
columns,
pk=pk,
foreign_keys=foreign_keys,
column_order=column_order,
not_null=not_null,
defaults=defaults,
hash_id=hash_id,
extracts=extracts,
)
# Copy across data - but only for columns that exist in both
new_columns = set(columns.keys())
columns_to_copy = new_columns.intersection(previous_columns)
copy_sql = "INSERT INTO [{new_table}] ({new_cols}) SELECT {old_cols} FROM [{old_table}]".format(
new_table=new_table_name,
old_table=self.name,
old_cols=", ".join("[{}]".format(col) for col in columns_to_copy),
new_cols=", ".join("[{}]".format(rename.get(col, col)) for col in columns_to_copy),
)
self.db.conn.execute(copy_sql)
# Drop the old table
self.db.conn.execute("DROP TABLE [{}]".format(self.name))
# Rename the new one
self.db.conn.execute(
"ALTER TABLE [{}] RENAME TO [{}]".format(new_table_name, self.name)
)
return self |
According to https://www.sqlite.org/lang_altertable.html#making_other_kinds_of_table_schema_changes the hardest bits to consider are how to deal with existing foreign key relationships, triggers and views. I'm OK leaving views as an exercise for the caller - many of these transformations may not need any view changes at all. Foreign key relationships are important: it should handle these automatically as effectively as possible. Likewise trigger changes: need to think about what this means. |
I'm going to add a second method Advanced callers can use this to include their own additional steps in the same transaction - e.g. recreating views or triggers. More importantly it gives me a useful hook for writing some unit tests against the generated SQL. |
Work in progress in |
I've decided to call this |
I'm rethinking the API design now. Maybe it could look like this: To change the type of the books.transform({"author_id": int}) This would leave the existing columns alone, but would change the type of this column. To rename books.transform(rename={"author_id": "author_identifier"}) To drop a column: books.transform(drop=["author_id"]) Since the parameters all operate on columns they don't need to be called |
I'm going to sketch out a prototype of this new API design in that branch. |
For FTS tables associated with the table that is being transformed, should I automatically drop the old FTS table and recreate it against the new one or will it just magically continue to work after the table is renamed? |
I may need to do something special for |
To expand on what that first argument - the
Any columns omitted from the Any new columns are added (at the end of the table):
Any columns that have their type changed will have their type changed:
Should I also re-order columns if the order doesn't match? I think so. Open question as to what happens to columns that aren't mentioned at all in the dictionary though - what order should they go in? |
If you want to both change the type of a column AND rename it in the same operation, how would you do that? I think like this: table.transform({"age": int}, rename={"age": "dog_age"}) So any rename logic is applied at the end, after the type transformation or re-ordering logic. |
The
|
Since I have a |
Does it make sense to support the If the user requests a primary key that doesn't make sense I think an integrity error will be raised when the SQL is being executed, which should hopefully cancel the transaction and raise an error. Need to check that this is what happens. |
A test that confirms that this mechanism can turn a |
I think the fiddliest part of the implementation here is code that takes the existing This logic probably also needs to return a structure that can be used to build the |
The reason I'm working on this now is that I'd like to support many more options for data cleanup in the Datasette ecosystem - so being able to do things like convert the type of existing columns becomes increasingly important. |
SQLite's
ALTER TABLE
can only do the following:Notably, it cannot drop columns - so tricks like "add a float version of this text column, populate it, then drop the old one and rename" won't work.
The docs here https://www.sqlite.org/lang_altertable.html#making_other_kinds_of_table_schema_changes describe a way of implementing full alters safely within a transaction, but it's fiddly.
It would be great if
sqlite-utils
provided an abstraction to help make these kinds of changes safely.The text was updated successfully, but these errors were encountered: