New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limitations of declarative approach #5

Closed
shlomi-noach opened this Issue May 15, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@shlomi-noach

shlomi-noach commented May 15, 2017

skeema works in a declarative approach, where the user specifies the desired table structure, as opposed to the alter statement which gets there. skeema itself infers the statement that would modify the table schema from version x to version y.

Illustrated below are two use cases the declarative approach cannot handle, and in fact requires an added insight from the user, namely the very alter statement.

RENAME (CHANGE) COLUMN

Consider this tests from gh-ost:

create table gh_ost_test (
  id int auto_increment,
  c1 int not null,
  c2 int not null,
  primary key (id)
) auto_increment=1;

insert into gh_ost_test ...
alter table gh_ost_test change column c2 c3 int not null

Resulting table:

create table gh_ost_test (
  id int auto_increment,
  c1 int not null,
  c3 int not null,
  primary key (id)
) auto_increment=1;

What would skeema infer? It would infer a DROP COLUMN c2, ADD COLUMN c3 INT NOT NULL.
However that is incorrect and will lose data. The values of c2 must be transferred to c3.

2. DROP+ADD COLUMN

This is the opposite for the RENAME example.

Consider this gh-ost test:

create table gh_ost_test (
  id int auto_increment,
  c1 int null,
  c2 int not null,
  primary key (id)
) auto_increment=1;
alter table gh_ost_test drop column c1, add column c1 int not null default 47

Resulting table:

create table gh_ost_test (
  id int auto_increment,
  c1 int null,
  c2 int not null default 47,
  primary key (id)
) auto_increment=1;

What would skeema infer? It would infer changing of the default, and would keep the values for c2, where it should really drop c2, discarding existing values, recreating c2 with the value 47.


I would suggest at first add these two in Unsupported for ALTERs so that the issue is known.

Otherwise I would recommend a potential add-on of the original alter statement. If exists, skeema would be able to parse it (feel free to use parser.go) and apply the extra insight.

@evanelias

This comment has been minimized.

Show comment
Hide comment
@evanelias

evanelias May 16, 2017

Member

Good point -- lack of rename support is mentioned in passing in a couple docs (examples.md and faq.md) but really should be mentioned explicitly in the Unsupported section. I'll update that momentarily.

In terms of how Skeema currently handles these two cases:

The first case would indeed result in an ALTER doing a DROP COLUMN followed by an ADD COLUMN. But Skeema automatically flags all destructive operations as "unsafe", and will flag them as an error (skipping them and any other alters on same table) unless --allow-unsafe is used. Technically this operation is safe on an empty table, so safe-below-size=1 in config file can also be used to allow such an operation on tables with 0 rows without requiring --allow-unsafe.

Similar situation with renaming tables btw.

The second situation would indeed generate an ALTER changing the default. I'd argue this situation is contrived -- I haven't ever encountered a real situation where a user wanted to remove a column, and then replace it with another column of same name, but without wanting the old data to be present. If a user wants to remove the data from a column, they should use DML for this purpose, not DDL.

Anyway, re: the first situation / renames, current work-around is to do the rename operations on database(s) manually (outside of Skeema), and then use skeema pull to update the filesystem representation to match the out-of-band operations.

In terms of future support for renames: Although http://github.com/skeema/tengo could easily be used to detect the situation where only a name of a col or table changed (simply by comparing the other fields of the relevant Go structs), I'd rather require users to be explicit about wanting renames. I have some thoughts on how this could work, involving table and column comments. Interestingly, Luke from FB independently had the same thought in how to eventually support renames in fb-osc.py (which doesn't support renames yet either, as FB historically has disallowed renames in prod.)

DBAs at several other companies have indicated similar no-renames-in-prod policies, so I'm hesitant to prioritize this too highly. As you're undoubtedly aware, renames present a deploy-order problem at scale: it's impossible to deploy application code at the exact same time as a prod column or table rename in the database takes effect.

Slightly higher priority is the notion of rename-before-dropping: an option to replace drops with renames automatically. For example, if you want to drop column foo, you could configure Skeema to first just rename it to _dropped_foo, and then actually drop it through a different invocation. This can be a safer approach than all-or-nothing drops.

Member

evanelias commented May 16, 2017

Good point -- lack of rename support is mentioned in passing in a couple docs (examples.md and faq.md) but really should be mentioned explicitly in the Unsupported section. I'll update that momentarily.

In terms of how Skeema currently handles these two cases:

The first case would indeed result in an ALTER doing a DROP COLUMN followed by an ADD COLUMN. But Skeema automatically flags all destructive operations as "unsafe", and will flag them as an error (skipping them and any other alters on same table) unless --allow-unsafe is used. Technically this operation is safe on an empty table, so safe-below-size=1 in config file can also be used to allow such an operation on tables with 0 rows without requiring --allow-unsafe.

Similar situation with renaming tables btw.

The second situation would indeed generate an ALTER changing the default. I'd argue this situation is contrived -- I haven't ever encountered a real situation where a user wanted to remove a column, and then replace it with another column of same name, but without wanting the old data to be present. If a user wants to remove the data from a column, they should use DML for this purpose, not DDL.

Anyway, re: the first situation / renames, current work-around is to do the rename operations on database(s) manually (outside of Skeema), and then use skeema pull to update the filesystem representation to match the out-of-band operations.

In terms of future support for renames: Although http://github.com/skeema/tengo could easily be used to detect the situation where only a name of a col or table changed (simply by comparing the other fields of the relevant Go structs), I'd rather require users to be explicit about wanting renames. I have some thoughts on how this could work, involving table and column comments. Interestingly, Luke from FB independently had the same thought in how to eventually support renames in fb-osc.py (which doesn't support renames yet either, as FB historically has disallowed renames in prod.)

DBAs at several other companies have indicated similar no-renames-in-prod policies, so I'm hesitant to prioritize this too highly. As you're undoubtedly aware, renames present a deploy-order problem at scale: it's impossible to deploy application code at the exact same time as a prod column or table rename in the database takes effect.

Slightly higher priority is the notion of rename-before-dropping: an option to replace drops with renames automatically. For example, if you want to drop column foo, you could configure Skeema to first just rename it to _dropped_foo, and then actually drop it through a different invocation. This can be a safer approach than all-or-nothing drops.

@evanelias evanelias closed this in c6d61d0 May 16, 2017

@shlomi-noach

This comment has been minimized.

Show comment
Hide comment
@shlomi-noach

shlomi-noach May 16, 2017

safe-below-size=1

nice!

I haven't ever encountered a real situation where a user wanted to remove a column, and then replace it with another column of same name, but without wanting the old data to be present. If a user wants to

Neither have I, until github/gh-ost#384. I agree it's a rare case.

I also agree on the rename deploy order problem.

shlomi-noach commented May 16, 2017

safe-below-size=1

nice!

I haven't ever encountered a real situation where a user wanted to remove a column, and then replace it with another column of same name, but without wanting the old data to be present. If a user wants to

Neither have I, until github/gh-ost#384. I agree it's a rare case.

I also agree on the rename deploy order problem.

@evanelias

This comment has been minimized.

Show comment
Hide comment
@evanelias

evanelias May 16, 2017

Member

Closed this with the doc update, but opened new issue #6 to track the FR for better rename support.

Member

evanelias commented May 16, 2017

Closed this with the doc update, but opened new issue #6 to track the FR for better rename support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment