Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing and clarifying shared unique key requirement #358

Merged
merged 6 commits into from Jan 22, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 7 additions & 6 deletions doc/requirements-and-limitations.md
Expand Up @@ -30,11 +30,12 @@ The `SUPER` privilege is required for `STOP SLAVE`, `START SLAVE` operations. Th

- MySQL 5.7 `JSON` columns are not supported. They are likely to be supported shortly.

- The two _before_ & _after_ tables must share some `UNIQUE KEY`. Such key would be used by `gh-ost` to iterate the table.
- As an example, if your table has a single `UNIQUE KEY` and no `PRIMARY KEY`, and you wish to replace it with a `PRIMARY KEY`, you will need two migrations: one to add the `PRIMARY KEY` (this migration will use the existing `UNIQUE KEY`), another to drop the now redundant `UNIQUE KEY` (this migration will use the `PRIMARY KEY`).

- The chosen migration key must not include columns with `NULL` values.
- `gh-ost` will do its best to pick a migration key with non-nullable columns. It will by default refuse a migration where the only possible `UNIQUE KEY` includes nullable-columns. You may override this refusal via `--allow-nullable-unique-key` but **you must** be sure there are no actual `NULL` values in those columns. Such `NULL` values would cause a data integrity problem and potentially a corrupted migration.
- The two _before_ & _after_ tables must share a `PRIMARY KEY` or other `UNIQUE KEY`. This key will be used by `gh-ost` to iterate through the table rows when copying. [Read more](shared-key.md)
- The migration key must not include columns with NULL values. This means either:
1. The columns are `NOT NULL`, or
2. The columns are nullable but don't contain any NULL values.
- by default, `gh-ost` will not run if the only `UNIQUE KEY` includes nullable columns.
- You may override this via `--allow-nullable-unique-key` but make sure there are no actual `NULL` values in those columns. Existing NULL values can't guarantee data integrity on the migrated table.

- It is not allowed to migrate a table where another table exists with same name and different upper/lower case.
- For example, you may not migrate `MyTable` if another table called `MYtable` exists in the same schema.
Expand All @@ -48,4 +49,4 @@ The `SUPER` privilege is required for `STOP SLAVE`, `START SLAVE` operations. Th

- If you have en `enum` field as part of your migration key (typically the `PRIMARY KEY`), migration performance will be degraded and potentially bad. [Read more](https://github.com/github/gh-ost/pull/277#issuecomment-254811520)

- Migrating a `FEDERATED` table is unsupported and is irrelevant to the problem `gh-ost` tackles.
- Migrating a `FEDERATED` table is unsupported and is irrelevant to the problem `gh-ost` tackles.
68 changes: 68 additions & 0 deletions doc/shared-key.md
@@ -0,0 +1,68 @@
# Shared key

A requirement for a migration to run is that the two _before_ and _after_ tables have a shared unique key. This is to elaborate and illustrate on the matter.

### Introduction

Consider a classic, simple migration. The table is any normal:

```
CREATE TABLE tbl (
id bigint unsigned not null auto_increment,
data varchar(255),
more_data int,
PRIMARY KEY(id)
)
```

And the migration is a simple `add column ts timestamp`.

In such migration there is no change in indexes, and in particular no change to any unique key, and specifically no change to the `PRIMARY KEY`. To run this migration, `gh-ost` would iterate the `tbl` table using the primary key, copy rows from `tbl` to the _ghost_ table `_tbl_gho` by order of `id`, and then apply binlog events onto `_tbl_gho`.

Applying the binlog events assumes the existence of a shared unique key. For example, an `UPDATE` statement in the binary log translate to a `REPLACE` statement which `gh-ost` applies to the _ghost_ table. Such statement expects to add or replace an existing row based on given row data. In particular, it would _replace_ an existing row if a unique key violation is met.

So `gh-ost` correlates `tbl` and `_tbl_gho` rows using a unique key. In the above example that would be the `PRIMARY KEY`.

### Rules

There must be a shared set of not-null columns for which there is a unique constraint in both the original table and the migration (_ghost_) table.

### Interpreting the rules

The same columns must be covered by a unique key in both tables. This doesn't have to be the `PRIMARY KEY`. This doesn't have to be a key of the same name.

Upon migration, `gh-ost` inspects both the original and _ghost_ table and attempts to find at least one such unique key (or rather, a set of columns) that is shared between the two. Typically this would just be the `PRIMARY KEY`, but sometimes you may change the `PRIMARY KEY` itself, in which case `gh-ost` will look for other options.

`gh-ost` expects unique keys where no `NULL` values are found, i.e. all columns covered by the unique key are defined as `NOT NULL`. This is implicitly true for `PRIMARY KEY`s. If no such key can be found, `gh-ost` bails out. In the event there is no such key, but you happen to _know_ your columns have no `NULL` values even though they're `NULL`-able, you may take responsibility and pass the `--allow-nullable-unique-key`. The migration will run well as long as no `NULL` values are found in the unique key's columns. Any actual `NULL`s may corrupt the migration.

### Examples: allowed and not allowed

```
create table some_table (
id int auto_increment,
ts timestamp,
name varchar(128) not null,
owner_id int not null,
loc_id int,
primary key(id),
unique key name_uidx(name)
)
```

Following are examples of migrations that are _good to run_:

- `add column i int`
- `add key owner_idx(owner_id)`
- `add unique key owner_name_idx(owner_id, name)` - though you need to make sure to not write conflicting rows while this migration runs
- `drop key name_uidx` - `primary key` is shared between the tables
- `drop primary key, add primary key(owner_id, loc_id)` - `name_uidx` is shared between the tables and is used for migration
- `change id bigint unsigned` - the `'primary key` is used. The change of type still makes the `primary key` workable.
- `drop primary key, drop key name_uidx, create primary key(name), create unique key id_uidx(id)` - swapping the two keys. `gh-ost` is still happy because `id` is still unique in both tables. So is `name`.


Following are examples of migrations that _cannot run_:

- `drop primary key, drop key name_uidx` - no unique key to _ghost_ table, so clearly cannot run
- `drop primary key, drop key name_uidx, create primary key(name, owner_id)` - no shared columns to both tables. Even though `name` exists in the _ghost_ table's `primary key`, it is only part of the key and in itself does not guarantee uniqueness in the _ghost_ table.

Also, you cannot run a migration on a table that doesn't have some form of `unique key` in the first place, such as `some_table (id int, ts timestamp)`
22 changes: 22 additions & 0 deletions localtests/fail-drop-pk/create.sql
@@ -0,0 +1,22 @@
drop table if exists gh_ost_test;
create table gh_ost_test (
id int auto_increment,
i int not null,
ts timestamp,
primary key(id)
) auto_increment=1;

drop event if exists gh_ost_test;
delimiter ;;
create event gh_ost_test
on schedule every 1 second
starts current_timestamp
ends current_timestamp + interval 60 second
on completion not preserve
enable
do
begin
insert into gh_ost_test values (null, 11, now());
insert into gh_ost_test values (null, 13, now());
insert into gh_ost_test values (null, 17, now());
end ;;
1 change: 1 addition & 0 deletions localtests/fail-drop-pk/expect_failure
@@ -0,0 +1 @@
No PRIMARY nor UNIQUE key found in table
1 change: 1 addition & 0 deletions localtests/fail-drop-pk/extra_args
@@ -0,0 +1 @@
--alter="change id id int, drop primary key"
22 changes: 22 additions & 0 deletions localtests/fail-no-shared-uk/create.sql
@@ -0,0 +1,22 @@
drop table if exists gh_ost_test;
create table gh_ost_test (
id int auto_increment,
i int not null,
ts timestamp,
primary key(id)
) auto_increment=1;

drop event if exists gh_ost_test;
delimiter ;;
create event gh_ost_test
on schedule every 1 second
starts current_timestamp
ends current_timestamp + interval 60 second
on completion not preserve
enable
do
begin
insert into gh_ost_test values (null, 11, now());
insert into gh_ost_test values (null, 13, now());
insert into gh_ost_test values (null, 17, now());
end ;;
1 change: 1 addition & 0 deletions localtests/fail-no-shared-uk/expect_failure
@@ -0,0 +1 @@
No shared unique key can be found after ALTER
1 change: 1 addition & 0 deletions localtests/fail-no-shared-uk/extra_args
@@ -0,0 +1 @@
--alter="drop primary key, add primary key (id, i)"
24 changes: 24 additions & 0 deletions localtests/swap-pk-uk/create.sql
@@ -0,0 +1,24 @@
drop table if exists gh_ost_test;
create table gh_ost_test (
id bigint,
i int not null,
ts timestamp(6),
primary key(id),
unique key its_uidx(i, ts)
) ;

drop event if exists gh_ost_test;
delimiter ;;
create event gh_ost_test
on schedule every 1 second
starts current_timestamp
ends current_timestamp + interval 60 second
on completion not preserve
enable
do
begin
insert into gh_ost_test values ((unix_timestamp() << 2) + 0, 11, now(6));
insert into gh_ost_test values ((unix_timestamp() << 2) + 1, 13, now(6));
insert into gh_ost_test values ((unix_timestamp() << 2) + 2, 17, now(6));
insert into gh_ost_test values ((unix_timestamp() << 2) + 3, 19, now(6));
end ;;
1 change: 1 addition & 0 deletions localtests/swap-pk-uk/extra_args
@@ -0,0 +1 @@
--alter="drop primary key, drop key its_uidx, add primary key (i, ts), add unique key id_uidx(id)"
1 change: 1 addition & 0 deletions localtests/swap-pk-uk/order_by
@@ -0,0 +1 @@
id
24 changes: 24 additions & 0 deletions localtests/swap-uk-uk/create.sql
@@ -0,0 +1,24 @@
drop table if exists gh_ost_test;
create table gh_ost_test (
id bigint,
i int not null,
ts timestamp(6),
unique key id_uidx(id),
unique key its_uidx(i, ts)
) ;

drop event if exists gh_ost_test;
delimiter ;;
create event gh_ost_test
on schedule every 1 second
starts current_timestamp
ends current_timestamp + interval 60 second
on completion not preserve
enable
do
begin
insert into gh_ost_test values ((unix_timestamp() << 2) + 0, 11, now(6));
insert into gh_ost_test values ((unix_timestamp() << 2) + 1, 13, now(6));
insert into gh_ost_test values ((unix_timestamp() << 2) + 2, 17, now(6));
insert into gh_ost_test values ((unix_timestamp() << 2) + 3, 19, now(6));
end ;;
1 change: 1 addition & 0 deletions localtests/swap-uk-uk/extra_args
@@ -0,0 +1 @@
--alter="drop key id_uidx, drop key its_uidx, add unique key its2_uidx(i, ts), add unique key id2_uidx(id)"
1 change: 1 addition & 0 deletions localtests/swap-uk-uk/order_by
@@ -0,0 +1 @@
id
22 changes: 22 additions & 0 deletions localtests/swap-uk/create.sql
@@ -0,0 +1,22 @@
drop table if exists gh_ost_test;
create table gh_ost_test (
id int auto_increment,
i int not null,
ts timestamp,
primary key(id)
) auto_increment=1;

drop event if exists gh_ost_test;
delimiter ;;
create event gh_ost_test
on schedule every 1 second
starts current_timestamp
ends current_timestamp + interval 60 second
on completion not preserve
enable
do
begin
insert into gh_ost_test values (null, 11, now());
insert into gh_ost_test values (null, 13, now());
insert into gh_ost_test values (null, 17, now());
end ;;
1 change: 1 addition & 0 deletions localtests/swap-uk/extra_args
@@ -0,0 +1 @@
--alter="drop primary key, add unique key(id)"
10 changes: 7 additions & 3 deletions localtests/test.sh
Expand Up @@ -49,7 +49,7 @@ test_single() {
echo -n "Testing: $test_name"

echo_dot
gh-ost-test-mysql-replica -e "start slave"
gh-ost-test-mysql-replica -e "stop slave; start slave; do sleep(1)"
echo_dot
gh-ost-test-mysql-master --default-character-set=utf8mb4 test < $tests_path/$test_name/create.sql

Expand All @@ -59,12 +59,16 @@ test_single() {
fi
orig_columns="*"
ghost_columns="*"
order_by=""
if [ -f $tests_path/$test_name/orig_columns ] ; then
orig_columns=$(cat $tests_path/$test_name/orig_columns)
fi
if [ -f $tests_path/$test_name/ghost_columns ] ; then
ghost_columns=$(cat $tests_path/$test_name/ghost_columns)
fi
if [ -f $tests_path/$test_name/order_by ] ; then
order_by="order by $(cat $tests_path/$test_name/order_by)"
fi
# graceful sleep for replica to catch up
echo_dot
sleep 1
Expand Down Expand Up @@ -129,8 +133,8 @@ test_single() {
fi

echo_dot
orig_checksum=$(gh-ost-test-mysql-replica --default-character-set=utf8mb4 test -e "select ${orig_columns} from gh_ost_test" -ss | md5sum)
ghost_checksum=$(gh-ost-test-mysql-replica --default-character-set=utf8mb4 test -e "select ${ghost_columns} from _gh_ost_test_gho" -ss | md5sum)
orig_checksum=$(gh-ost-test-mysql-replica --default-character-set=utf8mb4 test -e "select ${orig_columns} from gh_ost_test ${order_by}" -ss | md5sum)
ghost_checksum=$(gh-ost-test-mysql-replica --default-character-set=utf8mb4 test -e "select ${ghost_columns} from _gh_ost_test_gho ${order_by}" -ss | md5sum)

if [ "$orig_checksum" != "$ghost_checksum" ] ; then
echo "ERROR $test_name: checksum mismatch"
Expand Down