Support `binlog_row_image=MINIMAL` #230

Open
shlomi-noach opened this Issue Sep 12, 2016 · 7 comments

Comments

Projects
None yet
3 participants
@shlomi-noach
Collaborator

shlomi-noach commented Sep 12, 2016

Right now gh-ost only supports binlog_row_image=FULL

@shlomi-noach shlomi-noach self-assigned this Sep 12, 2016

@ggunson

This comment has been minimized.

Show comment
Hide comment
@ggunson

ggunson Sep 13, 2016

Contributor

Given the issues we saw with #3, is that possible without having to resort to insert...selects? We were seeing some weird behaviour with tables that had primary keys as well as unique keys.

Contributor

ggunson commented Sep 13, 2016

Given the issues we saw with #3, is that possible without having to resort to insert...selects? We were seeing some weird behaviour with tables that had primary keys as well as unique keys.

@shlomi-noach

This comment has been minimized.

Show comment
Hide comment
@shlomi-noach

shlomi-noach Sep 13, 2016

Collaborator

I believe gh-ost should be fine updating a PK value reading it from the binary logs. local tests now give us a good platform to confirm that.

Collaborator

shlomi-noach commented Sep 13, 2016

I believe gh-ost should be fine updating a PK value reading it from the binary logs. local tests now give us a good platform to confirm that.

@shlomi-noach

This comment has been minimized.

Show comment
Hide comment
@shlomi-noach

shlomi-noach Sep 13, 2016

Collaborator

Unsure yet how to differentiate minimal-NULLs and real-NULLs. Looking at:

create table tnull(id int auto_increment primary key, c1null int default null, c2null int default null, c3 int not null default 0);
insert into tnull values (null, 4, 4, 4);
update tnull set c1null=44 where c3=4;

go-mysql output:

=== XIDEvent ===
Date: 2016-09-13 07:55:11
Log position: 33349
Event size: 31
XID: 482593

=== QueryEvent ===
Date: 2016-09-13 07:55:11
Log position: 33421
Event size: 72
Slave proxy ID: 199978
Execution time: 0
Error code: 0
Schema: test
Query: BEGIN

=== TableMapEvent ===
Date: 2016-09-13 07:55:11
Log position: 33472
Event size: 51
TableID: 927
TableID size: 6
Flags: 1
Schema: test
Table: tnull
Column count: 4
Column type:
00000000  03 03 03 03                                       |....|
NULL bitmap:
00000000  06                                                |.|

=== UpdateRowsEventV2 ===
Date: 2016-09-13 07:55:11
Log position: 33518
Event size: 46
TableID: 927
Flags: 1
Column count: 4
Values:
--
0:4
1:<nil>
2:<nil>
3:<nil>
--
0:<nil>
1:44
2:<nil>
3:<nil>

what in the above suggests that c2null was kept unchanged as opposed to being set as null?

Followed by

update tnull set c2null=46 where c3=4;

I see:

=== UpdateRowsEventV2 ===
Date: 2016-09-13 08:10:40
Log position: 33918
Event size: 46
TableID: 927
Flags: 1
Column count: 4
Values:
--
0:4
1:<nil>
2:<nil>
3:<nil>
--
0:<nil>
1:<nil>
2:46
3:<nil>

so how do I now know that c1null was unchanged from 44 as opposed to set-to-null?

Collaborator

shlomi-noach commented Sep 13, 2016

Unsure yet how to differentiate minimal-NULLs and real-NULLs. Looking at:

create table tnull(id int auto_increment primary key, c1null int default null, c2null int default null, c3 int not null default 0);
insert into tnull values (null, 4, 4, 4);
update tnull set c1null=44 where c3=4;

go-mysql output:

=== XIDEvent ===
Date: 2016-09-13 07:55:11
Log position: 33349
Event size: 31
XID: 482593

=== QueryEvent ===
Date: 2016-09-13 07:55:11
Log position: 33421
Event size: 72
Slave proxy ID: 199978
Execution time: 0
Error code: 0
Schema: test
Query: BEGIN

=== TableMapEvent ===
Date: 2016-09-13 07:55:11
Log position: 33472
Event size: 51
TableID: 927
TableID size: 6
Flags: 1
Schema: test
Table: tnull
Column count: 4
Column type:
00000000  03 03 03 03                                       |....|
NULL bitmap:
00000000  06                                                |.|

=== UpdateRowsEventV2 ===
Date: 2016-09-13 07:55:11
Log position: 33518
Event size: 46
TableID: 927
Flags: 1
Column count: 4
Values:
--
0:4
1:<nil>
2:<nil>
3:<nil>
--
0:<nil>
1:44
2:<nil>
3:<nil>

what in the above suggests that c2null was kept unchanged as opposed to being set as null?

Followed by

update tnull set c2null=46 where c3=4;

I see:

=== UpdateRowsEventV2 ===
Date: 2016-09-13 08:10:40
Log position: 33918
Event size: 46
TableID: 927
Flags: 1
Column count: 4
Values:
--
0:4
1:<nil>
2:<nil>
3:<nil>
--
0:<nil>
1:<nil>
2:46
3:<nil>

so how do I now know that c1null was unchanged from 44 as opposed to set-to-null?

@shlomi-noach

This comment has been minimized.

Show comment
Hide comment
@shlomi-noach

shlomi-noach Nov 18, 2016

Collaborator

@sjmudd brought to my attention the matter of non-default columns, which would block us from supporting binlog_row_image=MINIMAL, and then I realized the problem is even bigger. To elaborate:

Say we just began the migration. Our ghost table is empty or nearly empty. An UPDATE executes on the original table, on a row that is not copied yet to the ghost table. That UPDATE sets value to some col1, but does not change values for column col2.

gh-ost needs to create this row on the ghost table; however, how can it do it? What values will it place in col2?

What we could do is: once we get the PK value from the binary log, we read the entire row from the original table and write it onto the ghost table. This would either be a

  • REPLACE INTO _tbl_gho (columns...) SELECT columns... FROM tbl WHERE <pk-values>
  • or read into app then write back

The first binds us to same-server migrations (and at this time we are already bounded in similar way by row-copy); the latter allows us to run cross-server migrations.

Collaborator

shlomi-noach commented Nov 18, 2016

@sjmudd brought to my attention the matter of non-default columns, which would block us from supporting binlog_row_image=MINIMAL, and then I realized the problem is even bigger. To elaborate:

Say we just began the migration. Our ghost table is empty or nearly empty. An UPDATE executes on the original table, on a row that is not copied yet to the ghost table. That UPDATE sets value to some col1, but does not change values for column col2.

gh-ost needs to create this row on the ghost table; however, how can it do it? What values will it place in col2?

What we could do is: once we get the PK value from the binary log, we read the entire row from the original table and write it onto the ghost table. This would either be a

  • REPLACE INTO _tbl_gho (columns...) SELECT columns... FROM tbl WHERE <pk-values>
  • or read into app then write back

The first binds us to same-server migrations (and at this time we are already bounded in similar way by row-copy); the latter allows us to run cross-server migrations.

@ggunson

This comment has been minimized.

Show comment
Hide comment
@ggunson

ggunson Nov 18, 2016

Contributor

Is it possible to have it use the replace into...select or select-then-replace-via-app until the row copy is complete (assuming it's set to postpone cut-over) and then revert to the default method from then on? That is assuming that this copying method is going to be impacting performance enough to warrant trying to avoid it.

Contributor

ggunson commented Nov 18, 2016

Is it possible to have it use the replace into...select or select-then-replace-via-app until the row copy is complete (assuming it's set to postpone cut-over) and then revert to the default method from then on? That is assuming that this copying method is going to be impacting performance enough to warrant trying to avoid it.

@jonahberquist

This comment has been minimized.

Show comment
Hide comment
@jonahberquist

jonahberquist Nov 18, 2016

Contributor

Say we just began the migration. Our ghost table is empty or nearly empty. An UPDATE executes on the original table, on a row that is not copied yet to the ghost table. That UPDATE sets value to some col1, but does not change values for column col2.

gh-ost needs to create this row on the ghost table

I don't think this is correct. If the row has not been copied to the ghost table when an update comes in, gh-ost can just ignore the update. The row will have the updated values when it gets copied later in the process.

Contributor

jonahberquist commented Nov 18, 2016

Say we just began the migration. Our ghost table is empty or nearly empty. An UPDATE executes on the original table, on a row that is not copied yet to the ghost table. That UPDATE sets value to some col1, but does not change values for column col2.

gh-ost needs to create this row on the ghost table

I don't think this is correct. If the row has not been copied to the ghost table when an update comes in, gh-ost can just ignore the update. The row will have the updated values when it gets copied later in the process.

@shlomi-noach

This comment has been minimized.

Show comment
Hide comment
@shlomi-noach

shlomi-noach Nov 21, 2016

Collaborator

I don't think this is correct. If the row has not been copied to the ghost table when an update comes in, gh-ost can just ignore the update. The row will have the updated values when it gets copied later in the process.

You are right. But this also complicates matters a bit: because the update would nonetheless generate an error; which means gh-ost would need to verify that this update runs on a row that hasn't been iterated yet.

That is not trivial but also not too difficult. The check would need to be delegated to MySQL, because gh-ost itself cannot compare PK range values (consider collations for example).

This implies issuing a select before each update operation.

Collaborator

shlomi-noach commented Nov 21, 2016

I don't think this is correct. If the row has not been copied to the ghost table when an update comes in, gh-ost can just ignore the update. The row will have the updated values when it gets copied later in the process.

You are right. But this also complicates matters a bit: because the update would nonetheless generate an error; which means gh-ost would need to verify that this update runs on a row that hasn't been iterated yet.

That is not trivial but also not too difficult. The check would need to be delegated to MySQL, because gh-ost itself cannot compare PK range values (consider collations for example).

This implies issuing a select before each update operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment