Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-ost hang with statement based binlogs #258

Closed
igroene opened this issue Oct 4, 2016 · 18 comments
Closed

gh-ost hang with statement based binlogs #258

igroene opened this issue Oct 4, 2016 · 18 comments
Assignees
Labels

Comments

@igroene
Copy link

igroene commented Oct 4, 2016

Running gh-ost with these 3 options:

--assume-master-host=the-master-IP 
--tungsten 
--allow-on-master

on a setup where the master is using statement based binlogs, results on gh-ost hanging in this phase:

2016-10-03 15:06:05 INFO starting gh-ost 1.0.20
2016-10-03 15:06:05 INFO Migrating `test`.`test_table`
./gh-ost \
2016-10-03 15:06:05 INFO connection validated on 10.1.1.52:3306
2016-10-03 15:06:05 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `test`.*
2016-10-03 15:06:05 INFO binary logs validated on 10.1.1.52:3306
2016-10-03 15:06:05 INFO Restarting replication on 10.1.1.52:3306 to make sure binlog settings apply to replication thread
2016-10-03 15:06:05 INFO Table found. Engine=InnoDB
2016-10-03 15:06:05 INFO Estimated number of rows via EXPLAIN: 1108
2016-10-03 15:06:05 INFO binary logs updates validated on 10.1.1.51:3306
2016-10-03 15:06:05 INFO Master found to be db1:3306
2016-10-03 15:06:05 INFO connection validated on 10.1.1.51:3306
2016-10-03 15:06:05 INFO Registering replica at 10.1.1.51:3306
2016-10-03 15:06:05 INFO Connecting binlog streamer at db0-bin.000002:1905053
2016-10-03 15:06:05 INFO connection validated on 10.1.1.51:3306
2016-10-03 15:06:05 INFO rotate to next log name: db0-bin.000002
2016-10-03 15:06:05 INFO connection validated on 10.1.1.51:3306
2016-10-03 15:06:05 INFO Droppping table `test`.`_test_table_gho`
2016-10-03 15:06:05 INFO Table dropped
2016-10-03 15:06:05 INFO Droppping table `test`.`_test_table_del`
2016-10-03 15:06:05 INFO Table dropped
2016-10-03 15:06:05 INFO Droppping table `test`.`_test_table_ghc`
2016-10-03 15:06:05 INFO Table dropped
2016-10-03 15:06:05 INFO Creating changelog table `test`.`_test_table_ghc`
2016-10-03 15:06:05 INFO Changelog table created
2016-10-03 15:06:05 INFO Creating ghost table `test`.`_test_table_gho`
2016-10-03 15:06:05 INFO Ghost table created
2016-10-03 15:06:05 INFO Altering ghost table `test`.`_test_table_gho`
2016-10-03 15:06:05 INFO Ghost table altered
2016-10-03 15:06:05 INFO Waiting for tables to be in place
@shlomi-noach shlomi-noach self-assigned this Oct 4, 2016
@shlomi-noach
Copy link
Contributor

@groenewold-pythian can you please paste the entire command you are executing? What is the --host value? I can't figure out if you're actually connecting to master or not.

@shlomi-noach
Copy link
Contributor

Also please paste output with --debug flag

@igroene
Copy link
Author

igroene commented Oct 6, 2016

Master is 10.1.1.51 and slave is 10.1.1.52:

./gh-ost \
--assume-master-host=10.1.1.51 \
--allow-on-master \
--tungsten \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--max-lag-millis=1500 \
--user="ghost" \
--password="ghost" \
--host=10.1.1.52 \
--database="test" \
--table="incoming_file_job" \
--verbose \
--alter="engine=innodb" \
--cut-over=default \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
--initially-drop-ghost-table \
--initially-drop-old-table \
--exact-rowcount \
--execute \
--debug
2016-10-06 11:52:58 INFO starting gh-ost 1.0.20
2016-10-06 11:52:58 INFO Migrating `test`.`incoming_file_job`
2016-10-06 11:52:58 INFO connection validated on 10.1.1.52:3306
2016-10-06 11:52:58 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `test`.*
2016-10-06 11:52:58 INFO binary logs validated on 10.1.1.52:3306
2016-10-06 11:52:58 INFO Restarting replication on 10.1.1.52:3306 to make sure binlog settings apply to replication thread
2016-10-06 11:52:58 INFO Table found. Engine=InnoDB
2016-10-06 11:52:58 DEBUG Estimated number of rows via STATUS: 1123
2016-10-06 11:52:58 DEBUG Validated no foreign keys exist on table
2016-10-06 11:52:58 DEBUG Validated no triggers exist on table
2016-10-06 11:52:58 INFO Estimated number of rows via EXPLAIN: 1123
2016-10-06 11:52:58 DEBUG Potential unique keys in incoming_file_job: [PRIMARY (auto_increment): [job_id]; has nullable: false path: [path]; has nullable: true]
2016-10-06 11:52:58 DEBUG Looking for master on 10.1.1.52:3306
2016-10-06 11:52:58 INFO binary logs updates validated on 10.1.1.51:3306
2016-10-06 11:52:58 INFO Master found to be db1:3306
2016-10-06 11:52:58 INFO connection validated on 10.1.1.51:3306
2016-10-06 11:52:58 DEBUG Streamer binlog coordinates: db0-bin.000002:4547978
2016-10-06 11:52:58 INFO Registering replica at 10.1.1.51:3306
2016-10-06 11:52:58 INFO Connecting binlog streamer at db0-bin.000002:4547978
2016-10-06 11:52:58 DEBUG Beginning streaming
2016-10-06 11:52:58 INFO rotate to next log name: db0-bin.000002
2016-10-06 11:52:58 INFO connection validated on 10.1.1.51:3306
2016-10-06 11:52:58 INFO connection validated on 10.1.1.51:3306
2016-10-06 11:52:58 INFO Droppping table `test`.`_incoming_file_job_gho`
2016-10-06 11:52:58 INFO Table dropped
2016-10-06 11:52:58 INFO Droppping table `test`.`_incoming_file_job_del`
2016-10-06 11:52:58 INFO Table dropped
2016-10-06 11:52:58 INFO Droppping table `test`.`_incoming_file_job_ghc`
2016-10-06 11:52:58 INFO Table dropped
2016-10-06 11:52:58 INFO Creating changelog table `test`.`_incoming_file_job_ghc`
2016-10-06 11:52:58 INFO Changelog table created
2016-10-06 11:52:58 INFO Creating ghost table `test`.`_incoming_file_job_gho`
2016-10-06 11:52:58 INFO Ghost table created
2016-10-06 11:52:58 INFO Altering ghost table `test`.`_incoming_file_job_gho`
2016-10-06 11:52:58 DEBUG ALTER statement: alter /* gh-ost */ table `test`.`_incoming_file_job_gho` engine=innodb
2016-10-06 11:52:58 INFO Ghost table altered
2016-10-06 11:52:58 INFO Waiting for tables to be in place

@shlomi-noach
Copy link
Contributor

on a setup where the master is using statement based binlogs, results on gh-ost hanging in this phase

I don't think this is the root cause. I'm running on this setup and do not reproduce the problem.
Is the replica using log-slave-updates? can you see that it generates binary logs?

@shlomi-noach
Copy link
Contributor

Or, is the replication running at all?

@igroene
Copy link
Author

igroene commented Oct 6, 2016

Replication is indeed running. FWIW this works properly if I manually switch the master to row based binlogs.
So for command:

./gh-ost \
--assume-master-host=10.1.1.51 \
--allow-on-master \
--tungsten \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--max-lag-millis=1500 \
--user="ghost" \
--password="ghost" \
--host=10.1.1.52 \
--database="test" \
--table="t" \
--verbose \
--alter="engine=innodb" \
--cut-over=default \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
--initially-drop-ghost-table \
--initially-drop-old-table \
--exact-rowcount \
--execute \
--debug

the output is:

2016-10-06 16:16:23 INFO starting gh-ost 1.0.20
2016-10-06 16:16:23 INFO Migrating `test`.`t`
2016-10-06 16:16:23 INFO connection validated on 10.1.1.52:3306
2016-10-06 16:16:23 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `test`.*
2016-10-06 16:16:23 INFO binary logs validated on 10.1.1.52:3306
2016-10-06 16:16:23 INFO Restarting replication on 10.1.1.52:3306 to make sure binlog settings apply to replication thread
2016-10-06 16:16:24 INFO Table found. Engine=InnoDB
2016-10-06 16:16:24 DEBUG Estimated number of rows via STATUS: 2
2016-10-06 16:16:24 DEBUG Validated no foreign keys exist on table
2016-10-06 16:16:24 DEBUG Validated no triggers exist on table
2016-10-06 16:16:24 INFO Estimated number of rows via EXPLAIN: 2
2016-10-06 16:16:24 DEBUG Potential unique keys in t: [PRIMARY (auto_increment): [a]; has nullable: false]
2016-10-06 16:16:24 DEBUG Looking for master on 10.1.1.52:3306
2016-10-06 16:16:24 INFO binary logs updates validated on 10.1.1.51:3306
2016-10-06 16:16:24 INFO Master found to be db1:3306
2016-10-06 16:16:24 INFO connection validated on 10.1.1.51:3306
2016-10-06 16:16:24 DEBUG Streamer binlog coordinates: db0-bin.000002:30573579
2016-10-06 16:16:24 INFO Registering replica at 10.1.1.51:3306
2016-10-06 16:16:24 INFO Connecting binlog streamer at db0-bin.000002:30573579
2016-10-06 16:16:24 INFO connection validated on 10.1.1.51:3306
2016-10-06 16:16:24 DEBUG Beginning streaming
2016-10-06 16:16:24 INFO rotate to next log name: db0-bin.000002
2016-10-06 16:16:24 INFO connection validated on 10.1.1.51:3306
2016-10-06 16:16:24 INFO Droppping table `test`.`_t_gho`
2016-10-06 16:16:24 INFO Table dropped
2016-10-06 16:16:24 INFO Droppping table `test`.`_t_del`
2016-10-06 16:16:24 INFO Table dropped
2016-10-06 16:16:24 INFO Droppping table `test`.`_t_ghc`
2016-10-06 16:16:24 INFO Table dropped
2016-10-06 16:16:24 INFO Creating changelog table `test`.`_t_ghc`
2016-10-06 16:16:24 INFO Changelog table created
2016-10-06 16:16:24 INFO Creating ghost table `test`.`_t_gho`
2016-10-06 16:16:24 INFO Ghost table created
2016-10-06 16:16:24 INFO Altering ghost table `test`.`_t_gho`
2016-10-06 16:16:24 DEBUG ALTER statement: alter /* gh-ost */ table `test`.`_t_gho` engine=innodb
2016-10-06 16:16:24 INFO Ghost table altered
2016-10-06 16:16:24 INFO Waiting for tables to be in place

And here is the contents of the slave's binlog:

+----------------+------+-------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Log_name       | Pos  | Event_type  | Server_id | End_log_pos | Info                                                                                                                                                                                                                                                                                                                                             |
+----------------+------+-------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| db1-bin.000001 |    4 | Format_desc |         2 |         107 | Server ver: 5.5.46-37.6-log, Binlog ver: 4                                                                                                                                                                                                                                                                                                       |
| db1-bin.000001 |  107 | Query       |         2 |         225 | use `test`; DROP TABLE IF EXISTS `_t_gho` /* generated by server */                                                                                                                                                                                                                                                                              |
| db1-bin.000001 |  225 | Query       |         2 |         293 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 |  293 | Table_map   |         2 |         386 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 |  386 | Update_rows |         2 |         594 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 |  594 | Xid         |         2 |         621 | COMMIT /* xid=11730499 */                                                                                                                                                                                                                                                                                                                        |
| db1-bin.000001 |  621 | Query       |         2 |         739 | use `test`; DROP TABLE IF EXISTS `_t_del` /* generated by server */                                                                                                                                                                                                                                                                              |
| db1-bin.000001 |  739 | Query       |         2 |         807 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 |  807 | Table_map   |         2 |         900 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 |  900 | Update_rows |         2 |        1108 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 | 1108 | Xid         |         2 |        1135 | COMMIT /* xid=11730506 */                                                                                                                                                                                                                                                                                                                        |
| db1-bin.000001 | 1135 | Query       |         2 |        1253 | use `test`; DROP TABLE IF EXISTS `_t_ghc` /* generated by server */                                                                                                                                                                                                                                                                              |
| db1-bin.000001 | 1253 | Query       |         2 |        1321 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 | 1321 | Table_map   |         2 |        1414 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 | 1414 | Update_rows |         2 |        1622 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 | 1622 | Xid         |         2 |        1649 | COMMIT /* xid=11730513 */                                                                                                                                                                                                                                                                                                                        |
| db1-bin.000001 | 1649 | Query       |         2 |        2036 | use `test`; create /* gh-ost */ table `test`.`_t_ghc` (
            id bigint auto_increment,
            last_update timestamp not null DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
            hint varchar(64) charset ascii not null,
            value varchar(255) charset ascii not null,
            primary key(id),
            unique key hint_uidx(hint)
        ) auto_increment=256 |
| db1-bin.000001 | 2036 | Query       |         2 |        2104 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 | 2104 | Table_map   |         2 |        2197 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 | 2197 | Update_rows |         2 |        2405 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 | 2405 | Xid         |         2 |        2432 | COMMIT /* xid=11730520 */                                                                                                                                                                                                                                                                                                                        |
| db1-bin.000001 | 2432 | Query       |         2 |        2552 | use `test`; create /* gh-ost */ table `test`.`_t_gho` like `test`.`t`                                                                                                                                                                                                                                                                            |
| db1-bin.000001 | 2552 | Query       |         2 |        2620 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 | 2620 | Table_map   |         2 |        2713 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 | 2713 | Update_rows |         2 |        2921 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 | 2921 | Xid         |         2 |        2948 | COMMIT /* xid=11730527 */                                                                                                                                                                                                                                                                                                                        |
| db1-bin.000001 | 2948 | Query       |         2 |        3065 | use `test`; alter /* gh-ost */ table `test`.`_t_gho` engine=innodb                                                                                                                                                                                                                                                                               |
| db1-bin.000001 | 3065 | Query       |         2 |        3133 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 | 3133 | Table_map   |         2 |        3226 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 | 3226 | Update_rows |         2 |        3434 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 | 3434 | Xid         |         2 |        3461 | COMMIT /* xid=11730534 */                                                                                                                                                                                                                                                                                                                        |
| db1-bin.000001 | 3461 | Query       |         2 |        3529 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 | 3529 | Table_map   |         2 |        3581 | table_id: 342 (test._t_ghc)                                                                                                                                                                                                                                                                                                                      |
| db1-bin.000001 | 3581 | Write_rows  |         2 |        3643 | table_id: 342 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                  |
| db1-bin.000001 | 3643 | Table_map   |         2 |        3736 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 | 3736 | Update_rows |         2 |        3944 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 | 3944 | Xid         |         2 |        3971 | COMMIT /* xid=11730540 */                                                                                                                                                                                                                                                                                                                        |
| db1-bin.000001 | 3971 | Query       |         2 |        4039 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 | 4039 | Table_map   |         2 |        4091 | table_id: 342 (test._t_ghc)                                                                                                                                                                                                                                                                                                                      |
| db1-bin.000001 | 4091 | Write_rows  |         2 |        4176 | table_id: 342 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                  |
| db1-bin.000001 | 4176 | Table_map   |         2 |        4269 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 | 4269 | Update_rows |         2 |        4477 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 | 4477 | Xid         |         2 |        4504 | COMMIT /* xid=11730548 */                                                                                                                                                                                                                                                                                                                        |
| db1-bin.000001 | 4504 | Query       |         2 |        4572 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 | 4572 | Table_map   |         2 |        4624 | table_id: 342 (test._t_ghc)                                                                                                                                                                                                                                                                                                                      |
| db1-bin.000001 | 4624 | Write_rows  |         2 |        4707 | table_id: 342 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                  |
| db1-bin.000001 | 4707 | Table_map   |         2 |        4800 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 | 4800 | Update_rows |         2 |        5008 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 | 5008 | Xid         |         2 |        5035 | COMMIT /* xid=11730555 */                                                                                                                                                                                                                                                                                                                        |
| db1-bin.000001 | 5035 | Query       |         2 |        5103 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 | 5103 | Table_map   |         2 |        5155 | table_id: 342 (test._t_ghc)                                                                                                                                                                                                                                                                                                                      |
| db1-bin.000001 | 5155 | Update_rows |         2 |        5293 | table_id: 342 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                  |
| db1-bin.000001 | 5293 | Table_map   |         2 |        5386 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 | 5386 | Update_rows |         2 |        5594 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 | 5594 | Xid         |         2 |        5621 | COMMIT /* xid=11730562 */                                                                                                                                                                                                                                                                                                                        |
| db1-bin.000001 | 5621 | Query       |         2 |        5689 | BEGIN                                                                                                                                                                                                                                                                                                                                            |
| db1-bin.000001 | 5689 | Table_map   |         2 |        5741 | table_id: 342 (test._t_ghc)                                                                                                                                                                                                                                                                                                                      |
| db1-bin.000001 | 5741 | Update_rows |         2 |        5879 | table_id: 342 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                  |
| db1-bin.000001 | 5879 | Table_map   |         2 |        5972 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 | 5972 | Update_rows |         2 |        6180 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |
| db1-bin.000001 | 6180 | Xid         |         2 |        6207 | COMMIT /* xid=11730571 */                                                                                                                                                                                                                                                                                                                        |
+----------------+------+-------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
61 rows in set (0.00 sec)

@igroene
Copy link
Author

igroene commented Oct 6, 2016

I noticed gh-ost is trying to connect to the master to get the events even though I specified host=10.1.1.52 which is the slave:

mysql> select @@hostname;
+------------+
| @@hostname |
+------------+
| db0        |
+------------+
1 row in set (0.00 sec)

mysql> show slave hosts;
+-----------+------+------+-----------+
| Server_id | Host | Port | Master_id |
+-----------+------+------+-----------+
|     99999 | db1  | 3306 |         1 |
+-----------+------+------+-----------+
1 row in set (0.00 sec)

@shlomi-noach
Copy link
Contributor

Thank you for all the input, very useful

@shlomi-noach
Copy link
Contributor

This is the reason:

| db1-bin.000001 | 3529 | Table_map   |         2 |        3581 | table_id: 342 (test._t_ghc)                                                                                                                                                                                                                                                                                                                      |
| db1-bin.000001 | 3581 | Write_rows  |         2 |        3643 | table_id: 342 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                  |
| db1-bin.000001 | 3643 | Table_map   |         2 |        3736 | table_id: 63 (tungsten_compositeprod.trep_commit_seqno)                                                                                                                                                                                                                                                                                          |
| db1-bin.000001 | 3736 | Update_rows |         2 |        3944 | table_id: 63 flags: STMT_END_F                                                                                                                                                                                                                                                                                                                   |

There's a change on _t_ghc that gh-ost is expecting. How does it know the change is on _t_ghc? It looks for the preceeding Table_map event, which says table_id: 342 (test._t_ghc).

But then tungsten comes along and ruins the day saying table_id: 63 (tungsten_compositeprod.trep_commit_seqno)

and so gh-ost understands that the Update_rows command actually relates to tungsten_compositeprod.trep_commit_seqno table.

How to fix?

Remember when you were asking to replace --tungsten with something more generic? We just hit the exact reason for not making it more generic. To handle this issue I will have to magically skip tungsten specific queries. I will need your help in recognizing them.

Since I do not work with tungsten, I'm not sure how the binlog events look in general. I understand tungsten has to do its magic; how to other binlog events look for other tables? Is there always a tungsten_compositeprod.trep_commit_seqno table involved? Is it always the same table?

Please provide any information you may find useful.

@igroene
Copy link
Author

igroene commented Oct 7, 2016

Mmm in that case shouldn't it fail every time with tungsten? so far it works well if the master is using row based binlogs.
My understanding is Tungsten will add its own GTID to each transaction so you will see an update to trep_commit_seqno for each transaction, but it is a different event in the binlog so shouldn't affect gh-ost right?

I did a couple more tests and I believe this is what's going on:

  1. I start with master and slave using statement based binlogs
  2. I run the following command (note I am specifying host=10.1.1.52 which is the slave)
./gh-ost \
--assume-master-host=10.1.1.51 \
--allow-on-master \
--tungsten \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--max-lag-millis=1500 \
--user="ghost" \
--password="ghost" \
--host=10.1.1.52 \
--database="test" \
--table="t" \
--verbose \
--alter="engine=innodb" \
--cut-over=default \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
--initially-drop-ghost-table \
--initially-drop-old-table \
--exact-rowcount \
--execute \
--debug

this errors out as expected:

2016-10-07 12:42:33 FATAL You must be using ROW binlog format. I can switch it for you, provided --switch-to-rbr and that 10.1.1.52:3306 doesn't have replicas
  1. I now run the same command adding --switch-to-rbr (which doesn't really make sense in a tungsten setup since gh-ost is unable to restart tungsten replicator) now hangs as described:
2016-10-07 12:43:34 INFO Waiting for tables to be in place
  1. At this point, I can see gh-ost has switched slave 10.1.1.52 to row based binlogs, while gh-ost didn't make any change to the master 10.1.1.51 binlog setting

  2. I also see at this point that gh-ost has attached itself as a slave to the master 10.1.1.51 even though I specified host=10.1.1.52

My conclusion is gh-ost is hanging because it is expecting a row based event from the master that of course will not come since the master is still using statement binlogs.

Sorry about the length of this, please let me know if it makes sense

@shlomi-noach
Copy link
Contributor

Sorry about the length of this, please let me know if it makes sense

I wish all Issue submissions were insightful as yours!
Thank you, it seems like your diagnostics have figured it out. I'm looking further into. Not running a tungsten setup myself, so I'm simulating this in my brain. But thanks to this diagnosis I believe the problem is nailed.

@shlomi-noach
Copy link
Contributor

@groenewold-pythian can you please test #263? There was a bug with --assume-master-host that did not fully apply the hostname where it needed to. I suspect your problem may be the direct result of that.

I am not sure #263 will fix your problem, but it should at least provide with concise output.

Also, with this you should be able to discard the --allow-on-master flag. It shouldn't really be there.

@igroene
Copy link
Author

igroene commented Oct 10, 2016

I have compiled the branch from #263 and running it gives the following:

./gh-ost \
--assume-master-host=10.1.1.51 \
--tungsten \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--max-lag-millis=1500 \
--user="ghost" \
--password="ghost" \
--host=10.1.1.52 \
--database="test" \
--table="t" \
--verbose \
--alter="engine=innodb" \
--cut-over=default \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
--initially-drop-ghost-table \
--initially-drop-old-table \
--exact-rowcount \
--execute \
--debug

2016-10-10 12:12:29 INFO starting gh-ost 1.0.20
2016-10-10 12:12:29 INFO Migrating `test`.`t`
2016-10-10 12:12:29 INFO connection validated on 10.1.1.52:3306
2016-10-10 12:12:29 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `test`.*
2016-10-10 12:12:29 INFO binary logs validated on 10.1.1.52:3306
2016-10-10 12:12:29 INFO Restarting replication on 10.1.1.52:3306 to make sure binlog settings apply to replication thread
2016-10-10 12:12:29 INFO Table found. Engine=InnoDB
2016-10-10 12:12:29 DEBUG Estimated number of rows via STATUS: 2
2016-10-10 12:12:30 DEBUG Validated no foreign keys exist on table
2016-10-10 12:12:30 DEBUG Validated no triggers exist on table
2016-10-10 12:12:30 INFO Estimated number of rows via EXPLAIN: 2
2016-10-10 12:12:30 DEBUG Potential unique keys in t: [PRIMARY (auto_increment): [a]; has nullable: false]
2016-10-10 12:12:30 DEBUG Looking for master on 10.1.1.52:3306
2016-10-10 12:12:30 FATAL It seems like this migration attempt to run directly on master. Preferably it would be executed on a replica (and this reduces load from the master). To proceed please provide --allow-on-master

@shlomi-noach
Copy link
Contributor

@groenewold-pythian thank you for your continued patience on this. I've further pushed changes to #263. Would you kindly checkout this branch again and test it?

@igroene
Copy link
Author

igroene commented Oct 11, 2016

Works! only caveat is if you specify --host=master and --switch-to-rbr in addition to --allow-on-master the binlog format will be changed on the master itself.

@shlomi-noach
Copy link
Contributor

only caveat is if you specify --host=master and --switch-to-rbr in addition to --allow-on-master the binlog format will be changed on the master itself.

That makes sense to me. If --allow-on-master then migration runs on master, which implies --switch-to-rbr works on master.

gh-ost does check that the server does not have any replicas, and in the tungsten case it obviously cannot find such replicas. I'm not sure what would be the ideal fix for that.

If you still see this as a concern, please open a new issue.

@igroene
Copy link
Author

igroene commented Oct 11, 2016

I believe that is fine. Thanks again!

@shlomi-noach
Copy link
Contributor

#263 closes this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants