Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

task pause spent a lot of time #1845

Closed
GMHDBJD opened this issue Jul 7, 2021 · 2 comments · Fixed by #1954
Closed

task pause spent a lot of time #1845

GMHDBJD opened this issue Jul 7, 2021 · 2 comments · Fixed by #1954
Labels
affected-v2.0.2 this issue/BUG affects v2.0.2 affected-v2.0.3 this issue/BUG affects v2.0.3 affected-v2.0.4 this issue/BUG affects v2.0.4 affected-v2.0.5 this issue/BUG affects v2.0.5 severity/minor type/bug This issue is a bug report

Comments

@GMHDBJD
Copy link
Collaborator

GMHDBJD commented Jul 7, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.
    upstream database connection error(master-slave switch/network connection error)
  2. What did you expect to see?
    task pause quickly
  3. What did you see instead?
[2021/07/07 15:09:36.460 +08:00] [INFO] [syncer.go:1494] ["deadline exceeded when fetching binlog event"] [task=test] [unit="binlog replication"]
[2021/07/07 15:09:36.460 +08:00] [ERROR] [streamer_controller.go:259] ["meet error when get binlog event"] [task=test] [unit="binlog replication"] [error="io.ReadFull(header) failed. err read tcp 172.16.5.177:35038->172.16.4.249:13307: i/o timeout: connection was bad"] [errorVerbose="connection was bad\ngithub.com/go-mysql-org/go-mysql/mysql.init\n\t/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.1.3-0.20210519094347-5ec8d511f5be/mysql/error.go:10\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5222\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5217\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5217\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5217\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:190\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\nio.ReadFull(header) failed. err read tcp 172.16.5.177:35038->172.16.4.249:13307: i/o timeout\ngithub.com/go-mysql-org/go-mysql/packet.(*Conn).ReadPacketTo\n\t/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.1.3-0.20210519094347-5ec8d511f5be/packet/conn.go:133\ngithub.com/go-mysql-org/go-mysql/packet.(*Conn).ReadPacketReuseMem\n\t/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.1.3-0.20210519094347-5ec8d511f5be/packet/conn.go:98\ngithub.com/go-mysql-org/go-mysql/packet.(*Conn).ReadPacket\n\t/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.1.3-0.20210519094347-5ec8d511f5be/packet/conn.go:90\ngithub.com/go-mysql-org/go-mysql/replication.(*BinlogSyncer).onStream\n\t/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.1.3-0.20210519094347-5ec8d511f5be/replication/binlogsyncer.go:662\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
[2021/07/07 15:09:36.460 +08:00] [ERROR] [syncer.go:1507] ["fail to fetch binlog"] [task=test] [unit="binlog replication"] [error="io.ReadFull(header) failed. err read tcp 172.16.5.177:35038->172.16.4.249:13307: i/o timeout: connection was bad"]
[2021/07/07 15:09:36.501 +08:00] [INFO] [syncer.go:1003] ["flushed checkpoint"] [task=test] [unit="binlog replication"] [checkpoint="position: (bin.000001, 125990755), gtid-set: 166e4dcb-d3fa-11eb-8670-e6de94bb3bc7:1-48508,7e49c814-dae6-11eb-b9f2-22e4d0b6856e:1-19533(flushed position: (bin.000001, 125990755), gtid-set: 166e4dcb-d3fa-11eb-8670-e6de94bb3bc7:1-48508,7e49c814-dae6-11eb-b9f2-22e4d0b6856e:1-19533)"]
[2021/07/07 15:10:04.886 +08:00] [ERROR] [syncer.go:2622] ["fail to estimate unreplicated binlog size"] [task=test] [unit="binlog replication"] [error="[code=10003:class=database:scope=not-set:level=high], Message: database driver, RawCause: invalid connection, Workaround: Please check the database connection, then use `pause-task` to stop the task and then use `resume-task` to resume the task."] [errorVerbose="[code=10003:class=database:scope=not-set:level=high], Message: database driver, RawCause: invalid connection, Workaround: Please check the database connection, then use `pause-task` to stop the task and then use `resume-task` to resume the task.\ngithub.com/pingcap/dm/pkg/terror.(*Error).Delegate\n\t/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github.com/pingcap/dm/pkg/terror/terror.go:309\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdaptArgs\n\t/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:37\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdapt\n\t/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:46\ngithub.com/pingcap/dm/syncer.getBinaryLogs\n\t/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github.com/pingcap/dm/syncer/db.go:345\ngithub.com/pingcap/dm/syncer.countBinaryLogsSize\n\t/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github.com/pingcap/dm/syncer/db.go:319\ngithub.com/pingcap/dm/syncer.(*UpStreamConn).countBinaryLogsSize\n\t/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github.com/pingcap/dm/syncer/db.go:112\ngithub.com/pingcap/dm/syncer.(*Syncer).printStatus\n\t/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github.com/pingcap/dm/syncer/syncer.go:2619\ngithub.com/pingcap/dm/syncer.(*Syncer).Run.func7\n\t/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github.com/pingcap/dm/syncer/syncer.go:1377\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
[2021/07/07 15:10:04.887 +08:00] [ERROR] [syncer.go:2643] ["fail to get master status"] [task=test] [unit="binlog replication"] [error="[code=10001:class=database:scope=not-set:level=high], Message: database driver error, RawCause: context canceled, Workaround: Please check the database connection and the database config in configuration file."]
[2021/07/07 15:10:04.887 +08:00] [INFO] [syncer.go:2654] ["binlog replication status"] [task=test] [unit="binlog replication"] [total_events=136951] [total_tps=163] [tps=27] [master_position="(, 0)"] [master_gtid=NULL] [checkpoint="position: (bin.000001, 125990755), gtid-set: 166e4dcb-d3fa-11eb-8670-e6de94bb3bc7:1-48508,7e49c814-dae6-11eb-b9f2-22e4d0b6856e:1-19533(flushed position: (bin.000001, 125990755), gtid-set: 166e4dcb-d3fa-11eb-8670-e6de94bb3bc7:1-48508,7e49c814-dae6-11eb-b9f2-22e4d0b6856e:1-19533)"]
[2021/07/07 15:10:04.887 +08:00] [ERROR] [syncer.go:2643] ["fail to get master status"] [task=test] [unit="binlog replication"] [error="[code=10001:class=database:scope=not-set:level=high], Message: database driver error, RawCause: context canceled, Workaround: Please check the database connection and the database config in configuration file."]
[2021/07/07 15:10:04.887 +08:00] [INFO] [syncer.go:2654] ["binlog replication status"] [task=test] [unit="binlog replication"] [total_events=136951] [total_tps=0] [tps=0] [master_position="(, 0)"] [master_gtid=NULL] [checkpoint="position: (bin.000001, 125990755), gtid-set: 166e4dcb-d3fa-11eb-8670-e6de94bb3bc7:1-48508,7e49c814-dae6-11eb-b9f2-22e4d0b6856e:1-19533(flushed position: (bin.000001, 125990755), gtid-set: 166e4dcb-d3fa-11eb-8670-e6de94bb3bc7:1-48508,7e49c814-dae6-11eb-b9f2-22e4d0b6856e:1-19533)"]
[2021/07/07 15:10:04.887 +08:00] [INFO] [syncer.go:2596] ["print status routine exits"] [task=test] [unit="binlog replication"] [error="context canceled"]
[2021/07/07 15:10:04.891 +08:00] [ERROR] [streamer_controller.go:326] ["fail to kill last connection"] [task=test] [unit="binlog replication"] ["connection ID"=175] [error="[code=10001:class=database:scope=not-set:level=high], Message: database driver error, RawCause: Error 1094: Unknown thread id: 175, Workaround: Please check the database connection and the database config in configuration file."]
[2021/07/07 15:10:04.893 +08:00] [INFO] [subtask.go:292] ["unit process returned"] [subtask=test] [unit=Sync] [stage=Paused] [status="{\"totalEvents\":136951,\"syncerBinlog\":\"(bin.000001, 125990755)\",\"syncerBinlogGtid\":\"166e4dcb-d3fa-11eb-8670-e6de94bb3bc7:1-48508,7e49c814-dae6-11eb-b9f2-22e4d0b6856e:1-19533\",\"binlogType\":\"remote\"}"]
  1. Versions of the cluster

    • DM version (run dmctl -V or dm-worker -V or dm-master -V):

       v2.0.4
      
  2. Root cause
    use ctx.Background() in countBinaryLogsSize

@GMHDBJD GMHDBJD added the type/bug This issue is a bug report label Jul 7, 2021
@GMHDBJD GMHDBJD changed the title task pause spent more than 30s task pause a lot of time Jul 7, 2021
@GMHDBJD GMHDBJD changed the title task pause a lot of time task pause spent a lot of time Jul 7, 2021
@lance6716 lance6716 added affected-v2.0.2 this issue/BUG affects v2.0.2 affected-v2.0.3 this issue/BUG affects v2.0.3 affected-v2.0.4 this issue/BUG affects v2.0.4 severity/minor labels Jul 7, 2021
@GMHDBJD GMHDBJD added the affected-v2.0.5 this issue/BUG affects v2.0.5 label Jul 30, 2021
@okJiang
Copy link
Member

okJiang commented Aug 3, 2021

At first glance, only need to replace Query with QueryContext?

Like this

rows, err := db.QueryContext(ctx, `SHOW SLAVE HOSTS`)

@okJiang
Copy link
Member

okJiang commented Aug 3, 2021

And seems no other place use Query function in DM.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affected-v2.0.2 this issue/BUG affects v2.0.2 affected-v2.0.3 this issue/BUG affects v2.0.3 affected-v2.0.4 this issue/BUG affects v2.0.4 affected-v2.0.5 this issue/BUG affects v2.0.5 severity/minor type/bug This issue is a bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants