Skip to content
This repository has been archived by the owner on Aug 21, 2023. It is now read-only.

avoid get blocked in dumping when mysql connection is broken #190

Merged
merged 15 commits into from
Nov 13, 2020

Conversation

lichunzhu
Copy link
Contributor

@lichunzhu lichunzhu commented Nov 6, 2020

What problem does this PR solve?

Fix #181

dumpling may get blocked at fetching data from database server when the connection to the database server is closed due to some reason.

Here is the goroutine information:

goroutine 265 [IO wait]:
internal/poll.runtime_pollWait(0x8c159d8, 0x72, 0xffffffffffffffff)
	/usr/local/go/src/runtime/netpoll.go:184 +0x55
internal/poll.(*pollDesc).wait(0xc00064c698, 0x72, 0xf00, 0xfc8, 0xffffffffffffffff)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc00064c680, 0xc000820038, 0xfc8, 0xfc8, 0x0, 0x0, 0x0)
	/usr/local/go/src/internal/poll/fd_unix.go:169 +0x22b
net.(*netFD).Read(0xc00064c680, 0xc000820038, 0xfc8, 0xfc8, 0xc000022600, 0xc000897000, 0x64)
	/usr/local/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc000a6e0a0, 0xc000820038, 0xfc8, 0xfc8, 0x0, 0x0, 0x0)
	/usr/local/go/src/net/net.go:184 +0x68
github.com/go-sql-driver/mysql.(*buffer).fill(0xc000213e60, 0xc6, 0x0, 0x0)
	/Users/chauncy/code/goPath/pkg/mod/github.com/go-sql-driver/mysql@v1.5.0/buffer.go:90 +0x13c
github.com/go-sql-driver/mysql.(*buffer).readNext(0xc000213e60, 0xc6, 0xc000820fc4, 0x4, 0x3c, 0x0, 0x0)
	/Users/chauncy/code/goPath/pkg/mod/github.com/go-sql-driver/mysql@v1.5.0/buffer.go:119 +0x9c
github.com/go-sql-driver/mysql.(*mysqlConn).readPacket(0xc000213e60, 0x5668bc1, 0x711b3a7, 0x53, 0xc0, 0x71121e8)
	/Users/chauncy/code/goPath/pkg/mod/github.com/go-sql-driver/mysql@v1.5.0/packets.go:67 +0x120
github.com/go-sql-driver/mysql.(*textRows).readRow(0xc00083e780, 0xc001801dc0, 0x4, 0x4, 0x53, 0xc0)
	/Users/chauncy/code/goPath/pkg/mod/github.com/go-sql-driver/mysql@v1.5.0/packets.go:742 +0x55
github.com/go-sql-driver/mysql.(*textRows).Next(0xc00083e780, 0xc001801dc0, 0x4, 0x4, 0x1, 0x0)
	/Users/chauncy/code/goPath/pkg/mod/github.com/go-sql-driver/mysql@v1.5.0/rows.go:220 +0x86
database/sql.(*Rows).nextLocked(0xc00064c780, 0x4020000)
	/usr/local/go/src/database/sql/sql.go:2767 +0xd5
database/sql.(*Rows).Next.func1()
	/usr/local/go/src/database/sql/sql.go:2745 +0x3c
database/sql.withLock(0x5fecc20, 0xc00064c7b0, 0xc000ac70e8)
	/usr/local/go/src/database/sql/sql.go:3184 +0x6d
database/sql.(*Rows).Next(0xc00064c780, 0xc0009940c0)
	/usr/local/go/src/database/sql/sql.go:2744 +0x87
github.com/pingcap/dumpling/v4/export.(*rowIter).Next(0xc000945440)
	/Users/chauncy/code/goPath/src/github.com/pingcap/dumpling/v4/export/ir_impl.go:48 +0x2e
github.com/pingcap/dumpling/v4/export.WriteInsert(0x600a920, 0xc000904980, 0x602eb80, 0xc0008e0140, 0x5fe9b20, 0xc001801e00, 0x0, 0xf4240, 0x0, 0x0)
	/Users/chauncy/code/goPath/src/github.com/pingcap/dumpling/v4/export/writer_util.go:193 +0x7a2
github.com/pingcap/dumpling/v4/export.SQLWriter.WriteTableData(0xc0000fe000, 0x600a920, 0xc000904980, 0x602eb80, 0xc0008e0140, 0x0, 0x0)
	/Users/chauncy/code/goPath/src/github.com/pingcap/dumpling/v4/export/writer.go:86 +0x393
github.com/pingcap/dumpling/v4/export.dumpDatabases.func1.1(0x0, 0x0)
	/Users/chauncy/code/goPath/src/github.com/pingcap/dumpling/v4/export/dump.go:276 +0x851
github.com/pingcap/br/pkg/utils.WithRetry(0x600a920, 0xc000904980, 0xc000ac7f00, 0x5fe9b60, 0xc0000524e0, 0x0, 0x0)
	/Users/chauncy/code/goPath/pkg/mod/github.com/pingcap/br@v0.0.0-20201027124415-c2ed897feada/pkg/utils/retry.go:34 +0x7c
github.com/pingcap/dumpling/v4/export.dumpDatabases.func1(0xc0009b7f68, 0x0)
	/Users/chauncy/code/goPath/src/github.com/pingcap/dumpling/v4/export/dump.go:266 +0x16d
golang.org/x/sync/errgroup.(*Group).Go.func1(0xc000944b10, 0xc000905a40)
	/Users/chauncy/code/goPath/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:57 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/Users/chauncy/code/goPath/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x66

What is changed and how it works?

  1. Add readTimeout in dsn. The default value is 15m and add a hidden configuration read-timeout.
  2. Add extraConn in connectionPool to avoid recreating connections while writing metadata, etc.
  3. Add context.CancelFunc in tableIR. When dumping progress fails, MySQL conn will employ readUntilEOF function to read the remained buffer which will waste a lot of time. For aws client, the retry logic is in their function. So we needn't wait for this and should quit directly.
  4. When dumping progress fails, don't write metadata to improve efficiency.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    Try to kill mysql socket connection while dumping. dumpling on master branch may get blocked. But dumpling built from this branch won't.

Related changes

  • Need to cherry-pick to the release branch

Release note

  • fix the problem that dumpling may get blocked when its connection to database server is closed

@lichunzhu
Copy link
Contributor Author

Do we need an extra MySQL connection?

dsn := fmt.Sprintf("%s:%s@tcp(%s:%d)/%s?charset=utf8mb4", conf.User, conf.Password, conf.Host, conf.Port, db)
// maxAllowedPacket=0 can be used to automatically fetch the max_allowed_packet variable from server on every connection.
// https://github.com/go-sql-driver/mysql#maxallowedpacket
dsn := fmt.Sprintf("%s:%s@tcp(%s:%d)/%s?charset=utf8mb4&readTimeout=30s&writeTimeout=30s&interpolateParams=true&maxAllowedPacket=0",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any chance of mysql taking longer than 30 seconds to start sending data when the chunk size is large? Probably not right? Since you query by primary key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! How long do you think is appropriate? Or, are there any more suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about keeping it 30s and make it configurable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was genuinely curious if it can take over 30 seconds to start sending data when you are querying directly by primary key. Since it’s the time to first byte sent that seems highly unlikely so 30 seconds seems fine. I’m not sure though. Configurable sounds safest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have raised this variable to 15m and make it configurable.

v4/export/config.go Outdated Show resolved Hide resolved
@lichunzhu lichunzhu changed the title refine dumpling database connections avoid get blocked in dumping when mysql connection is broken Nov 12, 2020
Copy link
Collaborator

@kennytm kennytm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lichunzhu lichunzhu merged commit 552470f into pingcap:master Nov 13, 2020
@lichunzhu lichunzhu deleted the refineDumpConn branch November 13, 2020 02:23
tisonkun pushed a commit to tisonkun/dumpling that referenced this pull request Oct 20, 2021
…#190)

* add extra conn to avoid get blocked at concurrently dumping

* add readTimeout/writeTimeout to avoid get blocked at fetching data from database server

* avoid wasting at finishing metadata when dumping is already failed

* add read-timeout parameter and mark it hidden
tisonkun pushed a commit to tisonkun/dumpling that referenced this pull request Oct 20, 2021
…#190)

* add extra conn to avoid get blocked at concurrently dumping

* add readTimeout/writeTimeout to avoid get blocked at fetching data from database server

* avoid wasting at finishing metadata when dumping is already failed

* add read-timeout parameter and mark it hidden
tisonkun pushed a commit to tisonkun/dumpling that referenced this pull request Oct 20, 2021
…#190)

* add extra conn to avoid get blocked at concurrently dumping

* add readTimeout/writeTimeout to avoid get blocked at fetching data from database server

* avoid wasting at finishing metadata when dumping is already failed

* add read-timeout parameter and mark it hidden
tisonkun pushed a commit to tisonkun/dumpling that referenced this pull request Oct 20, 2021
…#190)

* add extra conn to avoid get blocked at concurrently dumping

* add readTimeout/writeTimeout to avoid get blocked at fetching data from database server

* avoid wasting at finishing metadata when dumping is already failed

* add read-timeout parameter and mark it hidden
tisonkun pushed a commit to tisonkun/tidb that referenced this pull request Oct 20, 2021
…/dumpling#190)

* add extra conn to avoid get blocked at concurrently dumping

* add readTimeout/writeTimeout to avoid get blocked at fetching data from database server

* avoid wasting at finishing metadata when dumping is already failed

* add read-timeout parameter and mark it hidden
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dumpling hangs on Aurora backup
3 participants