Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiFlash crash with high pressure write and read workload with conprof #4172

Closed
lilinghai opened this issue Mar 2, 2022 · 6 comments
Closed

Comments

@lilinghai
Copy link

lilinghai commented Mar 2, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Run a high pressure write and read workload with tiflash cpu continue profiling

[2022/03/01 22:52:26.860 +08:00] [ERROR] [<unknown>] [BaseDaemon:########################################] [thread_id=29592]
[2022/03/01 22:52:26.860 +08:00] [ERROR] [<unknown>] ["BaseDaemon:(from thread 29591) Received signal Segmentation fault (11)."] [thread_id=29592]
[2022/03/01 22:52:26.860 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Address: 0x66000"] [thread_id=29592]
[2022/03/01 22:52:26.860 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Access: read."] [thread_id=29592]
[2022/03/01 22:52:26.860 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Address not mapped to object."] [thread_id=29592]
[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:0. /tiflash/tiflash() [0x4ec1bb1]"] [thread_id=29592]
[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:1. /lib64/libpthread.so.0(+0xf630) [0x7fad7c48a630]"] [thread_id=29592]
[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:2. /lib64/libgcc_s.so.1(+0xf0f8) [0x7fad7c2740f8]"] [thread_id=29592]
[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:3. /lib64/libgcc_s.so.1(_Unwind_Backtrace+0x69) [0x7fad7c274ff9]"] [thread_id=29592]
[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:4. /tiflash/libtiflash_proxy.so(perf_signal_handler+0x11a) [0x7fad7dc8ef4a]"] [thread_id=29592]
[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:5. /lib64/libpthread.so.0(+0xf630) [0x7fad7c48a630]"] [thread_id=29592]
[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:6. /tiflash/libtiflash_proxy.so(+0x1326741) [0x7fad7e0cb741]"] [thread_id=29592]

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

master
[type=tiflash] [version=5.5.0-nightly] [git_hash=8a0d0007c4d5c9630fcd0ca11784301466f22070]
[type=pd] [version=5.5.0-nightly] [git_hash=a4b1968c3fe64c555604ca72526a2e6d028b1423]
[type=tikv] [version=5.4.0-alpha] [git_hash=d7ae087ea23e6260f7e8f521b25dc8619eebddcd]
[type=tidb] [version=5.5.0-nightly] [git_hash=f9a324829b71a24141d5326ddc0026f6b12125c3]

@lilinghai lilinghai added the type/bug Issue for bug label Mar 2, 2022
@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Mar 2, 2022

截屏2022-03-02 下午5 23 10

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Mar 2, 2022

ERROR logs:

150833:[2022/03/01 22:30:41.661 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.119.218:20160: Keepalive watchdog fired. Closing transport."] [thread_id=6]
267428:[2022/03/01 22:32:47.912 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.113.61:20160: Keepalive watchdog fired. Closing transport."] [thread_id=5]
455261:[2022/03/01 22:35:53.514 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.113.61:20160: Keepalive watchdog fired. Closing transport."] [thread_id=7310]
455275:[2022/03/01 22:35:53.611 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.78.90:3930: Keepalive watchdog fired. Closing transport."] [thread_id=6]
455638:[2022/03/01 22:35:56.661 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.84.130:20160: Keepalive watchdog fired. Closing transport."] [thread_id=5]
551093:[2022/03/01 22:37:41.661 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.119.218:20160: Keepalive watchdog fired. Closing transport."] [thread_id=7310]
764906:[2022/03/01 22:41:11.664 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.113.61:20160: Keepalive watchdog fired. Closing transport."] [thread_id=7310]
865003:[2022/03/01 22:42:53.404 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.78.90:3930: Keepalive watchdog fired. Closing transport."] [thread_id=6]
969075:[2022/03/01 22:44:38.580 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.119.218:20160: Keepalive watchdog fired. Closing transport."] [thread_id=7310]
1061828:[2022/03/01 22:46:23.361 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.78.90:3930: Keepalive watchdog fired. Closing transport."] [thread_id=5]
1215860:[2022/03/01 22:49:11.665 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.119.218:20160: Keepalive watchdog fired. Closing transport."] [thread_id=5]
1270655:[2022/03/01 22:50:22.433 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.78.90:3930: Keepalive watchdog fired. Closing transport."] [thread_id=5]
1306962:[2022/03/01 22:50:53.047 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.119.218:20160: Keepalive watchdog fired. Closing transport."] [thread_id=7310]
1359767:[2022/03/01 22:51:42.781 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.78.90:3930: Keepalive watchdog fired. Closing transport."] [thread_id=6]
1360123:[2022/03/01 22:51:46.666 +08:00] [ERROR] [<unknown>] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line number: 2882, log msg : ipv4:10.233.84.130:20160: Keepalive watchdog fired. Closing transport."] [thread_id=7310]
1388439:[2022/03/01 22:52:26.860 +08:00] [ERROR] [<unknown>] [BaseDaemon:########################################] [thread_id=29592]
1388440:[2022/03/01 22:52:26.860 +08:00] [ERROR] [<unknown>] ["BaseDaemon:(from thread 29591) Received signal Segmentation fault (11)."] [thread_id=29592]
1388441:[2022/03/01 22:52:26.860 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Address: 0x66000"] [thread_id=29592]
1388442:[2022/03/01 22:52:26.860 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Access: read."] [thread_id=29592]
1388443:[2022/03/01 22:52:26.860 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Address not mapped to object."] [thread_id=29592]
1388445:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:0. /tiflash/tiflash() [0x4ec1bb1]"] [thread_id=29592]
1388446:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:1. /lib64/libpthread.so.0(+0xf630) [0x7fad7c48a630]"] [thread_id=29592]
1388447:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:2. /lib64/libgcc_s.so.1(+0xf0f8) [0x7fad7c2740f8]"] [thread_id=29592]
1388448:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:3. /lib64/libgcc_s.so.1(_Unwind_Backtrace+0x69) [0x7fad7c274ff9]"] [thread_id=29592]
1388449:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:4. /tiflash/libtiflash_proxy.so(perf_signal_handler+0x11a) [0x7fad7dc8ef4a]"] [thread_id=29592]
1388450:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:5. /lib64/libpthread.so.0(+0xf630) [0x7fad7c48a630]"] [thread_id=29592]
1388451:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:6. /tiflash/libtiflash_proxy.so(+0x1326741) [0x7fad7e0cb741]"] [thread_id=29592]

@YangKeao
Copy link
Member

YangKeao commented Mar 4, 2022

The first three frames were a little wired, we don't know their symbol:

1388445:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:0. /tiflash/tiflash() [0x4ec1bb1]"] [thread_id=29592]
1388446:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:1. /lib64/libpthread.so.0(+0xf630) [0x7fad7c48a630]"] [thread_id=29592]
1388447:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:2. /lib64/libgcc_s.so.1(+0xf0f8) [0x7fad7c2740f8]"] [thread_id=29592]

Could you upload the libpthread.so.0 and the libgcc_s.so.1, so that I could guess the relate functions 🤔 (but not so sure).

@breezewish
Copy link
Member

@crazycs520 Please follow up this thread, disabling the TiFlash conprof before Conprof is released.

@flowbehappy flowbehappy changed the title TiFlash crash with zto workload TiFlash crash with high pressure write and read workload\ Mar 4, 2022
@flowbehappy flowbehappy changed the title TiFlash crash with high pressure write and read workload\ TiFlash crash with high pressure write and read workload Mar 4, 2022
@SchrodingerZhu
Copy link
Contributor

@YangKeao

The first three frames were a little wired, we don't know their symbol:

1388445:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:0. /tiflash/tiflash() [0x4ec1bb1]"] [thread_id=29592]
1388446:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:1. /lib64/libpthread.so.0(+0xf630) [0x7fad7c48a630]"] [thread_id=29592]
1388447:[2022/03/01 22:52:26.862 +08:00] [ERROR] [<unknown>] ["BaseDaemon:2. /lib64/libgcc_s.so.1(+0xf0f8) [0x7fad7c2740f8]"] [thread_id=29592]

The stacktrace is generated at TiFlash side. The last several frames should be related to TiFlash's own signal handling. _Unwind_Backtrace resulted in segfault, so the signal is then transferred to TiFlash's segfault handler where the stacktrace context is collected. Hence, you can see 0. /tiflash/tiflash() as the top frame.

@flowbehappy flowbehappy changed the title TiFlash crash with high pressure write and read workload TiFlash crash with high pressure write and read workload with conprof Apr 21, 2022
@mornyx
Copy link

mornyx commented May 18, 2022

Should be fixed due to #4871

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants