-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault and core dump during restart and resharding #9609
Comments
Another segmentation fault and core dump happened later on during enospc nemesis on node 10:
|
Reproduced in next run of 50gb longevity as well:
2021-11-14 13:08:37.761 <2021-11-14 13:07:09.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=23badec6-5f41-4e21-bc76-e6cfe6ff55c7: type=SEGMENTATION regex=segmentation line_number=670467 node=Node longevity-tls-50gb-3d-master-db-node-1e8c5caf-7 [13.53.182.234 | 10.0.2.237] (seed: False)
|
Installation details
Kernel version:
5.4.0-1035-aws
Scylla version (or git commit hash):
4.6.dev-0.20211101.9e8fc6358 with build-id 62d8c292f9ddc15e61ebd6cd609ac91b80e1532a
Cluster size: 6 nodes (i3.4xlarge)
Scylla running with shards number (live nodes):
longevity-tls-50gb-3d-master-db-node-c5e416b4-1 (13.51.174.230 | 10.0.3.7): 14 shards
longevity-tls-50gb-3d-master-db-node-c5e416b4-3 (13.53.106.2 | 10.0.3.237): 14 shards
longevity-tls-50gb-3d-master-db-node-c5e416b4-5 (13.51.64.197 | 10.0.2.160): 14 shards
longevity-tls-50gb-3d-master-db-node-c5e416b4-7 (13.49.75.142 | 10.0.2.234): 14 shards
longevity-tls-50gb-3d-master-db-node-c5e416b4-11 (13.51.48.138 | 10.0.0.54): 14 shards
longevity-tls-50gb-3d-master-db-node-c5e416b4-12 (13.48.70.142 | 10.0.3.91): 14 shards
Scylla running with shards number (terminated nodes):
longevity-tls-50gb-3d-master-db-node-c5e416b4-6 (16.170.157.20 | 10.0.3.19): 14 shards
longevity-tls-50gb-3d-master-db-node-c5e416b4-2 (13.49.72.45 | 10.0.3.58): 14 shards
longevity-tls-50gb-3d-master-db-node-c5e416b4-4 (13.49.76.98 | 10.0.3.158): 14 shards
longevity-tls-50gb-3d-master-db-node-c5e416b4-8 (13.53.36.23 | 10.0.3.76): 14 shards
longevity-tls-50gb-3d-master-db-node-c5e416b4-10 (13.51.157.60 | 10.0.0.137): 14 shards
longevity-tls-50gb-3d-master-db-node-c5e416b4-9 (13.51.86.152 | 10.0.3.213): 14 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI):
ami-01ab7ab1a92d26fed
(aws: eu-north-1)Test:
longevity-50gb-3days
Test name:
longevity_test.LongevityTest.test_custom_time
Test config file(s):
Issue description
====================================
scenario:
run restart scylla with resharding on node 2.
node 2 got segmentation fault and core dump.
error details:
Download core dump by:
gsutil cp gs://upload.scylladb.com/core.scylla.113.4136e1ecff8847b3970f04a50bd543ec.12341.1636099593000000000000/core.scylla.113.4136e1ecff8847b3970f04a50bd543ec.12341.1636099593000000000000.gz .
gunzip /var/lib/systemd/coredump/core.scylla.113.4136e1ecff8847b3970f04a50bd543ec.12341.1636099593000000000000.gz
====================================
Restore Monitor Stack command:
$ hydra investigate show-monitor c5e416b4-63c6-458e-b8d2-87bef019c6fc
Show all stored logs command:
$ hydra investigate show-logs c5e416b4-63c6-458e-b8d2-87bef019c6fc
Test id:
c5e416b4-63c6-458e-b8d2-87bef019c6fc
Logs:
critical - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/critical.log.tar.gz
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/db-cluster-c5e416b4.tar.gz
debug - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/debug.log.tar.gz
email_data - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/email_data.json.tar.gz
error - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/error.log.tar.gz
event - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/events.log.tar.gz
left_processes - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/left_processes.log.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/loader-set-c5e416b4.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/monitor-set-c5e416b4.tar.gz
normal - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/normal.log.tar.gz
output - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/output.log.tar.gz
event - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/raw_events.log.tar.gz
summary - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/summary.log.tar.gz
warning - https://cloudius-jenkins-test.s3.amazonaws.com/c5e416b4-63c6-458e-b8d2-87bef019c6fc/20211107_062411/warning.log.tar.gz
Jenkins job URL
The text was updated successfully, but these errors were encountered: