Validator crashes node with "Unexpected mutation fragment" on range tombstone change of clustering row #10553

fruch · 2022-05-12T05:57:41Z

Installation details

Kernel version: 5.13.0-1022-aws
Scylla version (or git commit hash): 5.1.dev-0.20220504.b26a3da584cc with build-id ab2a33a30756c1513f4c516cd272291e75acec0e
Cluster size: 6 nodes (i3.large)
Scylla running with shards number (live nodes):
longevity-harry-2h-fix-cass-db-node-eddd82cc-1 (16.171.62.87 | 10.0.3.241): 2 shards
longevity-harry-2h-fix-cass-db-node-eddd82cc-2 (13.53.37.177 | 10.0.1.86): 2 shards
longevity-harry-2h-fix-cass-db-node-eddd82cc-3 (13.48.26.58 | 10.0.3.223): 2 shards
longevity-harry-2h-fix-cass-db-node-eddd82cc-4 (13.48.71.161 | 10.0.3.236): 2 shards
longevity-harry-2h-fix-cass-db-node-eddd82cc-5 (16.16.27.153 | 10.0.1.98): 2 shards
longevity-harry-2h-fix-cass-db-node-eddd82cc-6 (13.48.1.47 | 10.0.3.109): 2 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0f0e4c1a732cd9815 (aws: eu-north-1)

Test: longevity-harry-2h-test
Test name: longevity_test.LongevityTest.test_custom_time
Test config file(s):

longevity-harry-2h.yaml

Issue description

While running cassandra-harry (New test for SCT)

after ~1 hour of run it fail with the following failure/abort:

2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !     ERR |  [shard 0] mutation_reader - [validator 0x60000024df18 for sstable writer /var/lib/scylla/data/harry/table0-c767e800d10511ec8ed64b4a3fb655b1/me-6-big-Data.db (harry.table0 c767e800-d105-11ec-8ed6-4b4a3fb655b1)] Unexpected mutation fragment: partition key {key: pk{000800000000e3104d8b01ea5a4848794142646941426469414264694142646941426469587971534a4e4371353131373635363130393130353232323231333131363133393137303231333234353132383135333936323036323332323332323633383230313438313830323231393832323431343830323132323130323336323231313231323031313931313932313430323330323339313633323035313031323238383032363536363532343131303331363038333837313232313539313238313339323131313838313939323331323138353437393234323138373130363333373831373330313337313031323330313538323237363931333732313432313133393538313637313930323438333631303739303536323236323032323439323632323332333533303230323139343232323131303234303230373139353432303833323832313934323533343931313331343631383534313135323831383433333132313136313934333032353131393731393231393031373432333331373331323231343032313231313933343135333934313734313637333332303631313131383632353531313531323333373233303135313535313833313537383631393832313931373131393031313832323436343230373130393231373232343134383139373331323432323535323232303604725a4848794142646941426469414264694142646941426469586d65456e79694d3233353834393735313035323432323039313331303438313430363932303032303831343331313332323331353731373732333431363831363132353134313237313136353539323430313335313232313732373631313231313832353032313731313532333233313137363432343531323838323534313935313136333339323232323533323339313332303032353239373235353139383131303131353732303031363631383234363134303730313531353332323532333131393931343132343731363737363138313733313832323436323434313831313336313031303631393835373131303232313132323531303832313335333535323132313332333331313234373637313337313536393232333033353231383133303231383132393132363139363234343139323136333230353139323131393134313639313735313334323338343732313031323834363139313134373831313932313538333431383732333531393939323234333131363231313535323436323534363532313931393331393037353234353733313535393137323232363233353535313338363232323631323532323031333639373235383932303538333232363233353131333130383232393234393232323133383232373632313733393835363231383531353431373631373831363431353631383331393031303032303632323332313132323733363137363733383437313230313733323035313139313837313739323430373631393732343132353432333431353232323231343136363231333233313831313539333239323133323739393931333430313335373731303831383536313538383732323332383235353531303838383137303235323530313034313134313139313233363832303932353038343432343631393232323631333431313332323231393539323231313130313431313933323432353131333435363138343230373231313636373231303831313431333432323931383932313532353538383636323430353732333431363031323933313630313335313236323334363036303234313234373235303135323335313834323135363832313535313937323039313836313938313531323530363432323031333539393231393235353537343336313131323135383134353736313136313335373632313138323032363431303432323731393232303737313631373332333532313938353134353136303231373732313934323035333932313435383131333237323336343132383139343132313134363632313936373331333132323232303531393932353037303231363132353138383338373739383439313332343132313232323037333630313438313839323139323533323039313930313735333539313931313435313038313436313735353230373234383932323036333235353335323131313637313834393032343231343031333332353332303934323734313638313330323037313535363431383533363935}, token:-7626975985082462929}: 
previous clustering row:{position: clustered,ckp{0008000000fcb636757c000400e11035},0}, current range tombstone change:{position: clustered,ckp{},-1}, 
at: 0x49f54be 0x49f59b0 0x49f5cb8 0x463ee62 0x180b1e4 0x180bb5e 0x1d6089c 0x1c2f2cb 0x1c2cc54 0x1c2aeb7 0x1c21a1a 0x1c1d8a2 0x1c1d133 0x48db8d1
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO | Aborting on shard 0.
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO | Backtrace:
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x465ad78
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x468b792
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x7f490b105a1f
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   /opt/scylladb/libreloc/libc.so.6+0x3d2a1
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   /opt/scylladb/libreloc/libc.so.6+0x268a3
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x463eecb
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x180b1e4
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x180bb5e
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x1d6089c
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x1c2f2cb
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x1c2cc54
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x1c2aeb7
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x1c21a1a
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x1c1d8a2
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x1c1d133
2022-05-11T09:13:00+00:00 longevity-harry-2h-fix-cass-db-node-264c7f6f-1 !    INFO |   0x48db8d1

It's 100% reproducible, failed 3 times on row, with exact same failure.

Ops made by cassandra-harry

CREATE TABLE IF NOT EXISTS harry.table0 (pk0000 bigint,pk0001 ascii,pk0002 ascii,ck0000 bigint,ck0001 float,static0000 double static,static0001 tinyint static,static0002 smallint static,static0003 tinyint static,static0004 tinyint static,regular0000 int,regular0001 smallint,regular0002 int,regular0003 float, PRIMARY KEY ((pk0000,pk0001,pk0002), ck0000, ck0001)) WITH  CLUSTERING ORDER BY (ck0000 ASC,ck0001 ASC);

Example of queries used to insert/update data:

LTS: 1986999. Pd -2097264832740263531. Cd 6919541666608472860. M 2. OpId: 5 Statement CompiledStatement{cql='UPDATE harry.table0 USING TIMESTAMP 1652289010295092 SET regular0000 = ?, regular0002 = ? WHERE pk0000 = ? AND pk0001 = ? AND pk0002 = ? AND ck0000 = ? AND ck0001 = ?;', bindings=1424096100,32238882,1659176127L,"ZHHyABdiABdiABdiABdiABdiEQRqdTHi2132017010324848531411821057234186170229421272554238140196211109701540249228117282318523725516170681891564517781160513918564100771825315719529351672321561247233130150232492321628422123106115147198155462471061923525211223937223632431611392282332431965192154125631271952551956276515011122477449687165249144691196518817659230196231202","ZHHyABdiABdiABdiABdiABdiygANchTA2382027819172111122242156117067118244207771331112306898145402077441921091032081353223716121418651751923811120243143128150213246115196997209587568711892011414416815534111107159521117620941931812381932316487392181641601003620930619813425514152461392427715383169205245014121793348211524420367157964209591381605621523111816524090221245306019216162692139615814258601765694231121158197208160392531251699318014994145210243235213111411721088176171575643180223759218241982546413010207392897921701510223819143991583945321661688913941613336203225169637040149167243163194262187015822318822363921932181130192242190144119824617014087311312187103192178401559837632066420919860128615410460206121114159268818639891013514624631262151520151472278816892081111591058231053814728165289622418201315518920517916816715610550154179161522498139",962192636934L,(float)1.6749614E-38}
LTS: 1986999. Pd -2097264832740263531. Cd 6591965067128628690. M 3. OpId: 6 Statement CompiledStatement{cql='INSERT INTO harry.table0 (pk0000,pk0001,pk0002,ck0000,ck0001,regular0002) VALUES (?, ?, ?, ?, ?, ?) USING TIMESTAMP 1652289010295092;', bindings=1659176127L,"ZHHyABdiABdiABdiABdiABdiEQRqdTHi2132017010324848531411821057234186170229421272554238140196211109701540249228117282318523725516170681891564517781160513918564100771825315719529351672321561247233130150232492321628422123106115147198155462471061923525211223937223632431611392282332431965192154125631271952551956276515011122477449687165249144691196518817659230196231202","ZHHyABdiABdiABdiABdiABdiygANchTA2382027819172111122242156117067118244207771331112306898145402077441921091032081353223716121418651751923811120243143128150213246115196997209587568711892011414416815534111107159521117620941931812381932316487392181641601003620930619813425514152461392427715383169205245014121793348211524420367157964209591381605621523111816524090221245306019216162692139615814258601765694231121158197208160392531251699318014994145210243235213111411721088176171575643180223759218241982546413010207392897921701510223819143991583945321661688913941613336203225169637040149167243163194262187015822318822363921932181130192242190144119824617014087311312187103192178401559837632066420919860128615410460206121114159268818639891013514624631262151520151472278816892081111591058231053814728165289622418201315518920517916816715610550154179161522498139",942667550085L,(float)2.3179071E-38,1840112489}

full log of all operation cassandra-harry was doing: (it's deflated to 22Gb):
https://cloudius-jenkins-test.s3.amazonaws.com/77d9f946-9eff-455e-ba63-e4211ff9d8e0/20220511_183754/operation.log.tar.gz

Coredump:

2022-05-11 22:20:44.089 <2022-05-11 22:16:52.000>: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=e379b5cc-3c96-4aa7-a2a2-65f5c79ae473 node=Node longevity-harry-2h-fix-cass-db-node-eddd82cc-1 [16.171.62.87 | 10.0.3.241] (seed: True)
corefile_url=https://storage.cloud.google.com/upload.scylladb.com/core.scylla.113.19b4df5c9a4a4b9e92f475f4af343758.16904.1652307412000000000000/core.scylla.113.19b4df5c9a4a4b9e92f475f4af343758.16904.1652307412000000000000.gz
backtrace=           PID: 16904 (scylla)
UID: 113 (scylla)
GID: 119 (scylla)
Signal: 6 (ABRT)
Timestamp: Wed 2022-05-11 22:16:52 UTC (2min 1s ago)
Command Line: /usr/bin/scylla --blocked-reactor-notify-ms 100 --abort-on-lsa-bad-alloc 1 --abort-on-seastar-bad-alloc --abort-on-internal-error 1 --abort-on-ebadf 1 --enable-sstable-key-validation 1 --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 0-1 --lock-memory=1
Executable: /opt/scylladb/libexec/scylla
Control Group: /scylla.slice/scylla-server.slice/scylla-server.service
Unit: scylla-server.service
Slice: scylla-server.slice
Boot ID: 19b4df5c9a4a4b9e92f475f4af343758
Machine ID: 3415a6f419fe479a89ebb7cce7e15f2e
Hostname: longevity-harry-2h-fix-cass-db-node-eddd82cc-1
Storage: /var/lib/systemd/coredump/core.scylla.113.19b4df5c9a4a4b9e92f475f4af343758.16904.1652307412000000000000
Message: Process 16904 (scylla) of user 113 dumped core.
Stack trace of thread 16905:
#0  0x00007fbda50782a2 raise (libc.so.6 + 0x3d2a2)
#1  0x00007fbda5061950 abort (libc.so.6 + 0x26950)
#2  0x000000000463eecc _ZN7seastar17on_internal_errorERNS_6loggerESt17basic_string_viewIcSt11char_traitsIcEE (scylla + 0x443eecc)
#3  0x000000000180b1e5 _ZN12_GLOBAL__N_119on_validation_errorERN7seastar6loggerERKNS0_13basic_sstringIcjLj15ELb1EEE (scylla + 0x160b1e5)
#4  0x000000000180bb5f _ZN42mutation_fragment_stream_validating_filterclEN20mutation_fragment_v24kindE26position_in_partition_view (scylla + 0x160bb5f)
#5  0x0000000001d6089d _ZN8sstables14sstable_writer7consumeEO22range_tombstone_change (scylla + 0x1b6089d)
#6  0x0000000001c2f2cc _ZN22compact_mutation_stateIL19emit_only_live_rows0EL20compact_for_sstables1EE10do_consumeIN8sstables26compacted_fragments_writerE33noop_compacted_fragments_consumerEEN7seastar10bool_classINS7_18stop_iteration_tagEEEO22range_tombstone_changeRT_RT0_ (scylla + 0x1a2f2cc)
#7  0x0000000001c2cc55 _ZN23flat_mutation_reader_v24impl26consume_pausable_in_threadISt17reference_wrapperINS0_16consumer_adapterI25compact_for_compaction_v2IN8sstables26compacted_fragments_writerE33noop_compacted_fragments_consumerEEEENS_9no_filterEEEvT_T0_ (scylla + 0x1a2cc55)
#8  0x0000000001c2aeb8 _ZN23flat_mutation_reader_v217consume_in_threadI25compact_for_compaction_v2IN8sstables26compacted_fragments_writerE33noop_compacted_fragments_consumerENS_9no_filterEEEDaT_T0_ (scylla + 0x1a2aeb8)
#9  0x0000000001c21a1b _ZN23flat_mutation_reader_v217consume_in_threadI25compact_for_compaction_v2IN8sstables26compacted_fragments_writerE33noop_compacted_fragments_consumerEEEDaT_ (scylla + 0x1a21a1b)
#10 0x0000000001c1d8a3 _ZZZN8sstables10compaction7consumeEvENUl23flat_mutation_reader_v2E_clES1_ENUlvE_clEv (scylla + 0x1a1d8a3)
#11 0x0000000001c1d134 _ZN7seastar20noncopyable_functionIFvvEE17direct_vtable_forIZNS_5asyncIZZN8sstables10compaction7consumeEvENUl23flat_mutation_reader_v2E_clES7_EUlvE_JEEENS_8futurizeINSt13invoke_resultIT_JDpT0_EE4typeEE4typeENS_17thread_attributesEOSC_DpOSD_EUlvE_E4callEPKS2_ (scylla + 0x1a1d134)
#12 0x00000000048db8d2 _ZN7seastar14thread_context4mainEv (scylla + 0x46db8d2)
Stack trace of thread 16907:
#0  0x00007fbda5c3994c read (libpthread.so.0 + 0x1294c)
#1  0x00000000046ae285 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x44ae285)
#2  0x00000000046ae5c0 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x44ae5c0)
#3  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#4  0x00007fbda5c302a5 start_thread (libpthread.so.0 + 0x92a5)
#5  0x00007fbda513b323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 16906:
#0  0x00007fbda5c3994c read (libpthread.so.0 + 0x1294c)
#1  0x00000000046ae285 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x44ae285)
#2  0x00000000046ae5c0 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x44ae5c0)
#3  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#4  0x00007fbda5c302a5 start_thread (libpthread.so.0 + 0x92a5)
#5  0x00007fbda513b323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 16904:
#0  0x000000000143a991 _ZNK16compound_wrapperI21clustering_key_prefix26clustering_key_prefix_viewE4sizeERK6schema (scylla + 0x123a991)
#1  0x00000000014393e7 _ZNK10bound_view11tri_compareclERK21clustering_key_prefixiS3_i (scylla + 0x12393e7)
#2  0x0000000001789756 _ZN22mutation_reader_mergerclEv (scylla + 0x1589756)
#3  0x000000000178937f _ZN22mutation_reader_mergerclEv (scylla + 0x158937f)
#4  0x0000000001794afb _ZZN14merging_readerI22mutation_reader_mergerE11fill_bufferEvENKUlvE_clEv (scylla + 0x1594afb)
#5  0x0000000001793c1b _ZN14merging_readerI22mutation_reader_mergerE11fill_bufferEv (scylla + 0x1593c1b)
#6  0x0000000001c2ca26 _ZN23flat_mutation_reader_v24impl26consume_pausable_in_threadISt17reference_wrapperINS0_16consumer_adapterI25compact_for_compaction_v2IN8sstables26compacted_fragments_writerE33noop_compacted_fragments_consumerEEEENS_9no_filterEEEvT_T0_ (scylla + 0x1a2ca26)
#7  0x0000000001c2aeb8 _ZN23flat_mutation_reader_v217consume_in_threadI25compact_for_compaction_v2IN8sstables26compacted_fragments_writerE33noop_compacted_fragments_consumerENS_9no_filterEEEDaT_T0_ (scylla + 0x1a2aeb8)
#8  0x0000000001c21a1b _ZN23flat_mutation_reader_v217consume_in_threadI25compact_for_compaction_v2IN8sstables26compacted_fragments_writerE33noop_compacted_fragments_consumerEEEDaT_ (scylla + 0x1a21a1b)
#9  0x0000000001c1d8a3 _ZZZN8sstables10compaction7consumeEvENUl23flat_mutation_reader_v2E_clES1_ENUlvE_clEv (scylla + 0x1a1d8a3)
#10 0x0000000001c1d134 _ZN7seastar20noncopyable_functionIFvvEE17direct_vtable_forIZNS_5asyncIZZN8sstables10compaction7consumeEvENUl23flat_mutation_reader_v2E_clES7_EUlvE_JEEENS_8futurizeINSt13invoke_resultIT_JDpT0_EE4typeEE4typeENS_17thread_attributesEOSC_DpOSD_EUlvE_E4callEPKS2_ (scylla + 0x1a1d134)
#11 0x00000000048db8d2 _ZN7seastar14thread_context4mainEv (scylla + 0x46db8d2)
download_instructions=gsutil cp gs://[upload.scylladb.com/core.scylla.113.19b4df5c9a4a4b9e92f475f4af343758.16904.1652307412000000000000/core.scylla.113.19b4df5c9a4a4b9e92f475f4af343758.16904.1652307412000000000000.gz](http://upload.scylladb.com/core.scylla.113.19b4df5c9a4a4b9e92f475f4af343758.16904.1652307412000000000000/core.scylla.113.19b4df5c9a4a4b9e92f475f4af343758.16904.1652307412000000000000.gz) .
gunzip /var/lib/systemd/coredump/core.scylla.113.19b4df5c9a4a4b9e92f475f4af343758.16904.1652307412000000000000.gz

Restore Monitor Stack command: $ hydra investigate show-monitor eddd82cc-d745-4a4f-afc2-d8ab979c84aa
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs eddd82cc-d745-4a4f-afc2-d8ab979c84aa

Test id: eddd82cc-d745-4a4f-afc2-d8ab979c84aa

Logs

grafana - https://cloudius-jenkins-test.s3.amazonaws.com/eddd82cc-d745-4a4f-afc2-d8ab979c84aa/20220511_235515/grafana-screenshot-longevity-harry-2h-test-scylla-per-server-metrics-nemesis-20220511_235639-longevity-harry-2h-fix-cass-monitor-node-eddd82cc-1.png
grafana - https://cloudius-jenkins-test.s3.amazonaws.com/eddd82cc-d745-4a4f-afc2-d8ab979c84aa/20220511_235515/grafana-screenshot-overview-20220511_235515-longevity-harry-2h-fix-cass-monitor-node-eddd82cc-1.png
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/eddd82cc-d745-4a4f-afc2-d8ab979c84aa/20220512_000756/db-cluster-eddd82cc.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/eddd82cc-d745-4a4f-afc2-d8ab979c84aa/20220512_000756/loader-set-eddd82cc.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/eddd82cc-d745-4a4f-afc2-d8ab979c84aa/20220512_000756/monitor-set-eddd82cc.tar.gz
sct - https://cloudius-jenkins-test.s3.amazonaws.com/eddd82cc-d745-4a4f-afc2-d8ab979c84aa/20220512_000756/sct-runner-eddd82cc.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/eddd82cc-d745-4a4f-afc2-d8ab979c84aa/20220512_000756/sct-runner-eddd82cc.tar.gz

Jenkins job URL

The text was updated successfully, but these errors were encountered:

fruch · 2022-05-12T15:36:50Z

Running same test case with 5.0.rc4, passed successfully:

Restore Monitor Stack command: $ hydra investigate show-monitor 566c3e60-e647-46ec-b6d1-4a57f7e84b93
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 566c3e60-e647-46ec-b6d1-4a57f7e84b93

Test id: 566c3e60-e647-46ec-b6d1-4a57f7e84b93

Logs:
grafana - https://cloudius-jenkins-test.s3.amazonaws.com/566c3e60-e647-46ec-b6d1-4a57f7e84b93/20220512_145739/grafana-screenshot-longevity-harry-2h-test-scylla-per-server-metrics-nemesis-20220512_145859-longevity-harry-2h-fix-cass-monitor-node-566c3e60-1.png
grafana - https://cloudius-jenkins-test.s3.amazonaws.com/566c3e60-e647-46ec-b6d1-4a57f7e84b93/20220512_145739/grafana-screenshot-overview-20220512_145740-longevity-harry-2h-fix-cass-monitor-node-566c3e60-1.png
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/566c3e60-e647-46ec-b6d1-4a57f7e84b93/20220512_150707/db-cluster-566c3e60.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/566c3e60-e647-46ec-b6d1-4a57f7e84b93/20220512_150707/loader-set-566c3e60.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/566c3e60-e647-46ec-b6d1-4a57f7e84b93/20220512_150707/monitor-set-566c3e60.tar.gz
sct - https://cloudius-jenkins-test.s3.amazonaws.com/566c3e60-e647-46ec-b6d1-4a57f7e84b93/20220512_150707/sct-runner-566c3e60.tar.gz

Jenkins job URL

slivne · 2022-05-15T11:27:36Z

We need a reproducer with a system up after this occurs - its a corruption - https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/longevity-harry-2h-test/7/

fruch · 2022-05-15T11:56:28Z

We need a reproducer with a system up after this occurs - its a corruption - https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/longevity-harry-2h-test/7/

here job (set to keep all the instances):
https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/longevity-harry-2h-test/9/

takes ~1.5 hour for it to get to it.

fruch · 2022-05-15T14:08:27Z

crashes has start for whom want to take a look: http://13.49.225.42:3000/d/alternator-master/longevity-harry-2h-test-scylla-per-server-metrics-nemesis-master?orgId=1

fruch · 2022-05-17T06:59:05Z

crashes has start for whom want to take a look: http://13.49.225.42:3000/d/alternator-master/longevity-harry-2h-test-scylla-per-server-metrics-nemesis-master?orgId=1

I've kill the cluster in staging (cause seems, no on it looking at).

here's a link to start the reproducer when needed (it takes ~1.5h to get to the coredump point):
https://jenkins.scylladb.com/view/master/job/scylla-master/job/reproducers/job/longevity-harry-2h-test/parambuild/?scylla_version=master:latest&post_behavior_db_nodes=keep&post_behavior_monitor_nodes=keep&post_behavior_loader_nodes=keep&provision_type=on_demand

bhalevy · 2022-05-22T16:36:02Z

@mikolajsieluzycki please look into this.
Is this the same validation error you hit in unit testing?

mikolajsieluzycki · 2022-05-23T05:49:05Z

@bhalevy Seems like exactly the same error

mikolajsieluzycki · 2022-05-23T16:21:57Z

@fruch I've created a PR #10643 for an error that happened during my unit testing for an unrelated change that produces the same exception. What would be the easies way to verify that PR fixes this issue as well?

fruch · 2022-05-24T06:11:18Z

@fruch I've created a PR #10643 for an error that happened during my unit testing for an unrelated change that produces the same exception. What would be the easies way to verify that PR fixes this issue as well?

Having an AMI or RPMs with this fix.

@benipeled is the build in PRs create RPMs ? Is it being upload to S3 ?

Also can you point @mikolajsieluzycki to the jobs that should build him RPMs or AMIs from forks ?

benipeled · 2022-05-24T07:22:28Z

@benipeled is the build in PRs create RPMs ? Is it being upload to S3 ?

The CI job doesn't archive RPMs, but logs (build & tests),
There is an open issue for archiving some artifacts - https://github.com/scylladb/scylla-pkg/issues/2893

Also can you point @mikolajsieluzycki to the jobs that should build him RPMs or AMIs from forks ?

BYO can be used for building RPM and AMI from a fork - https://jenkins.scylladb.com/view/master/job/scylla-master/job/byo/job/byo_build_tests_dtest/

bhalevy · 2022-06-08T13:10:15Z

@mikolajsieluzycki / @fruch can we close this issue with #10643?

mikolajsieluzycki · 2022-06-08T13:34:45Z

Waiting for https://jenkins.scylladb.com/view/master/job/scylla-master/job/reproducers/job/longevity-harry-2h-test/lastBuild/console to finish (hopefully kicked it off correctly). According to the description the error should show up after 1.5h. It's over 2h since start so I'm cautiously optimistic.

mikolajsieluzycki · 2022-06-08T15:29:43Z

The test finished successfully on master, I think it can be closed.

bhalevy · 2022-06-09T12:30:49Z

@fruch please consider closing this issue as per the above

fruch · 2022-06-09T14:43:42Z

If that test passed with master, then yes, closing this one

fruch mentioned this issue May 12, 2022

fix(cassandra-harry): install from upstream master branch scylladb/scylla-cluster-tests#4707

Merged

7 tasks

fruch added the triage/master Looking for assignee label May 12, 2022

roydahan added the master/high label May 12, 2022

slivne added this to the 5.1 milestone May 15, 2022

slivne assigned bhalevy May 15, 2022

slivne added showstopper bug and removed triage/master Looking for assignee labels May 15, 2022

fruch mentioned this issue May 18, 2022

Missing key in cassandra-harry verification while doing rolling restart of the cluster #10598

Closed

bhalevy assigned mikolajsieluzycki May 22, 2022

fruch mentioned this issue Jun 7, 2022

fix(coredump): change the upload url to use https scylladb/scylla-cluster-tests#4871

Merged

7 tasks

fruch closed this as completed Jun 9, 2022

DoronArazii modified the milestones: 5.1, 5.0 Jul 7, 2022

DoronArazii removed this from the 5.0 milestone Nov 8, 2022

DoronArazii added this to the 5.1 milestone Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validator crashes node with "Unexpected mutation fragment" on range tombstone change of clustering row #10553

Validator crashes node with "Unexpected mutation fragment" on range tombstone change of clustering row #10553

fruch commented May 12, 2022

fruch commented May 12, 2022

slivne commented May 15, 2022

fruch commented May 15, 2022

fruch commented May 15, 2022

fruch commented May 17, 2022

bhalevy commented May 22, 2022

mikolajsieluzycki commented May 23, 2022

mikolajsieluzycki commented May 23, 2022

fruch commented May 24, 2022

benipeled commented May 24, 2022

bhalevy commented Jun 8, 2022

mikolajsieluzycki commented Jun 8, 2022

mikolajsieluzycki commented Jun 8, 2022

bhalevy commented Jun 9, 2022

fruch commented Jun 9, 2022

Validator crashes node with "Unexpected mutation fragment" on range tombstone change of clustering row #10553

Validator crashes node with "Unexpected mutation fragment" on range tombstone change of clustering row #10553

Comments

fruch commented May 12, 2022

Installation details

Issue description

Ops made by cassandra-harry

Coredump:

Logs

fruch commented May 12, 2022

slivne commented May 15, 2022

fruch commented May 15, 2022

fruch commented May 15, 2022

fruch commented May 17, 2022

bhalevy commented May 22, 2022

mikolajsieluzycki commented May 23, 2022

mikolajsieluzycki commented May 23, 2022

fruch commented May 24, 2022

benipeled commented May 24, 2022

bhalevy commented Jun 8, 2022

mikolajsieluzycki commented Jun 8, 2022

mikolajsieluzycki commented Jun 8, 2022

bhalevy commented Jun 9, 2022

fruch commented Jun 9, 2022