Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Compact table and compaction causing crash loop. #8071

Closed
1 task done
zooptwopointone opened this issue Feb 11, 2021 · 17 comments
Closed
1 task done

Unable to Compact table and compaction causing crash loop. #8071

zooptwopointone opened this issue Feb 11, 2021 · 17 comments

Comments

@zooptwopointone
Copy link

  • I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

Installation details
Scylla version 4.2.1-0.20201108.4fb8ebccff with build-id d19a95fd85a8e7df2928ccd8c729941f263e8adc
Cluster size: 4 Datacenters (3 x 3 nodes 1 x 7 nodes)
OS (RHEL/CentOS/Ubuntu/AWS AMI): 18.04.5 LTS (GNU/Linux 4.15.0-128-generic x86_64)

Platform (physical/VM/cloud instance type/docker): Physical
Hardware: sockets= 2 cores=8 hyperthreading= 32 memory= 128 GB
Disks: (SSD/HDD, count) 8 SSD

I have uploaded a second of Syslog here 815ea449-6ab3-4c88-9914-1a0e14162d6f
What this is is the logs after setting nodetool disableautocompaction for the table. And then running a forced compaction for that table after about 10 minutes to make sure that any waiting retry compactions were not going to run.

I am posting this as the same results happen from the auto running compactions.

So some history of what this is. This table has about 3 billion rows. Table design is like this:
CREATE TABLE lnp.lrn (
country_code text,
did text,
version int,
common_name text,
company text,
country text,
dst text,
last_updated timestamp,
latitude text,
longitude text,
lrn text,
ocn text,
rc text,
state text,
type text,
tz text,
zip text,
zip2 text,
zip3 text,
zip4 text,
PRIMARY KEY ((country_code, did, version))

So for this table I started a job to clean out some old unused data. related to the version column which would have deleted two thirds of the data so roughly 2 billion rows. This eventually started causing problems with the cluster. 2 servers started getting into a Crash loop. Sometimes they would start and stay on for about 30 minutes but only to crash again. Once I kinda figured out I could disable the auto compaction that cleared up all the exceptions and stopped the system from crashing. So I would assume this is all because of the amount of Tombstones that this would have created. I did modify the gc_grace_seconds to 2 hours and this is where I was attempting the Major compactions to see if it could clear up some of them. But it would seem the larger sstable files it is just unable to do anything with.

I was trying to search around if there was any type of offline compaction option. Or if there was any type of option to extend the Reactor shard timeout on the node so it could just get the job complete.
I found something about it, but didn't understand how to do the config changes. #4559 #2689

I'll upload more tracebacks etc from during the time of the problem in hopes you might find other bugs that could be fixed. But might be a bit of a Jumble of logs. Will add it in an additional comment.

@zooptwopointone
Copy link
Author

Other log info f727da6c-f4b1-43d5-bdbd-4cfdf354e824

@raphaelsc
Copy link
Member

log shows you ran out of memory (OOM). when reading from a table with a huge number of tombstones, it also contributes massively with OOM (#7689). I think compaction is also potentially accumulating lots of tombstones in memory until partition ends. If that's the case, we should probably deoverlap them to help with this. @denesb

which compaction strategy are you using? perhaps tombstone compaction may help with this, will not solve the problem completely but will purge tombstones which shadowed data is already gone, which may increase chances of success for the next procedures.

@zooptwopointone
Copy link
Author

zooptwopointone commented Feb 11, 2021

Ah sorry default compaction stratengy.

FYI server has 64 Gig of ram I started it with reserve 5G for OS(while the system was in a loop)

here is the rest of the table info:
WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 7200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

@zooptwopointone
Copy link
Author

I have uploaded a listing of one of the directories that came up a log for that table.

6b00b903-6d39-4a93-bcd0-e1c9c7aa89c8

@zooptwopointone
Copy link
Author

Follow up with this issue, is how to handle deletes like this. this table will commonly have all of the data replaced ever so often. I have other tables as well where TTLs are there which seems to recently started to have some of the same issues. I know setting a lower gc_grace_seconds and compacting is a thing to do, but as I ran into this problem I can get through deleting data without causing an outage. do I need to keep track of only deleting some chunk of data and making sure to manually compact it out before continuing? Really just looking for what is a work around to getting things deleted. where i know it is going to cause a large amount of tombstones. (and in this case technically I don't care about resurrection)

@slivne
Copy link
Contributor

slivne commented Feb 14, 2021

Based on the schema - partitions are not multiple rows ((country_code, did, version)) - so why do we need to accumelate anything - we can "simply" write a partition or tombstone every time.

@raphaelsc ?

@zooptwopointone
Copy link
Author

I really would like some help on this. as it has been quite a while now. I have upgrade the cluster it is running a 4.3 version. if I allow the auto compaction of this table it just crashes the server. If I run a manual compaction it tries to compact all partitions or maybe all shards at the same time causing more problems. I can provide stables files or whatever you need, but I would think this is not a isolated issue. For anyone who has a table with large amounts of deletes. It would seem that maybe compaction should have a concurrancy option or for larger partitions maybe not loading it into memory? I am not sure what it is doing be I get a wall of exceptions about unable to alloc when this is trying to compact. Please let me know what I can provide.

@zooptwopointone
Copy link
Author

Cluster is now running Scylla version 4.3.2-0.20210301.5cdc1fa66 with build-id 3a7f9a6b65ec73bba0a38f3f349a91c49129c598 starting ...

@raphaelsc
Copy link
Member

Cluster is now running Scylla version 4.3.2-0.20210301.5cdc1fa66 with build-id 3a7f9a6b65ec73bba0a38f3f349a91c49129c598 starting ...

I think you may be affected by a known bug where OOM happens if sstable parser bumps into a large run of tombstones.

Could you please run the following script of mine against the table which cause this issue?
find it here: https://gist.githubusercontent.com/raphaelsc/eaf0ae9352a362df91d1c82f1fff3579/raw/a875ab33d7d6b8ca2341810d2b4a36808da973da/describe_sstable_layout.py

usage: descript_sstable_layout.py /path/to/table/dir smp_count
where smp_count is equal to the number of cpus made available to scylla (can be found in /proc/scylla_pid/cmdline)

@raphaelsc
Copy link
Member

@zooptwopointone could you please share the cql statement used to perform the deletion?

@zooptwopointone
Copy link
Author

--- SHARD #0 ---
[Run 1629847460834668108110966181638552143826: size: 54.1GiB, partitions: 392833345 rows: 386598614, tombstones: 6239318
{ md-118824-big-Scylla.db: size: 54.1GiB, partitions: 392833345, rows: 386598614, tombstones: 6239318 }
]
[Run 1290312017309147348211643449084511099976: size: 10.2GiB, partitions: 73624891 rows: 73542155, tombstones: 83714
{ md-122432-big-Scylla.db: size: 10.2GiB, partitions: 73624891, rows: 73542155, tombstones: 83714 }
]
[Run 130694075447709078939385396070608123114: size: 6.4GiB, partitions: 58633920 rows: 43543487, tombstones: 15147474
{ md-122424-big-Scylla.db: size: 6.4GiB, partitions: 58633920, rows: 43543487, tombstones: 15147474 }
]
[Run 734606037436270548311064185592928655076: size: 388.2MiB, partitions: 2845192 rows: 2788485, tombstones: 89858
{ md-122328-big-Scylla.db: size: 388.2MiB, partitions: 2845192, rows: 2788485, tombstones: 89858 }
]
[Run 1158624079986172213210863975320215466456: size: 59.8MiB, partitions: 487386 rows: 463689, tombstones: 24081
{ md-122560-big-Scylla.db: size: 59.8MiB, partitions: 487386, rows: 463689, tombstones: 24081 }
]
[Run 880943740974152139412368994826852438203: size: 21.4MiB, partitions: 175842 rows: 158919, tombstones: 17428
{ md-122608-big-Scylla.db: size: 21.4MiB, partitions: 175842, rows: 158919, tombstones: 17428 }
]
[Run 1443203490893098462511277349266388389578: size: 20.2MiB, partitions: 165242 rows: 147060, tombstones: 20333
{ md-122624-big-Scylla.db: size: 20.2MiB, partitions: 165242, rows: 147060, tombstones: 20333 }
]
[Run 1459276387163290339312301942466109385057: size: 255.4KiB, partitions: 2075 rows: 2075, tombstones: 0
{ md-122616-big-Scylla.db: size: 255.4KiB, partitions: 2075, rows: 2075, tombstones: 0 }
]
[Run 672208818542177701513405421118730538828: size: 228.0B, partitions: 1 rows: 1, tombstones: 0
{ md-122600-big-Scylla.db: size: 228.0B, partitions: 1, rows: 1, tombstones: 0 }
]
--- SHARD #1 ---
[Run 1277063588840428724011428495406058663061: size: 64.9GiB, partitions: 461784728 rows: 461770434, tombstones: 18921
{ md-122457-big-Scylla.db: size: 64.9GiB, partitions: 461784728, rows: 461770434, tombstones: 18921 }
]
[Run 175416787535687080410886451533240324530: size: 3.8GiB, partitions: 28295382 rows: 28095787, tombstones: 284249
{ md-122425-big-Scylla.db: size: 3.8GiB, partitions: 28295382, rows: 28095787, tombstones: 284249 }
]
[Run 331169729471619192812997451582810458081: size: 779.2MiB, partitions: 5716040 rows: 5465464, tombstones: 267107
{ md-122441-big-Scylla.db: size: 779.2MiB, partitions: 5716040, rows: 5465464, tombstones: 267107 }
]
[Run 56095715416803038010695744514025144608: size: 140.5MiB, partitions: 1659646 rows: 1002818, tombstones: 657549
{ mc-115697-big-Scylla.db: size: 140.5MiB, partitions: 1659646, rows: 1002818, tombstones: 657549 }
]
[Run 386296719664967052013280871107876300802: size: 17.0MiB, partitions: 144547 rows: 128360, tombstones: 16282
{ md-122545-big-Scylla.db: size: 17.0MiB, partitions: 144547, rows: 128360, tombstones: 16282 }
]
[Run 1722354514646081688812629428753267476915: size: 11.1MiB, partitions: 89305 rows: 85964, tombstones: 3341
{ md-122681-big-Scylla.db: size: 11.1MiB, partitions: 89305, rows: 85964, tombstones: 3341 }
]
[Run 1493016536561590354313180760630784604263: size: 10.9MiB, partitions: 93943 rows: 81917, tombstones: 12026
{ md-122609-big-Scylla.db: size: 10.9MiB, partitions: 93943, rows: 81917, tombstones: 12026 }
]
[Run 1294266537394728469013766486273626577892: size: 9.9MiB, partitions: 81760 rows: 76605, tombstones: 5155
{ md-122633-big-Scylla.db: size: 9.9MiB, partitions: 81760, rows: 76605, tombstones: 5155 }
]
[Run 1618233076837399224711668363939494948562: size: 8.3MiB, partitions: 74312 rows: 59413, tombstones: 16643
{ md-122657-big-Scylla.db: size: 8.3MiB, partitions: 74312, rows: 59413, tombstones: 16643 }
]
[Run 286692062231522802312177893689839259363: size: 180.3KiB, partitions: 1455 rows: 1455, tombstones: 0
{ md-122641-big-Scylla.db: size: 180.3KiB, partitions: 1455, rows: 1455, tombstones: 0 }
]
[Run 1726426611370591960912553415087895405180: size: 126.9KiB, partitions: 1635 rows: 1635, tombstones: 0
{ md-122673-big-Scylla.db: size: 126.9KiB, partitions: 1635, rows: 1635, tombstones: 0 }
]
[Run 773302378121070318613355255736626027921: size: 116.8KiB, partitions: 991 rows: 991, tombstones: 0
{ md-122665-big-Scylla.db: size: 116.8KiB, partitions: 991, rows: 991, tombstones: 0 }
]
[Run 1811919085557620876910949609244711410523: size: 108.6KiB, partitions: 914 rows: 914, tombstones: 0
{ md-122617-big-Scylla.db: size: 108.6KiB, partitions: 914, rows: 914, tombstones: 0 }
]
[Run 151022660946528080412881549664920440847: size: 63.2KiB, partitions: 536 rows: 536, tombstones: 0
{ md-122689-big-Scylla.db: size: 63.2KiB, partitions: 536, rows: 536, tombstones: 0 }
]
[Run 45778799931085705910044935850456080249: size: 47.0KiB, partitions: 405 rows: 405, tombstones: 0
{ md-122649-big-Scylla.db: size: 47.0KiB, partitions: 405, rows: 405, tombstones: 0 }
]
[Run 262311877432365953911304446589544863918: size: 29.2KiB, partitions: 250 rows: 250, tombstones: 0
{ md-122625-big-Scylla.db: size: 29.2KiB, partitions: 250, rows: 250, tombstones: 0 }
]
--- SHARD #2 ---
[Run 974826215019598916311009276546051286313: size: 49.5GiB, partitions: 356394692 rows: 353717637, tombstones: 2684793
{ md-118802-big-Scylla.db: size: 49.5GiB, partitions: 356394692, rows: 353717637, tombstones: 2684793 }
]
[Run 1278893604064882179713197834845721799750: size: 20.0GiB, partitions: 152377738 rows: 141234168, tombstones: 11890344
{ md-122450-big-Scylla.db: size: 20.0GiB, partitions: 152377738, rows: 141234168, tombstones: 11890344 }
]
[Run 24978569774357709579480514948749616244: size: 874.4MiB, partitions: 6466958 rows: 6466958, tombstones: 0
{ md-122418-big-Scylla.db: size: 874.4MiB, partitions: 6466958, rows: 6466958, tombstones: 0 }
]
[Run 154724143849812323009567816970023365520: size: 335.2MiB, partitions: 2510816 rows: 2465097, tombstones: 60557
{ md-122330-big-Scylla.db: size: 335.2MiB, partitions: 2510816, rows: 2465097, tombstones: 60557 }
]
[Run 276977529414571019212076134113681128094: size: 59.7MiB, partitions: 482612 rows: 463951, tombstones: 18955
{ md-122562-big-Scylla.db: size: 59.7MiB, partitions: 482612, rows: 463951, tombstones: 18955 }
]
[Run 23569494802807330810010957287725869811: size: 18.7MiB, partitions: 153397 rows: 137618, tombstones: 16227
{ md-122610-big-Scylla.db: size: 18.7MiB, partitions: 153397, rows: 137618, tombstones: 16227 }
]
[Run 1611892644381683078911593944035802040185: size: 11.5MiB, partitions: 99609 rows: 83879, tombstones: 17742
{ md-122618-big-Scylla.db: size: 11.5MiB, partitions: 99609, rows: 83879, tombstones: 17742 }
]
[Run 871381078574067297912004265057394635993: size: 202.0B, partitions: 1 rows: 1, tombstones: 0
{ md-122602-big-Scylla.db: size: 202.0B, partitions: 1, rows: 1, tombstones: 0 }
]
--- SHARD #3 ---
[Run 1060201772521877016610163875691714161382: size: 53.3GiB, partitions: 389320526 rows: 379689407, tombstones: 9634772
{ md-118827-big-Scylla.db: size: 53.3GiB, partitions: 389320526, rows: 379689407, tombstones: 9634772 }
]
[Run 119030255465958796639538115460825872252: size: 13.0GiB, partitions: 93986885 rows: 93867270, tombstones: 121377
{ md-122435-big-Scylla.db: size: 13.0GiB, partitions: 93986885, rows: 93867270, tombstones: 121377 }
]
[Run 350569952411269800113268037015753766662: size: 3.1GiB, partitions: 31516866 rows: 20609456, tombstones: 10929995
{ md-122427-big-Scylla.db: size: 3.1GiB, partitions: 31516866, rows: 20609456, tombstones: 10929995 }
]
[Run 61657488541360513011939592561420558277: size: 385.9MiB, partitions: 2887396 rows: 2829094, tombstones: 81256
{ md-122331-big-Scylla.db: size: 385.9MiB, partitions: 2887396, rows: 2829094, tombstones: 81256 }
]
[Run 131407046541921655509304353516430445771: size: 59.6MiB, partitions: 485033 rows: 461984, tombstones: 23474
{ md-122571-big-Scylla.db: size: 59.6MiB, partitions: 485033, rows: 461984, tombstones: 23474 }
]
[Run 1568469922774222818810848015455704040225: size: 21.5MiB, partitions: 176103 rows: 159608, tombstones: 16999
{ md-122611-big-Scylla.db: size: 21.5MiB, partitions: 176103, rows: 159608, tombstones: 16999 }
]
[Run 1616926466819058923310502633603777947849: size: 8.7MiB, partitions: 77302 rows: 62546, tombstones: 16536
{ md-122627-big-Scylla.db: size: 8.7MiB, partitions: 77302, rows: 62546, tombstones: 16536 }
]
[Run 662302870301201435813295348556723513737: size: 145.5KiB, partitions: 1151 rows: 1151, tombstones: 0
{ md-122619-big-Scylla.db: size: 145.5KiB, partitions: 1151, rows: 1151, tombstones: 0 }
]
[Run 842400839984442600712396073374902555596: size: 234.0B, partitions: 1 rows: 1, tombstones: 0
{ md-122603-big-Scylla.db: size: 234.0B, partitions: 1, rows: 1, tombstones: 0 }
]
--- SHARD #4 ---
[Run 160614278061291778769841160962617517063: size: 53.8GiB, partitions: 392847737 rows: 384196722, tombstones: 8654201
{ md-118828-big-Scylla.db: size: 53.8GiB, partitions: 392847737, rows: 384196722, tombstones: 8654201 }
]
[Run 3802468929347831111689767440669092148: size: 12.2GiB, partitions: 100043714 rows: 85083309, tombstones: 15006132
{ md-122460-big-Scylla.db: size: 12.2GiB, partitions: 100043714, rows: 85083309, tombstones: 15006132 }
]
[Run 1799254720075289763713158740901750550417: size: 3.7GiB, partitions: 27055781 rows: 26998991, tombstones: 57955
{ md-122476-big-Scylla.db: size: 3.7GiB, partitions: 27055781, rows: 26998991, tombstones: 57955 }
]
[Run 1655395737262013201913183524533408529892: size: 843.2MiB, partitions: 6059693 rows: 6014664, tombstones: 45725
{ md-122468-big-Scylla.db: size: 843.2MiB, partitions: 6059693, rows: 6014664, tombstones: 45725 }
]
[Run 184062633313221027411263695479710293607: size: 21.5MiB, partitions: 176524 rows: 159724, tombstones: 17299
{ md-122612-big-Scylla.db: size: 21.5MiB, partitions: 176524, rows: 159724, tombstones: 17299 }
]
[Run 179735585680163650529279731002620736006: size: 16.7MiB, partitions: 142223 rows: 126063, tombstones: 16242
{ md-122588-big-Scylla.db: size: 16.7MiB, partitions: 142223, rows: 126063, tombstones: 16242 }
]
[Run 858394272906538288811855983606047607545: size: 8.5MiB, partitions: 76017 rows: 61221, tombstones: 16651
{ md-122628-big-Scylla.db: size: 8.5MiB, partitions: 76017, rows: 61221, tombstones: 16651 }
]
[Run 20147477081255618089821557565619670380: size: 136.0KiB, partitions: 1071 rows: 1071, tombstones: 0
{ md-122620-big-Scylla.db: size: 136.0KiB, partitions: 1071, rows: 1071, tombstones: 0 }
]
--- SHARD #5 ---
[Run 1026131391547475735610889254333553910854: size: 43.0GiB, partitions: 328599429 rows: 300228259, tombstones: 28471747
{ md-122517-big-Scylla.db: size: 43.0GiB, partitions: 328599429, rows: 300228259, tombstones: 28471747 }
]
[Run 340805835634253831911264487287925899633: size: 27.5GiB, partitions: 203857390 rows: 203789179, tombstones: 135321
{ mc-108029-big-Scylla.db: size: 27.5GiB, partitions: 203857390, rows: 203789179, tombstones: 135321 }
]
[Run 792710201618623986313788736177107024715: size: 3.6GiB, partitions: 26593644 rows: 26429639, tombstones: 195138
{ md-122469-big-Scylla.db: size: 3.6GiB, partitions: 26593644, rows: 26429639, tombstones: 195138 }
]
[Run 1135998489964191771310817941034968576013: size: 762.6MiB, partitions: 5625219 rows: 5375809, tombstones: 282531
{ md-122501-big-Scylla.db: size: 762.6MiB, partitions: 5625219, rows: 5375809, tombstones: 282531 }
]
[Run 78511541735275097811692931083218904705: size: 143.6MiB, partitions: 8109944 rows: 0, tombstones: 8109944
{ mc-115565-big-Scylla.db: size: 143.6MiB, partitions: 8109944, rows: 0, tombstones: 8109944 }
]
[Run 63909358709262440769742108126426970885: size: 135.7MiB, partitions: 1381223 rows: 1004787, tombstones: 377181
{ mc-115701-big-Scylla.db: size: 135.7MiB, partitions: 1381223, rows: 1004787, tombstones: 377181 }
]
[Run 109011049374624163379276931418253317824: size: 100.4MiB, partitions: 5695068 rows: 0, tombstones: 5695068
{ mc-115253-big-Scylla.db: size: 100.4MiB, partitions: 5695068, rows: 0, tombstones: 5695068 }
]
[Run 457901817595037431011606466205707679511: size: 95.1MiB, partitions: 5399639 rows: 0, tombstones: 5399639
{ mc-115309-big-Scylla.db: size: 95.1MiB, partitions: 5399639, rows: 0, tombstones: 5399639 }
]
[Run 1138824249319727483112372891356088238923: size: 94.0MiB, partitions: 5325093 rows: 0, tombstones: 5325093
{ mc-115557-big-Scylla.db: size: 94.0MiB, partitions: 5325093, rows: 0, tombstones: 5325093 }
]
[Run 1280855350466905875113169827237065166642: size: 21.5MiB, partitions: 176600 rows: 159969, tombstones: 17124
{ md-122613-big-Scylla.db: size: 21.5MiB, partitions: 176600, rows: 159969, tombstones: 17124 }
]
[Run 831980833457079983212436885535639972254: size: 16.9MiB, partitions: 144536 rows: 127642, tombstones: 18616
{ md-122493-big-Scylla.db: size: 16.9MiB, partitions: 144536, rows: 127642, tombstones: 18616 }
]
[Run 190183552194604567011541961618862021216: size: 11.5MiB, partitions: 92022 rows: 88611, tombstones: 3426
{ md-122637-big-Scylla.db: size: 11.5MiB, partitions: 92022, rows: 88611, tombstones: 3426 }
]
[Run 1079113611789443220111763425665807191692: size: 8.5MiB, partitions: 75339 rows: 60709, tombstones: 16706
{ md-122629-big-Scylla.db: size: 8.5MiB, partitions: 75339, rows: 60709, tombstones: 16706 }
]
[Run 867615195404437113011711845414948803335: size: 63.1KiB, partitions: 498 rows: 498, tombstones: 0
{ md-122621-big-Scylla.db: size: 63.1KiB, partitions: 498, rows: 498, tombstones: 0 }
]
[Run 1386415308016387916611649855159835193028: size: 48.3KiB, partitions: 416 rows: 416, tombstones: 0
{ md-122541-big-Scylla.db: size: 48.3KiB, partitions: 416, rows: 416, tombstones: 0 }
]
--- SHARD #6 ---
[Run 82770852636385145929375125739809497863: size: 49.9GiB, partitions: 357731794 rows: 357491766, tombstones: 302327
{ md-118982-big-Scylla.db: size: 49.9GiB, partitions: 357731794, rows: 357491766, tombstones: 302327 }
]
[Run 218771069468936160013083232191152336649: size: 12.9GiB, partitions: 95964231 rows: 91688964, tombstones: 4283042
{ md-122486-big-Scylla.db: size: 12.9GiB, partitions: 95964231, rows: 91688964, tombstones: 4283042 }
]
[Run 644888861850837206013308089896356299297: size: 3.9GiB, partitions: 28867329 rows: 28679004, tombstones: 215933
{ md-122462-big-Scylla.db: size: 3.9GiB, partitions: 28867329, rows: 28679004, tombstones: 215933 }
]
[Run 1598647542322300720411725484015543773939: size: 3.9GiB, partitions: 29795197 rows: 29786821, tombstones: 1537684
{ mc-113542-big-Scylla.db: size: 3.9GiB, partitions: 29795197, rows: 29786821, tombstones: 1537684 }
]
[Run 654361819390877801911677813079497369977: size: 430.5MiB, partitions: 3241410 rows: 3058310, tombstones: 630565
{ md-118694-big-Scylla.db: size: 430.5MiB, partitions: 3241410, rows: 3058310, tombstones: 630565 }
]
[Run 153627202312687655209625240275481956854: size: 272.0MiB, partitions: 2041546 rows: 1960478, tombstones: 214625
{ md-122334-big-Scylla.db: size: 272.0MiB, partitions: 2041546, rows: 1960478, tombstones: 214625 }
]
[Run 1789567427245521383310893406059017239520: size: 178.1MiB, partitions: 8036853 rows: 451326, tombstones: 7585527
{ mc-115270-big-Scylla.db: size: 178.1MiB, partitions: 8036853, rows: 451326, tombstones: 7585527 }
]
[Run 1766533142143095712412530623927498987320: size: 138.4MiB, partitions: 1539984 rows: 1005131, tombstones: 535535
{ mc-115702-big-Scylla.db: size: 138.4MiB, partitions: 1539984, rows: 1005131, tombstones: 535535 }
]
[Run 17357451112741132029745538287695149616: size: 109.8MiB, partitions: 6214431 rows: 0, tombstones: 6214431
{ mc-115566-big-Scylla.db: size: 109.8MiB, partitions: 6214431, rows: 0, tombstones: 6214431 }
]
[Run 135203240785798962789267429734225411458: size: 80.4MiB, partitions: 4556929 rows: 0, tombstones: 4556929
{ mc-115606-big-Scylla.db: size: 80.4MiB, partitions: 4556929, rows: 0, tombstones: 4556929 }
]
[Run 438985362338717728711263027508299083493: size: 74.8MiB, partitions: 4223100 rows: 0, tombstones: 4223100
{ mc-115518-big-Scylla.db: size: 74.8MiB, partitions: 4223100, rows: 0, tombstones: 4223100 }
]
[Run 182036368226112182649280405687788549728: size: 34.1MiB, partitions: 287976 rows: 273093, tombstones: 15033
{ md-122550-big-Scylla.db: size: 34.1MiB, partitions: 287976, rows: 273093, tombstones: 15033 }
]
[Run 1149919981298399882411628627400120260579: size: 23.1MiB, partitions: 190899 rows: 174370, tombstones: 18781
{ md-122542-big-Scylla.db: size: 23.1MiB, partitions: 190899, rows: 174370, tombstones: 18781 }
]
[Run 922122591506938570913589570143023849474: size: 20.3MiB, partitions: 165804 rows: 147639, tombstones: 20078
{ md-122630-big-Scylla.db: size: 20.3MiB, partitions: 165804, rows: 147639, tombstones: 20078 }
]
[Run 1097961759324158362712220759904315272889: size: 11.2MiB, partitions: 95844 rows: 83710, tombstones: 12134
{ md-122614-big-Scylla.db: size: 11.2MiB, partitions: 95844, rows: 83710, tombstones: 12134 }
]
[Run 881989775792369206011284340047525599637: size: 10.1MiB, partitions: 83746 rows: 78504, tombstones: 5242
{ md-122622-big-Scylla.db: size: 10.1MiB, partitions: 83746, rows: 78504, tombstones: 5242 }
]
[Run 988530393258906792510850441020384038766: size: 1.7MiB, partitions: 16742 rows: 15920, tombstones: 822
{ md-122558-big-Scylla.db: size: 1.7MiB, partitions: 16742, rows: 15920, tombstones: 822 }
]
[Run 1260565006240266756712602117119660707128: size: 57.1KiB, partitions: 484 rows: 484, tombstones: 0
{ md-122638-big-Scylla.db: size: 57.1KiB, partitions: 484, rows: 484, tombstones: 0 }
]
[Run 139342745429733055849432920698140031253: size: 210.0B, partitions: 1 rows: 1, tombstones: 0
{ md-122598-big-Scylla.db: size: 210.0B, partitions: 1, rows: 1, tombstones: 0 }
]
--- SHARD #7 ---
[Run 599391261637517562012695267731249051667: size: 53.2GiB, partitions: 380377250 rows: 380370782, tombstones: 10548
{ md-118807-big-Scylla.db: size: 53.2GiB, partitions: 380377250, rows: 380370782, tombstones: 10548 }
]
[Run 171637467946705419459264470375780772165: size: 11.6GiB, partitions: 83660506 rows: 83577672, tombstones: 84090
{ md-122439-big-Scylla.db: size: 11.6GiB, partitions: 83660506, rows: 83577672, tombstones: 84090 }
]
[Run 573971111935884056011341319850194202116: size: 3.8GiB, partitions: 28037120 rows: 27835848, tombstones: 237227
{ md-122431-big-Scylla.db: size: 3.8GiB, partitions: 28037120, rows: 27835848, tombstones: 237227 }
]
[Run 695107957074278903813594906105666389453: size: 754.8MiB, partitions: 5583915 rows: 5294255, tombstones: 308016
{ md-122447-big-Scylla.db: size: 754.8MiB, partitions: 5583915, rows: 5294255, tombstones: 308016 }
]
[Run 246922541930127643810225761733909824052: size: 149.1MiB, partitions: 2141925 rows: 1005014, tombstones: 1137640
{ mc-115703-big-Scylla.db: size: 149.1MiB, partitions: 2141925, rows: 1005014, tombstones: 1137640 }
]
[Run 27959908695454682249582922795285225876: size: 59.8MiB, partitions: 487030 rows: 463247, tombstones: 24197
{ md-122567-big-Scylla.db: size: 59.8MiB, partitions: 487030, rows: 463247, tombstones: 24197 }
]
[Run 878745536757840458411882101846464459105: size: 20.9MiB, partitions: 171736 rows: 155177, tombstones: 17010
{ md-122615-big-Scylla.db: size: 20.9MiB, partitions: 171736, rows: 155177, tombstones: 17010 }
]
[Run 1201660877139006425910449069359591752717: size: 9.3MiB, partitions: 81572 rows: 66344, tombstones: 17079
{ md-122623-big-Scylla.db: size: 9.3MiB, partitions: 81572, rows: 66344, tombstones: 17079 }
]
[Run 736450673856743386813473352236641299689: size: 313.0B, partitions: 2 rows: 2, tombstones: 0
{ md-122607-big-Scylla.db: size: 313.0B, partitions: 2, rows: 2, tombstones: 0 }
]

@zooptwopointone
Copy link
Author

zooptwopointone commented May 14, 2021

The delete statement is just a standard delete on the partition key. as the full primary key is the partition key

@zooptwopointone
Copy link
Author

Also FYI I have not deleted all the data I need to delete. probably 20% or so might have been deleted.

@raphaelsc
Copy link
Member

This is very likely an OOM bug (already fixed in master) when processing a large run of partition tombstones inside the input sstables in compaction.

@raphaelsc
Copy link
Member

Sending a patch to fix the bug in older branches...

avikivity pushed a commit that referenced this issue May 20, 2021
… large run of partition tombstones

mp_row_consumer will not stop consuming large run of partition
tombstones, until a live row is found which will allow the consumer
to stop proceeding. So partition tombstones, from a large run, are
all accumulated in memory, leading to OOM and stalls.
The fix is about stopping the consumer if buffer is full, to allow
the produced fragments to be consumed by sstable writer.

Fixes #8071.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210514202640.346594-1-raphaelsc@scylladb.com>


Upstream fix: db4b921
avikivity pushed a commit that referenced this issue May 20, 2021
… large run of partition tombstones

mp_row_consumer will not stop consuming large run of partition
tombstones, until a live row is found which will allow the consumer
to stop proceeding. So partition tombstones, from a large run, are
all accumulated in memory, leading to OOM and stalls.
The fix is about stopping the consumer if buffer is full, to allow
the produced fragments to be consumed by sstable writer.

Fixes #8071.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210514202640.346594-1-raphaelsc@scylladb.com>

Upstream fix: db4b921

(cherry picked from commit 2b29568)
@tzach
Copy link
Contributor

tzach commented Jun 10, 2021

@raphaelsc can we close this issue?

@raphaelsc
Copy link
Member

I think so. @zooptwopointone please upgrade to a 4.3 or 4.4 version once a minor release containing the fix is available. closing this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants