-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to Compact table and compaction causing crash loop. #8071
Comments
Other log info f727da6c-f4b1-43d5-bdbd-4cfdf354e824 |
log shows you ran out of memory (OOM). when reading from a table with a huge number of tombstones, it also contributes massively with OOM (#7689). I think compaction is also potentially accumulating lots of tombstones in memory until partition ends. If that's the case, we should probably deoverlap them to help with this. @denesb which compaction strategy are you using? perhaps tombstone compaction may help with this, will not solve the problem completely but will purge tombstones which shadowed data is already gone, which may increase chances of success for the next procedures. |
Ah sorry default compaction stratengy. FYI server has 64 Gig of ram I started it with reserve 5G for OS(while the system was in a loop) here is the rest of the table info: |
I have uploaded a listing of one of the directories that came up a log for that table. 6b00b903-6d39-4a93-bcd0-e1c9c7aa89c8 |
Follow up with this issue, is how to handle deletes like this. this table will commonly have all of the data replaced ever so often. I have other tables as well where TTLs are there which seems to recently started to have some of the same issues. I know setting a lower gc_grace_seconds and compacting is a thing to do, but as I ran into this problem I can get through deleting data without causing an outage. do I need to keep track of only deleting some chunk of data and making sure to manually compact it out before continuing? Really just looking for what is a work around to getting things deleted. where i know it is going to cause a large amount of tombstones. (and in this case technically I don't care about resurrection) |
Based on the schema - partitions are not multiple rows |
I really would like some help on this. as it has been quite a while now. I have upgrade the cluster it is running a 4.3 version. if I allow the auto compaction of this table it just crashes the server. If I run a manual compaction it tries to compact all partitions or maybe all shards at the same time causing more problems. I can provide stables files or whatever you need, but I would think this is not a isolated issue. For anyone who has a table with large amounts of deletes. It would seem that maybe compaction should have a concurrancy option or for larger partitions maybe not loading it into memory? I am not sure what it is doing be I get a wall of exceptions about unable to alloc when this is trying to compact. Please let me know what I can provide. |
Cluster is now running Scylla version 4.3.2-0.20210301.5cdc1fa66 with build-id 3a7f9a6b65ec73bba0a38f3f349a91c49129c598 starting ... |
I think you may be affected by a known bug where OOM happens if sstable parser bumps into a large run of tombstones. Could you please run the following script of mine against the table which cause this issue?
|
@zooptwopointone could you please share the cql statement used to perform the deletion? |
--- SHARD #0 --- |
The delete statement is just a standard delete on the partition key. as the full primary key is the partition key |
Also FYI I have not deleted all the data I need to delete. probably 20% or so might have been deleted. |
This is very likely an OOM bug (already fixed in master) when processing a large run of partition tombstones inside the input sstables in compaction. |
Sending a patch to fix the bug in older branches... |
… large run of partition tombstones mp_row_consumer will not stop consuming large run of partition tombstones, until a live row is found which will allow the consumer to stop proceeding. So partition tombstones, from a large run, are all accumulated in memory, leading to OOM and stalls. The fix is about stopping the consumer if buffer is full, to allow the produced fragments to be consumed by sstable writer. Fixes #8071. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210514202640.346594-1-raphaelsc@scylladb.com> Upstream fix: db4b921
… large run of partition tombstones mp_row_consumer will not stop consuming large run of partition tombstones, until a live row is found which will allow the consumer to stop proceeding. So partition tombstones, from a large run, are all accumulated in memory, leading to OOM and stalls. The fix is about stopping the consumer if buffer is full, to allow the produced fragments to be consumed by sstable writer. Fixes #8071. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210514202640.346594-1-raphaelsc@scylladb.com> Upstream fix: db4b921 (cherry picked from commit 2b29568)
@raphaelsc can we close this issue? |
I think so. @zooptwopointone please upgrade to a 4.3 or 4.4 version once a minor release containing the fix is available. closing this... |
Installation details
Scylla version 4.2.1-0.20201108.4fb8ebccff with build-id d19a95fd85a8e7df2928ccd8c729941f263e8adc
Cluster size: 4 Datacenters (3 x 3 nodes 1 x 7 nodes)
OS (RHEL/CentOS/Ubuntu/AWS AMI): 18.04.5 LTS (GNU/Linux 4.15.0-128-generic x86_64)
Platform (physical/VM/cloud instance type/docker): Physical
Hardware: sockets= 2 cores=8 hyperthreading= 32 memory= 128 GB
Disks: (SSD/HDD, count) 8 SSD
I have uploaded a second of Syslog here 815ea449-6ab3-4c88-9914-1a0e14162d6f
What this is is the logs after setting
nodetool disableautocompaction
for the table. And then running a forced compaction for that table after about 10 minutes to make sure that any waiting retry compactions were not going to run.I am posting this as the same results happen from the auto running compactions.
So some history of what this is. This table has about 3 billion rows. Table design is like this:
CREATE TABLE lnp.lrn (
country_code text,
did text,
version int,
common_name text,
company text,
country text,
dst text,
last_updated timestamp,
latitude text,
longitude text,
lrn text,
ocn text,
rc text,
state text,
type text,
tz text,
zip text,
zip2 text,
zip3 text,
zip4 text,
PRIMARY KEY ((country_code, did, version))
So for this table I started a job to clean out some old unused data. related to the version column which would have deleted two thirds of the data so roughly 2 billion rows. This eventually started causing problems with the cluster. 2 servers started getting into a Crash loop. Sometimes they would start and stay on for about 30 minutes but only to crash again. Once I kinda figured out I could disable the auto compaction that cleared up all the exceptions and stopped the system from crashing. So I would assume this is all because of the amount of Tombstones that this would have created. I did modify the gc_grace_seconds to 2 hours and this is where I was attempting the Major compactions to see if it could clear up some of them. But it would seem the larger sstable files it is just unable to do anything with.
I was trying to search around if there was any type of offline compaction option. Or if there was any type of option to extend the Reactor shard timeout on the node so it could just get the job complete.
I found something about it, but didn't understand how to do the config changes. #4559 #2689
I'll upload more tracebacks etc from during the time of the problem in hopes you might find other bugs that could be fixed. But might be a bit of a Jumble of logs. Will add it in an additional comment.
The text was updated successfully, but these errors were encountered: