-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: holding open files prevents filesystem reclaiming space (full cache drive) #3334
Comments
Maybe we're putting to much pressure on a file system by keeping the random prefix here. I think it's safe to just remove it. |
Deleting directories didn't free any blocks, but killing redpanda released all of them back to where df roughly matched du, so it looks like redpanda was failing to close some files. |
Maybe there was a lot of .part files? Looks like there was only 39 completed downloads but more than 100K files total. Those other files could only be .part files which are incomplete downloads. |
.part files would have shown up in the Because the cache cleanup function runs independently, it makes sense that nothing is poking things like the reader cache to close their open handles, although the sheer number was surprising (a 750GB filesystem filled up with a mixture of 1GB and 64MB segments). |
got it, so we open files without closing them properly in some cases, and they get deleted from cache but inodes and data blocks are not |
From my test: But it contains info about old topic with 10 MB segment size
|
The find output is larger in the second case because it prints all subdirectories and we don't delete directories from cache. |
Another interesting thing is that the file names are different since we added term to the name but the cache is still able to find and delete them anyway. |
On 21.11.3-si-beta5, I can reproduce the disk space leak (i.e. divergence between I'm going to work on a direct automated test of this with whatever instrumentation we need to check it directly without waiting for large clusters to run for some time. |
How big is the difference between the df and du numbers? Currently, cache
eviction can remove the segment which is being used at the moment. Maybe
this is what is happening here.
…On Tue, Jan 4, 2022, 16:29 John Spray ***@***.***> wrote:
Reopened #3334 <#3334>.
—
Reply to this email directly, view it on GitHub
<#3334 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAWMNRCPIOEAXFTU3ZLWELUULY4RANCNFSM5KP5GA2Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
The garbage collection of remote_segment objects was conditional on the reference count in the segment's sharded_ptr being 1 (i.e. nothing else is using it). This is not true if there are entries in the `readers` attribute of materialized_segment_state, because readers hold a reference to the segment. Deal with this by proactively evicting readers from any materialized_segment_state objects whose atime makes them elegible for removal. Fixes: redpanda-data#3334 Signed-off-by: John Spray <jcs@vectorized.io>
The garbage collection of remote_segment objects was conditional on the reference count in the segment's sharded_ptr being 1 (i.e. nothing else is using it). This is not true if there are entries in the `readers` attribute of materialized_segment_state, because readers hold a reference to the segment. Deal with this by proactively evicting readers from any materialized_segment_state objects whose atime makes them elegible for removal. Fixes: redpanda-data#3334 Signed-off-by: John Spray <jcs@vectorized.io>
The garbage collection of remote_segment objects was conditional on the reference count in the segment's sharded_ptr being 1 (i.e. nothing else is using it). This is not true if there are entries in the `readers` attribute of materialized_segment_state, because readers hold a reference to the segment. Deal with this by proactively evicting readers from any materialized_segment_state objects whose atime makes them elegible for removal. Fixes: redpanda-data#3334 Signed-off-by: John Spray <jcs@vectorized.io> (cherry picked from commit 42945cb)
The garbage collection of remote_segment objects was conditional on the reference count in the segment's sharded_ptr being 1 (i.e. nothing else is using it). This is not true if there are entries in the `readers` attribute of materialized_segment_state, because readers hold a reference to the segment. Deal with this by proactively evicting readers from any materialized_segment_state objects whose atime makes them elegible for removal. Fixes: redpanda-data#3334 Signed-off-by: John Spray <jcs@vectorized.io> (cherry picked from commit 42945cb)
Backporting to 21.11.x here: |
This system had been subject to a bunch of previous work, but the workload running at time of failure was 64 random readers trying to read from a topic with 1G segments. This was intended to check if a 50GB cache limit would be enforced against 64GB of concurrent IO, but ended up showing up a separate issue.
The cache XFS filesystem is full, but the result of listing out all the files and summing sizes
The limit is set to 50GB:
The filesystem is in a state where
du
anddf
disagree:There are far more tiny files/dirs than there are actual log segments in cache:
To it's credit, redpanda isn't crashing, but it is permanently failing to fulfil read requests:
The text was updated successfully, but these errors were encountered: