Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Badger Compaction Issue #90

Closed
acaire opened this issue Dec 11, 2019 · 5 comments
Closed

Badger Compaction Issue #90

acaire opened this issue Dec 11, 2019 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@acaire
Copy link
Contributor

acaire commented Dec 11, 2019

Hi, there's an issue with Sloop that occurs when you consume all of maxDiskDb and a compaction runs, the sloop graphs disappear (and subsequent writes aren't made to the DB).

I'm able to consistently reproduce this by installing the latest helm chart without any overrides, except adding --max-disk-mb=1 to the command line.

It seems to occur after these events:

I1211 01:51:02.648860       1 storemanager.go:171] Start cleaning up because current file size: 14073983 exceeds file size: 1048576
badger 2019/12/11 01:51:02 INFO: Writes flushed. Stopping compactions now...
badger 2019/12/11 01:51:02 DEBUG: Flushing memtable
badger 2019/12/11 01:51:02 DEBUG: Storing value log head: {Fid:0 Len:32 Offset:13236655}
badger 2019/12/11 01:51:02 INFO: Got compaction priority: {level:0 score:1.74 dropPrefix:[47 119 97 116 99 104 47 48 48 49 53 55 54 48 50 54 48 48 48]}
badger 2019/12/11 01:51:02 INFO: Running for level: 0
badger 2019/12/11 01:51:02 DEBUG: LOG Compact. Added 212 keys. Skipped 114 keys. Iteration took: 466.315µs
badger 2019/12/11 01:51:02 DEBUG: Discard stats: map[0:205537]
badger 2019/12/11 01:51:02 INFO: LOG Compact 0->1, del 2 tables, add 1 tables, took 11.140577ms
badger 2019/12/11 01:51:02 INFO: Compaction for level: 0 DONE
badger 2019/12/11 01:51:02 DEBUG: LOG Compact. Added 205 keys. Skipped 7 keys. Iteration took: 444.281µs
badger 2019/12/11 01:51:02 DEBUG: Discard stats: map[0:7658]
badger 2019/12/11 01:51:02 INFO: LOG Compact 1->1, del 1 tables, add 1 tables, took 10.030778ms
badger 2019/12/11 01:51:02 INFO: DropPrefix done
badger 2019/12/11 01:51:02 INFO: Resuming writes
badger 2019/12/11 01:51:02 INFO: Writes flushed. Stopping compactions now...
badger 2019/12/11 01:51:02 DEBUG: LOG Compact. Added 201 keys. Skipped 4 keys. Iteration took: 261.251µs
badger 2019/12/11 01:51:02 DEBUG: Discard stats: map[0:582]
badger 2019/12/11 01:51:02 INFO: LOG Compact 1->1, del 1 tables, add 1 tables, took 8.278415ms
badger 2019/12/11 01:51:02 INFO: DropPrefix done
badger 2019/12/11 01:51:02 INFO: Resuming writes
badger 2019/12/11 01:51:02 INFO: Writes flushed. Stopping compactions now...
badger 2019/12/11 01:51:02 DEBUG: LOG Compact. Added 201 keys. Skipped 0 keys. Iteration took: 277.819µs
badger 2019/12/11 01:51:02 DEBUG: Discard stats: map[]
badger 2019/12/11 01:51:02 INFO: LOG Compact 1->1, del 1 tables, add 1 tables, took 12.260061ms
badger 2019/12/11 01:51:02 INFO: DropPrefix done
badger 2019/12/11 01:51:02 INFO: Resuming writes

image

@DuncanSmith1126
Copy link
Contributor

Thanks - we're looking at this internally as well. I will do some investigation and get back to you.

@DuncanSmith1126 DuncanSmith1126 pinned this issue Dec 11, 2019
@DuncanSmith1126 DuncanSmith1126 added the bug Something isn't working label Dec 11, 2019
@DuncanSmith1126 DuncanSmith1126 self-assigned this Dec 11, 2019
@DuncanSmith1126
Copy link
Contributor

Local repro was as easy as you described. Treating this as a high pri issue.

@DuncanSmith1126
Copy link
Contributor

I'm hopeful that an updated badger version fixes this. There's an issue referenced in dgraph-io/badger#1062 that was fixed in the latest release. This is a data-store breaking change, unfortunately. But better now than later on down the line.

Interestingly, this won't actually fix the max-disk-mb=1 method of reproducing the issue. What's happening there is that our compaction runs in a tight loop, since it doesn't have the disk space to store the current state of all the k8s resources. We should have a better (really, any at all) error message in the UI declaring that you need to allocate at least enough disk space to store the current state. (That would be a good first issue!)

The upgrade to badger v2 should fix the compaction oom crash, though. I'm still testing.

@thomashargrove
Copy link
Contributor

Ive been looking into this a bit. Sloop has a background job to detect when there is too much data on disk and clean up old keys with Badger DropPrefix. It appears that is not resulting in any actual cleanup on the disk, so it goes into a fairly tight loop. While DropPrefix is running all reads and writes to Badger are blocked. Still researching a fix.

@thomashargrove
Copy link
Contributor

Give the latest build a try. In is after the badger v2 upgrade, so you will need to wipe your old data unfortunately. But supposedly badger is going to stop making breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants