Badger Compaction Issue #90

acaire · 2019-12-11T02:23:11Z

Hi, there's an issue with Sloop that occurs when you consume all of maxDiskDb and a compaction runs, the sloop graphs disappear (and subsequent writes aren't made to the DB).

I'm able to consistently reproduce this by installing the latest helm chart without any overrides, except adding --max-disk-mb=1 to the command line.

It seems to occur after these events:

I1211 01:51:02.648860       1 storemanager.go:171] Start cleaning up because current file size: 14073983 exceeds file size: 1048576
badger 2019/12/11 01:51:02 INFO: Writes flushed. Stopping compactions now...
badger 2019/12/11 01:51:02 DEBUG: Flushing memtable
badger 2019/12/11 01:51:02 DEBUG: Storing value log head: {Fid:0 Len:32 Offset:13236655}
badger 2019/12/11 01:51:02 INFO: Got compaction priority: {level:0 score:1.74 dropPrefix:[47 119 97 116 99 104 47 48 48 49 53 55 54 48 50 54 48 48 48]}
badger 2019/12/11 01:51:02 INFO: Running for level: 0
badger 2019/12/11 01:51:02 DEBUG: LOG Compact. Added 212 keys. Skipped 114 keys. Iteration took: 466.315µs
badger 2019/12/11 01:51:02 DEBUG: Discard stats: map[0:205537]
badger 2019/12/11 01:51:02 INFO: LOG Compact 0->1, del 2 tables, add 1 tables, took 11.140577ms
badger 2019/12/11 01:51:02 INFO: Compaction for level: 0 DONE
badger 2019/12/11 01:51:02 DEBUG: LOG Compact. Added 205 keys. Skipped 7 keys. Iteration took: 444.281µs
badger 2019/12/11 01:51:02 DEBUG: Discard stats: map[0:7658]
badger 2019/12/11 01:51:02 INFO: LOG Compact 1->1, del 1 tables, add 1 tables, took 10.030778ms
badger 2019/12/11 01:51:02 INFO: DropPrefix done
badger 2019/12/11 01:51:02 INFO: Resuming writes
badger 2019/12/11 01:51:02 INFO: Writes flushed. Stopping compactions now...
badger 2019/12/11 01:51:02 DEBUG: LOG Compact. Added 201 keys. Skipped 4 keys. Iteration took: 261.251µs
badger 2019/12/11 01:51:02 DEBUG: Discard stats: map[0:582]
badger 2019/12/11 01:51:02 INFO: LOG Compact 1->1, del 1 tables, add 1 tables, took 8.278415ms
badger 2019/12/11 01:51:02 INFO: DropPrefix done
badger 2019/12/11 01:51:02 INFO: Resuming writes
badger 2019/12/11 01:51:02 INFO: Writes flushed. Stopping compactions now...
badger 2019/12/11 01:51:02 DEBUG: LOG Compact. Added 201 keys. Skipped 0 keys. Iteration took: 277.819µs
badger 2019/12/11 01:51:02 DEBUG: Discard stats: map[]
badger 2019/12/11 01:51:02 INFO: LOG Compact 1->1, del 1 tables, add 1 tables, took 12.260061ms
badger 2019/12/11 01:51:02 INFO: DropPrefix done
badger 2019/12/11 01:51:02 INFO: Resuming writes

The text was updated successfully, but these errors were encountered:

DuncanSmith1126 · 2019-12-11T22:01:31Z

Thanks - we're looking at this internally as well. I will do some investigation and get back to you.

DuncanSmith1126 · 2019-12-11T22:03:54Z

Local repro was as easy as you described. Treating this as a high pri issue.

DuncanSmith1126 · 2019-12-11T23:47:37Z

I'm hopeful that an updated badger version fixes this. There's an issue referenced in dgraph-io/badger#1062 that was fixed in the latest release. This is a data-store breaking change, unfortunately. But better now than later on down the line.

Interestingly, this won't actually fix the max-disk-mb=1 method of reproducing the issue. What's happening there is that our compaction runs in a tight loop, since it doesn't have the disk space to store the current state of all the k8s resources. We should have a better (really, any at all) error message in the UI declaring that you need to allocate at least enough disk space to store the current state. (That would be a good first issue!)

The upgrade to badger v2 should fix the compaction oom crash, though. I'm still testing.

thomashargrove · 2020-01-02T18:56:26Z

Ive been looking into this a bit. Sloop has a background job to detect when there is too much data on disk and clean up old keys with Badger DropPrefix. It appears that is not resulting in any actual cleanup on the disk, so it goes into a fairly tight loop. While DropPrefix is running all reads and writes to Badger are blocked. Still researching a fix.

thomashargrove · 2020-01-07T22:38:55Z

Give the latest build a try. In is after the badger v2 upgrade, so you will need to wipe your old data unfortunately. But supposedly badger is going to stop making breaking changes.

DuncanSmith1126 pinned this issue Dec 11, 2019

DuncanSmith1126 added the bug Something isn't working label Dec 11, 2019

DuncanSmith1126 self-assigned this Dec 11, 2019

thomashargrove mentioned this issue Jan 3, 2020

Tweak Badger GC and expose several Badger config options #94

Merged

thomashargrove closed this as completed Jan 7, 2020

RubenKelevra mentioned this issue Jan 22, 2020

Set up collaborative pinning clusters ipfs/distributed-wikipedia-mirror#68

Closed

3 tasks

mallow111 unpinned this issue Oct 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Badger Compaction Issue #90

Badger Compaction Issue #90

acaire commented Dec 11, 2019

DuncanSmith1126 commented Dec 11, 2019

DuncanSmith1126 commented Dec 11, 2019

DuncanSmith1126 commented Dec 11, 2019

thomashargrove commented Jan 2, 2020

thomashargrove commented Jan 7, 2020

Badger Compaction Issue #90

Badger Compaction Issue #90

Comments

acaire commented Dec 11, 2019

DuncanSmith1126 commented Dec 11, 2019

DuncanSmith1126 commented Dec 11, 2019

DuncanSmith1126 commented Dec 11, 2019

thomashargrove commented Jan 2, 2020

thomashargrove commented Jan 7, 2020