Leftover .ldb file #593

pwnall · 2018-05-31T22:28:23Z

While validating the design for a feature on top of leveldb, we've been stress-testing a database using a workload consisting of Put() and Delete(). We consistently see a .ldb file being left behind, so it seems like compaction doesn't delete all the old files.

This may be related to #573, which also suggests a potential accounting problem in compaction.

dmurph · 2018-05-31T22:38:21Z

Yep. More specifically, I'm putting batches of data, while keeping some alive temporarily, before deleting them:

Put batch 1
Put batch 2
Put batch 3, delete batch 1
Put batch 4, delete batch 2,
etc.

The first level file should be full of data that is soon deleted, but it sticks around forever.

pedrinimm · 2018-08-29T14:25:45Z

Hi Guys,

Any update on this issue, I am facing the same issue.
In my case I have a stream of incoming data which is converted into a struct, saved while processing then removed.

Same as @dmurph. But after a while, my amount of ldb files just grows and grows. All not used.
Because, I wrote a small binary that does only CompactRange from Start to Finish and all get removed.

cmumford · 2018-09-11T16:25:50Z

@pwnall Does closing and reopening the database delete the leftover .ldb files?

pwnall · 2018-09-16T05:11:01Z

@dmurph Can you please dust off your experiment's code and answer @cmumford's question above?

dmurph · 2018-09-17T17:27:01Z

@pedrinimm That might be a different issue - can you give a range of 'growth' that you see? Or is it just the 00005.ldb file that sticks around?

@cmumford I will investigate, I'm not sure. But - I see this in my chrome's localStorage & indexeddb leveldb directories

dmurph · 2018-09-17T17:41:01Z

@cmumford Yes - I just re-ran my test with a persistent directory instead of a new one, and it keeps the 000005.ldb file around even after the second run of the test.

dmurph · 2018-09-17T17:50:52Z

Updated the test to use a persistent folder: https://chromium-review.googlesource.com/c/chromium/src/+/1080339

nysenthil · 2019-03-20T12:40:12Z

In my team, we are seeing the same issue in one of our large enterprise customer projects. Old .ldb files are being left behind without getting deleted. It eventually leads to filling up the disk after two days. Is there a fix or workaround or any advice from the Level DB team to avoid this critical problem?

Thank you.

cmumford · 2019-03-20T17:04:33Z

@nysenthil check that your code doesn't have any unreleased snapshots or iterators. Also check the log (*.log) that your Env may be creating for evidence of problems. This bug is only for a single *.ldb file being left behind - I'm unaware of a bug that fails to delete many *.ldb files.

nysenthil · 2019-03-20T17:31:05Z

Hi Chris,
The problem I described above is much more related to this open issue by not purging the old .ldb files: Disk space used by the DB grows without bounds #603

@dominichamon

…n harder. (google#593) The first problem you have to solve yourself. The second one can be aided. The benchmark library can compute some statistics over the repetitions, which helps with grasping the results somewhat. But that is only for the one set of results. It does not really help to compare the two benchmark results, which is the interesting bit. Thankfully, there are these bundled `tools/compare.py` and `tools/compare_bench.py` scripts. They can provide a diff between two benchmarking results. Yay! Except not really, it's just a diff, while it is very informative and better than nothing, it does not really help answer The Question - am i just looking at the noise? It's like not having these per-benchmark statistics... Roughly, we can formulate the question as: > Are these two benchmarks the same? > Did my change actually change anything, or is the difference below the noise level? Well, this really sounds like a [null hypothesis](https://en.wikipedia.org/wiki/Null_hypothesis), does it not? So maybe we can use statistics here, and solve all our problems? lol, no, it won't solve all the problems. But maybe it will act as a tool, to better understand the output, just like the usual statistics on the repetitions... I'm making an assumption here that most of the people care about the change of average value, not the standard deviation. Thus i believe we can use T-Test, be it either [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test), or [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test). **EDIT**: however, after @dominichamon review, it was decided that it is better to use more robust [Mann–Whitney U test](https://en.wikipedia.org/wiki/Mann–Whitney_U_test) I'm using [scipy.stats.mannwhitneyu](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html#scipy.stats.mannwhitneyu). There are two new user-facing knobs: ``` $ ./compare.py --help usage: compare.py [-h] [-u] [--alpha UTEST_ALPHA] {benchmarks,filters,benchmarksfiltered} ... versatile benchmark output compare tool <...> optional arguments: -h, --help show this help message and exit -u, --utest Do a two-tailed Mann-Whitney U test with the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample. WARNING: requires **LARGE** (9 or more) number of repetitions to be meaningful! --alpha UTEST_ALPHA significance level alpha. if the calculated p-value is below this value, then the result is said to be statistically significant and the null hypothesis is rejected. (default: 0.0500) ``` Example output: ![screenshot_20180512_175517](https://user-images.githubusercontent.com/88600/39958581-ae897924-560d-11e8-81b9-806db6c3e691.png) As you can guess, the alpha does affect anything but the coloring of the computed p-values. If it is green, then the change in the average values is statistically-significant. I'm detecting the repetitions by matching name. This way, no changes to the json are _needed_. Caveats: * This won't work if the json is not in the same order as outputted by the benchmark, or if the parsing does not retain the ordering. * This won't work if after the grouped repetitions there isn't at least one row with different name (e.g. statistic). Since there isn't a knob to disable printing of statistics (only the other way around), i'm not too worried about this. * **The results will be wrong if the repetition count is different between the two benchmarks being compared.** * Even though i have added (hopefully full) test coverage, the code of these python tools is staring to look a bit jumbled. * So far i have added this only to the `tools/compare.py`. Should i add it to `tools/compare_bench.py` too? Or should we deduplicate them (by removing the latter one)?

pwnall self-assigned this May 31, 2018

pedrinimm mentioned this issue Aug 29, 2018

ldb files beeker1121/goque#15

Open

cmumford added the bug label Sep 11, 2018

moughxyz mentioned this issue Mar 4, 2020

Usage through IndexedDB results in ldb files that never get deleted #783

Open

Rhythm-ONE mentioned this issue Oct 29, 2021

Implement beacon chain pruning harmony-one/harmony#3914

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leftover .ldb file #593

Leftover .ldb file #593

pwnall commented May 31, 2018

dmurph commented May 31, 2018

pedrinimm commented Aug 29, 2018

cmumford commented Sep 11, 2018

pwnall commented Sep 16, 2018

dmurph commented Sep 17, 2018 •

edited

Loading

dmurph commented Sep 17, 2018

dmurph commented Sep 17, 2018

nysenthil commented Mar 20, 2019 •

edited

Loading

cmumford commented Mar 20, 2019

nysenthil commented Mar 20, 2019

Leftover .ldb file #593

Leftover .ldb file #593

Comments

pwnall commented May 31, 2018

dmurph commented May 31, 2018

pedrinimm commented Aug 29, 2018

cmumford commented Sep 11, 2018

pwnall commented Sep 16, 2018

dmurph commented Sep 17, 2018 • edited Loading

dmurph commented Sep 17, 2018

dmurph commented Sep 17, 2018

nysenthil commented Mar 20, 2019 • edited Loading

cmumford commented Mar 20, 2019

nysenthil commented Mar 20, 2019

dmurph commented Sep 17, 2018 •

edited

Loading

nysenthil commented Mar 20, 2019 •

edited

Loading