About Deletion marker #655

hyunsooda · 2019-02-09T19:29:43Z

I am korean student and Can I ask few question? if you are not busy.

I know that leveldb's delete operation is implemented by deletion marker.
if the user delete data multiple times, many space will be waste so leveldb has compaction algorithm.

when leveldb performs compress to reuse waste space ?
by any chance, other operations are wating during compression such as Mark and Sweep Algorithm ?

Thank you.

adam-azarchs · 2019-03-05T03:59:40Z

Mark/sweep as you'd see in a garbage collector is way more complicated than what happens in leveldb. A simplified version of how leveldb works (ignoring some of the implementation details for the youngest generation and optimizations like bloom filters and key sharding) is that there's a stack of files representing (more or less) snapshots of the database. When you go to look up a key, you first look in the top file in the stack. If you find a key, or a deletion marker, you stop there. Otherwise you keep going down the stack until you either find the key or reach the bottom of the stack. So for example imagine this stack of tables:

key:    a     b     c
v0:     1     2    del
v1:           3     4
v2:           4     5

then what you would see iterating through the database would be {a: 1, b: 2}. If we edit or delete a key, we just make the change to the table in v0 in-memory. Removing a could be done without a deletion marker. However removing b would require a marker to keep a reader from falling back to v1. Once there's enough data in v0, it gets frozen and dumped to disk, all of the "levels" get incremented, and we start writing to a new v0.

Compaction is basically removing the redundant data in lower levels of the stack. So if we were to compact v1 and v2 together above, it would basically just remove v2 since both of its values were overwritten in v1. If we compacted all 3 layers together after making a new v0, we'd have

key:    a     b     c
v0:
v1:     1     2

Because the lower levels of the stack are logically immutable, compaction can happen in the background and then the stack of tables can be atomically replaced by the new, single, compacted table.

Some of the configured options control when stuff happens. Basically once a level gets above a certain size, it gets compacted with the level under it. Compaction can also be manually triggered for a key range, for example if you know that you're not going to be writing to that range for a while but are going to be reading from it a lot (since compaction can avoid the need to search multiple levels for your key). As the documentation states, however, you're probably better off leaving compaction up to the built-in logic.

cmumford · 2019-04-13T18:01:25Z

Thx for the response @adam-azarchs. I think this is answered, but please reopen if not.

cmumford added the question label Apr 13, 2019

cmumford closed this as completed Apr 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Deletion marker #655

About Deletion marker #655

hyunsooda commented Feb 9, 2019

adam-azarchs commented Mar 5, 2019

cmumford commented Apr 13, 2019

About Deletion marker #655

About Deletion marker #655

Comments

hyunsooda commented Feb 9, 2019

adam-azarchs commented Mar 5, 2019

cmumford commented Apr 13, 2019