Sparse file #30

arj03 · 2021-03-12T15:34:07Z

Consider how we can use sparse file support in the OS to better support delete.

staltz · 2022-04-12T14:13:31Z

I'm studying the viability of making the log sparse, and here's what I learned:

From the fallocate (Linux command which among other things can convert a file to sparse file) man page:

Detect and dig holes. This makes the file sparse in-place, without using extra disk space. The minimum size of the hole depends on filesystem I/O block size (usually 4096 bytes).

Bold is my own emphasis. I think this means that not all deleted records would become a "sparse file hole", the record has to be at least 4096 bytes big. Or, two or more consecutive records would have to be deleted.

But we have a problem: because each record is <dataLength: UInt16LE><dataBuf: Arbitrary Bytes>, the dataBuf can be zeroed, but the dataLength is still going to be non-zero bytes. So maybe we should "merge" consecutive deleted records so that only the first record has dataLength to create a larger region of pure zero bytes and make it more likely that the holes are 4096+ in size.

cc @arj03

Other links:

arj03 · 2022-04-12T14:57:57Z

Oh right. There is also the block boundary in AAOC you can't get around. It would be interesting to write a small script to analyze your delete tests and calculate how many of the holes when combined are larger than 4k.

staltz · 2022-04-12T15:00:13Z

Yeah, I'm currently doing something like that. I took my log in production and I'm deleting 80% of the records. Then I'm going to make it a sparse file and see whether physicalSize / logicalSize is roughly 20%.

If it's not great, then I'll try to merge holes and see what happens.

staltz · 2022-04-12T15:34:10Z

Yeah, physicalSize / logicalSize turned out to be 99%, which is very bad, it means only 1% was gained in space efficiency, while we expected 80%. 😅

Will work on an experiment to merge holes.

staltz · 2022-04-13T10:24:04Z

Alright, I finished a proof of concept for "hole merging", and here are the results.

Deleted 80% of the records, merged all the consecutive holes together, and the physicalSize / logicalSize was 74%, better than 99%, but we expected it to be 20%.

I wish I could know/debug the sparse FS and discover the "min sparse hole size"...

arj03 · 2022-04-13T11:12:01Z

hmm yeah, I guess if it is votes, that you need quite a few of them in a row to make a sparse hole.

staltz · 2022-04-13T11:26:14Z

But I also opened up the log in a hex editor and I could see a lot of huge holes. 80% deleted records at random spots means that the remaining records are actually very well spaced apart.

staltz · 2022-04-13T11:44:22Z

I found a C program and I'm debugging how many sparse FS holes there are and where they start/end. https://codeberg.org/da/sparseseek

staltz · 2022-04-13T11:54:41Z

I think I know what's going on. Here are how the holes look like (hex address start – end):

Hole: 8000 -- 9000
Hole: a000 -- e000
Hole: 22000 -- 23000
Hole: 26000 -- 29000
Hole: 2b000 -- 2c000
Hole: 2e000 -- 2f000
Hole: 34000 -- 36000
Hole: 3a000 -- 3b000
...

Notice that a hole starts and ends at multiples of 0x1000. I took a look at the log in hex, and indeed there are lots of deleted regions that don't become sparse FS holes just because it falls short a bit (e.g. from 5000 to 5EEF)

staltz · 2022-04-13T12:10:29Z

I then did another experiment where all the first 40% records at the beginning of the log are deleted, i.e. all of those are consecutive, should make one huge hole.

The physicalSize / logicalSize was 62.4%, while we expected 60%. So I thought, where does this 2.4% come from?

The holes look like this:

Hole: 1000 -- 10000
Hole: 11000 -- 20000
Hole: 21000 -- 30000
Hole: 31000 -- 40000
Hole: 41000 -- 50000
Hole: 51000 -- 60000
...

Which means that there must be some data between 10000 and 11000, and yes there is, it's the dataLength at the start of a block. That's because we can't merge deletes across the block boundary.

staltz · 2022-04-13T12:38:23Z

I considered that maybe we could change "End of Block" marker to be 0xFFFF and use 0x0000 at the start of the block to mean that "this block is empty, please skip to the next block", but in the case of 80% deleted records randomly sprinkled around the log there are actually very few blocks (I couldn't find any) that are entirely empty, so it wouldn't help that case.

I guess I'm ready to conclude that sparse files would help us only in specific cases (where deleted records are heavily focused on a specific region of the log), but in the average case the sparse file approach yields us just 5%~25% space freed. I was hoping an order of magnitude space freed, so something like 90%.

arj03 · 2022-04-13T13:12:05Z

Yeah, sadly sometimes things don't work out exactly as planned. At least what you did was a good exploration of this idea, so that in the future we can safely say that this was tried and know how it works.

staltz self-assigned this Apr 12, 2022

staltz mentioned this issue Apr 12, 2022

Improve performance of isBufferZero #66

Open

staltz mentioned this issue Apr 13, 2022

merge deleted records #67

Closed

staltz removed their assignment Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse file #30

Sparse file #30

arj03 commented Mar 12, 2021

staltz commented Apr 12, 2022 •

edited

Loading

arj03 commented Apr 12, 2022

staltz commented Apr 12, 2022 •

edited

Loading

staltz commented Apr 12, 2022

staltz commented Apr 13, 2022

arj03 commented Apr 13, 2022

staltz commented Apr 13, 2022

staltz commented Apr 13, 2022 •

edited

Loading

staltz commented Apr 13, 2022

staltz commented Apr 13, 2022 •

edited

Loading

staltz commented Apr 13, 2022 •

edited

Loading

arj03 commented Apr 13, 2022

Sparse file #30

Sparse file #30

Comments

arj03 commented Mar 12, 2021

staltz commented Apr 12, 2022 • edited Loading

arj03 commented Apr 12, 2022

staltz commented Apr 12, 2022 • edited Loading

staltz commented Apr 12, 2022

staltz commented Apr 13, 2022

arj03 commented Apr 13, 2022

staltz commented Apr 13, 2022

staltz commented Apr 13, 2022 • edited Loading

staltz commented Apr 13, 2022

staltz commented Apr 13, 2022 • edited Loading

staltz commented Apr 13, 2022 • edited Loading

arj03 commented Apr 13, 2022

staltz commented Apr 12, 2022 •

edited

Loading

staltz commented Apr 12, 2022 •

edited

Loading

staltz commented Apr 13, 2022 •

edited

Loading

staltz commented Apr 13, 2022 •

edited

Loading

staltz commented Apr 13, 2022 •

edited

Loading