Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse file #30

Open
arj03 opened this issue Mar 12, 2021 · 12 comments
Open

Sparse file #30

arj03 opened this issue Mar 12, 2021 · 12 comments

Comments

@arj03
Copy link
Member

arj03 commented Mar 12, 2021

Consider how we can use sparse file support in the OS to better support delete.

@staltz staltz self-assigned this Apr 12, 2022
@staltz
Copy link
Member

staltz commented Apr 12, 2022

I'm studying the viability of making the log sparse, and here's what I learned:

From the fallocate (Linux command which among other things can convert a file to sparse file) man page:

Detect and dig holes. This makes the file sparse in-place, without using extra disk space. The minimum size of the hole depends on filesystem I/O block size (usually 4096 bytes).

Bold is my own emphasis. I think this means that not all deleted records would become a "sparse file hole", the record has to be at least 4096 bytes big. Or, two or more consecutive records would have to be deleted.

But we have a problem: because each record is <dataLength: UInt16LE><dataBuf: Arbitrary Bytes>, the dataBuf can be zeroed, but the dataLength is still going to be non-zero bytes. So maybe we should "merge" consecutive deleted records so that only the first record has dataLength to create a larger region of pure zero bytes and make it more likely that the holes are 4096+ in size.

cc @arj03

Other links:

@arj03
Copy link
Member Author

arj03 commented Apr 12, 2022

Oh right. There is also the block boundary in AAOC you can't get around. It would be interesting to write a small script to analyze your delete tests and calculate how many of the holes when combined are larger than 4k.

@staltz
Copy link
Member

staltz commented Apr 12, 2022

Yeah, I'm currently doing something like that. I took my log in production and I'm deleting 80% of the records. Then I'm going to make it a sparse file and see whether physicalSize / logicalSize is roughly 20%.

If it's not great, then I'll try to merge holes and see what happens.

@staltz
Copy link
Member

staltz commented Apr 12, 2022

Yeah, physicalSize / logicalSize turned out to be 99%, which is very bad, it means only 1% was gained in space efficiency, while we expected 80%. 😅

Will work on an experiment to merge holes.

@staltz
Copy link
Member

staltz commented Apr 13, 2022

Alright, I finished a proof of concept for "hole merging", and here are the results.

Deleted 80% of the records, merged all the consecutive holes together, and the physicalSize / logicalSize was 74%, better than 99%, but we expected it to be 20%.

I wish I could know/debug the sparse FS and discover the "min sparse hole size"...

@arj03
Copy link
Member Author

arj03 commented Apr 13, 2022

hmm yeah, I guess if it is votes, that you need quite a few of them in a row to make a sparse hole.

@staltz
Copy link
Member

staltz commented Apr 13, 2022

But I also opened up the log in a hex editor and I could see a lot of huge holes. 80% deleted records at random spots means that the remaining records are actually very well spaced apart.

@staltz
Copy link
Member

staltz commented Apr 13, 2022

I found a C program and I'm debugging how many sparse FS holes there are and where they start/end. https://codeberg.org/da/sparseseek

@staltz
Copy link
Member

staltz commented Apr 13, 2022

I think I know what's going on. Here are how the holes look like (hex address startend):

Hole: 8000 -- 9000
Hole: a000 -- e000
Hole: 22000 -- 23000
Hole: 26000 -- 29000
Hole: 2b000 -- 2c000
Hole: 2e000 -- 2f000
Hole: 34000 -- 36000
Hole: 3a000 -- 3b000
...

Notice that a hole starts and ends at multiples of 0x1000. I took a look at the log in hex, and indeed there are lots of deleted regions that don't become sparse FS holes just because it falls short a bit (e.g. from 5000 to 5EEF)

@staltz
Copy link
Member

staltz commented Apr 13, 2022

I then did another experiment where all the first 40% records at the beginning of the log are deleted, i.e. all of those are consecutive, should make one huge hole.

The physicalSize / logicalSize was 62.4%, while we expected 60%. So I thought, where does this 2.4% come from?

The holes look like this:

Hole: 1000 -- 10000
Hole: 11000 -- 20000
Hole: 21000 -- 30000
Hole: 31000 -- 40000
Hole: 41000 -- 50000
Hole: 51000 -- 60000
...

Which means that there must be some data between 10000 and 11000, and yes there is, it's the dataLength at the start of a block. That's because we can't merge deletes across the block boundary.

@staltz
Copy link
Member

staltz commented Apr 13, 2022

I considered that maybe we could change "End of Block" marker to be 0xFFFF and use 0x0000 at the start of the block to mean that "this block is empty, please skip to the next block", but in the case of 80% deleted records randomly sprinkled around the log there are actually very few blocks (I couldn't find any) that are entirely empty, so it wouldn't help that case.

I guess I'm ready to conclude that sparse files would help us only in specific cases (where deleted records are heavily focused on a specific region of the log), but in the average case the sparse file approach yields us just 5%~25% space freed. I was hoping an order of magnitude space freed, so something like 90%.

@arj03
Copy link
Member Author

arj03 commented Apr 13, 2022

Yeah, sadly sometimes things don't work out exactly as planned. At least what you did was a good exploration of this idea, so that in the future we can safely say that this was tried and know how it works.

@staltz staltz removed their assignment Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants