Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow larger files #6

Open
killercup opened this issue Oct 22, 2018 · 2 comments

Comments

@killercup
Copy link
Owner

commented Oct 22, 2018

As of #1 and its implementation in #2, the u64 values that the index gives us is split into two: 32bit for the position in the archive, and 32bit to specify the length of the file. The means 4GB for the archive in general, and 4GB max size per file. Here are some ways to change these limits.

Increase addressable archive size bye enforce write alignment

If we align the start of each file written to the archive by 2^n bytes, we get n more bits to use for addressing. For example: Align files so their address ends with …0000 and we can shift all addresses by 4 bits, yielding 2^4=16 times the addressable archive size. This of course introduces zero-ed gaps in the archive files.

Using more bits for addressing

Instead of splitting the 64bit integer into two 32bit integers, we might as split it into, for example 40bit and 24bit -- shifting the limits to 1TB archives containing files up to 16MB. This should work very well for the rustdoc use case.

This can of course be combined with the alignment option described above, to yield (2^36)*(2^4)=1TB archive files containing 4 byte aligned files up to 2^28=268MB.

Patching fst to allow other value types

The fst docs mention that in the future, it should be possible to map to something other than a u64. We can make that future happen. This seems to most complicated and time-consuming of all the options, though :)


Please correct my math, it's late and I had a few beers.

@coder543

This comment has been minimized.

Copy link

commented Nov 12, 2018

There is another solution not mentioned here: simply store the address and length separately. When writing the file contents into the archive, prepend the contents with an 8 byte integer representing the length.

When fst hands you back the u64, let that u64 just represent the absolute byte position of the file content in the archive. Seek to that position in the file, read the 8 bytes representing the length, then read the content based on the length that you just extracted.

Edit: also, you have a typo, just letting you know!

size bye enforce

should be

size by enforcing

@killercup

This comment has been minimized.

Copy link
Owner Author

commented Nov 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.