Description
As of #1 and its implementation in #2, the u64
values that the index gives us is split into two: 32bit for the position in the archive, and 32bit to specify the length of the file. The means 4GB for the archive in general, and 4GB max size per file. Here are some ways to change these limits.
Increase addressable archive size bye enforce write alignment
If we align the start of each file written to the archive by 2^n
bytes, we get n
more bits to use for addressing. For example: Align files so their address ends with …0000
and we can shift all addresses by 4 bits, yielding 2^4=16 times the addressable archive size. This of course introduces zero-ed gaps in the archive files.
Using more bits for addressing
Instead of splitting the 64bit integer into two 32bit integers, we might as split it into, for example 40bit and 24bit -- shifting the limits to 1TB archives containing files up to 16MB. This should work very well for the rustdoc use case.
This can of course be combined with the alignment option described above, to yield (2^36)*(2^4)=1TB archive files containing 4 byte aligned files up to 2^28=268MB.
Patching fst to allow other value types
The fst docs mention that in the future, it should be possible to map to something other than a u64
. We can make that future happen. This seems to most complicated and time-consuming of all the options, though :)
Please correct my math, it's late and I had a few beers.