-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPFS Repo - BlockStore directory sharding #41
Comments
0.4.0 should respect whatever is defined on the spec and if the spec is missing something, now is a good time to add it, so that we can all trust the spec to create our implementations :) @whyrusleeping might have some thoughts though :) |
In the Clojure blocks library the |
oh! that's relevant to my interests @greglook. |
flatfs in go-ipfs ended up deviating from the spec. its my fault for not pressing for it -- or not updating the spec. i think we can migrate go-ipfs to follow the 3 tiers. (ls-ing huge directories is annoying anyway. not all filesystems are good) |
@tv42 was there a strong reason you opted for single tiering instead of a larger fanout? i recall you mentioning it might be slower (more dirs to traverse?), but this likely varies by fs? |
For filesystem-based stores with good performance you'll probably want to use something like Camlistore's diskpacked store (or the logical version, blobpacked) anyway. |
@jbenet It really comes down to this: single level split is easier to understand and easier to program. Typical Linux modern FS performs well all the way up to hundreds of thousands or even a few million files in one dir, and the current setting of 4-byte fanout (actual amount of entropy depending on whether the slash prefix actually gets stripped or not, that's been changed by others enough that I'm no longer clear on that) oughta work well enough up to tens of terabytes that something else becomes a problem first. I made the split size configurable, so people can fiddle with that, if needed. Multi-level split only makes sense if the one-level split would result in the top level dir containing too many entries (once again, I'd expect >> 100k). By that time I expect you'll have other problems; then each directory ought to contain at least 100k items in it, leading to the total storage being easily over 100k * 100k * 256kB = 2PB. Personally, I've seen affordable large storage only work well in JBOD mode, so the amount of data in a single flatfs is likely in the low tens of TB anyway for the near future. That's < 100M objects tops at 256kB per object, even a 256-way split brings that to ~400k objects per dir, which I expect to show no significant deterioration in performance (= even that split is wide enough). You'll suffer more from things like FS inode record keeping overhead than from the fact that it's a single level sharding. @greglook That's yet another variation of what I've been calling arena storage. The decision to go with flatfs was because of the combo 1) it's simple 2) we can get it going fast. I agree arena storage can smoke it in performance, mostly because done right, it can manage disk syncs better. |
As for the spec:
|
See ipfs/specs#41 for context.
I'm trying to implement
fs-repo
for js and I'm not really sure the spec reflects the current go implementation. This is how the blocks are stored right now in my machine (0.3.8-dev
).While in the spec directory partitioning looks like:
The text was updated successfully, but these errors were encountered: