-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for flatfs by default #55
Conversation
The IPFS project always used badger-v1, and we are using badger-v4 here. One motivation (apart from being a datastore that works very well imho), is that you can set TTL on values, and use that as GC, but we haven't tried. That GC strategy would be as bad as what is implemented now (list random keys and delete them), but maybe better (probably things that are expiring are mostly localized to the same tables as they must have been written around the same time). I think in general there's confusion how GC works in databases like badger, which need recompaction of sstables and depending how the data has been structured on disk may not be able to reclaim a lot of space depending on the settings, but that doesn't mean it's bad per se. Using flatfs has gotchas. I.e. you can shoot yourself in the foot very easily by using it on an ext4 filesystem. |
Co-authored-by: Adin Schmahmann <adin.schmahmann@gmail.com>
Co-authored-by: Adin Schmahmann <adin.schmahmann@gmail.com>
I think any datastore is always going to have pros and cons. But I also think that someone is far more likely to run out of disk before experiencing ext4-related issues. The overall goal here is to introduce production-ready-yet-safe defaults. If you spin up a rainbow instance today and immediately download 100 GB of content, triggering GC will do nothing. That's a big problem, and is what I'm trying to primarily solve for here. |
@acejam I don't have permission to push to your branch, but I added a commit that adds configurability for flatfs+badger, defaults to flatfs, and adds some docs in https://github.com/ipfs/rainbow/tree/feat/configurable-blockstores. I don't think that we're necessarily in a position to be saying one datastore is good/bad when AFAICT they all have tradeoffs and we need some time to explore what makes sense for rainbow. Having If you'd like to cherry pick my changes onto your branch or give me permission to push that's fine. If you're not convinced with this approach and want to discuss we can do that here or I can open an alternative PR (whatever works for you). |
Thanks @aschmahmann. This approach works for me. I cherry-picked your commit onto my branch. Also for clarity, I am not saying that badger itself is "bad" per se. My statement here is that it's current GC implementation does not work very well, and does not function as most would expect it to. These issues have been well documented over several years across both IPFS and dgraph related repos. If we're able to get badger GC working, I'm happy to revisit using badger, at least for my own use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will merge when CI passes and we can do any followups needed in other PRs.
The IPFS project has been historically plagued by BadgerDB-related GC issues. It's time to finally replace badger with flatfs.