Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for flatfs by default #55

Merged
merged 6 commits into from
Dec 22, 2023
Merged

Conversation

acejam
Copy link
Contributor

@acejam acejam commented Dec 21, 2023

The IPFS project has been historically plagued by BadgerDB-related GC issues. It's time to finally replace badger with flatfs.

README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
setup.go Outdated Show resolved Hide resolved
@hsanjuan
Copy link
Contributor

The IPFS project has been historically plagued by BadgerDB-related GC issues

The IPFS project always used badger-v1, and we are using badger-v4 here. One motivation (apart from being a datastore that works very well imho), is that you can set TTL on values, and use that as GC, but we haven't tried. That GC strategy would be as bad as what is implemented now (list random keys and delete them), but maybe better (probably things that are expiring are mostly localized to the same tables as they must have been written around the same time).

I think in general there's confusion how GC works in databases like badger, which need recompaction of sstables and depending how the data has been structured on disk may not be able to reclaim a lot of space depending on the settings, but that doesn't mean it's bad per se. Using flatfs has gotchas. I.e. you can shoot yourself in the foot very easily by using it on an ext4 filesystem.

acejam and others added 2 commits December 21, 2023 16:31
Co-authored-by: Adin Schmahmann <adin.schmahmann@gmail.com>
Co-authored-by: Adin Schmahmann <adin.schmahmann@gmail.com>
@acejam
Copy link
Contributor Author

acejam commented Dec 21, 2023

I think in general there's confusion how GC works in databases like badger, which need recompaction of sstables and depending how the data has been structured on disk may not be able to reclaim a lot of space depending on the settings, but that doesn't mean it's bad per se. Using flatfs has gotchas. I.e. you can shoot yourself in the foot very easily by using it on an ext4 filesystem.

I think any datastore is always going to have pros and cons. But I also think that someone is far more likely to run out of disk before experiencing ext4-related issues.

The overall goal here is to introduce production-ready-yet-safe defaults. If you spin up a rainbow instance today and immediately download 100 GB of content, triggering GC will do nothing. That's a big problem, and is what I'm trying to primarily solve for here.

@aschmahmann
Copy link
Contributor

@acejam I don't have permission to push to your branch, but I added a commit that adds configurability for flatfs+badger, defaults to flatfs, and adds some docs in https://github.com/ipfs/rainbow/tree/feat/configurable-blockstores.

I don't think that we're necessarily in a position to be saying one datastore is good/bad when AFAICT they all have tradeoffs and we need some time to explore what makes sense for rainbow. Having FlatFS be the default for now seems like a safe+reasonable option for now though.

If you'd like to cherry pick my changes onto your branch or give me permission to push that's fine. If you're not convinced with this approach and want to discuss we can do that here or I can open an alternative PR (whatever works for you).

@acejam
Copy link
Contributor Author

acejam commented Dec 22, 2023

@acejam I don't have permission to push to your branch, but I added a commit that adds configurability for flatfs+badger, defaults to flatfs, and adds some docs in https://github.com/ipfs/rainbow/tree/feat/configurable-blockstores.

I don't think that we're necessarily in a position to be saying one datastore is good/bad when AFAICT they all have tradeoffs and we need some time to explore what makes sense for rainbow. Having FlatFS be the default for now seems like a safe+reasonable option for now though.

If you'd like to cherry pick my changes onto your branch or give me permission to push that's fine. If you're not convinced with this approach and want to discuss we can do that here or I can open an alternative PR (whatever works for you).

Thanks @aschmahmann. This approach works for me. I cherry-picked your commit onto my branch.

Also for clarity, I am not saying that badger itself is "bad" per se. My statement here is that it's current GC implementation does not work very well, and does not function as most would expect it to. These issues have been well documented over several years across both IPFS and dgraph related repos.

If we're able to get badger GC working, I'm happy to revisit using badger, at least for my own use cases.

@acejam acejam changed the title Replace badger with flatfs Add support for flatfs by default Dec 22, 2023
Copy link
Contributor

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge when CI passes and we can do any followups needed in other PRs.

@aschmahmann aschmahmann merged commit 5067f27 into ipfs:main Dec 22, 2023
10 checks passed
@acejam acejam deleted the flatfs-v2 branch December 23, 2023 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants