Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

Replace LevelDB with BadgerDB #2141

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Replace LevelDB with BadgerDB #2141

wants to merge 2 commits into from

Conversation

jmozah
Copy link
Collaborator

@jmozah jmozah commented Mar 23, 2020

This PR was created to see if badger is a viable alternative to leveldb and forky. Benchmarks show that it performs at par with current forky implementation if the "SyncWrites" flag is set to false (which is how forky is implemented).

This PR replaces the entire DB, which means both the chunk store and the state store are replaced with badgerDB. The flag "ValueThreshold" is set to 1K, that means any value which is less then 1K will be stored with the key (which is good for state store and indexes). Chunk store values which are 4K will be stored in a separate valueLog just like the forky design.

Right now badger 1.6 version is used as there is a regression in 2.0 version which make read expensive. Once this is fixed we ca start using version 2.0. Version 2.o also has a mem only option which can be used for testing.

@jmozah jmozah requested review from janos, zelig and acud March 23, 2020 08:20
@jmozah jmozah self-assigned this Mar 23, 2020
@jmozah jmozah changed the title Badger Replace LevelDB with BadgerDB Mar 23, 2020
@jmozah jmozah changed the title Replace LevelDB with BadgerDB Replace LevelDB/Forky with BadgerDB Mar 23, 2020
@janos
Copy link
Member

janos commented Mar 23, 2020

Thanks @jmozah. This PR has nothing to do with the Forky (fcds) it replaces leveldb completely. I thought that the goal would be to replace fcds.Store, at least to measure and compare performance and scalability on multiple cores. Could you provide some performance comparisons between fcds-teenage-mutants (or fcds) branch and this branch?

Have in mind that these changes completely removes everything done in forky, not just fcds.Store file storing approach. New migrations (manual or automatic) for both localstore and all state stores have to be implemented.

@janos janos changed the title Replace LevelDB/Forky with BadgerDB Replace LevelDBwith BadgerDB Mar 23, 2020
@janos janos changed the title Replace LevelDBwith BadgerDB Replace LevelDB with BadgerDB Mar 23, 2020
@jmozah
Copy link
Collaborator Author

jmozah commented Mar 23, 2020

@janos There are two commits in this PR. The first one has badger only in fcds. The second one replaces the entire leveldb with badger. I reasoning to go for the second was to avoid having two DB implementations. For that matter i am also not very sure which is best now .. given that we might want to move to bee.

What do you suggest, i just use badger for fcds and benchmark against the forky branch? is that enough?

@janos
Copy link
Member

janos commented Mar 23, 2020

What do you suggest, i just use badger for fcds and benchmark against the forky branch? is that enough?

My impression was that fcds benchmarks will be done first to measure the difference between using plain files and badger for chunk data store.

It would be great to have badger for everything, but that is a greater area to cover for validation and justification for such large PR. Beside that, a reason to use an older 1.6 version and not the latest stable 2.0 is a bit worrying.

@jmozah
Copy link
Collaborator Author

jmozah commented Mar 24, 2020

My impression was that fcds benchmarks will be done first to measure the difference between using plain files and badger for chunk data store.

Ok. WIll create a new PR with only that and will benchmark that.

Beside that, a reason to use an older 1.6 version and not the latest stable 2.0 is a bit worrying.

Why?. They have a bug in the new version. it was reported only few days back and they are fixing it. There is even a patch for that. Unil a minor release comes out i decided to use 1.6 for benchmark.

@janos
Copy link
Member

janos commented Mar 24, 2020

Why?. They have a bug in the new version. it was reported only few days back and they are fixing it. There is even a patch for that. Unil a minor release comes out i decided to use 1.6 for benchmark.

Because it would be very nice from you to provide more information, like an issue you just mentioned and what is the expected resolution. This would be very helpful for reviewers.

@jmozah
Copy link
Collaborator Author

jmozah commented Mar 25, 2020

like an issue you just mentioned and what is the expected resolution.

This is the issues which is affecting version 2.0 in badger.
dgraph-io/badger#1248 (fixed)
dgraph-io/badger#1254 (not fixed yet)

Thats why i used a older release.
BTW 2.0 is supposed to be faster than 1.6. and have lot more features like compression etc.

@janos
Copy link
Member

janos commented Mar 25, 2020

Thanks. We should watch for the remaining issue.

@jarifibrahim
Copy link

jarifibrahim commented Mar 26, 2020

Hey @jmozah @janos, I'm the maintainer of Badger DB. The read performance issue has been fixed in master and we will be releasing v2.0.3 soon (it was supposed to be released yesterday). As for the issue dgraph-io/badger#1254 , it is open because the user is seeing some bug in the vlog file generation and we haven't been able to reproduce the failure.

Feel free to raise issues on badger if you find any other performance issues.

Thank you for using BadgerDB :)

@jmozah
Copy link
Collaborator Author

jmozah commented Mar 27, 2020

@jarifibrahim Thanks for the information. We are keeping a watch on the bug and will upgrade to 2.0.x as soon as the issue is fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants