GitHub - robaho/cpp_leveldb: A port of the ultra-high-performance robaho key-value database using modern C++.

summary

This is a C++ port of my ultra low-latency keydb key/value database.

background

The main motivation for this project was to familiarize myself with current C++ development, specifically C++20.

The project is a bit of an "inception", in that, my original keydb was written in Go, then I did a Java port, then I discovered the Google leveldb project - which is written in C++. I changed the keydb api to mimic the Google leveldb api - and renamed them leveldb and jleveldb.

Now with this C++ port, I have written the same library from scratch in three languages (Java code,Go code)- along with an independent version developed by Google. My library focus was on api simplicity, maintainability and performance. The three versions are written using the same design, but are idiomatic within the language.

leveldb only supports multi-threaded access. There is a client/server module to expose cpp_leveldb for multi-process access available at cpp_leveldbr

performance

Google's leveldb includes a benchmarking tool dbbench. I have replicated this in each of the libraries. All tests are run on the same Intel iMac under OSX. All timings are in microseconds / operation. The methodology is that for each platform, db_bench was run multiple times, and the lowest timing for each task was recorded. Because of the short duration of the test, and the tasks themselves, they are subject to OS background interference.

Test	Go	Go PGO²	Java	Java GraalVM¹	C++	C++ PGO²	Google
write no-sync	4.25	4.23	4.86	5.51	4.65	4.36	4.60
write sync³	45	46	46	50	10822	10888	10023
write batch	1.07	1.06	1.06	1.72	1.19	1.02	1.94
write overwrite	4.37	4.30	4.65	5.60	4.65	4.43	8.20
read random	2.44	2.49	3.79	8.07	5.93	4.61	4.74
read sequential	0.36	0.31	0.35	0.51	0.53	0.42	0.15
read random compact	2.44	2.40	3.62	7.55	3.71	2.68	2.13
read sequential compact	0.09	0.09	0.07	0.14	0.06	0.04	0.12

¹ Profile guided optimizations are not enabled since they require the enterprise version.

² Compiled with profile guided optimizations.

³ Both C++ and Google versions use FULLFSYNC on OSX for improved safety.

learnings

Modern C++ is a joy in many ways but it isn't perfect.

pros

Approaches garbage collection memory safety by using shared_ptr, unique_ptr, etc.
Tools like ASan make tracking down memory issues/leaks fairly straightforward.
Many easy to use profilers available. I used Samply during development.
Many modern debuggers and IDEs available - supporting syntax highlighting, code completion and popup api documentation.
Excellent support for unit tests using facilities like Boost.
Performance can be outstanding.

cons

shared_ptr is slow, making it only suitable for top-level objects, then you must use raw pointers internally.
Dynamic memory management (e.g. malloc/free) is very slow. This necessitates "manual" memory management, e.g. buffer re-use, arena allocators, etc. The initial "clean" C++ version was nearly 3x slower than the final - much, much slower than the Go or Java versions. In order to achieve the desired performance, you end up creating "restrictive/bug-prone" apis. For instance, the Google API docs state:

Caller should delete the iterator when it is no longer needed. The returned iterator should be deleted before this db is deleted.

and

Return the value for the current entry. The underlying storage for the returned slice is valid only until the next modification of the iterator.

Which works, but it does make the api more fragile (i.e. error prone). Often the inefficiency is simply pushed farther (e.g. you can't store the returned iterator value in a map without a copy).
Compilation times are extremely slow, especially when using higher optimization levels. Since C++ does not use a context-free grammar everything must be recompiled almost always.
Writing lock-free data structures is much easier in a GC language due to the ABA problem. std::atomic helps but using a well-tested library like Boost is probably safer. Still, I was able to implement the concurrent skip list in a lock-free manner.
Cryptic error messages. Missing a single character can lead to a 100 line error across multiple files. Same underlying cause as the slow compile times.
Having to work with both header and implementation files is a pain.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
include		include
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
bulkwrite_test.cpp		bulkwrite_test.cpp
compile_flags.txt		compile_flags.txt
cpptest.cpp		cpptest.cpp
database.cpp		database.cpp
database_test.cpp		database_test.cpp
databaseops.cpp		databaseops.cpp
dbbench_test.cpp		dbbench_test.cpp
default.profdata		default.profdata
diskio.cpp		diskio.cpp
disksegment.cpp		disksegment.cpp
disksegment_test.cpp		disksegment_test.cpp
lockfile_test.cpp		lockfile_test.cpp
logfile.cpp		logfile.cpp
logfile_test.cpp		logfile_test.cpp
memorysegment.cpp		memorysegment.cpp
memorysegment_test.cpp		memorysegment_test.cpp
merger.cpp		merger.cpp
merger_test.cpp		merger_test.cpp
multisegment_test.cpp		multisegment_test.cpp
profile.json		profile.json
randomread_test.cpp		randomread_test.cpp
seqread_test.cpp		seqread_test.cpp
skiplist_main.cpp		skiplist_main.cpp
skiplist_test.cpp		skiplist_test.cpp
snapshot.cpp		snapshot.cpp
snapshot_test.cpp		snapshot_test.cpp
test_helpers.h		test_helpers.h
testfile		testfile
timings.txt		timings.txt
waitgroup_test.cpp		waitgroup_test.cpp
writebatch_test.cpp		writebatch_test.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

summary

background

performance

learnings

pros

cons

About

Releases

Packages

Languages

robaho/cpp_leveldb

Folders and files

Latest commit

History

Repository files navigation

summary

background

performance

learnings

pros

cons

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages