Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Transactional Performance #251

Open
1 of 7 tasks
ecton opened this issue May 10, 2022 · 0 comments
Open
1 of 7 tasks

Improve Transactional Performance #251

ecton opened this issue May 10, 2022 · 0 comments
Labels
blocked This issue can't be resolved due to a dependency

Comments

@ecton
Copy link
Member

ecton commented May 10, 2022

Changes in Nebari v0.5.3 that fixed actual ACID compliance have slowed BonsaiDb significantly. Since then, I have been working on a new storage layer that will improve BonsaiDb's transactional performance. Here is an overview of the timeline:

  • May 2022: Discovered File::flush doesn't call fsync, and learned about tmpfs. The summary is that after calling the correct method for fsync to happen, design decisions I had made early on caused BonsaiDb/Nebari's transactional writes to be quite slow. For light applications, the speed would still have been perfectly acceptable, but under any significant write load, the database would become a bottleneck much quicker than PostgreSQL or SQLite would.

  • May 2022: Updated Nebari with new transaction batching. This change significantly improved performance, but there were still two fsync operations per transaction. The only way I could see improving things would be changing my approach to how data was stored.

  • June 2022: While trying to measure and understand various file synchronization mechanisms performance, I discovered that SQLite on MacOS isn't actually ACID compliant.

  • July 2022: I wrote an overview of my goals of Sediment, a storage layer I am planning on sitting below Nebari, which BonsaiDb uses for the underlying database implementation.

  • August 2022: I get Sediment to the point of benchmarking, and I feel pretty good about its overall performance relative to other embedded stores. However, while preparing a new blog post, I went and did the same benchmark against PostgreSQL and discovered that PostgreSQL outperformed them all. Why? It turns out Write-ahead logging is the fastest way to get incoming writes to disk.

  • September 2022: I wrote my own WAL implementation, inspired in-part by sharded-log. Because I again used a new benchmarking implementation, I lost track of the performance of PostgresSQL. I knew I was outperforming sharded-log in my particular benchmark suite, and it was that progress that made me start a new blog post to let anyone following the BonsaiDb blog know what was going on.

    While writing that post, I realized I needed to compare it against PostgreSQL. What I found shocked me -- PostgreSQLs much simpler single-writer-at-a-time WAL design outperformed my implementation and sharded-log significantly, even with a large number of threads all competing to write at the same time. I scrapped the blog post and began rewriting my implementation to be inspired by PostgreSQL instead.

  • October 2022: I finished my rewrite of OkayWAL, and I saw the mountain of work ahead of me to get everything tied back together. I was a bit burned out, and I needed a break.

  • December 2022: I began a rewrite of Sediment due to changing some of my goals with the format. Because the integration of a WAL made Sediment no longer a single-file database, I decided to simplify how Sediment works by utilizing multiple files.

  • January 2023: I published the first release of OkayWAL, and wrote a blog post introducing it. I also completed the Sediment rewrite that utilizes OkayWAL, and I'm very proud to have reached nearly 95% line coverage. And even better news, the performance is looking promising.

This ticket began as a look into improving the view indexing system, but because of the storage rewrite, it has morphed into an overall combined "refactor all the storage changes needed into one big release" ticket. The current to-do list includes:

The net result of these changes should look like this:

  • BonsaiDb's transactional insert performance will be competitive with PostgreSQL, thanks to utilizing a similarly-designed WAL.
  • BonsaiDb's databases will no longer require a compact operation to reclaim unused disk space, thanks to Nebari's new underlying format Sediment.
  • BonsaiDb's view system will require fewer internal trees and offer more efficient reduce operations.
  • BonsaiDb's document storage will be designed more efficient storage and much faster revision history retrieval.
  • BonsaiDb will be better equipped if the storage format needs to change in the future again.
@ecton ecton added the blocked This issue can't be resolved due to a dependency label May 10, 2022
@ecton ecton added this to the v0.5.0 milestone May 10, 2022
@ecton ecton changed the title Road to v0.5.0 Road to Faster Transactional Performance Feb 27, 2023
@ecton ecton modified the milestones: v0.5.0, v1.0, Performance Refactor Feb 28, 2023
@ecton ecton changed the title Road to Faster Transactional Performance Improve Transactional Performance Feb 28, 2023
@ecton ecton pinned this issue Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked This issue can't be resolved due to a dependency
Projects
Status: Blocked
Development

No branches or pull requests

1 participant