Look into replacing `sled` #294

ShadowJonathan · 2021-07-14T15:06:26Z

heed is a high-level wrapper around LMDB, i think it'd be interesting to figure out if it has better performance, compatibility, and caching possibilities compared to sled (at the moment), and/or to allow these database backends to be swapped interchangeably.

The text was updated successfully, but these errors were encountered:

poljar · 2021-07-27T09:48:23Z

Such an abstraction exists:

matrix-rust-sdk/matrix_sdk_base/src/store/mod.rs

Line 91 in 3d734a1

pub trait StateStore: AsyncTraitDeps {

. You can implement a custom store and use it.

As for now, there aren't any show stoppers that would make me move away from sled and I don't have the bandwidth to support multiple store implementations.

jplatte · 2022-03-10T18:40:23Z

Other options mentioned on Matrix today:

~~BonsaiDb~~
Persy

Both written in Rust, so presumably don't need a C toolchain which could give us easy cross compilation once olm is also replaced by vodozemac (which should be pretty soon!).

poljar · 2022-03-10T18:45:19Z

Another one:

Sanakirja

Which has multi-process support, which might be relevant for the iOS use-case where currently two sync loops are going on. One for the notifications and one for the main process.

gnunicorn · 2022-03-17T17:48:24Z

note about BonsaiDB, from their docs:

However, the underlying storage layer that BonsaiDb is built upon, sled, does not support multiple processes writing its data simultaneously.

gnunicorn · 2022-05-18T08:07:46Z

maybe https://github.com/symmetree-labs/infinitree ?

gnunicorn · 2022-05-18T08:41:09Z

don't forget about https://github.com/paritytech/parity-db

bnjbvr · 2022-10-01T07:10:51Z

A new contender recently appeared: https://github.com/cberner/redb/

Pure Rust, claims almost as fast as LMDB in the README.md.

jplatte · 2022-10-01T09:29:51Z

Unfortunately it is mmap-based, so fundamentally incompatible with accessing the DB from multiple processes at the same time, which seems like is going to be a hard requirement for whatever we replace sled with.

gnunicorn · 2022-11-10T14:08:18Z

I stand corrected on Bonsai-DBb - since 0.1.0 it's been using nebari, not sled:

The underlying dependency on sled has been changed for an in-house storage implementation nebari.

It's not clear what that means for multi-process support though (I doubt it has that).

P-E-Meunier · 2022-12-10T13:32:30Z

I'm the author of Sanakirja and reached this thread while reading about news in Matrix. If you need any help let me know. See our benchmarks: https://pijul.org/posts/2021-02-06-rethinking-sanakirja/

jplatte · 2022-12-12T11:00:51Z

@P-E-Meunier Thanks for the offer! I have started work on a crypto-store based on Sanakirja, but it's not clear to me how I would use arbitrarily-sized (byte)strings in there, both for keys and values. I got the basic setup working and thought [u8] key / value would work, but then hit the 510 byte limit in impl UnsizedStorable for [u8]. I started looking into how Pijul uses sanakirja, but didn't get very far yet. The current state of my work can be found here.

P-E-Meunier · 2022-12-12T11:20:38Z

This isn't a hard limit, Sanakirja is a framework rather than a database, its main duty is to manage memory (including optional reference counting if you need to fork your datastructures) in an mmapped file, in a fully transactional way.

You can write many different datastructures on top of that, B trees are just one particular datastructure, but I've worked on others (ropes, for example). It mostly depends on what operations you want, but one workaround for the 510b limit is to create a new type of pages for storing large chunks of contiguous data, and reference these pages in the tree. The only tricky part is that the allocator now needs to return contiguous pages, but that isn't hard.

If you don't need your values to be contiguous in memory (or can afford to copy them rather than work with pointers), things will be easier. One significant advantage you'll gain if you can use the current allocator is that one can implement it on top of more exotic backends: Pijul uses the same code when working on-disk and in zstd-compressed files. You could also potentially use other databases (such as replication/HA tools).

Just tell me what you need, I may be able to implement it if it isn't too complicated.

I started looking into how Pijul uses sanakirja, but didn't get very far yet.

This is at the same time a good and a terrible idea: a good one since it is possibly the most "real-world" use of Sanakirja currently deployed, and a terrible one because at the time of starting Pijul, GATs hadn't even started in unstable Rust. Therefore, I decided to do the compiler's job of monomorphising manually, using macros. But then that gave me too much freedom to express type constraints that made "repos-in-zstd" possible, and these constraints can't be expressed in Rust's type system without the manual monomorphisation.

So, Pijul's way is also possibly the most convoluted way to use Sanakirja.

P-E-Meunier · 2022-12-12T11:23:00Z

Looking at your code, Pijul can handle concurrent readers and writers, you don't need to drop your reader before starting a writer. The exact constraint is much more flexible than this, but a first approximation is that you need to drop the readers before committing a writer.

jplatte · 2022-12-12T11:29:15Z

The biggest concern for us right now is to get something that works. I'm not at all experienced with low-level details of databases, nor do I want to optimize things a lot right now, except we need to be able to open the DB without immediately running out of the memory restrictions for background processes on iOS. We have sled as an existing backend which runs into those limits, which is one reason we are now spending more time on trying out replacements.

If you don't need your values to be contiguous in memory (or can afford to copy them rather than work with pointers), things will be easier.

I think we can totally afford to copy rather than working with pointers. I guess the alternative would be an io::Read implementing type, but due to serde-rs/json#160, that will be much slower, and I assume it won't be easier to implement either, right?

you don't need to drop your reader before starting a writer

But it reduces contention to drop the reader earlier, right?

P-E-Meunier · 2022-12-12T12:03:16Z

we need to be able to open the DB without immediately running out of the memory restrictions for background processes on iOS

One of my goals was actually to ensure that the database was always at least half full before growing the file, so this could work, depending on how iOS implements mmap (Sanakirja uses multiple maps to avoid this issue).

I guess the alternative would be an io::Read implementing type, but due to serde-rs/json#160, that will be much slower

That could be quite easier, actually, since it doesn't need a new version of the AllocPage trait. But I've wanted to implement contiguous allocations for a long time, so maybe the time has finally come!

But it reduces contention to drop the reader earlier, right?

The only hard contention is on the writers mutex. The readers count is an atomic variable, so depending on the architecture it might result in some contention, but probably not significant.

Avoid concurrency inside a thread would make your code easier to reason about though.

jplatte · 2022-12-12T12:08:29Z

I guess the alternative would be an io::Read implementing type, but due to serde-rs/json#160, that will be much slower

That could be quite easier, actually, since it doesn't need a new version of the AllocPage trait.

In that case, could you give me some hints as to how to get started with that?

P-E-Meunier · 2022-12-12T12:11:04Z

Well, let me try something first ;-) I believe this can be solved during the end of my lunch break.

P-E-Meunier · 2022-12-12T14:53:04Z

Ok, I just published the allocator (Sanakirja 1.3), although I don't think it will be very useful for now, it seems I'll have to think a bit more about the best way to use it, I tried various things without success.

The main issue is that if you're storing bytestrings of widely different sizes, you could end up wasting disk space, so I tried to mitigate against that, but didn't find a convincing way. I'll come back to it in the next few days.

jplatte · 2022-12-12T15:40:13Z

Sorry, I don't understand – what changed exactly?

P-E-Meunier · 2022-12-12T15:54:05Z

The AllocPage trait has a new method, alloc_contiguous, which MutTxn implements. I tested it on small examples, it seems to work ok. Now the only thing missing is the type of large values using that allocator.

jplatte · 2022-12-13T09:36:49Z

Now the only thing missing is the type of large values using that allocator.

So is that something I can do myself easily enough without understanding how sanakirja works in depth?

P-E-Meunier · 2022-12-13T09:40:39Z

No, but I looked at it again last night and I believe I found a way. I'll keep you posted.

P-E-Meunier · 2022-12-13T11:11:33Z

Alright, using Sanakirja 1.3.1 you can the following:

use sanakirja::*;

fn main() {
    let env = Env::new_anon(409600000, 1).unwrap();
    let mut txn = Env::mut_txn_begin(&env).unwrap();

    let mut db: btree::UDb<u64, Slice> = btree::create_db_(&mut txn).unwrap();
    btree::put(&mut txn, &mut db, &0, &(&b"blabal bilbli"[..]).into()).unwrap();

    let v = vec![b'a'; 5000];
    btree::put(&mut txn, &mut db, &1, &(&v[..]).into()).unwrap();

    txn.set_root(0, db.db);
    txn.commit().unwrap();

    let mut txn = Env::mut_txn_begin(&env).unwrap();
    let mut db: btree::UDb<u64, Slice> = btree::UDb::from_page(txn.root(0).unwrap());
    println!("{:?}", btree::get(&txn, &mut db, &0, None).unwrap().unwrap().1.as_bytes(&txn));
    let bb = btree::get(&txn, &mut db, &1, None).unwrap().unwrap().1.as_bytes(&txn).unwrap();
    println!("{:?} {:?}", bb.len(), bb.iter().all(|c| *c == b'a'));

    txn.set_root(0, db.db);
    txn.commit().unwrap();
}

Some words of caution:

Sanakirja is relatively low-level, as you can see. The main advantage is that you can store complex Rust types as raw bytes, and write well-typed wrappers that don't copy any byte.
In particular, you should have at least some understanding of what the pointers are. A Db, for example, is a pointer to an allocated page. If you don't retain it, you'll leak memory. If you store it twice (without forking it), you'll have a use-after-free.
The roots may store any kind of value, but usually
About storing this new Slice type, it points to another page if the value is large enough, and stores it on the page otherwise. In principle, btree::del takes care of deallocating the values.

If you're unsure, you can ask me. The Rust typesystem isn't sufficiently expressive to provide a safe interface to Sanakirja, which means that the rules aren't that simple.

jplatte · 2022-12-13T11:25:15Z

In particular, you should have at least some understanding of what the pointers are. A Db, for example, is a pointer to an allocated page. If you don't retain it, you'll leak memory. If you store it twice (without forking it), you'll have a use-after-free.

So you mean in the example above,

If the call txn.set_root(0, db.db); is removed, memory is leaked
If the call is kept but I also add txn.set_root(1, db.db); afterwards, that is UB

?

P-E-Meunier · 2022-12-13T11:58:26Z

Yes. About set_root, note that Sanakirja allocates memory atomically: all the allocations from a MutTxn are done atomically at commit time.

So, while using a Db from one MutTxn in another transaction might "work", I still recommend that you store it in the transaction (possibly in a root), and read it back in the other transaction, to benefit from these guarantees.

For example, if you follow this advice, and some failure happens after commit (my usual hypothesis is power failure), no memory will be leaked and no reference will be left hanging.

jplatte · 2022-12-13T12:18:34Z

Well, UB using functions that are not marked unsafe is not something I expected. Is it just set_root or are there other parts of the API that aren't actually safe?

P-E-Meunier · 2022-12-13T12:33:26Z

Actually, since Sanakirja manipulates pointers to memory not managed by Rust, nothing is really safe, and the borrow checker is unfortunately not sufficient to check everything. Other libraries suffer from this as well: for example in LMDB, mutably borrowing two different tables in LMDB is probably safe, but tables should borrow the transaction mutably to ensure the tables aren't reused after a commit.

But if you were using a simpler library before (i.e. not generic in the types), writing a safe wrapper should be really easy.

jplatte · 2022-12-13T14:18:33Z

In that case I don't have the confidence needed to continue with the sanakirja crypto store prototype. Long-term, we might be interested in building a higher-level safe API on top of a version of sanakirja with anything that can cause UB marked unsafe + documented safety preconditions, but right now we're probably going to try getting sqlite to work.

P-E-Meunier · 2022-12-13T14:22:10Z

No problem. If you know what you want, I can give you a safe wrapper. How many tables? Do you need nested datastructures (a Db where the keys and/or values are also Db)?

gnunicorn · 2023-03-14T09:46:31Z

came across surrealdb this morning...

seems to be using rocksdb for file-based-storage internally

jplatte · 2023-05-15T13:53:35Z

The default store backend is sqlite now, and we will likely kill sled entirely soon enough.

Hywan · 2023-09-20T06:51:44Z

And sled has been killed, RIP. I'm closing this issue :-).

tshepang · 2023-09-20T12:32:27Z

hm, there was a(n alpha) release just last week

poljar · 2023-09-20T12:40:48Z

Indeed there was. Though what @Hywan meant is that the sled based store implementation has been removed from the repo.

Nowadays we're having slightly different requirements so bringing the sled store back is very unlikely.

poljar changed the title ~~Looking into replacing sled with heed~~ Looking into replacing sled Mar 10, 2022

jplatte added the enhancement New feature or request label Apr 8, 2022

jplatte changed the title ~~Looking into replacing sled~~ Look into replacing sled Oct 4, 2022

BillCarsonFr added this to the Stabilize the crypto crate milestone Jan 9, 2023

Hywan closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Look into replacing `sled` #294

Look into replacing `sled` #294

ShadowJonathan commented Jul 14, 2021

poljar commented Jul 27, 2021

jplatte commented Mar 10, 2022 •

edited

poljar commented Mar 10, 2022

gnunicorn commented Mar 17, 2022

gnunicorn commented May 18, 2022

gnunicorn commented May 18, 2022

bnjbvr commented Oct 1, 2022

jplatte commented Oct 1, 2022

gnunicorn commented Nov 10, 2022

P-E-Meunier commented Dec 10, 2022 •

edited

jplatte commented Dec 12, 2022

P-E-Meunier commented Dec 12, 2022

P-E-Meunier commented Dec 12, 2022

jplatte commented Dec 12, 2022 •

edited

P-E-Meunier commented Dec 12, 2022

jplatte commented Dec 12, 2022

P-E-Meunier commented Dec 12, 2022

P-E-Meunier commented Dec 12, 2022

jplatte commented Dec 12, 2022

P-E-Meunier commented Dec 12, 2022

jplatte commented Dec 13, 2022

P-E-Meunier commented Dec 13, 2022

P-E-Meunier commented Dec 13, 2022 •

edited

jplatte commented Dec 13, 2022

P-E-Meunier commented Dec 13, 2022

jplatte commented Dec 13, 2022

P-E-Meunier commented Dec 13, 2022

jplatte commented Dec 13, 2022

P-E-Meunier commented Dec 13, 2022

gnunicorn commented Mar 14, 2023 •

edited

jplatte commented May 15, 2023

Hywan commented Sep 20, 2023

tshepang commented Sep 20, 2023

poljar commented Sep 20, 2023

Look into replacing sled #294

Look into replacing sled #294

Comments

ShadowJonathan commented Jul 14, 2021

poljar commented Jul 27, 2021

jplatte commented Mar 10, 2022 • edited

poljar commented Mar 10, 2022

gnunicorn commented Mar 17, 2022

gnunicorn commented May 18, 2022

gnunicorn commented May 18, 2022

bnjbvr commented Oct 1, 2022

jplatte commented Oct 1, 2022

gnunicorn commented Nov 10, 2022

P-E-Meunier commented Dec 10, 2022 • edited

jplatte commented Dec 12, 2022

P-E-Meunier commented Dec 12, 2022

P-E-Meunier commented Dec 12, 2022

jplatte commented Dec 12, 2022 • edited

P-E-Meunier commented Dec 12, 2022

jplatte commented Dec 12, 2022

P-E-Meunier commented Dec 12, 2022

P-E-Meunier commented Dec 12, 2022

jplatte commented Dec 12, 2022

P-E-Meunier commented Dec 12, 2022

jplatte commented Dec 13, 2022

P-E-Meunier commented Dec 13, 2022

P-E-Meunier commented Dec 13, 2022 • edited

jplatte commented Dec 13, 2022

P-E-Meunier commented Dec 13, 2022

jplatte commented Dec 13, 2022

P-E-Meunier commented Dec 13, 2022

jplatte commented Dec 13, 2022

P-E-Meunier commented Dec 13, 2022

gnunicorn commented Mar 14, 2023 • edited

jplatte commented May 15, 2023

Hywan commented Sep 20, 2023

tshepang commented Sep 20, 2023

poljar commented Sep 20, 2023

Look into replacing `sled` #294

Look into replacing `sled` #294

jplatte commented Mar 10, 2022 •

edited

P-E-Meunier commented Dec 10, 2022 •

edited

jplatte commented Dec 12, 2022 •

edited

P-E-Meunier commented Dec 13, 2022 •

edited

gnunicorn commented Mar 14, 2023 •

edited