Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

determine how to access multiple stores within a single transaction #46

Closed
mykmelez opened this issue Jun 21, 2018 · 6 comments · Fixed by #62
Closed

determine how to access multiple stores within a single transaction #46

mykmelez opened this issue Jun 21, 2018 · 6 comments · Fixed by #62

Comments

@mykmelez
Copy link
Contributor

Over in #42, I noted that "it's unclear if/how it's possible to open multiple stores within a single transaction, which LMDB itself supports," and @ncloudioj responded:

The current rkv::Store abstraction doesn't support that because it wraps the transaction into the Reader/Writer. To support multi-store reads/writes, it needs to take the transaction out from store, perhaps something like,

let txn = rkv.write();
let store_foo = rkv.create_or_open("foo");
let store_bar = rkv.create_or_open("bar");
store_foo.write(txn, "key0", "value0");
store_bar.write(txn, "key1", "value1");
txn.commit();

The downside is that users can't use the Reader/Writer any more. Another potential approach, which reuses the design of Writer/Reader, is introduce a MultiStore so that multiple stores could be get_or_created at the same time in a single transaction. Its read/write API will be slightly different,

let store_names = vec!["foo", "bar", "baz"];
let mega_store = rkv.create_or_open(&store_names);
let writer = mega_store.write();
writer.write("foo", "key0", "value0"); // it takes a store name here
writer.write("bar", "key1", "value1");
writer.commit();

This is a tough problem. The latter approach feels a bit more intuitive and is also likely to be more compact, provided store names are short; whereas the former grows a line for each store involved in the transaction.

On the other hand, the former approach has the advantage of being more strongly typed, because stores are referenced by handle after creation, so it isn't possible to compile code that opens the "foo" and "bar" stores and then writes to the "baz" store; whereas the latter approach will happily compile that code (and then fail at runtime).

Also note #29, although #28 (comment) suggests that I didn't actually understand LMDB database handles when I filed it, and it's the wrong thing to do.

I'm also puzzling over the requirement that LMDB database handles be opened with reference to a specific transaction but can then be reused by any other transaction, as described in the docs for mdb_dbi_open, which additionally notes:

The database handle will be private to the current transaction until the transaction is successfully committed. If the transaction is aborted the handle will be closed automatically. After a successful commit the handle will reside in the shared environment, and may be used by other transactions.

This function must not be called from multiple concurrent transactions in the same process. A transaction that uses this function must finish (either commit or abort) before any other transaction in the process may use this function.

However, lmdb-rs appears to manage those constraints by acquiring a mutex and creating/committing a throwaway transaction in Environment::create_db, so that shouldn't be an issue.

Regardless, from browsing the LMDB docs, it seems like the intent is for handles to stores to be long-lived, so perhaps the former approach is better, even though it doesn't let you use Reader/Writer, as it requires you to explicitly create the handles, which also enables you to reuse them.

Or perhaps even better is a related approach in which Rkv::write returns a non-store-specific Writer rather than an lmdb::RwTransaction (ditto for Rkv::read; it returns a Reader), and it has a put method that takes a store handle rather than a store name, i.e. something like:

let store_foo = rkv.create_or_open("foo");
let store_bar = rkv.create_or_open("bar");
let writer = rkv.write();
writer.put(store_foo, "key0", "value0");
writer.put(store_bar, "key1", "value1");
writer.commit();
// store_foo and store_bar can be reused to read
let reader = rkv.read();
reader.get(store_foo, "key0");
reader.get(store_bar, "key1");

(This has the added advantage that we no longer return the low-level RoTransaction and RwTransaction lmdb-rs structs from rkv methods.)

@ncloudioj What do you think?

@ncloudioj
Copy link
Member

I like this idea. This essentially unifies the rkv's API no matter how many stores involved in the transactions. Plus, it also maintains the Reader/Writer abstraction on top of the raw transactions.

@ncloudioj
Copy link
Member

ncloudioj commented Jul 19, 2018

Had another thought on this while I was implementing this feature. Can we just add some extra APIs to support the multi-store read/write on top of the current Store::Reader/Writer? For example,

let store = rkv.create_or_open("foo");
let writer = store.write();
writer.put("key", "value"); // write to itself

let store_bar = kv.create_or_open("bar");
writer.put_to(&store_bar, "key", "value"); // write to an another store with the same writer
writer.commit();

// The same could apply to the reader as well. 

I believe this way is better than introduce another reader/writer (say multi-store reader/writer) by following reasons:

  1. Having reader/writer spawned from store is more intuitive than doing so from rkv, even though the latter is for multi-store reads/writes
  2. There is only one kind of reader/writer regardless of how many stores you want to access. We can document it as it allows you to create a reader/writer from any store, then attach to multiple stores on-demand with the same reader/writer
  3. Easier to implement, my first try of the implementation ended up with quite a bit duplicate code, after all they both just call the txn.get/set with the underlying transaction.
  4. Another benefit of augmenting the existing reader&writer is that we can even easily extend the iterators to the multi-store cases, for instance:
// assuming we've created two dupsort stores

let reader = store1.read(&k).unwrap();
let mut iter = reader.iter_start().unwrap();
while let Some((k, v)) = iter.next() {
    ...
}

// unfortunately, Rust doesn't support optional argument, otherwise we can even further simplify it.
let mut iter2 = reader.iter_start_from_store(&store2).unwrap();
while let Some((k, v)) = iter2.next() {
    ...
}

@mykmelez Thoughts?

@mykmelez
Copy link
Contributor Author

@ncloudioj While I was thinking about your latest proposal here and in #58, I realized that there's actually already an implementation of a multi-store reader in the codebase. The existing implementation of Store in readwrite.rs has this Store::get() method that takes an RoTransaction and a key:

pub fn get<'tx, T: Transaction>(&self, tx: &'tx T, k: K) -> Result<Option<Value<'tx>>, StoreError> {
    let bytes = tx.get(self.db, &k.as_ref());
    read_transform(bytes)
}

And many of the tests in env.rs actually use this implementation rather than calling Reader::get(), like this trio of Store::get() calls (in which r is the Reader, k is the Rkv, and sk is the Store):

let r = &k.read().unwrap();
assert_eq!(sk.get(r, "foo").expect("read"), None);
assert_eq!(sk.get(r, "bar").expect("read"), None);
assert_eq!(sk.get(r, "baz").expect("read"), None);

That leads me to think that your original suggestion of factoring the transaction out from the store would actually be the most natural way to support multi-store write, since it would be consistent with the existing implementation of multi-store read.

It should also be fairly straightforward to implement, with minimal duplication of code. We'd need to add a Store.put() method that takes a transaction, a key and a value; and perhaps we should also generalize the Store.get() method to take a generic Transaction rather than only a RoTransaction, so that you can call Store::get() with a RwTransaction just like you can call Writer::get().

Another way of thinking about this approach—and explaining it to API consumers—is that Store::get/put() is the general API for reading from and writing to arbitrary numbers of stores, which works in all cases; while Reader/Writer::get/put() is a simplification for the common case where you're only accessing a single store.

Of course we'd also want to add examples of the two APIs. And I might also refactor the existing tests to use single-store readers when reading from a single store, to make it more obvious that you can use the simpler API in those cases. (Or perhaps show both APIs for the single-store case and then explain how the more general API handles the multi-store case.)

What do you think?

@ncloudioj
Copy link
Member

Yep, I also noticed that API, and was gonna ask you about whether or not to get rid of it because apparently it overlaps with Store:Reader. If so, even further, shall we also consider hiding the raw lmdb::RoTransaction/RwTransaction behind the Store::Reader/Writer for the same reason? That being said, I'd suggest to expose only one API other than two, either Store::Reader/Writer or Store::get/put/delete would be fine. Let me try to make sense of my point as follows.

From the consumer's point of view, the common case is that they want to write and read from a single store, given two sets of APIs, they may wonder which one should be used? Any performance difference between them? They also need to be informed that not only should they avoid creating two RwTransactions in the same thread, but also not to interleave the use of RwTransactions and Writers, because they're essentially the same. To get that done right, it requires quite a bit clarification from us as well as quite a bit understanding from the consumers, which doesn't sound ergonomic in general.

Another potential problem of having two sets of APIs is for LMDB cursors, those iterators defined in Store::Reader is a great abstraction IMHO, if we were to provide the similar iterators for Store, it'd be not easy to avoid API bloat.

I'm more inclined to sticking with the current design of Store, Writer/Reader, and Iterator, and adding multi-store read/wrtie on top of that, but I could be convinced otherwise. I do have both implementations in my repo 😉. Waiting for your call @mykmelez

@mykmelez
Copy link
Contributor Author

Yep, I also noticed that API, and was gonna ask you about whether or not to get rid of it because apparently it overlaps with Store:Reader. If so, even further, shall we also consider hiding the raw lmdb::RoTransaction/RwTransaction behind the Store::Reader/Writer for the same reason?

Yes, we should hide lmdb::RoTransaction/RwTransaction, whether or not we get rid of that API; since rkv, as an abstraction over LMDB, should expose its own types for common operations.

That being said, I'd suggest to expose only one API other than two, either Store::Reader/Writer or Store::get/put/delete would be fine. Let me try to make sense of my point as follows.

I'm open to consolidating to a single API set; or at least being very cautious, deliberate, and intentional about API duplication.

From the consumer's point of view, the common case is that they want to write and read from a single store, given two sets of APIs, they may wonder which one should be used? Any performance difference between them? They also need to be informed that not only should they avoid creating two RwTransactions in the same thread, but also not to interleave the use of RwTransactions and Writers, because they're essentially the same. To get that done right, it requires quite a bit clarification from us as well as quite a bit understanding from the consumers, which doesn't sound ergonomic in general.

I agree that offering a choice of APIs adds complexity, and the existence of both RwTransaction and Writer is confusing. On the other hand, layering multi-store capabilities onto an API optimized for the single-store case also creates the potential for complexity and confusion.

I'm particularly concerned that methods like Store::write() -> Writer and Writer::put_to(Store) would lead to confusion over the relationship of Writers to Stores. I'm also concerned about methods like Reader::iter_from_in(Key, Store) being difficult to parse, since one has to associate both the "from" in the name with the Key parameter and the "in" in the name with the Store parameter.

Another potential problem of having two sets of APIs is for LMDB cursors, those iterators defined in Store::Reader is a great abstraction IMHO, if we were to provide the similar iterators for Store, it'd be not easy to avoid API bloat.

Indeed, it does result in API duplication to hang iterators off both Reader and Store. That can be reasonable, if consumers will expect the API in both places (f.e. the From and Into traits in std::convert). Nevertheless, I agree that we should be cautious about introducing API duplication.

I'm more inclined to sticking with the current design of Store, Writer/Reader, and Iterator, and adding multi-store read/wrtie on top of that, but I could be convinced otherwise. I do have both implementations in my repo 😉. Waiting for your call @mykmelez

I suspect that we'll run into fewer problems by exposing a more straightforward translation of the LMDB API—with added simplifications for common cases—than if we expose a simplified abstraction of the LMDB API and then add complexity for uncommon cases.

Thus I would maintain the relationship between the environment, transactions, and stores but wrap the lmdb::RwTransaction/RoTransaction types with rkv types like rkv::Writer/Reader.

The API model would then be either Store::get/put/delete-centric:

let store_foo: Store = rkv.create_or_open("foo");
let store_bar: Store = rkv.create_or_open("bar");
let writer: Writer = rkv.write();
store_foo.put(writer, "key0", "value0");
store_bar.put(writer, "key1", "value1");
writer.commit();
let reader: Reader = rkv.read();
store_foo.iter(&reader);
store_bar.iter_from(&reader, "key0");

Or it would be Rkv::Writer/Reader-centric:

let store_foo: Store = rkv.create_or_open("foo");
let store_bar: Store = rkv.create_or_open("bar");
let writer: Writer = rkv.write();
writer.put(&store_foo, "key0", "value0");
writer.put(&store_bar, "key1", "value1");
writer.commit();
let reader: Reader = rkv.read();
reader.iter(&store_foo);
reader.iter_from(&store_bar, "key0");

But it wouldn't be Store::Writer/Reader-centric, although we might decide to introduce simplifications for common cases, deciding on a case-by-case basis whether the benefits of a simpler—but redundant—API is worth the API/implementation complexity of having more than one way to do things.

For example, notwithstanding my earlier argument against hiding transactions, I can imagine eventually deciding to expose transactionless APIs for setting/getting values, for cases where the consumer knows that they're only going to set/get a single value at any one time:

let store_foo: Store = rkv.create_or_open("foo");
store_foo.put("key0", "value0");

(For this reason, and because it's more consistent with the underlying LMDB API, it probably makes more sense to employ the Rkv::Writer/Reader-centric approach for the general API.)

@ncloudioj
Copy link
Member

Sounds good, let's go with Rkv::Writer/Reader-centric then. Will polish my patch accordingly, and re-submit the PR soon.

Thanks for your insights!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants