-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
determine how to access multiple stores within a single transaction #46
Comments
I like this idea. This essentially unifies the rkv's API no matter how many stores involved in the transactions. Plus, it also maintains the Reader/Writer abstraction on top of the raw transactions. |
Had another thought on this while I was implementing this feature. Can we just add some extra APIs to support the multi-store read/write on top of the current let store = rkv.create_or_open("foo");
let writer = store.write();
writer.put("key", "value"); // write to itself
let store_bar = kv.create_or_open("bar");
writer.put_to(&store_bar, "key", "value"); // write to an another store with the same writer
writer.commit();
// The same could apply to the reader as well. I believe this way is better than introduce another reader/writer (say multi-store reader/writer) by following reasons:
// assuming we've created two dupsort stores
let reader = store1.read(&k).unwrap();
let mut iter = reader.iter_start().unwrap();
while let Some((k, v)) = iter.next() {
...
}
// unfortunately, Rust doesn't support optional argument, otherwise we can even further simplify it.
let mut iter2 = reader.iter_start_from_store(&store2).unwrap();
while let Some((k, v)) = iter2.next() {
...
} @mykmelez Thoughts? |
@ncloudioj While I was thinking about your latest proposal here and in #58, I realized that there's actually already an implementation of a multi-store reader in the codebase. The existing implementation of Store in readwrite.rs has this Store::get() method that takes an RoTransaction and a key:
And many of the tests in env.rs actually use this implementation rather than calling Reader::get(), like this trio of Store::get() calls (in which
That leads me to think that your original suggestion of factoring the transaction out from the store would actually be the most natural way to support multi-store write, since it would be consistent with the existing implementation of multi-store read. It should also be fairly straightforward to implement, with minimal duplication of code. We'd need to add a Store.put() method that takes a transaction, a key and a value; and perhaps we should also generalize the Store.get() method to take a generic Transaction rather than only a RoTransaction, so that you can call Store::get() with a RwTransaction just like you can call Writer::get(). Another way of thinking about this approach—and explaining it to API consumers—is that Store::get/put() is the general API for reading from and writing to arbitrary numbers of stores, which works in all cases; while Reader/Writer::get/put() is a simplification for the common case where you're only accessing a single store. Of course we'd also want to add examples of the two APIs. And I might also refactor the existing tests to use single-store readers when reading from a single store, to make it more obvious that you can use the simpler API in those cases. (Or perhaps show both APIs for the single-store case and then explain how the more general API handles the multi-store case.) What do you think? |
Yep, I also noticed that API, and was gonna ask you about whether or not to get rid of it because apparently it overlaps with From the consumer's point of view, the common case is that they want to write and read from a single store, given two sets of APIs, they may wonder which one should be used? Any performance difference between them? They also need to be informed that not only should they avoid creating two RwTransactions in the same thread, but also not to interleave the use of RwTransactions and Writers, because they're essentially the same. To get that done right, it requires quite a bit clarification from us as well as quite a bit understanding from the consumers, which doesn't sound ergonomic in general. Another potential problem of having two sets of APIs is for LMDB cursors, those iterators defined in I'm more inclined to sticking with the current design of |
Yes, we should hide lmdb::RoTransaction/RwTransaction, whether or not we get rid of that API; since rkv, as an abstraction over LMDB, should expose its own types for common operations.
I'm open to consolidating to a single API set; or at least being very cautious, deliberate, and intentional about API duplication.
I agree that offering a choice of APIs adds complexity, and the existence of both RwTransaction and Writer is confusing. On the other hand, layering multi-store capabilities onto an API optimized for the single-store case also creates the potential for complexity and confusion. I'm particularly concerned that methods like Store::write() -> Writer and Writer::put_to(Store) would lead to confusion over the relationship of Writers to Stores. I'm also concerned about methods like Reader::iter_from_in(Key, Store) being difficult to parse, since one has to associate both the "from" in the name with the Key parameter and the "in" in the name with the Store parameter.
Indeed, it does result in API duplication to hang iterators off both Reader and Store. That can be reasonable, if consumers will expect the API in both places (f.e. the From and Into traits in std::convert). Nevertheless, I agree that we should be cautious about introducing API duplication.
I suspect that we'll run into fewer problems by exposing a more straightforward translation of the LMDB API—with added simplifications for common cases—than if we expose a simplified abstraction of the LMDB API and then add complexity for uncommon cases. Thus I would maintain the relationship between the environment, transactions, and stores but wrap the lmdb::RwTransaction/RoTransaction types with rkv types like rkv::Writer/Reader. The API model would then be either Store::get/put/delete-centric: let store_foo: Store = rkv.create_or_open("foo");
let store_bar: Store = rkv.create_or_open("bar");
let writer: Writer = rkv.write();
store_foo.put(writer, "key0", "value0");
store_bar.put(writer, "key1", "value1");
writer.commit();
let reader: Reader = rkv.read();
store_foo.iter(&reader);
store_bar.iter_from(&reader, "key0"); Or it would be Rkv::Writer/Reader-centric: let store_foo: Store = rkv.create_or_open("foo");
let store_bar: Store = rkv.create_or_open("bar");
let writer: Writer = rkv.write();
writer.put(&store_foo, "key0", "value0");
writer.put(&store_bar, "key1", "value1");
writer.commit();
let reader: Reader = rkv.read();
reader.iter(&store_foo);
reader.iter_from(&store_bar, "key0"); But it wouldn't be Store::Writer/Reader-centric, although we might decide to introduce simplifications for common cases, deciding on a case-by-case basis whether the benefits of a simpler—but redundant—API is worth the API/implementation complexity of having more than one way to do things. For example, notwithstanding my earlier argument against hiding transactions, I can imagine eventually deciding to expose transactionless APIs for setting/getting values, for cases where the consumer knows that they're only going to set/get a single value at any one time: let store_foo: Store = rkv.create_or_open("foo");
store_foo.put("key0", "value0"); (For this reason, and because it's more consistent with the underlying LMDB API, it probably makes more sense to employ the Rkv::Writer/Reader-centric approach for the general API.) |
Sounds good, let's go with Rkv::Writer/Reader-centric then. Will polish my patch accordingly, and re-submit the PR soon. Thanks for your insights! |
Over in #42, I noted that "it's unclear if/how it's possible to open multiple stores within a single transaction, which LMDB itself supports," and @ncloudioj responded:
This is a tough problem. The latter approach feels a bit more intuitive and is also likely to be more compact, provided store names are short; whereas the former grows a line for each store involved in the transaction.
On the other hand, the former approach has the advantage of being more strongly typed, because stores are referenced by handle after creation, so it isn't possible to compile code that opens the "foo" and "bar" stores and then writes to the "baz" store; whereas the latter approach will happily compile that code (and then fail at runtime).
Also note #29, although #28 (comment) suggests that I didn't actually understand LMDB database handles when I filed it, and it's the wrong thing to do.
I'm also puzzling over the requirement that LMDB database handles be opened with reference to a specific transaction but can then be reused by any other transaction, as described in the docs for mdb_dbi_open, which additionally notes:
However, lmdb-rs appears to manage those constraints by acquiring a mutex and creating/committing a throwaway transaction in Environment::create_db, so that shouldn't be an issue.
Regardless, from browsing the LMDB docs, it seems like the intent is for handles to stores to be long-lived, so perhaps the former approach is better, even though it doesn't let you use Reader/Writer, as it requires you to explicitly create the handles, which also enables you to reuse them.
Or perhaps even better is a related approach in which
Rkv::write
returns a non-store-specificWriter
rather than anlmdb::RwTransaction
(ditto forRkv::read
; it returns aReader
), and it has aput
method that takes a store handle rather than a store name, i.e. something like:(This has the added advantage that we no longer return the low-level RoTransaction and RwTransaction lmdb-rs structs from rkv methods.)
@ncloudioj What do you think?
The text was updated successfully, but these errors were encountered: