Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of a monadic cursor API. #1

Merged
merged 6 commits into from
Nov 22, 2022

Conversation

jorisdral
Copy link

@jorisdral jorisdral commented Oct 17, 2022

While working on IntersectMBO/ouroboros-network#3954, it became apparent that we need better access to cursors in order to solve a bug with LMDB range reads. This bug resulted from using the forEach function to perform reads of ranges of keys. However, the forEach function reads a full database, whereas we are only interested in a range of keys in a database.

This PR provides a small, monadic API to cursors. The API hides the lower-level details of using cursors behind a state-like monad. For example, the getc function is similar to the get function for State-monads. A specialised version of a custom foldM is used to read a range of keys. Note that:

  • A cursor can be run in IO, or as a Transaction. The preferred way is to run the cursor as a transaction, because it akin to running the lower-level cursor actions in IO using bracket: the cursor is opened before running all the cursor operations, and it is closed when those have finished.
  • We do not use the Serialise type class to serialise/deserialise keys and values. Instead, we use a record-of-functions type called Codec. We reimplement the serialise and deserialise functions from the Serialise library to fit this style.
  • The API is not complete: it is currently only tailored to performing single-point reads and range reads. Extending the API to cover the full cursor functionality should be relatively straightforward.

As an additional change, I've updated the .gitignore file to include the default entries from https://github.com/github/gitignore/blob/main/Haskell.gitignore. I have also added a cabal.project file.

Related PR: input-output-hk/haskell-lmdb#4

Updates

#1 (comment)

@jorisdral jorisdral added the enhancement New feature or request label Oct 17, 2022
@jorisdral jorisdral self-assigned this Oct 17, 2022
@jorisdral jorisdral force-pushed the jdral/monadic-cursor-interface branch from d4dff00 to 78d8c59 Compare October 17, 2022 14:43
Copy link
Collaborator

@jasagredo jasagredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. However, shouldn't we delete these functions as we are redefining them here?

And why would one ever run the cursor in IO directly as opposed to running it in a Transaction?

cabal.project Outdated Show resolved Hide resolved
Copy link
Collaborator

@dnadales dnadales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Some comments:

  • Why not submitting this to the project we forked from? We might get more feedback.
  • Why do we need to depend on CBOR, as opposed to (de)serializing in the way
    lmdb-simple used to do it?
  • I think we should add tests, and optionally some micro-benchmarks.
  • It'd be useful to add some (pseudo)doctests that illustrate how the new functions can be used.

However, the forEach function reads a full database, whereas we are only interested in a range of keys in a database.

Is this the only reason why we're introducing this change?

src/Database/LMDB/Simple/Codec.hs Outdated Show resolved Hide resolved
lmdb-simple.cabal Outdated Show resolved Hide resolved
src/Database/LMDB/Simple/Codec.hs Outdated Show resolved Hide resolved
src/Database/LMDB/Simple/Cursor.hs Show resolved Hide resolved
src/Database/LMDB/Simple/Cursor.hs Show resolved Hide resolved
src/Database/LMDB/Simple/Cursor.hs Outdated Show resolved Hide resolved
src/Database/LMDB/Simple/Cursor.hs Show resolved Hide resolved
src/Database/LMDB/Simple/Cursor.hs Outdated Show resolved Hide resolved
src/Database/LMDB/Simple/Cursor.hs Show resolved Hide resolved
src/Database/LMDB/Simple/Cursor.hs Outdated Show resolved Hide resolved
@jorisdral
Copy link
Author

Looks good to me. However, shouldn't we delete these functions as we are redefining them here?

I tried as you suggested, but some functions in the Extra module still depend on the forEach versions from the Internal module. In turn, these functions in the Extra module have Serialise constraints (which I'm trying to avoid), so I can't remove them as of yet.

And why would one ever run the cursor in IO directly as opposed to running it in a Transaction?

If one does not want the bracketing behaviour because they want to provide/reuse their own cursor pointer/kPtr/vPtr, they could use runCursorM instead of runCursorAsTransaction

@jorisdral
Copy link
Author

jorisdral commented Oct 18, 2022

Nice. Some comments:

* Why not submitting this to the project we forked from? We might get more feedback.

Good idea, though I would probably expand the API to cover the full cursor functionality if we were to do this. One obstacle is that the original package does not seem to be maintained anymore..

* Why do we need to depend on CBOR, as opposed to (de)serializing in the way `lmdb-simple` used to do it?

consensus does not define Serialise instances for the types of values we want to serialise/deserialise, but provides explicit codec dictionaries instead. Another solution would have been to provide Serialise instances in consensus, but I am not sure if that is possible. Currently, consensus also contains redefinitions of lmdb-simple functions that use explicit codecs, so we were already doing a similar thing of redefining functions with explicit codecs (though not in the lmdb-simple package)

* I think we should add tests, and optionally some micro-benchmarks.
* It'd be useful to add some (pseudo)doctests that illustrate how the new functions can be used.

I'll look into this

However, the forEach function reads a full database, whereas we are only interested in a range of keys in a database.

Is this the only reason why we're introducing this change?

The primary motivation was to define a cursor fold that folds only over a range and not the full database. However, I did think it would be useful to define a higher-level API for cursors (instead of just writing a lower-level fold without the monadic API)

@jorisdral jorisdral force-pushed the jdral/monadic-cursor-interface branch 5 times, most recently from 5001340 to 83f8415 Compare October 19, 2022 13:07
@dnadales
Copy link
Collaborator

Why do we need to depend on CBOR, as opposed to (de)serializing in the way lmdb-simple used to do it?

consensus does not define Serialise instances for the types of values we want to serialise/deserialise, but provides explicit codec dictionaries instead. Another solution would have been to provide Serialise instances in consensus, but I am not sure if that is possible. Currently, consensus also contains redefinitions of lmdb-simple functions that use explicit codecs, so we were already doing a similar thing of redefining functions with explicit codecs (though not in the lmdb-simple package)

I don't think we want to provide Serialise instances in consensus. What confuses me is the fact that we were able to use this library without having to rely on CBOR, and I was wondering why we can't do that anymore.

@jorisdral
Copy link
Author

jorisdral commented Oct 20, 2022

I don't think we want to provide Serialise instances in consensus. What confuses me is the fact that we were able to use this library without having to rely on CBOR, and I was wondering why we can't do that anymore.

We're relying on cborg in the HD.LMDB module to redefine a number of definitions from lmdb-simple that have Serialise constraints. For example, we're using custom get and put functions that use explicit CodecMKs:

@dnadales
Copy link
Collaborator

We're relying on cborg in the HD.LMDB module to redefine a number of definitions from lmdb-simple that have Serialise constraints.

So can we keep on using this approach? I'm afraid we're breaking the uniformity of the LMDB interface. It seems to me we either change all the functions to use cborg or we change none.

That said I don't oppose merging this PR as is.

@jorisdral
Copy link
Author

jorisdral commented Oct 24, 2022

We're relying on cborg in the HD.LMDB module to redefine a number of definitions from lmdb-simple that have Serialise constraints.

So can we keep on using this approach? I'm afraid we're breaking the uniformity of the LMDB interface. It seems to me we either change all the functions to use cborg or we change none.

That said I don't oppose merging this PR as is.

I have some ideas on how to unify the serialise and cborg interfaces, but I'll create a separate issue for it because the scope of unifying the two interfaces is package-wide. I will also create a separate issue for writing tests/micro-benchmarks for the cursor API

@jorisdral jorisdral force-pushed the jdral/monadic-cursor-interface branch from 122392a to a410982 Compare November 7, 2022 09:26
@jorisdral jorisdral force-pushed the jdral/monadic-cursor-interface branch from 823b5ba to 04da2fc Compare November 10, 2022 15:32
@jorisdral
Copy link
Author

Update: since the last review round, I've focused on three main improvements, and they should be nearly ready for another review round.

  • Unify the cborg- and serialise-based interfaces: we have decided to remove the cborg-based interface that the cursor API previously had, and make it serialise-based instead. This means there is just one interface now in the package, paving the way for possibly merging the API into the original package that we forked from. This does mean however that we will have to write boilerplate code in ouroboros-consensus because it only has a cborg-based interface, which is largely incompatible with the serialise-based interface.
  • Write property tests for the cursor API: the cursor API is now tested using quickcheck-lockstep state machine tests.
  • Include micro-benchmarks for the cursor API: we included a micro-benchmark for the cgetMany function.

@jorisdral jorisdral force-pushed the jdral/monadic-cursor-interface branch from ed3bf4a to 9f7f956 Compare November 11, 2022 10:54
* Extended the API with `put` and `del` operations.
* Most of the ops/flags are now handled by the API. Note: some ops and
  flags require databases to be opened with specific flags like
  `MDB_DUPSORT`. These ops and flags are not yet supported.
* `cborg`-based API is now `serialise`-based.
* Cursor operations are now generalised over any monad instead of just
  `CursorM`.
@jorisdral jorisdral force-pushed the jdral/monadic-cursor-interface branch from 3220b9d to 7544c98 Compare November 11, 2022 12:32
-> LockstepAction (CursorState k v) a
-> ModelValue (CursorState k v) a
-> [Tag]
tagCursorAction _ _ _ = []
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests have no tags yet. Suggestions for sensible tags are welcome! One minimal example I can think of is a simple "round trip": putting, getting and deleting a key.

@jasagredo jasagredo self-requested a review November 17, 2022 09:20
bench/Bench/Utils.hs Outdated Show resolved Hide resolved
bench/Bench/Utils.hs Outdated Show resolved Hide resolved
bench/Bench/Utils.hs Show resolved Hide resolved
bench/Bench/Utils.hs Show resolved Hide resolved
bench/Bench/Utils.hs Show resolved Hide resolved

main :: IO ()
main = defaultMain $ testGroup "test-cursors" [
testGroup "Test" [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding my "bench/Bench" question, here we see that we only have one "test" level. Whatever we pick maybe we should be consistent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/Database/LMDB/Simple/Internal.hs Show resolved Hide resolved
test-cursors/Test/Database/LMDB/Simple/Cursor/Lockstep.hs Outdated Show resolved Hide resolved
src/Database/LMDB/Simple/Cursor.hs Outdated Show resolved Hide resolved
@jorisdral jorisdral merged commit 6e9f1b9 into master Nov 22, 2022
@jorisdral jorisdral deleted the jdral/monadic-cursor-interface branch November 22, 2022 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants