Konserve provides a minimalistic user interface that can be understood as a lense into key-value stores. It’s main purpose is portability to as many backends as possible. This document is describing how to implement such backends. Feel free to ignore parts of this document to get something running quickly, but be aware that we can only support backends that adhere to this document.
- understanding of the backend store you want to integrate
- basic understanding of core.async
- understanding of the protocols
- understanding of how to do error handling
- understanding of how to handle migrations
Advanced Clojure developers with the need to durably store data with a flexible interface, e.g. users of Datahike or replikativ or developers who want to retain portability of their storage backend in general.
Datomic is providing a set of storage backends for its persistent indices. There are also libraries for JavaScript that provide a similar interface, but without the lense-like interface and practically all libraries are not cross-platform between JVM and JavaScript.
The boundary between the user facing API’s of konserve and the backends are
defined in protocols.cljc. The protocols PEDNAsyncKeyValueStore
,
PBinaryAsyncKeyValueStore
and PKeyIterable
are the protocols that need to be
implemented by each backend. Each protocol method implementation has to return a
go channel. Only the operations -update-in
, -assoc-in
, -dissoc
and
-bassoc
can mutate the state of the durable medium.
Konserve internally uses metadata that is kept separate from the operations on
the values. This metadata is used to track the time when values are mutated for
example, a functionality that is needed by the concurrent garbage collector.
The metadata itself is an edn
map. There are no size limitations of metadata
fixed in the protocol, if you cannot implement your backend without a size
limit then please document it. Metadata can also be used by third parties to
track important contextual information such as user access control rights or
maybe even edit histories and therefore supporting at least a megabyte in size
might be a future proof limitation. The get-meta
protocol reads only the
metadata part of the respective key and therefore should not fetch the value.
It is used by the garbage collector and potentially other monitoring processes
that otherwise would read all data regularly.
TODO
document schema.
Konserve is providing ACID guarantees for each key to its users.
Your write operation should either completely succeed or fail. Typically this is provided by the underlying store, but you might need to first write a new value and then do an atomic swap, as is done in the filestore with an atomic file renaming. The metadata update and the value update need to be updated in the same atomic transaction. The underlying store should provide consistent views on the data. This is typically not a property you have to worry about, but it is a good idea to point your users to the consistency guarantees provided. A reasonable backend should at least provide read-committed semantics. You are guaranteed by the go-locked macro to have no concurrent state mutations on individual keys. This locking mechanism only holds inside a memory context of a JVM. If you expect multiple JVM processes to operate on one backend you have to make sure that crashing processes do not leave broken values. Usually this is provided by the underlying distributed storage. You must have written all data when the go channel you return yields a value, i.e. everything needs to be transmitted to the backend. This guarantee depends on the guarantees of the backend and you must clearly document this for your users. It is a good idea to provide configuration options for the backend if you see fit. The filestore provides a flip to turn offsync
‘ing for instance.
All internal errors must be caught and returned as a throwable object on the return channel following this simple pattern. We provide similar helper macros in utils.cljc.
Be aware that you must not use blocking IO operations in go-routines. The
easiest solution is to spawn threads with clojure.core.async/thread
, but
ideally you should use asynchronous IO operations to provide maximum
scalability.
Konserve provides the protocol PStoreSerializer
with a -serialize
and
-deserialize
method. You need to use these methods to serialize the edn
values, as they are supposed to be configurable by the user.
Sometimes the chosen internal representation of a store turns out to be insufficient as it was for the addition of metadata support as described in this document. In this unfortunate situation a migration of the existing durable data becomes necessary. Migrations have the following requirements:
- They must not lose data, including on concurrent garbage collection.
- They should work without user intervention.
- They should work incrementally, upgrade each key on access, allowing version upgrades in production.
- They can break old library versions running on the same store.
To determine the version of an old key we cannot read it since we do not know its version… Therefore a dedicated inspection function is needed to determine the version of the stored format. You can decide to store the version explicitly if you think this is beneficial.
TODO
Given the version and the current version of the code base we provide a
function determining the upgrade path.
You then need to provide an upgrade function between subsequent versions and can apply those along the path provided. The migration will be automatically triggered on reads.
The figure illustrates the different paths that are taken by read or update
operations. io-operation
, read-file
and update-file
are functions in the
filestore namespace while each phase dispatches internally on a context
describing the top-level io operation, e.g. whether reading or writing is
necessary. This explicit contextual environment is not strictly necessary, but
reduces duplication of code. The filestore uses core.async
internally, but you
can also use callback APIs and just put the value on a channel in the end, e.g.
through promises. Please take care to return all errors though.
The filestore uses the following memory layout:
- 8 bytes for metadata size
- serialized metadata
- serialized or binary value
Storing the metadata size is necessary to allow to read only the metadata (and also to skip it). You can store the metadata also separately if your store allows atomic transactions over both objects, e.g. using two columns in SQL database.
TODO
We provide a standard test suite that your store has to satisfy to be
compliant.
- find a better visual language if possible
- integrate logging
- monitoring, e.g. of cache sizes, migration processes, performance …?
- document test suite
- generalize core.async macros and move to superv.async