Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised Ordered Key-Value Store, with sample Scheme over POSIX implementation, and abstractions #1850

Open
16 of 50 tasks
amirouche opened this issue Jul 20, 2023 · 0 comments
Open
16 of 50 tasks

Comments

@amirouche
Copy link
Member

The current approach where everything is stored in memory, if not larger than memory, the database can use a lot of memory; that is not always possible because okvs needs to share memory with other systems.

With an on disk page-based storage layer, called book, it is possible to have a bigger than memory database. The gist of the idea: make it possible to load one page at a time, one piece of the database at a time, spreading the memory usage over time.

  • book is made of several page of fixed length
  • the length of pages is fixed so that it is easier to re-use space because it simplifies defragmentation;
  • On top of the abstraction book, and using one or more page, there is a concept of chapter, where page identifiers of a chapter are not monotonically increasing, nor sorted e.g., a chapter may be read starting at 10, then 3, followed by 7... 42, etc...
  • Each chapter has its structure, it is not necessarily a linked list;
  • A primordial chapter is the chapter called 'recycling' that stores the identifiers of the pages that were once useful, and can now be re-used;
  • Another chapter may be a write-ahead-log;
  • There are also chapters called level

A level will be an instance of lbst that is serialized until a b+tree is implemented, and immutable as a chapter in a book. A level will be a double linked list of of key-value pairs, with a trie index.

At any time, there is $1 + n$ structure. The first level is stored in memory. When that level reaches the byte size limit, e.g. 10 MB, it is serialized to disk as the most recent level. The n levels are sorted from most recent (newest), to least recent (oldest).

An operation inside an lbst looks for the relevant key, traversing one tree. In the new architecture, there are several sorted trees, a forest of trees: one is in memory, on disk the newest level, to the oldest level. To query for a key, one needs to consider the $1 + n$ trees; hence the time complexity is $O(n * log(m))$ where $m$ is the size of the database including deleted keys. It is possible to keep $n$ under control, and reduce the impact of deleted keys using a merge operation e.g. merging the two levels $n - 1$ and $n - 2$, to produce one, and new $n - 2$ level replacing both previous $n - 2$, and $n -1$. The algorithm used to query simultaneously $1 + n$ trees is called the "zigzag forest traversal" algorithm (it is not the order traversal zigzag algorithm for one tree).

Issues

  • None

Requirements

  • a write transaction does not block read-only transactions

NON TODO

  • ❌ several writers

TODO

high priority

  • entangle-parallel-apply
  • forbid the use of some posix func ala seek
  • entangle-file-open
  • entangle-file-close
  • entangle-file-bytes
  • entangle-file-pread filepath offset length
  • entangle-file-pwrite filepath offset bytevector
  • entangle-file-sync filepath offset length
  • add min, max key bounds to lbst;
  • levels forest zigzag merge traversal:
    • okvs-merge
    • okvs-query
      • okvs-ref forest key
      • okvs-query forest key other
    • okvs-set! forest key value
    • okvs-clear!
      • okvs-clear! forest key
      • okvs-clear forest key other
  • implement book;
    • book-push (recycler)
    • book-pop (recycler)
    • book-set! (page)
    • book-ref (page)
    • level:
      • start, end bounds
      • bytevector trie with search
      • double linked list sorted key-value
      • bloom filter
    • write ahead log
    • memory tree: lbst
    • memory tree to new level
    • compaction as a b+tree;
  • sqlite3 okvs impl, check, and benchmark
  • lsm1 okvs impl, check, and benchmark
  • page cache
  • nstore
  • vnstore
  • eavstore
  • eavtstore
  • bitstore
  • istore
  • bbkh
  • xzstore
  • rstore
  • gstore
  • hstore
  • entangle-thread-poll
  • byter-write: add prefix bytevector argument
  • byter-read: add prefix bytevector-length argument
@amirouche amirouche mentioned this issue Jul 20, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant