Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Documents and Views to better utilize Nebari #250

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ecton
Copy link
Member

@ecton ecton commented May 9, 2022

Closes #76.
Closes #225.

The primary goal of this PR is to improve the speed of view indexing (See #251
for more info) by tackling #76 in such a way that it can be executed safely
without fsync.

Now that work has been done, the goals are slightly different:

  • Reduce Views from 3 Trees to 1 by making the view indexing system sequence
    based rather than invalidated-keys based.
  • Allow lazy views to execute without fsync, and eager views to execute fully
    synchronized in their transaction (although it might still be safe to be
    fsync-less in the transaction context, but more thought needs to be done in
    that direction).

Document Storage:

Documents are no longer serialized in a wrapper document type. Instead,
the documents tree is now a versioned tree with an embedded index that
stores the document's hash. The Revision's id is now the versioned
tree's sequence_id.

This means that instead of simply pulling a document out of the database
and deserializing it, we must pull the value and index out for a key and
combine it with the key to create our document.

The other major change is introduced by the constraints of working
within Nebari's modification system. Because we don't have access to the
index for a key we're about to set, most of the logic for creating the
OperationResult has been moved outside of the CompareSwap operation.

View Storage:

Views have been refactored to store the reduced value in Nebari through
use of an embedded index. Instead of storing the entire ViewEntry
structure in the view, we now only store the serialized
Vec<Entrymapping>. The major change here is that Nebari will now
reduce the stored index via the new ViewIndexer. The changes haven't
been made to reduce/reduce_grouped yet to use Nebari's native reduce
function -- but that is the inspiration for these changes.

When retrieving a view entry, we reconstruct the ViewEntry using the
stored index to maintain compatibility with the existing code that
worked with the ViewEntry structure.

These are a lot of remaining tasks:

  • Update reduce/reduce_grouped() to use Nebari's built-in reduction.
  • Remove the invalidated_entries map and make the view mapper sequence
    based.
  • Embed the DocumentMap tree in the ViewEntries tree by creating a
    custom Root.
  • Once all the above are done, when the view indexer is running outside
    of a transaction (lazy views), the view can be persisted without fsync
    and be 100% safe to use due to the append-only file format.
  • Figure out if we want a new PR for the version migration work or to write it here.

This also has one small change of moving when the documents are queried
into the transaction. This should have no actual effect, since the
integrity scanner must run for all views now before a document
transaction is applied.
ecton added a commit to khonsulabs/nebari that referenced this pull request May 11, 2022
While working on khonsulabs/bonsaidb#225, I had to do major updates on
how documents are stored. In the new storage scheme, the key in Nebari
is the DocumentId, Revision::id is Nebari's SequenceId, and
Revision::hash is indexed and stored as an embedded index. This means
that retrieving a document needs access to the index to fully construct
the Document record -- hence these new APIs.

Refs: khonsulabs/bonsaidb#250
Both khonsulabs#76 and khonsulabs#225 ended up being heavily intertwined. This is not
yet in its final form, but it's complete enough that unit tests are
passing (aside from backwards compatibility ones).

Document Storage:

Documents are no longer serialized in a wrapper document type. Instead,
the documents tree is now a versioned tree with an embedded index that
stores the document's hash. The Revision's id is now the versioned
tree's sequence_id.

This means that instead of simply pulling a document out of the database
and deserializing it, we must pull the value and index out for a key and
combine it with the key to create our document.

The other major change is introduced by the constraints of working
within Nebari's modification system. Because we don't have access to the
index for a key we're about to set, most of the logic for creating the
OperationResult has been moved outside of the CompareSwap operation.

View Storage:

Views have been refactored to store the reduced value in Nebari through
use of an embedded index. Instead of storing the entire ViewEntry
structure in the view, we now only store the serialized
`Vec<Entrymapping>`. The major change here is that Nebari will now
reduce the stored index via the new `ViewIndexer`. The changes haven't
been made to reduce/reduce_grouped yet to use Nebari's native reduce
function -- but that is the inspiration for these changes.

When retrieving a view entry, we reconstruct the ViewEntry using the
stored index to maintain compatibility with the existing code that
worked with the ViewEntry structure.

There are a lot of remaining tasks:

- Update reduce/reduce_grouped() to use Nebari's built-in reduction.
- Remove the invalidated_entries map and make the view mapper sequence
  based.
- Embed the DocumentMap tree in the ViewEntries tree by creating a
  custom Root.
- Once all the above are done, when the view indexer is running outside
  of a transaction (lazy views), the view can be persisted without fsync
  and be 100% safe to use due to the append-only file format.
@ecton ecton changed the title Refactor Views to use Nebari Refactor Documents and Views to better utilize Nebari May 11, 2022
ecton added 2 commits May 12, 2022 10:58
This commit removes the invalidated entries tree, and uses the sequence
index of the documents tree to drive the indexing. The mapping operation
is batched and performed in such a way that if new data is added to the
documents tree while the operation is being performed, the indexing is
performed using the sequence data at the time of the mapping job being
kicked off.

This guarantee allows us to track what the latest indexed sequence ID in
the ViewEntries embedded index. The start of the map job begins from the
ViewEntries tree's latest sequence id + 1.
This was a weird one to debug, as it only showed up on the
simultaneous-connections test. Yet, the bug was unrelated to
multiprocessing.

Eager views are meant to always be up-to-date. This contract was broken
when multiprocessing was involved, because there was a logic bug: the
index being returned from TransactionTree::remove is the existing index,
which means its sequence id is of the removed sequence, not the newly
writeten sequence (document entries aren't actually removed, for history
preservation).

The fix is to retrieve the new sequence value and map it instead. This
ensures we're actually mapping the deleted version of the entry.

The reason this didn't cause issues outside of multithreading is most
tests are written without specifying an access policy, which means all
the queries are AccessPolicy::UpdateBefore. This meant that the
preparation for queries would still index it, as it wasn't actually
up-to-date.
ecton added a commit to khonsulabs/nebari that referenced this pull request May 15, 2022
This largely enables external Roots to be written. As part of
khonsulabs/bonsaidb#250, I am moving the document map into the view
entries tree, which requires using two roots like the Versioned root.
The biggest limiting factor was a lot of unexported functionality and
types.

This set of changes also removes some of the &mut requirements for some
of the closures.

The last major change is adding the Value associated type to Root, which
allows a tree to use something other than ArcBytes on its public API
surface. The two built-in trees will continue to use ArcBytes, but the
ViewEntries tree in BonsaiDb will be using a custom type to prevent
extra deserialization.
This change turns ViewEntries into a new Root implementor for Nebari
that stores the view entries in one B+Tree, and stores the document map
in another B+Tree.

This pull request does not yet add the ability to query from the
document map. Once that is implemented, I can remove the external
document map tree which will conclude the final format changes.
@ecton ecton mentioned this pull request May 29, 2022
This removes the document_map tree, and stores it inline in a new custom
Nebari Root. This custom tree supports querying what keys a document id
emitted as well as what mappings were emitted for any given key.

This branch also contains several other changes:

The integrity scanner can spawn a mapping job, and that mapping job must
use transactions if the view is eager. This set of changes addressed
that, but it also lumped in with a refactor to change from
easy_parallel to rayon.

While rayon is a heaver dependency, I was noticing a *lot* of traffic on
profiles for spinning up new threads. Rayon uses a persistent thread
pool for work, and by embracing it here, we can start using it in other
locations as well.
@ecton
Copy link
Member Author

ecton commented May 29, 2022

I've been starting work on a new file format that is my best theorycratt at something that could sit beneath Nebari -- https://github.com/khonsulabs/sediment. At its core is the basic idea that while fsync is happening, other transactions can proceed with updating the database, and then be batch-synced to confirm. This would make the fsyncs on each thread take on average the normal time for a sync, but now transactions will be able to be batched.

That core idea is actually somewhat compatible with the append-only format, except that only one writer can be modifying the tree at any given moment. I attempted to bring this idea into Nebari without the new project today, but I ran into another issue that Sediment wouldn't suffer from: multi-file synchronizations.

The reason my work today didn't do much is that each tree file is still being synced for each write. I don't have a good way to batch these operations at the moment, but it's one of the things Sediment aims to solve. I may come up with an idea in the meantime and try again -- but the more I think about Sediment the more I'm hopeful it will be able to be significantly better than an append-only format, so I probably still want to get there anyways.

ecton added 4 commits May 30, 2022 13:29
This is meant to be an atomic operation, and is implemented in SQL as a
single query.
The collection sequence tracking I introduced as part of the
sequence-based-mapping refactor was done incorrectly -- the sequence IDs
can't be published to shared state until the transaction is confirmed.

The edge case was that a lazy view could start mapping while a
collection had a pending transaction being applied. The collection's
sequence could report a higher number than the database would return via
a query due to the transaction not being writen yet.

This was partially a Nebari bug as well -- Tree::current_transaction_id
was implemented incorrectly, while TransactionTree/TreeFile were
correct.
This isn't completely functional, but I was ready to merge changes in
for clippy fixes from main. Still, only 2 tests are broken in
bonsaidb-local currently that are expected to be working.
sharonminer052 added a commit to sharonminer052/beanch-nebari-develop that referenced this pull request Sep 14, 2022
While working on khonsulabs/bonsaidb#225, I had to do major updates on
how documents are stored. In the new storage scheme, the key in Nebari
is the DocumentId, Revision::id is Nebari's SequenceId, and
Revision::hash is indexed and stored as an embedded index. This means
that retrieving a document needs access to the index to fully construct
the Document record -- hence these new APIs.

Refs: khonsulabs/bonsaidb#250
sharonminer052 added a commit to sharonminer052/beanch-nebari-develop that referenced this pull request Sep 14, 2022
This largely enables external Roots to be written. As part of
khonsulabs/bonsaidb#250, I am moving the document map into the view
entries tree, which requires using two roots like the Versioned root.
The biggest limiting factor was a lot of unexported functionality and
types.

This set of changes also removes some of the &mut requirements for some
of the closures.

The last major change is adding the Value associated type to Root, which
allows a tree to use something other than ArcBytes on its public API
surface. The two built-in trees will continue to use ArcBytes, but the
ViewEntries tree in BonsaiDb will be using a custom type to prevent
extra deserialization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor Document Revision to be Sequence based Refactor Views to utilize Nebari's B-Tree
1 participant