Unbounded `HistoryTreeNode` field #112

slawlor · 2021-12-13T13:29:27Z

HistoryTreeNode has an epochs field that is used to identify the epochs which an update took place.
As it stands now, the field can grow to an unbounded size. This will pose an issue when the corresponding DB record hits either soft or hard limits imposed by the DB schema or architecture.

ESTIMATED LIMITS

Epochs are u64 = 8 bytes.

For mysql, with the current VARBINARY(2000) definition, 2KB / 8B = ~ 250 values (we could always raise the max value in the definition, this is just for illustration)
For some alternate key-value db (KVDB) with say 100KB recommended soft limit for record size, ignoring other overhead, 100KB / 8B = ~ 12500 values

SOLUTION?

The associated epochs for a HistoryTreeNode can be obtained directly from the HistoryNodeState table given the node label, since each update corresponds to a HistoryNodeState.

For mysql, this can be done through one call with a join, or a separate, concurrent call to the DB.
For the alternate KVDB, one call is also sufficient. Instead of a "get" operation to fetch a singular HistoryTreeNode record, we can use a "scan" operation to fetch both the HistoryTreeNode record AND associated HistoryNodeState records at one go, provided we have a good sharding setup.

Accumulating the data from separate tables will incur some computation time on the database server and we will need to profile and assess the impact. However, I think this is necessary to avoid the unbounded record size issue. Are there any other potential solutions I've missed?

slawlor · 2021-12-13T13:31:10Z

Exporting external group conversation

@kevinlewi
Good catch. The epoch vector will indeed need to be removed. However, how would we efficiently look up the HistoryNodeState table for the appropriate epoch? I believe we will perhaps need each HistoryNodeState to have a "pointer" to the previous valid epoch for the node, right? Or are you thinking of a more efficient way to do this?
Btw, at some point it would also be great to create public-facing issues on the github to track these, even just as a placeholder. Thanks for the finds!!

@afterdusk
Hmm, when an epoch is added to the epochs field, does it not remain perpetually? My impression was that all epochs where a HistoryNodeState exists for the node are valid epochs.
Based on this assumption, the following naive query can be run in parallel to collect the epochs where a state exists for a node:

SELECT a.epoch
FROM HistoryNodeStateTable a
WHERE a.label_len = <label_len> AND a.label_val = <label_val>

(In hindsight, a join to combine this with the query to HistoryTreeNodeTable wouldn’t be feasible, so two queries are needed.)
Also thank you for pointing out GitHub issues, I’ll post there for discussions like these in future!
Edit: To be clear, I'm proposing to remove the epochs field from the database table for HistoryTreeNode, but not the struct definition. When retrieving a HistoryTreeNode from storage, we will populate the epochs field with a query to the table storing HistoryNodeState 🙂

slawlor · 2021-12-13T13:34:41Z

My 2 cents.

I've actually been thinking about this before as a place to optimize. For a lookup proof & a publish, we only need the most recent state of a HistoryTreeNode correct? (@kevinlewi confirm plz :) )

What if we change HistoryTreeNode to contain the "latest" state as a part of the struct. That way at the data-layer it's constant in size, and it actually saves us a 2nd get operation generally since we (most of the time) only need the latest state. However when we write, we'll also want to push and changed states to their own table. That way for a key-history request, we can still retrieve the information (via JOIN or whatever). It might be worth changing the storage contract to support something like this (instead of just get/set generics).

…jority of operations only require the birth and current epochs. (Only audit & key history require the full list of epochs a node was mutated in) Resolves #112

kevinlewi · 2021-12-13T22:09:32Z

For a lookup proof & a publish, we only need the most recent state of a HistoryTreeNode correct?

Yes, @Jasleen1 to confirm.

However, I think the intermediate states are needed for the append-only / key history consistency proofs. And we still need the information about which epochs a node actually gets changed in for these proofs. Otherwise, we could end up having to search for all epochs between the birth and current epochs for changes -- which may not be very efficient...

Edit: left a comment here (https://github.com/novifinancial/akd/pull/113/files/9367f0ebc7dde560dfaf47491b1b32a5bb2eff51#diff-be3234553a30392a01a72481f1676c8338a439bc2a15a8a9f00f7923d1ef9353) which further elaborates on this

slawlor · 2021-12-14T14:15:15Z

Yeah @kevinlewi so in MySQL it's relatively straightforward with

SELECT `epoch` 
FROM `states` 
WHERE 
  `label_len` = :len 
  AND `label_val` = :val 
  AND `epoch` <= :epoch 
ORDER BY `epoch` DESC 
LIMIT 1

which is searching over an indexed column space with "retrieve the first epoch which is <= :epoch with a specific label" In-memory, yes it's slower, but I did a bit of a nifty trick to not scan the entire "memory db". Construct an array from

0...=epoch, map it to the NodeStateKey with the provided specific label, and retrieve those batch of nodes. Any epochs where an update didn't occur will simply return a not-found, so after we just take the largest epoch.

slawlor · 2021-12-14T14:16:08Z

In MySQL, this guarantees that the query is going to return a single u64 which at least saves us on the data transport (we aren't loading an entire node, or state or something, and wasting time serializing)

* Remove the epochs vector from the history tree node since the vast majority of operations only require the birth and current epochs. (Only audit & key history require the full list of epochs a node was mutated in) Resolves #112 * Add mysql limit to limit select as small as possible * Version bump to 0.3.6 Co-authored-by: Sean Lawlor <seanlawlor@fb.com>

slawlor added enhancement New feature or request optimization labels Dec 13, 2021

slawlor assigned afterdusk Dec 13, 2021

slawlor mentioned this issue Dec 13, 2021

Remove the epochs vector from HistoryTreeNode #113

Merged

slawlor closed this as completed in #113 Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbounded `HistoryTreeNode` field #112

Unbounded `HistoryTreeNode` field #112

slawlor commented Dec 13, 2021

slawlor commented Dec 13, 2021 •

edited

Loading

slawlor commented Dec 13, 2021

kevinlewi commented Dec 13, 2021 •

edited

Loading

slawlor commented Dec 14, 2021

slawlor commented Dec 14, 2021

Unbounded HistoryTreeNode field #112

Unbounded HistoryTreeNode field #112

Comments

slawlor commented Dec 13, 2021

ESTIMATED LIMITS

SOLUTION?

slawlor commented Dec 13, 2021 • edited Loading

Exporting external group conversation

slawlor commented Dec 13, 2021

kevinlewi commented Dec 13, 2021 • edited Loading

slawlor commented Dec 14, 2021

slawlor commented Dec 14, 2021

Unbounded `HistoryTreeNode` field #112

Unbounded `HistoryTreeNode` field #112

slawlor commented Dec 13, 2021 •

edited

Loading

kevinlewi commented Dec 13, 2021 •

edited

Loading