New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache (expansion?) parents #455
Comments
You can run the zigzag benchmark 'example' to find out how long replication takes for comparison purposes. First, make sure the examples are built (for release):
Then you can just run replication:
The output most important to you will be the replication time stats:
You might want to play with different data sizes. |
@porcuquine What would be the easiest way to put these in a CircleCI job? (in my previous performance tests I could just call a |
I think you can just include the command(s) above (or one like) it where you would have put the |
Also, I think this is exactly what |
Yes, my question was aiming mostly at finding the correct |
Ok. |
From the discussion in #459 I think I have extended the scope way outside of what was actually needed. @porcuquine, to have a concrete starting point, could you suggest an API that would align with the needs here (it doesn't have to be the final version, just something to get me started), that is, which function (that caches some product of the current computations) should be added and where. |
@schomatis Here's a guess, along with my reasoning so you can sanity check it and tease out more details as needed. Assumptions:
Recommendation:
Conceptually, for the first step, we just want to cache the calls to Please bear in mind that the end goal of this and other future work will be to understand and optimize for potentially configurable time-space tradeoffs. So the best solutions will be both fastest and with least memory used for that speed (with the option to use disk where appropriate). I don't think you should worry about all the details of what 'best' will look like here yet, though. I mention it just so you can make decisions in a larger context. |
@porcuquine I'm going to write my conclusions of what I understand about cache usage (from reading and playing with the code) and some design considerations, but first I need to ask, all my analysis is based on the
scenario, is there any other workflow I should consider that would expand my perspective on how the ZigZag graph is used? |
@porcuquine Assuming the answer was yes to the previous question, I updated the issue description with my current understanding of the cache strategy we should use, feel free to edit it in-place with things you are sure about and let's leave the rest of this issue thread to discuss things we're not sure of and would need to give some more though to. My aim is to have a central document that clarifies exactly what are we trying to accomplish and how. |
Implemented in #465. |
Cache strategy design
We need to have a very clear idea of how isparents
used to be able to design the cache in an efficient manner, especially since there is a memory constraint and we just can't cache everything all the time.We need to be able to support a full-cache strategy, which will use a lot of memory — but have this be optional. A framework for these space vs time optimization configurations should be designed as a separate issue. (TODO: create that issue.)
What follows is my current understanding based on what I've seen from the code in ZigZag example,
The same ZigZag graph seems to be used throughout the code, not the same
ZigZagGraph
structure but rather its "reversed" (zigzag()
'ed ) version passed from one layer to another, specifically, the samebase_graph
andexpansion_degree
is maintained, which results in always the same parents for each separate direction (forward
/reversed
).The
parents
can actually be split in the parents of the base graph and of the ZigZag extension, for now this concentrate on the ZigZag extension (expanded_parents
, although for simplicityparents
will be used in this text to actually mean the subgroup of parents we care to cache), this is because the underlying computation is more expensive than the base. We won't cache or analyze the underlying Feistel computations (anything belowcorrespondent
), which is basically a series of sequential hash functions, caching a particular hash result seems of little value since it is useful only for the computation of the current node (the chances of being reused are very low), and for roughly the same amount of space we can just cache the actual result (a parent index).The parents of this "single" (in the sense described above) ZigZag are requested in 3 places:
During the replication phase, in
transform_and_replicate_layers
.During the prove generation in
DrgPoRep::prove
.During the verification in
DrgPoRep::verify
.I'm not sure but I'm assuming that steps 1-2 would be called separately from step 3 and wouldn't share the cache (that is, that another process would be the one verifying it under normal -not test- circumstances).
There are two different profiles for the
parents
use, the replication (1) and the prove/verify (2/3), but from the initial write-up I'm assuming that we want to optimize step 1 and don't care about 2-3 (which useparents
orders of magnitude less than step 1).Step 1: replication.
This requests the parents in sequential order for each layer (in the corresponding direction), e.g., layer-0:
parents(0); parents(1); parents(2); [...] parents(N-1);
, then layer-1:parents(N-1); parents(N-2); parents(N-3); [...] parents(0);
, then layer-2 the same as layer-0, layer-3 same as layer-1 and so on. Each layer is processed sequentially (even withgenerate_merkle_trees_in_parallel
set, which just parallelizes the merkle tree generation inmerkle_tree()
).So the replication is calling
parents
for every node once in each layer, in sequential order (sequential here referring not to calling the parents in an orderly fashion like 0,1,2,3 which has no impact, but that every layer calls all the parents once, finishes, and then another layer repeats it).Based on that, the current cache strategy implemented in PR #465 (and roughly delineated in this comment) doesn't aim at an LRU cache, instead it just covers the biggest
parents
search space as possible with a fixed-size allocated memory.The only consideration against this fixed cache would be if layers are not processed sequentially (as described in #465 (comment)): if we know (and can coordinate) that many layers would be calling the
parents
for the same group of nodes "more or less at the same time" then an LRU could be considered. This (I think) wouldn't be trivial to implement and would require a high degree of coupling, we would need to coordinate, say, all even (forward) layers to requestparents
for the same group of nodes at once (a group that could fit in a determined cache size), before allowing them to move to another part of the search space (clearing the previous cache as we would have a strong certainty that those values wouldn't be requested again).Original write-up
Background:
expanded_parents
calls.parents
calculation.parents
depends on the marginal time savings of doing so (may be low).Start by looking at
zigzag_graph.rs
, wherefeistel
is called:https://github.com/filecoin-project/rust-proofs/blob/master/storage-proofs/src/zigzag_graph.rs#L165-L178
The call to
permute
ORinvert_permute
will be calledexpansion_degree
times on every index (node) at every layer.Also look at
parents
andexpanded_parents
in the same file.This may need to be fleshed out or better specified, but the above should be enough background for you to begin digging in and understanding the problem.
The text was updated successfully, but these errors were encountered: