Cache (expansion?) parents #455

porcuquine · 2019-01-24T19:41:26Z

Cache strategy design

We need to have a very clear idea of how is parents used to be able to design the cache in an efficient manner, especially since there is a memory constraint and we just can't cache everything all the time.
We need to be able to support a full-cache strategy, which will use a lot of memory — but have this be optional. A framework for these space vs time optimization configurations should be designed as a separate issue. (TODO: create that issue.)

What follows is my current understanding based on what I've seen from the code in ZigZag example,

./target/release/examples/zigzag --size 1024 --no-bench

The same ZigZag graph seems to be used throughout the code, not the same ZigZagGraph structure but rather its "reversed" (zigzag()'ed ) version passed from one layer to another, specifically, the same base_graph and expansion_degree is maintained, which results in always the same parents for each separate direction (forward/reversed).

The parents can actually be split in the parents of the base graph and of the ZigZag extension, for now this concentrate on the ZigZag extension (expanded_parents, although for simplicity parents will be used in this text to actually mean the subgroup of parents we care to cache), this is because the underlying computation is more expensive than the base. We won't cache or analyze the underlying Feistel computations (anything below correspondent), which is basically a series of sequential hash functions, caching a particular hash result seems of little value since it is useful only for the computation of the current node (the chances of being reused are very low), and for roughly the same amount of space we can just cache the actual result (a parent index).

The parents of this "single" (in the sense described above) ZigZag are requested in 3 places:

During the replication phase, in transform_and_replicate_layers.
During the prove generation in DrgPoRep::prove.
During the verification in DrgPoRep::verify.

I'm not sure but I'm assuming that steps 1-2 would be called separately from step 3 and wouldn't share the cache (that is, that another process would be the one verifying it under normal -not test- circumstances).

There are two different profiles for the parents use, the replication (1) and the prove/verify (2/3), but from the initial write-up I'm assuming that we want to optimize step 1 and don't care about 2-3 (which use parents orders of magnitude less than step 1).

Step 1: replication.

This requests the parents in sequential order for each layer (in the corresponding direction), e.g., layer-0: parents(0); parents(1); parents(2); [...] parents(N-1);, then layer-1: parents(N-1); parents(N-2); parents(N-3); [...] parents(0);, then layer-2 the same as layer-0, layer-3 same as layer-1 and so on. Each layer is processed sequentially (even with generate_merkle_trees_in_parallel set, which just parallelizes the merkle tree generation in merkle_tree()).

So the replication is calling parents for every node once in each layer, in sequential order (sequential here referring not to calling the parents in an orderly fashion like 0,1,2,3 which has no impact, but that every layer calls all the parents once, finishes, and then another layer repeats it).

Based on that, the current cache strategy implemented in PR #465 (and roughly delineated in this comment) doesn't aim at an LRU cache, instead it just covers the biggest parents search space as possible with a fixed-size allocated memory.

The only consideration against this fixed cache would be if layers are not processed sequentially (as described in #465 (comment)): if we know (and can coordinate) that many layers would be calling the parents for the same group of nodes "more or less at the same time" then an LRU could be considered. This (I think) wouldn't be trivial to implement and would require a high degree of coupling, we would need to coordinate, say, all even (forward) layers to request parents for the same group of nodes at once (a group that could fit in a determined cache size), before allowing them to move to another part of the search space (clearing the previous cache as we would have a strong certainty that those values wouldn't be requested again).

Original write-up

Background:

ZigZag replication has many (10 currently) layers.
A graph is reversed at each layer, but is only calculated on-demand.
There are two different parent calculations.
One of these, the ‘expansion’ parents, uses feistel and is expensive.
We can cache this and reuse everywhere.
The actual expansion parents alternates (as layers reverse).
The underlying feistel permutation is what can be cached.
This is a permutation of indexes, so for space efficiency could be packed appropriately.
(Data is indexed by 32-byte chunks.)
It probably makes most sense just to cache the expanded_parents calls.
It might be valuable to also cache the whole parents calculation.
There is a time-space tradeoff that should be explored before deciding whether this makes sense in general.
Whether or not we cache parents depends on the marginal time savings of doing so (may be low).
It would be worth knowing the savings, if it is cheap to learn.
I mention all this because the right design would make it easy to select no caching, or caching of either one or both calculations as a configuration.

Start by looking at zigzag_graph.rs, where feistel is called:
https://github.com/filecoin-project/rust-proofs/blob/master/storage-proofs/src/zigzag_graph.rs#L165-L178

The call to permute OR invert_permute will be called expansion_degree times on every index (node) at every layer.

Also look at parents and expanded_parents in the same file.

This may need to be fleshed out or better specified, but the above should be enough background for you to begin digging in and understanding the problem.

The text was updated successfully, but these errors were encountered:

porcuquine · 2019-01-25T18:14:34Z

You can run the zigzag benchmark 'example' to find out how long replication takes for comparison purposes.

First, make sure the examples are built (for release):

> cargo build --all --release --examples

Then you can just run replication:

> ./target/release/examples/zigzag --size 1024 --no-bench

The output most important to you will be the replication time stats:

Jan 25 10:12:40.869 INFO replication_time: 2.271130827s, target: stats, place: filecoin-proofs/examples/zigzag.rs:170 zigzag, root: filecoin-proofs
Jan 25 10:12:40.870 INFO replication_time/byte: 2.165µs, target: stats, place: filecoin-proofs/examples/zigzag.rs:171 zigzag, root: filecoin-proofs
Jan 25 10:12:40.870 INFO replication_time/GiB: 2325.637966847s, target: stats, place: filecoin-proofs/examples/zigzag.rs:176 zigzag, root: filecoin-proofs

You might want to play with different data sizes.

schomatis · 2019-01-29T15:37:51Z

First, make sure the examples are built (for release):
> cargo build --all --release --examples
Then you can just run replication:
> ./target/release/examples/zigzag --size 1024 --no-bench

@porcuquine What would be the easiest way to put these in a CircleCI job? (in my previous performance tests I could just call a cargo command but I'm not sure how to access the binary built)

porcuquine · 2019-01-29T18:10:09Z

I think you can just include the command(s) above (or one like) it where you would have put the cargo command. You can put arbitrary shell commands into config.yml.

porcuquine · 2019-01-29T18:20:46Z

Also, I think this is exactly what rust-proofs-infra is for, so please feel to use. @eefahy or @mburns can help ensure you have access to the results. You should have ability to create a PR there, which will trigger a bench run. Subsequent pushes will run again, so be judicious. I recommend creating a bencher configuration which only runs what you need.

schomatis · 2019-01-29T18:22:33Z

You can put arbitrary shell commands into config.yml.

Yes, my question was aiming mostly at finding the correct ./target/release/examples/zigzag path but I'll just try this as is and ask for help if it doesn't work.

schomatis · 2019-01-29T18:23:27Z

Also, I think this is exactly what rust-proofs-infra is for, so please feel free to use. @eefahy or @mburns can help ensure you have access to the results. You should have ability to create a PR there, which will trigger a bench run. Subsequent pushes will run again, so be judicious. I recommend creating a bencher configuration which only runs what you need.

Ok.

schomatis · 2019-02-02T15:10:51Z

From the discussion in #459 I think I have extended the scope way outside of what was actually needed.

@porcuquine, to have a concrete starting point, could you suggest an API that would align with the needs here (it doesn't have to be the final version, just something to get me started), that is, which function (that caches some product of the current computations) should be added and where.

porcuquine · 2019-02-04T16:58:09Z

@schomatis Here's a guess, along with my reasoning so you can sanity check it and tease out more details as needed.

Assumptions:

We need to at least cache the Feistel computations.
We may want to cache the parents calculations eventually (as an option).
We want to share caches between equivalent instances.

Recommendation:

At initialization time, use the values passed to ZigZagGraph::new() to acquire a reference and populate a new field with any extant cache.
If none exists, create a new cache instance and lazily fill it when running the first time.
Remember that the value of expansion_parents() depends on the layer (even or odd).

Conceptually, for the first step, we just want to cache the calls to expansion_parents. However, since the count of parents will vary by layer, it may be more space-efficient (at the cost of some recalculation) to just cache the Feistel calls.

Please bear in mind that the end goal of this and other future work will be to understand and optimize for potentially configurable time-space tradeoffs. So the best solutions will be both fastest and with least memory used for that speed (with the option to use disk where appropriate). I don't think you should worry about all the details of what 'best' will look like here yet, though. I mention it just so you can make decisions in a larger context.

schomatis · 2019-02-07T14:28:19Z

@porcuquine I'm going to write my conclusions of what I understand about cache usage (from reading and playing with the code) and some design considerations, but first I need to ask, all my analysis is based on the

./target/release/examples/zigzag

scenario, is there any other workflow I should consider that would expand my perspective on how the ZigZag graph is used?

schomatis · 2019-02-12T16:23:50Z

@porcuquine Assuming the answer was yes to the previous question, I updated the issue description with my current understanding of the cache strategy we should use, feel free to edit it in-place with things you are sure about and let's leave the rest of this issue thread to discuss things we're not sure of and would need to give some more though to. My aim is to have a central document that clarifies exactly what are we trying to accomplish and how.

schomatis · 2019-03-15T11:58:50Z

Implemented in #465.

porcuquine assigned schomatis Jan 24, 2019

schomatis mentioned this issue Jan 28, 2019

Add documentation to the Graph and ZigZag traits #459

Merged

porcuquine mentioned this issue Feb 5, 2019

Knifehead #218

Closed

schomatis mentioned this issue Feb 11, 2019

zigzag: cache optimizations #465

Merged

porcuquine mentioned this issue Mar 5, 2019

Larger sectors #522

Merged

schomatis closed this as completed Mar 15, 2019

nicola mentioned this issue Mar 21, 2019

Captain log filecoin-project/research#103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache (expansion?) parents #455

Cache (expansion?) parents #455

porcuquine commented Jan 24, 2019 •

edited

porcuquine commented Jan 25, 2019

schomatis commented Jan 29, 2019

porcuquine commented Jan 29, 2019

porcuquine commented Jan 29, 2019

schomatis commented Jan 29, 2019

schomatis commented Jan 29, 2019 •

edited by porcuquine

schomatis commented Feb 2, 2019 •

edited

porcuquine commented Feb 4, 2019

schomatis commented Feb 7, 2019

schomatis commented Feb 12, 2019

schomatis commented Mar 15, 2019

Cache (expansion?) parents #455

Cache (expansion?) parents #455

Comments

porcuquine commented Jan 24, 2019 • edited

Cache strategy design

Step 1: replication.

Original write-up

porcuquine commented Jan 25, 2019

schomatis commented Jan 29, 2019

porcuquine commented Jan 29, 2019

porcuquine commented Jan 29, 2019

schomatis commented Jan 29, 2019

schomatis commented Jan 29, 2019 • edited by porcuquine

schomatis commented Feb 2, 2019 • edited

porcuquine commented Feb 4, 2019

schomatis commented Feb 7, 2019

schomatis commented Feb 12, 2019

schomatis commented Mar 15, 2019

porcuquine commented Jan 24, 2019 •

edited

schomatis commented Jan 29, 2019 •

edited by porcuquine

schomatis commented Feb 2, 2019 •

edited