New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disk-backed merkle trees #481
Comments
WIP: Making two docker images,
|
As discussed offline, here's some more detail about how I think this should probably work.
|
@porcuquine Some questions (Q) and observations (O, if you agree you don't need to answer those) about the previous design (D) considerations:
rust-fil-proofs/src/writer/memmap.rs Lines 9 to 32 in d5bbcd7
(But instead of wrapping
|
|
@porcuquine A few follow-ups:
|
Regarding Q8 or O8 above, I'm not sure if that is a question or observation. If question, I don't follow. If observation, carry on. |
Not sure if I completely follow the path solution but I have enough to move forward and we can revisit it over a concrete implementation. |
The solution to back merkle trees with |
(Of all the |
@schomatis @porcuquine I believe most of this work is done, can we close this issue then? |
Not sure what's the full scope of this issue, I'll defer to @porcuquine on this one. |
I'm not completely up-to-date on what has been shipped, but I believe the spirit of this issue has been fulfilled. My only question is around how 'finished' the work is. If we can now replicate with less memory than before by flipping a configuration switch, then I think this work is done. If not, then there are some hanging threads which should be accounted for. In that case, I'm still fine with closing this, as long as we have follow-up issues covering whatever remains. If the needs are unclear, let's discuss -- mostly just need to make sure the work is fully integrated and usable. |
I think then we can wrap this up once #655 is finished (I'll mark that issue as a closer for this one) when we'll be able to replicate using a max RSS of 3 sector sizes (lowering it from 5, or even more, originally). |
Original write-up
Description
We devote a lot of memory to holding onto merkle trees during replication. We should eliminate this requirement, which limits the sector size we can replicate. The trees should live on disk; they don't need to be in memory between when they are generated and when they are used to create the proof. The root of each tree will be needed to calculate
comm_r_star
, so it should probably be extracted and held before flushing to disk, though.Whether the best solution is to construct the tree in mmapped data, or to just dump and load it from a file is an expedient question for now.
Acceptance criteria
Ideally, we would see memory profiling evidence before and after.
Alternately, empirical evidence of replicating a sector size which fails for lack of memory before then succeeds after would do.
Risks + pitfalls
As we consider generalizing our approach to (partial and complete) merkle tree caching, and to sharing the mechanism betwee PoRep and PoSt, the design space becomes somewhat dense. Don't get caught up in trying to solve future problems — but do bear in mind the design trajectory, to the extent you're familiar with it.
Where to begin
Merkle tree generation: https://github.com/filecoin-project/rust-proofs/blob/master/storage-proofs/src/drgraph.rs#L26-L28
We have all the trees together here: https://github.com/filecoin-project/rust-proofs/blob/master/storage-proofs/src/layered_drgporep.rs#L427-L444
We use all the trees when proving here: https://github.com/filecoin-project/rust-proofs/blob/master/storage-proofs/src/layered_drgporep.rs#L237-L240
What we hold onto and pass around in
aux
should not require a a full merkle-tree's worth of memory, except when the relevant layer's tree is being used to generate the proof: https://github.com/filecoin-project/rust-proofs/blob/master/storage-proofs/src/layered_drgporep.rs#L259-L261Work Log
Most of the current work is devoted to setting up the tools to automate the memory profiling of the
zigzag
example (the application of themmap
itself doesn't seem to present much of a problem).Memory profiling
Building a full profiler script that would allow me to use
gperftools
with thezigzag
example. The PR #487 adds the heap profiler (0004889) already supported inrust-gperftools
. This is already working (although I still haven't studied the profile output in depth).Trying to automate the previous profile mechanism into
rust-proofs-infra
to have a reliable working environment on which to run the benchmarks: https://github.com/filecoin-project/rust-proofs-infra/issues/17.mmap
Looking at d5bbcd7 as an example. The file has been removed (even though other examples of it exist throughout the code). I'm not sure if it's worth abstracting it again, I've been told @laser might be working on it, so for the moment I'll just try something simple like what's already done in
rust-fil-proofs/filecoin-proofs/examples/drgporep-vanilla-disk.rs
Lines 36 to 47 in 90bb044
So what I'm implementing at the moment looks like ac21c56.
TODO
Closed [WIP] measure memory usage with gperftools #487 in favor of decoupling
gperftools
support in examples: zigzag: add heap profiler #509. Update references in "Memory profiling" section.Merge conclusions from design discussion in sub-thread disk-backed merkle trees #481 (comment).
Merge Docker images disk-backed merkle trees #481 (comment).
The text was updated successfully, but these errors were encountered: