Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate, benchmark, and optimize the nv21 SectorInfo migration #11071

Closed
7 tasks done
arajasek opened this issue Jul 12, 2023 · 2 comments · Fixed by #11149
Closed
7 tasks done

Integrate, benchmark, and optimize the nv21 SectorInfo migration #11071

arajasek opened this issue Jul 12, 2023 · 2 comments · Fixed by #11149
Assignees

Comments

@arajasek
Copy link
Contributor

arajasek commented Jul 12, 2023

Background

State migrations are events that occur during Filecoin network upgrades. At minimum, they involve iterating over all entries in the Filecoin state tree. Simpler migrations require updating actor code CIDs, while complex migrations might involve processing other parts of state such as deals, sector infos, submitted proofs, etc.

In particular, we have twice performed migrations over all SectorInfos. These are slow, and so rely heavily on previous work ("pre-migrations") in order to complete the migration in ~sub-epoch time. Migrations that take longer than a single epoch result in null rounds on the blockchain.

User Story

I, someone running a Lotus node, want to be able to sync network upgrades that include network upgrades with minimal disruption. This means that the migration runs in < 1 min for SPs with good hardware, < 5 mins for most users, and < 2 hours for all cases (including archival nodes). Additionally, the node's memory usage during the migration period should not cause the node to fall out of sync.

Technical Breakdown

With an eye on the nv21 migration, we need to hook up the migration (already written) to the existing benchmarking tool, and performance test. We should then investigate various performance ideas.

Tasks

@jennijuju
Copy link
Member

i think the first two tasks are already done in https://github.com/filecoin-project/lotus/tree/feat/nv21-skeleton

@snissn snissn self-assigned this Aug 1, 2023
@jennijuju jennijuju assigned ZenGround0 and unassigned arajasek Aug 1, 2023
@jennijuju jennijuju linked a pull request Aug 9, 2023 that will close this issue
7 tasks
@snissn
Copy link
Contributor

snissn commented Aug 11, 2023

This has been benchmarked and the optimization for AMT diffing has been verified to be correct via correct stateroot cids. In addition it meets the requirement of migration timing.

REMINDER: If you are running this, you likely want to ALSO run the continuity testing tool!
manifest cid: bafy2bzacedowjygm2fy5yryyyjcww6znqngoz4rildgouuj6uk35z7a2u5gmc
migration height  3104159
old cid  bafy2bzacebap5x7ow7xwubdnqfodveuksbp4k63oe7f6ukl752cm3vtp5m7fo
new cid  bafy2bzaced3m4tbxhq5fcybisppaiafbnguwecjxhkn46jds7y75yimrfc2vq
completed round actual (without cache), took  11m49.470263157s
manifest cid: bafy2bzacedowjygm2fy5yryyyjcww6znqngoz4rildgouuj6uk35z7a2u5gmc
completed premigration, took  10m20.103971889s
manifest cid: bafy2bzacedowjygm2fy5yryyyjcww6znqngoz4rildgouuj6uk35z7a2u5gmc
completed round actual (with cache), took  15.86165313s
Ec2 Instance Type - i4i.4xlarge
lotus branch - feat/nv21-migrations-optimized-tree
worker count - num-cpus

below without optimizations in go-state-type version:

REMINDER: If you are running this, you likely want to ALSO run the continuity testing tool!
manifest cid: bafy2bzacedowjygm2fy5yryyyjcww6znqngoz4rildgouuj6uk35z7a2u5gmc
migration height  3104159
old cid  bafy2bzacebap5x7ow7xwubdnqfodveuksbp4k63oe7f6ukl752cm3vtp5m7fo
new cid  bafy2bzaced3m4tbxhq5fcybisppaiafbnguwecjxhkn46jds7y75yimrfc2vq
completed round actual (without cache), took  16m12.368587429s
manifest cid: bafy2bzacedowjygm2fy5yryyyjcww6znqngoz4rildgouuj6uk35z7a2u5gmc
completed premigration, took  15m33.544663821s
manifest cid: bafy2bzacedowjygm2fy5yryyyjcww6znqngoz4rildgouuj6uk35z7a2u5gmc
completed round actual (with cache), took  1m43.923645056s
Ec2 Instance Type - i4i.4xlarge
lotus branch - feat/nv21-migrations-perf
worker count - num-cpus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants