Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: chain: light-weight patch to fix calibrationnet again by removing move_partitions from built-in actors #11409

Merged
merged 13 commits into from
Nov 15, 2023

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Nov 10, 2023

TODO

Related Issues

The calibration test network upgraded to nv21 / actors v12 2 weeks ago. Further testing on the testnet revealed a minor bug in a new feature that has since been patched: filecoin-project/builtin-actors#1455.

We then patched the testnet with the fix but ran into another confirmed bug related to the implementation of FIP-0070 at filecoin-project/builtin-actors#1326.

For now, we propose that we completely revert FIP-0070 from the testnet and NV21. See the corresponding built-in Actors PR at filecoin-project/builtin-actors#1481. Unfortunately, this is consensus-critical, and so needs a coordinated upgrade on the testnet in order to land the new code. This PR proposes a relatively easy way to achieve this and is based on #11363.

Proposed Changes

  • Create a new UpgradeWatermelonFixHeight2 upgrade that ONLY exists for calibrationnet (disabled for all others)
  • At that height, upgrade all miner actors to the new, fixed miner actor CID that does not have FIP-0070 changes
  • Include the patched actors bundle as the v12 bundle for calibrationnet
  • Also, explicitly add the "previous buggy" actors bundle to v12.tar.zst
  • Also, explicitly add the "previous buggy" miner actor info to actorMeta

Additional Info

  • Needs to be tested, we'll do that by:
    • simulating this on calibnet
    • actually running this on a butterflynet
  • Could also consider:
    • Doing a "full" upgrade on calibnet with new network / actors version, etc.
    • Leaving calibnet as-is for now, it'll get patched in the next upgrade.

Checklist

Before you mark the PR ready for review, please make sure that:

  • Commits have a clear commit message.
  • PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
    • area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
  • If the PR affects users (e.g., new feature, bug fix, system requirements change), update the CHANGELOG.md and add details to the UNRELEASED section.
  • New features have usage guidelines and / or documentation updates in
  • Tests exist for new functionality or change in behavior
  • CI is green

@aarshkshah1992 aarshkshah1992 requested a review from a team as a code owner November 10, 2023 11:21
@aarshkshah1992 aarshkshah1992 changed the title [WIP] Upgrade calibnet by removing move_partitions from miner actor in v12 [WIP] feat: chain: light-weight patch to fix calibrationnet again by removing move_partitions from built-in actors Nov 10, 2023
Copy link
Contributor

@arajasek arajasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 Thanks a lot for this, it looks great! There's really only one major change needed: there's one actor whose state needs to be updated, which is the system actor. This actor contains the manifest entry in its state, and so needs to be updated in the patch migration.

The reason you missed this is because we actually missed this in the original patch :D We fixed this on the release branches, but hadn't done so on master -- I just did so by landing this forward-port PR.

If you rebase this PR onto master, you'll see the necessary change. I suspect you'll have to modify the buildUpgradeActorsV12MinerFix method to also take the manifest CID as input, but you can take a stab at it.

Apart from that, I want to squint a little closer at how the bundles & manifests are being loaded, but that can be a second pass.

@@ -190,9 +191,10 @@ func readEmbeddedBuiltinActorsMetadata(bundle string) ([]*BuiltinActorsMetadata,
return nil, xerrors.Errorf("error loading builtin actors bundle: %w", err)
}

// The following manifest cid existed temporarily on the calibnet testnet
// The following manifest cids existed temporarily on the calibnet testnet
// We include it in our builtin bundle, but intentionally omit from metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We include it in our builtin bundle, but intentionally omit from metadata
// We include them in our builtin bundle, but intentionally omit from metadata

@@ -70,6 +70,9 @@ var UpgradeWatermelonHeight = abi.ChainEpoch(200)
// This fix upgrade only ran on calibrationnet
const UpgradeWatermelonFixHeight = -100

// This fix upgrade only ran on calibrationnet
const UpgradeWatermelonFix2Height = -101
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this (and the above) fields to UpgradeCalibrationWatermelonFix2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about missing these !

@@ -85,6 +85,10 @@ const UpgradeWatermelonHeight = 1013134
// 2023-11-07T13:00:00Z
const UpgradeWatermelonFixHeight = 1070494

// 2023-11-07T13:00:00Z
// TODO INSERT VALUE HERE ONCE DECIDED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @jennijuju Thursday the 16th is a good day to aim for.

@aarshkshah1992
Copy link
Contributor Author

@arajasek Pulled in master and addressed your review. Note that there are some TODOs we'll have to address before using this (enumerated in the PR description).

}

// now confirm we have the one we're migrating to
if haveManifest, err := stateStore.Has(ctx, newManifestCID); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the newManifestCID get into the state store ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I have a bug here -- I need to load calibnetv12BuggyBundleSuffix2, pushing a commit to fix this. Thank you!

With that fix, the two blocks above should do this:

  • bundle.LoadBundles(ctx, stateStore, actorstypes.Version12) loads the "correct" bundle, cuz that's what's hard-coded as the bundle to use for v12 actors
  • build.GetEmbeddedBuiltinActorsBundle(actorstypes.Version12, calibnetv12BuggyBundleSuffix2), followed by a call to LoadBundle should load the second buggy bundle (the one that's currently live)

if err := bundle.LoadBundles(ctx, stateStore, actorstypes.Version12); err != nil {
return cid.Undef, xerrors.Errorf("failed to load manifest bundle: %w", err)
}

// this loads the second buggy bundle, for UpgradeWatermelonFixHeight
_, ok := build.GetEmbeddedBuiltinActorsBundle(actorstypes.Version12, calibnetv12BuggyBundleSuffix2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this line, we're ensuring that we have the "second buggy bundle" as that is what we will upgrade to during the first upgrade epoch ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (but I'm missing a step), see above.

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some questions

@aarshkshah1992 aarshkshah1992 changed the title [WIP] feat: chain: light-weight patch to fix calibrationnet again by removing move_partitions from built-in actors feat: chain: light-weight patch to fix calibrationnet again by removing move_partitions from built-in actors Nov 14, 2023
@aarshkshah1992
Copy link
Contributor Author

lgtm.

@aarshkshah1992
Copy link
Contributor Author

But would love a full review from @Stebalien too.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing obviously wrong, but we should really test it. Would it be possible to test this on a devnet? We'd likely have to swap out the CIDs and all the "is this calibrationnet" checks, but that seems doable and it should let us actually go over the upgrade.

@jennijuju
Copy link
Member

There's nothing obviously wrong, but we should really test it. Would it be possible to test this on a devnet? We'd likely have to swap out the CIDs and all the "is this calibrationnet" checks, but that seems doable and it should let us actually go over the upgrade.

I believe we have the 2k mocks here https://github.com/filecoin-project/lotus/tree/asr/devnet - need to double confirm if we have run the testing

@Stebalien
Copy link
Member

Ah! Awesome.

@rjan90
Copy link
Contributor

rjan90 commented Nov 15, 2023

I believe we have the 2k mocks here https://github.com/filecoin-project/lotus/tree/asr/devnet - need to double confirm if we have run the testing

I tested the asr/devnet branch on a devnet, and the upgrades went fine locally:

At epoch 8:

Chain: [sync behind! (4m46s behind)] [basefee 39.276 pFIL] [epoch 8]
-- 
Network Version: 20
Actor Version: 11
Manifest CID: bafy2bzaceay35go4xbjb45km6o46e5bib3bi46panhovcbedrynzwmm3drr4i

Actor             CID
account           bafk2bzacecf2pprkbdlpm4e2xz3ufunxtgrgyh2ie3stuqiyhibsvdze7kvri
cron              bafk2bzaceasr5d2skowvzv5mzsyak6waqrgc46ewj6rzbapkfi5woom6n6bwa
datacap           bafk2bzaceaqd77gptubupda7rp7daxkxbkzwc253dxhiyoezxvj2tljmkgpny
eam               bafk2bzacedve6p4ye6zxydjbfs4ode5r2equ7rqzpyltujsq2lu6wyxnijfx4
ethaccount        bafk2bzacea25xfsxwew3h2crer6jlb4c5vwu2gtch2jh73ocuxjhupenyrugy
evm               bafk2bzacece5hivtkmi757lyfahgti7xuqgofodb2u65pxgf6oizfwiiwlcsi
init              bafk2bzacecxnr5y7qifzdqqiwfbjxv2yr7lbkcyu3e2mf5zjdncteupxdlquu
multisig          bafk2bzaceayap4k4u3lbysaeeixct5fvhmafy3fa5eagvdpk3i4a7ubfdpobe
paymentchannel    bafk2bzaceafgrz5wepbein35gie7rnsu7zttxvgllgdneuefmmy4j5izydtza
placeholder       bafk2bzacedfvut2myeleyq67fljcrw4kkmn5pb5dpyozovj7jpoez5irnc3ro
reward            bafk2bzacedwbtfqlx47fdkxjrb5mwiatheci44x3zkpx33smybc2cme23ymuo
storagemarket     bafk2bzaceaj74fmooaf3gj3ebwon64ky7hhdh7kytdr3agclqfrqzmpzykh7g
storageminer      bafk2bzacedb7bokkzzs7hnbhivp74pgcpermuy7j6b3ncodylksukkxtnn7ze
storagepower      bafk2bzacedilnkegizkxz3nuutib4d4wwlk4bkla22loepia2h53yf4hysmq6
system            bafk2bzacedpyoncjbl4oxkjm5e77ngvpy2xfajjc4myfsv2vltvzxioattlu2
verifiedregistry  bafk2bzacebdqi5tr5pjnem5nylg2zbqcugvi7oxi35bhnrfudx4y4ufhlit2k

After the first upgrade epoch (UpgradeWatermelonFixHeight = 20)

lotus state get-actor t01000
Address:	t01000
Balance:	848.629437295097244884 FIL
Nonce:		0
Code:		bafk2bzaceckqrzomdnfb35byrhabrmmapxplj66cv3efw7u62qswjaqsuxah4 (fil/12/storageminer)
Head:		bafy2bzacebway52q4jwwjeyf5u5ncwmbbs6a4gnalsxviwutkr33mm74nz3rs

After the second upgrade epoch (UpgradeWatermelonFix2Height = 25)

lotus state get-actor t01000
Address:	t01000
Balance:	1051.719945011572714954 FIL
Nonce:		0
Code:		bafk2bzacecs262232b3awcrilyzpdketeayyqzzwgoavtxilgjvayrz55ovk4 (fil/12/storageminer)
Head:		bafy2bzacea3ccofag72dfbebjy45bgfewc2w3i5ruiyjxho7bt3ezc6uewk6e

lotus state actor-cids
Network Version: 21
Actor Version: 12
Manifest CID: bafy2bzaceasjdukhhyjbegpli247vbf5h64f7uvxhhebdihuqsj2mwisdwa6o

Actor             CID
account           bafk2bzacedki4apynvdxxuoigmqkgaktgy2erjftoxqxqaklnelgveyaqknfu
cron              bafk2bzacebjpczf7qtcisy3zdp3sqoohxe75tgupmdo5dr26vh7orzrsjn3b2
datacap           bafk2bzacecz4esatk7gizdc7yvl6soigkelhix7izbc75q6eqtb7gjzavpcqc
eam               bafk2bzacebhtpd5mxfyovi7fgsfj62nhtmh4t5guob4sgq73ymgsk7473ltig
ethaccount        bafk2bzacebvdbbw5ag4qnxd7cif5mtakrw4wzv63diwl7awta5plaidfay4vg
evm               bafk2bzacebb7vrhprnshn52bzfmypjdpcrcfecapk232a6gapk3kghu2mp67q
init              bafk2bzaceaw4iouukgqxmwukfpt3sakdvsu75ftjvw47swnwtdftz5oszbt4w
multisig          bafk2bzaceahyjwf6re4mnuwhopglo3qzh6aboluboncpijm7vuiz3u4bkazho
paymentchannel    bafk2bzaceaupjw3djghaqw3g3hd4tw7uuas3njkszgzx2fhmgqh5eh4e6q2by
placeholder       bafk2bzacedfvut2myeleyq67fljcrw4kkmn5pb5dpyozovj7jpoez5irnc3ro
reward            bafk2bzacebzso6xkjxdscbpncw7el2d4hap6lfkgwqzrbc76lzp33vkwk6obc
storagemarket     bafk2bzacebzg74vyk3gzbhnz4zviwvxblyar574mtd6ayognmsvlkriejmunu
storageminer      bafk2bzacecs262232b3awcrilyzpdketeayyqzzwgoavtxilgjvayrz55ovk4
storagepower      bafk2bzacebbtj2m2ajawfuzxqz5nmdep7xevjo2qfjqa5tx3vr5m6qojolya4
system            bafk2bzacecnau5wddulbsvwn75tc3w75jrlvkybgrlxs4ngonqab6xq3eowvg
verifiedregistry  bafk2bzacec37mddea65nvh4htsagtryfa3sq6i67utcupslyhzbhjhoy6hopa

Copy link
Contributor

@arajasek arajasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to unset / update the mainnet upgrade epoch, but otherwise LGTM.

// TODO INSERT VALUE HERE ONCE DECIDED
const UpgradeWatermelonFix2Height = -1
// 2023-11-21T13:00:00Z
const UpgradeWatermelonFix2Height = 1108174
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arajasek arajasek merged commit 808a6e9 into master Nov 15, 2023
87 checks passed
@arajasek arajasek deleted the fix/calibnet-remove-move-partitions branch November 15, 2023 18:07
arajasek added a commit that referenced this pull request Nov 15, 2023
…ng move_partitions from built-in actors (#11409)

* upgrade calibnet by removing move_partitions from miner actor in actor v12

* cids for buggy bundles

* revert changes to v12 tar

* upgrade system actor state

* update based on manifest

* nit: clean up some comments

* chore: rename param to oldBuggyMinerCID

* refactor, ensure both buggy bundles are loaded

* update to actors v12.0.0-rc.3

* fix: load second buggy bundle for UpgradeWatermelonFixHeight

* add calibration fix2 upgrade epcoh

* update mainnet upgrade epoch

---------

Co-authored-by: Aayush <arajasek94@gmail.com>
Co-authored-by: jennijuju <jiayingw703@gmail.com>
arajasek added a commit that referenced this pull request Nov 15, 2023
…ng move_partitions from built-in actors (#11409)

* upgrade calibnet by removing move_partitions from miner actor in actor v12

* cids for buggy bundles

* revert changes to v12 tar

* upgrade system actor state

* update based on manifest

* nit: clean up some comments

* chore: rename param to oldBuggyMinerCID

* refactor, ensure both buggy bundles are loaded

* update to actors v12.0.0-rc.3

* fix: load second buggy bundle for UpgradeWatermelonFixHeight

* add calibration fix2 upgrade epcoh

* update mainnet upgrade epoch

---------

Co-authored-by: Aayush <arajasek94@gmail.com>
Co-authored-by: jennijuju <jiayingw703@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants