Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snap-up sector stuck in "UpdateActivating" state #8415

Open
9 of 18 tasks
SBudo opened this issue Mar 31, 2022 · 21 comments
Open
9 of 18 tasks

Snap-up sector stuck in "UpdateActivating" state #8415

SBudo opened this issue Mar 31, 2022 · 21 comments
Assignees

Comments

@SBudo
Copy link

SBudo commented Mar 31, 2022

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

Lotus version:
Daemon:  1.15.1-rc3+mainnet+git.003df89ed.dirty+api1.5.0
Local: lotus version 1.15.1-rc3+mainnet+git.003df89ed.dirty

lotus-miner version:
Daemon:  1.15.1-rc3+mainnet+git.003df89ed.dirty+api1.5.0
Local: lotus-miner version 1.15.1-rc3+mainnet+git.003df89ed.dirty

Describe the Bug

Experienced a lotus node failure while a sector (sector 155) was in state "UpdateActivating". While the node and miner are all back up and running, the sector has now been stuck in that state for over 72 hours.

Timeline of the event:

  • Sector 155 is "snapped up" on 28/03 @ 16:18
  • Sector 155 receives a deal on 28/03 @ 22:38
  • Sector 155 is finalised (and deal activated) on 28/03 @ 23:23
  • Our lotus node stops syncing with the chain on 29/03 @ 05:22
  • Miner is stopped on 29/03 @ 09:13
  • Snapshot "minimal_finality_stateroots_1672800_2022-03-28_18-00-00.car" is imported on 29/03 @ 09:48
  • Lotus node is synced to the chain on 29/03 @ 10:40
  • miner is restarted on 29/03 @ 10:41

Sector 155 has been stuck in "UpdateActivating" since.

Note that, during the lotus node and miner down time, all WindowPoSt (4 partitions) were missed (from 7:54am to 9:24am)

Lotus and miner logs.zip

Logging Information

See logs attached for both miner and node

Repo Steps

see above.

@LexLuthr
Copy link
Contributor

LexLuthr commented Mar 31, 2022

Quoting the author from slack

Got some important additional information....
We had a lotus node outage in the early morning of the 29th, but what I didn't know was that the team pruned it and imported a fresh snapshot! (sorry about this, I thought the node just got behind and "caught up" later, didn't think the team would have imported a new snapshot in the process and not let me know... grrrrrrr :face_with_symbols_on_mouth:).
Could this be why the miner missed the update from the chain?
The steps would have been:
sector 155 is finalised at 23:23 on the 28th
at around 5am on the 29th, the lotus node gets behind on the chain (stops processing blocks)
the team decided to remove the chain and import a snapshot at 10am on the 29th.
mode is synced by about noon and normal operations resumes, but sector 155 is now still in UpdateActivating

I could not find anything in the logs. But after looking through the code, i have a theory.

return m.events.ChainAt(func(context.Context, TipSetToken, abi.ChainEpoch) error {
actually send a sectorUpdateActive at a targetHeight.

Which is predetermined by

targetHeight := mw.Height + lb + InteractivePoRepConfidence

But if we look at the

// ChainAt invokes the specified `HeightHandler` when the chain reaches the
and the events that happened with chain pruning.

The miner would not be able to determine its height and compare to targetHeight. I think this is where it got stuck.

If the context was cancelled(timeout I suspect) before the eventHandler was fired off then state would persist and nothing should happen. In our case, 900 epochs account for 7.5 hours. Which is in the middle of node issue. Now, if this theory is correct and "UpdateActivating" is idempotent then firing it again should trigger the next chain of events. I will put all this on the issue and wait for someone from core team to give their thoughts.

Again, this is just a theory based on my understanding of the code and could entirely be wrong.

@Reiers Reiers added need/analysis Hint: Needs Analysis SnapDeals and removed need/triage labels Mar 31, 2022
@Reiers Reiers removed the need/analysis Hint: Needs Analysis label Apr 21, 2022
@Reiers
Copy link

Reiers commented Apr 21, 2022

Hi @SBudo

Thanks for the report.
After 1000 SnapDeals, Im unable to reproduce the issue on the current Lotus version.

Is the sector still stuck in that state?

@Reiers Reiers added the need/author-input Hint: Needs Author Input label Apr 21, 2022
@github-actions
Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 24 hours.

@github-actions
Copy link

This issue was closed because it is missing author input.

@SBudo
Copy link
Author

SBudo commented May 5, 2022

@Reiers
Sorry, missed the notification :-(
Yes, sector is still stuck.
I did another prune of the chain, and now I'm getting these errors in my miner log:
2022-05-05T00:00:02.636+1000 ERROR sectors storage-sealing/states_replica_update.go:267 error in handleUpdateActivating {"error": "failed to load message: blockstore: block not found"}

Considering to just try to force the sector status in "proving".
If that doesn't work, then I'll terminate it

@Reiers
Copy link

Reiers commented May 5, 2022

@SBudo
Let me know if it happens again and I will re-open - unable to repro here and have tried for some time.

@Andrew09if
Copy link

I have the same error with 2 sectors. Deals are active.
Forcing to "Proving" state unsucessfull.

2022-05-06T18:02:22.086Z INFO markets loggers/loggers.go:20 storage provider event {"name": "ProviderEventDealActivationFailed", "proposal CID": "bafyreigooen5jybdm2z5prqn4udg2hjpspvzcnnxytnydptokpaw6qtwku", "state": "StorageDealFailing", "message": "error activating deal: failed to set up called handler: called check error (h: 1785124): failed to look up deal on chain: looking for publish deal message bafy2bzacea2blcv2boaphl7f2ffpbwsvtdc42qz47grcpf6fh4jficcbr6ivs: search msg failed: failed to load message: blockstore: block not found"}
2022-05-06T18:02:22.088Z WARN providerstates providerstates/provider_states.go:595 deal bafyreigooen5jybdm2z5prqn4udg2hjpspvzcnnxytnydptokpaw6qtwku failed: error activating deal: failed to set up called handler: called check error (h: 1785124): failed to look up deal on chain: looking for publish deal message bafy2bzacea2blcv2boaphl7f2ffpbwsvtdc42qz47grcpf6fh4jficcbr6ivs: search msg failed: failed to load message: blockstore: block not found
2022-05-06T18:02:22.138Z INFO markets loggers/loggers.go:20 storage provider event {"name": "ProviderEventFailed", "proposal CID": "bafyreigooen5jybdm2z5prqn4udg2hjpspvzcnnxytnydptokpaw6qtwku", "state": "StorageDealError", "message": "error activating deal: failed to set up called handler: called check error (h: 1785124): failed to look up deal on chain: looking for publish deal message bafy2bzacea2blcv2boaphl7f2ffpbwsvtdc42qz47grcpf6fh4jficcbr6ivs: search msg failed: failed to load message: blockstore: block not found"}

@Reiers
Copy link

Reiers commented May 12, 2022

Im seeing the same thing here. Re-opening.

@Reiers Reiers reopened this May 12, 2022
@Reiers Reiers removed the need/author-input Hint: Needs Author Input label May 12, 2022
@SBudo
Copy link
Author

SBudo commented May 13, 2022

I did another prune of the chain. After restarting the node and miner with a new imported snapshot, I started to get this error in the log:
2022-05-13T15:37:23.386+1000 ERROR sectors storage-sealing/states_replica_update.go:267 error in handleUpdateActivating {"error": "failed to load message: blockstore: block not found"}

Also tried to force the sector status to "Proving", but it's not doing it (no errors when trying, just not applying the status)

@jennijuju jennijuju added P3 P3: Might get resolved area/ux Area: UX and removed kind/stale labels May 20, 2022
@donkabat
Copy link

donkabat commented Sep 7, 2022

lotus-miner version 1.17.1-rc3+mainnet+git.079dab0d7
lotus-miner info => UpdateActivating: 15

lotus-miner logs every minute:
2022-09-07T04:43:06.119Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzaceaqhwkak7l6ccu7nyvzon6w6bxqzufcptsh4dwdygwljpxy5gemjm"} 2022-09-07T04:43:06.119Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzacecca4a546la5p3jjjz5jax3dwfwgtzzvtjcvpioq3clgpaesjm366"} 2022-09-07T04:43:06.119Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzacedjk2vtxucvkxrb5o6w6vwxmjsqj3nwjtc2e2furcvyhsahrwivla"} 2022-09-07T04:43:06.119Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzacebddqksfd6elgswb46wc5ecumwvzbvw7nqshc63m63yedy3klmiri"} 2022-09-07T04:43:06.120Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzacea4li5wotzhyqudj4gkou3wx4pbnfk26hvjsbkowndkys5mot4liw"} 2022-09-07T04:43:06.121Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzacedo5itdetxvggst2luj2rz7e3offujezjct6lkwpfetcnl3lga4c2"} 2022-09-07T04:43:06.121Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzacedquka2jtztrty3w3eo2tqrzgyv7zcrvs2qpdpd7jgemciriv5v3y"} 2022-09-07T04:43:06.121Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzaced2rzfk5hwxcicllr4y47a353hjilzhiqzfwdr5wdsaozocor3tam"} 2022-09-07T04:43:06.121Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzaceayqm4xd7cavrfkbwr4ia2ks7f5lpestmrszjnun4jb5j35zibazg"} 2022-09-07T04:43:06.121Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzaceciykq7dxzwfqwj6srkeni2n2d6rzpcql6gjvt4pk3nx6haofilt6"} 2022-09-07T04:43:06.121Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzaceanrqkszvpyge5mnnu7or64zxghxfjnnj6yl46tusvuv5hlgiyvzs"} 2022-09-07T04:43:06.122Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzacebkfqqvbnodfspl5bdpzaul2fxyjjczk55zj2sou46ed2w4ydzixs"} 2022-09-07T04:43:06.122Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzacedhceaqulxixne4ypgny2jys62mg6ivg5qmtu6sghm2flv26pqq6s"} 2022-09-07T04:43:06.142Z ERROR sectors pipeline/states_replica_update.go:272 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzacec2j4uffdxgqawdrgbryego7onr4f4ynzmovv7sv3dcu4jr3sd5wq"}

example sector:

lotus-miner sectors status --log 38217
SectorID:       38217
Status:         UpdateActivating
CIDcommD:       baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq
CIDcommR:       bagboea4b5abcblbpmvzofke45ut5yngjmsrdnjavpjljpaekqwbnrgvy7ytpbcds
Ticket:         c482791a5677719e6c0a2c1a77877a525af9e65aac7e0cb52afb77048b7f9ac7
TicketH:        2102984
Seed:           84abc2811a65f8a2453e861c4eeadeb491d9f3b21fe226484dda75a37e54a001
SeedH:          2104378
Precommit:      bafy2bzaceau2jh6btemwpwzpbl772gyc5zseldimtj6gnqygtvqalk5uug3wc
Commit:         bafy2bzaced7sgliopm2lvf43edsmi27tdqjrrzjcfad2licb3kx6p7i7e7372
Deals:          [9791523]
Retries:        0
--------
Event Log:
0.      2022-08-25 10:21:17 +0000 UTC:  [event;sealing.SectorStartCC]   {"User":{"ID":38217,"SectorType":8}}
1.      2022-08-25 10:22:01 +0000 UTC:  [event;sealing.SectorPacked]    {"User":{"FillerPieces":[{"Size":34359738368,"PieceCID":{"/":"baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"}}]}}
2.      2022-08-25 10:22:01 +0000 UTC:  [event;sealing.SectorTicket]    {"User":{"TicketValue":"xIJ5GlZ3cZ5sCiwad4d6Ulr55lqsfgy1Kvt3BIt/msc=","TicketEpoch":2102984}}
3.      2022-08-25 13:04:41 +0000 UTC:  [event;sealing.SectorPreCommit1]        {"User":{"PreCommit1Out":"eyJfbG90dXNfU2VhbFJhbmRvbW5lc3MiOiJ4SUo1R2xaM2NaNXNDaXdhZDRkNlVscjU1bHFzZmd5MUt2dDNCSXQvbXNjPSIsImNvbW1fZCI6WzcsMTI2LDk1LDIyMiw1MywxOTcsMTAsMTQ3LDMsMTY1LDgwLDksMjI3LDczLDEzOCw3OCwxOTAsMjIzLDI0MywxNTYsNjYsMTgzLDE2LDE4Myw0OCwyMTYsMjM2LDEyMiwxOTksMTc1LDE2Niw2Ml0sImNvbmZpZyI6eyJpZCI6InRyZWUtZCIsInBhdGgiOiIvbW50L21kMC9jYWNoZS9zLXQwMTEyNzY3OC0zODIxNyIsInJvd3NfdG9fZGlzY2FyZCI6Nywic2l6ZSI6MjE0NzQ4MzY0N30sImxhYmVscyI6eyJTdGFja2VkRHJnMzJHaUJWMSI6eyJfaCI6bnVsbCwibGFiZWxzIjpbeyJpZCI6ImxheWVyLTEiLCJwYXRoIjoiL21udC9tZDAvY2FjaGUvcy10MDExMjc2NzgtMzgyMTciLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci0yIiwicGF0aCI6Ii9tbnQvbWQwL2NhY2hlL3MtdDAxMTI3Njc4LTM4MjE3Iiwicm93c190b19kaXNjYXJkIjo3LCJzaXplIjoxMDczNzQxODI0fSx7ImlkIjoibGF5ZXItMyIsInBhdGgiOiIvbW50L21kMC9jYWNoZS9zLXQwMTEyNzY3OC0zODIxNyIsInJvd3NfdG9fZGlzY2FyZCI6Nywic2l6ZSI6MTA3Mzc0MTgyNH0seyJpZCI6ImxheWVyLTQiLCJwYXRoIjoiL21udC9tZDAvY2FjaGUvcy10MDExMjc2NzgtMzgyMTciLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci01IiwicGF0aCI6Ii9tbnQvbWQwL2NhY2hlL3MtdDAxMTI3Njc4LTM4MjE3Iiwicm93c190b19kaXNjYXJkIjo3LCJzaXplIjoxMDczNzQxODI0fSx7ImlkIjoibGF5ZXItNiIsInBhdGgiOiIvbW50L21kMC9jYWNoZS9zLXQwMTEyNzY3OC0zODIxNyIsInJvd3NfdG9fZGlzY2FyZCI6Nywic2l6ZSI6MTA3Mzc0MTgyNH0seyJpZCI6ImxheWVyLTciLCJwYXRoIjoiL21udC9tZDAvY2FjaGUvcy10MDExMjc2NzgtMzgyMTciLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci04IiwicGF0aCI6Ii9tbnQvbWQwL2NhY2hlL3MtdDAxMTI3Njc4LTM4MjE3Iiwicm93c190b19kaXNjYXJkIjo3LCJzaXplIjoxMDczNzQxODI0fSx7ImlkIjoibGF5ZXItOSIsInBhdGgiOiIvbW50L21kMC9jYWNoZS9zLXQwMTEyNzY3OC0zODIxNyIsInJvd3NfdG9fZGlzY2FyZCI6Nywic2l6ZSI6MTA3Mzc0MTgyNH0seyJpZCI6ImxheWVyLTEwIiwicGF0aCI6Ii9tbnQvbWQwL2NhY2hlL3MtdDAxMTI3Njc4LTM4MjE3Iiwicm93c190b19kaXNjYXJkIjo3LCJzaXplIjoxMDczNzQxODI0fSx7ImlkIjoibGF5ZXItMTEiLCJwYXRoIjoiL21udC9tZDAvY2FjaGUvcy10MDExMjc2NzgtMzgyMTciLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9XX19LCJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjFfMSJ9"}}
4.      2022-08-25 13:13:31 +0000 UTC:  [event;sealing.SectorPreCommit2]        {"User":{"Sealed":{"/":"bagboea4b5abcblbpmvzofke45ut5yngjmsrdnjavpjljpaekqwbnrgvy7ytpbcds"},"Unsealed":{"/":"baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"}}}
5.      2022-08-25 13:13:36 +0000 UTC:  [event;sealing.SectorPreCommitted]      {"User":{"Message":{"/":"bafy2bzaceau2jh6btemwpwzpbl772gyc5zseldimtj6gnqygtvqalk5uug3wc"},"PreCommitDeposit":"10337498238119200","PreCommitInfo":{"SealProof":8,"SectorNumber":38217,"SealedCID":{"/":"bagboea4b5abcblbpmvzofke45ut5yngjmsrdnjavpjljpaekqwbnrgvy7ytpbcds"},"SealRandEpoch":2102984,"DealIDs":[],"Expiration":3653667,"ReplaceCapacity":false,"ReplaceSectorDeadline":0,"ReplaceSectorPartition":0,"ReplaceSectorNumber":0}}}
6.      2022-08-25 13:17:00 +0000 UTC:  [event;sealing.SectorPreCommitLanded]   {"User":{"TipSet":[{"/":"bafy2bzaceaqmfieruvolfyeo5ilkkink5n7vfy75x5pobg6774jcdqfova56c"},{"/":"bafy2bzaceckm4ijdmw7fzmxbicnvvawe7nouwcdqshy6ubdhsnl6vigmn64uo"},{"/":"bafy2bzacebxpsanogym4b7afvc3b5smcm7b6ashmzeqs7vqwmjiswnva5lqbq"},{"/":"bafy2bzacec3l3w7fdy2hl3p6hu4og4vj4xa2535uffeeoulqmslp4jwt2lsa6"},{"/":"bafy2bzacecacyofdnr7yrmlvo46nu2dy6qazkiwdq6bag2tbmjvq3z3dgy3to"},{"/":"bafy2bzacecc3cdjhcmvtqq4mzylvuovjqfuuzbsygs4s7rs3pvrbnxqm47dpu"}]}}
7.      2022-08-25 14:32:00 +0000 UTC:  [event;sealing.SectorSeedReady] {"User":{"SeedValue":"hKvCgRpl+KJFPoYcTuretJHZ87If4iZITdp1o35UoAE=","SeedEpoch":2104378}}
8.      2022-08-25 14:43:14 +0000 UTC:  [event;sealing.SectorProofReady]        {"User":{"Proof":"j4YInXWlIPjC4QHqpjz5aPZpliJyqg2DTd2cyUV8gITlr27xDyxeMs11F+nFt2I1kOjoekTlRDeLX4T+4AYyPfWJHLyPTKLjrV+NBUMVs8w60eaSd6vE2DCTGdG+GGvNEp6/wI4GDU7R7otIAKIWJxOUHUYap16cucA7H+JiE6ZZAL9yjGxx3dTac4bTNjrqjFIMFHJkZW5ImsaKMHE28q3AhwVaNMVXVJ+Vyt7CqQY80Wm61Es9DSULVxKbsWc8k5tSl/K5VCjhzBLzcn08kUeuM2rZzTPbTBS6niDQQn+aFVwY7oay739u8Ttn8K9zqG7AwBUyVzPnlLhq+d1inVUy5B0ytkJoAm9ws5Y2Yz5BcvvAz9N5hV6QNlHuVWOaEbdv24F09fKzj58nAXAPaPrCfVeWtHxApyVYsYVfitDmkQNQBq6psNx48US5dBOsiZU694Liyc3QulSVLa9Vq/sE545Y3CFMh28zDFVRolkx+quOXtzX7DYzAZ4DXC8VkjEs8sy35/9ZwYfsCK/6HvGhIft7X2xJ1jHUHxcLVnsRx7DxCM77G7o36NHv5I8foaaa8awyTyGMbbJaekdNIHM7dGtP2XZo6zxQgW85g8pows3MZBlmfae4KllENiiZCpuvL1TnduAnTVZmJt4wGTdaRtCg+4ZUlPC6bYJFYFroHKqXy6I7y1TBxGk6evlEpKa2ZxhYWtioWpDhTB4/DHKoqs/PnvfHsQfvRZnSq6fkxPv0NprmBHCl0blTDVY2kUey/NGdoRh9aFvQidUKSvK399UemJVgDh0aucnprrj9qYfT5ghnuk5hZUmxUGrroE90dHSH5aCqWT+voeJwYwCSjmVyRPILuCnCyVvUVjBx0LsKSj+ghmYSQ2CPhOhtE2dHmbUVibSYdUT3bIQKCYpWKO1nYrmSaLkZfVeLLTNlGNHC3GTQ+Ge72SRBAH2ah2d+pYAqErIHWzjKDmdEqU0FYRlSZDf3qv0cFkUwZ14jSBl+8vOaB5bZSUX/PTbXjfU5QWMp+aUB53j6g8hkUMZvZd4Y5c/YSqAASbeprVUAzUJf+Y7q/Jf+IwsF3jUXgla5BTYHFYh+Yc3CSKqACI0hiDjNE1WEhgD0Y3UII7snTsvBLSSWcm77UPUSoTIwGJhoacKI6TuDx26d5E26aUqP4Pu7f8bL/PJOssFPbeoGCzIFp1caeYyOtkyZ5EnmrX2tGN95lOg1vUefCjRN747TfxyIP2fKDoVC0c5SpoDk+ja6JJYbNe5x/rHiUVTNlPx95yOTbyqPWrFraTyoxjExBndl2PVDfq6pGg9xxgwFa81tFss+QpHsll9ShkwguVa6uV0l2aToXplscvQJ3WUFIEBEtpsFuGTROoOTCFBPRkundhcNp+7a9GKgQs/7FAOnQIoitpoo58b+Xlr2JYfJSNHR7B9qdG2SjtOSV1lB1Gh9BPEV4x2iS5eyKnmXk7Njx+2tBZg3szem9iEDTdMb8v/laRSdVq6/K+ZivRckdOTKvTQji4qOWzPuLXrKt97W6BBDxeknuqU7azaesLIxu0QTrIRT75F9Kxl6gLd5aWosP9xNVULkd5fp+toalWN5PHOxAIkkX91COAXKhaG1tUVKmcpEth1+9xE7wfgXFvyORuGFHYs2CB/Els1fEufZ1O4HC2lJJpO+z/nIgb9SnvP/XFiTnWoTLnpknmx8ZpuV5dih30a9VM82EbachZ/WNRi8qdNT2InhU7V96IqQG7OoDDEV7qqRf7AzHA0SCIvcE1kyPoZTq+dc3j1tpYqOsXT5NSdQMh9LQF/4RTVL1Yqzml6gSczB5e0RmEUp2RfpK7A30pW+H9mfph5Eh7QjwhKYIid+1JuXDkZHtlEZiZj9++HeXD3AunzDp/K/wQiaua7sSpoSZ1uMRtCdBNwvrkX6GBMWGDJVocg1lLN/0AlrWP9Q2GlBf9aRSqRpUN1WFAj6KK4ZOwloevsIi+Hi5uNkTNjbsNh/62FUHGwwm61dRKBio+mH9n5lTdGs8MQBew3NI4/F4lXdp3ZSjfchHbKZmQzzqgUIN2tIFRhNVTPvePT7DhNhNlS0Yy+uS8s0XofUU3nXeE1U4RNYgT8kCTwSO28Sq8l9VqJTTqiaWxtTOE8hbRJNPI8birFmACy7lqxyMWB73YKYNJGkALgUpId6XaC44cG0UhskskMO1Km4SC8Y9ZoVScMB0Fv4wq80qauSJNDRpiMQskNgt8bYKg+T5w2AgmK/6R+lYWOM0b9YOKHiKPUm638umvcd2Rd6UkW9ib84bH3mKOYgoEDtHSBX17IVI84kaIT7czlEoqWGap6rDhPgf9gOJs6ZsjihX9weyu14b7PUacyVru+tIvItXkGWGq6Kkmt2XZ0O8dF7l26b3X7S5DApTCfBkAEZzT1iePHzYl0wyyKxBjbdIg2vbWzk5ABPfB7tgWzOn6CyStAw9Oe8BLXbToZIJdjKvCiLj7HOOie4gTqvpH2T1Xh3rTFpz3CHBo5WVhwxE8irUxb4h4kMATLH7zVOPV2gmFiA+rtkLsUitE6V"}}
9.      2022-08-25 14:47:05 +0000 UTC:  [event;sealing.SectorFinalizedAvailable]        {"User":{}}
10.     2022-08-25 14:47:12 +0000 UTC:  [event;sealing.SectorCommitSubmitted]   {"User":{"Message":{"/":"bafy2bzaced7sgliopm2lvf43edsmi27tdqjrrzjcfad2licb3kx6p7i7e7372"}}}
11.     2022-08-25 14:50:59 +0000 UTC:  [event;sealing.SectorProving]   {"User":{}}
12.     2022-08-25 14:51:02 +0000 UTC:  [event;sealing.SectorFinalizedAvailable]        {"User":{}}
13.     2022-09-02 05:57:59 +0000 UTC:  [event;sealing.SectorStartCCUpdate]     {"User":{}}
14.     2022-09-02 05:57:59 +0000 UTC:  [event;sealing.SectorAddPiece]  {"User":{}}
15.     2022-09-02 06:03:17 +0000 UTC:  [event;sealing.SectorPieceAdded]        {"User":{"NewPieces":[{"Piece":{"Size":34359738368,"PieceCID":{"/":"baga6ea4seaqmyw4tbhm4vnvbqoubl2mx2cd2infeegmeuchpsurnl4pqg75i4dq"}},"DealInfo":{"PublishCid":{"/":"bafy2bzaced57qnsu2vsnw72tghk7zvbqojcdhe4w4zkufnheslpylygn74ma2"},"DealID":9791523,"DealProposal":{"PieceCID":{"/":"baga6ea4seaqmyw4tbhm4vnvbqoubl2mx2cd2infeegmeuchpsurnl4pqg75i4dq"},"PieceSize":34359738368,"VerifiedDeal":true,"Client":"f15cps7yo2x4fosvp45opfihk3a4wg2qukmwyet7a","Provider":"f01127678","Label":"Qma3hUfRBKFS4dZky2j8iJegTZnNJ9v7SKDR6c29BJ6yfG","StartEpoch":2138236,"EndEpoch":3653574,"StoragePricePerEpoch":"0","ProviderCollateral":"6780624814919023","ClientCollateral":"0"},"DealSchedule":{"StartEpoch":2138236,"EndEpoch":3653574},"KeepUnsealed":true}}]}}
16.     2022-09-02 06:03:17 +0000 UTC:  [event;sealing.SectorStartPacking]      {"User":{}}
17.     2022-09-02 06:03:17 +0000 UTC:  [event;sealing.SectorPacked]    {"User":{"FillerPieces":null}}
18.     2022-09-02 06:08:41 +0000 UTC:  [event;sealing.SectorReplicaUpdate]     {"User":{"Out":{"NewSealed":{"/":"bagboea4b5abcbxq4cmjqjjxjq6eh33hv6tlti3djxmslpyowekqy6wnvek4cwxto"},"NewUnsealed":{"/":"baga6ea4seaqmyw4tbhm4vnvbqoubl2mx2cd2infeegmeuchpsurnl4pqg75i4dq"}}}}
19.     2022-09-02 06:26:09 +0000 UTC:  [event;sealing.SectorProveReplicaUpdate]        {"User":{"Proof":"kkac4WS6YV2pLRJYeYUtRXNjPYiez9p7nSIxcznG9cDqW9A6GBuch1jjdxgSZAEysJTtr5tayviSSL1PCMWk69I1tNi40kc8TdoO6qYUYGnK0HyeL+loen2/JTTLQbnGCZ7kPMB/SYU6GzD3kixxrQ0qxvU876HEwdyDfHN+wqWkaJ3/DyighvfnyijIfoX5pII9ryYOcfHGhYR73FoQbCm5QhiHd/YdHlK+CbFXaTRFZYtWQf9CfMXG4CPew391qArFPo7VLXndrdPG5oL5m9zKsPklQ5kSeZLwkQ1BWm/qBH7RYxRVrLk7sVLVIVUArrpEdL1rAOIZgzoz6QvdX+k2RlQbYgF5x8dUQRk8nBvcKVGY7VnnncfqP4Snb4tkGBqRcf+J02NGigHXNDe3z8sWSSrQApAC6DhdH8Nc02tTpZ36SDYKSOKxll2wUu/dpDqK+W2TdLxGp/yaZHFMK+jAtCdQ0qGXu7FzTtg/mUcE73nKyClJ9nDVRzahRbuGgahiQjepxWFjZCQJEXGTL7HBQEB2oSgghG98yMxiCE4T+jbLV6tqEujD1MWDSjt5p5s092KCAF58kxFWt3KNABl+03ldwHGDmvrlFqitQxdRv1/fehvl4XFtuFUeMN05FjzRC3u31tBScOP8NAuayWvZ1qsDeoqyDz9KEghI2xGslyjZI2Cf4KuOg97kwRT+qkHsE6foa4ZKf3OrvaujWC37Nm1EdS7DHXW7UGtIHe8KzGlWKyHpLnwLPClu3K9rj19cNDQNuh04g6AYy/X0ygMRBFlfj1WxszECdIr1slXXS9Kcu2Xbtacgq34cPgO3kKH7B32g1Rv1OW7XxdA3+/c8Ta6qu9F3l8UogYyXJ/5N66wWRk6bpCGlvDSwVnBEBIN3EwXXOtNwghRz/lehshddwnf/VNvhXEig+R2LZ7Ez0oR/W7rO0Y7v7bSmLwjkiAoQaNHm1O4PkhTbN/Xmi+DnfwOz9Dg/cNbRE51U00qG61zx6gzl5chOyXwZG42KkmPaWfkDhU3uNqB7sOn3mpcSWj4gmGV9c+5/Fk6SmiCDHhrFbohTfScthSQyobKEmOIFTDMr32Ul8ACNxmCSNmUFyOYODNGRnmqzxOTl5/bBa3NIcyRftZg0dUYSLU6UAqyHdiQldYTkHP1IgDowIBTfmauBYZr/+FVmoQjLeuCtcQk0gcO6+NmGmL4/BR33irzFLKFUceblANR+jEpZAmHMNK0FVlNjgihLbhhGNiHb9WGDYZu3uyRNb/ikQo0TgVILQ1xe/0Y82wXO/3ThCuFcz8HaJWXogIv/fBsLWzzIgXlkGn2Ye/1itFle55PopGgqZSM85v1xYQCtNtJq7czS2tOcZnDOsvU4ATM8X1QM1hcDCgOSyH6bJlEiL4bMEY3ctPWKjHPAhdVLR4xoHDkn7a/o5PQJj0R38Plo5myrscwcA4eIvxnASnvMhZXHlKB+JEoTzTz+zpdVc7aPeHPbu0MxrN2qFiduXxkw/7x7Es+FAVn9OQZLNuy0anjKkNqaibhqE1x2QeHGqN18UbKc6F4d77M7csg9qyNdjFx835LFl0XVoIoh6W2OY4LutWdHNTPVbRL5D6PWq/aVbwm3jwD571EMEudeqd3oaWlHoCGaBrdPf/vSugzla4ihE3l4Xi6g8o3WRf1EoEBrtj5KwM5PL2IF7Sx1YG7M5H8lnHOhbuurOgLI1WC4DVQNsOJ343ZCZ2uoAyVvRdAe6IPYA/JgujzYjDxkbHrEg+RXpPij3vQj/nHxG/crAZItq67ubHS1y4XMO0+zGuWSljFiBIZVcfD2CgarneAZmNc1xCJMsZvUv6lZmC8hNDMfincMcgx0v82FgLhErh8LKJHLw/cq6q003I8rz/ku2U0ZsfP1An/wedy5m+/oflYtEj/A04Fb6SOIq/NhlYOS3mgJqI804HuM64RIR7bcmnk2C4Sgs3kr2qqRwe+LMxgIo2SJKqKqZ738Y7+B3/gdcfNlfhoRYNmNKHB31OOMlwed/PdN5GXT48jGWhJEYX4UpJPObVpSpPTE1bSEyX6eNpOtY+feoDMTQdxyIMjDXceMzyPbfg4Bfw0H2WmbOBDkg25jEIv4mqaZRvOHo3gDuANCYmfrEbdhjJJKgZ93prKur8OFQyRZQeJSW52JL44MDdA8xzm3ALLmB9+YK0vv1Z1Sya0z5v+CkES9as8vYgo8hO6YPce2jb/wIzUFQ6uylqGDzv7pD5QR9vV05Uby/5xzxiTctLMcx9vno5n6bm2XHzh0IrDySQPhNSSEJqsVlQHYsi/32GKQPCADCTgCZ0iDwkMFoeRjb68fSXdhyHnmHykWDSPKJHmncREpii1eth39MqZMqjPI7YKfyllQHnb7pY17xh90C+402adHpa2WnAZnB+825z8AqiEseEcEGQW9myEuoXFmcx/pB5ktuqwUh3IubccpEPMIq2Tl6ZdCYzHmznn8Yht4XLWkkqK1gNzvhCYwyMoyuHiUM10OZF1uvJlLPdNfEGs1nZ/V3AiK/WSFYS/OGss/qU/J82CHjTYMBeyDFR6QvBYw4WLQ82Glol+icJuYeCFxgoBitDUSGfJHBdAOycCoiihR0IiAry4rscQS6uOLqNV3CBzaebHTyLZAa5SJpYYBx8ranN2sTvw9Rlxs8qBuui8OAzowDOepfpJhl+4kVRKORFAsA77RJuPAguiHrIFHYMOvdTTnjRiUyFKyyfIs+GqEwslloRGYk7j+BHjpMeu/Qtaa0TO7so8Y1+r+u+5d+2jahebNnYB6ShSWK1bYhWf47KYdsrqry+g9MZAWH/z4LkoOE9l1U4kDPRhJ1IaEugTuir6L5voLVFgcqWR/U8KTKc8ckJxIsvGHEPPxPxHzPk7qIyrk+C9N9h55OXYka1BmG6BLjBMwpad4TkBHBqUiDNxYF8Z25wwd3fTKBevEt/mz4Ouu0E4RRtLc57cB9INTmUW4pf0ozc1li6TROF5+tTyRiBRpwD1f3J2Xjbo/G1PjuHwSMOTP8EAb5eBOOcHBdDVxniZA6O5UN6wBMBjZvdM1l8YbIcLIIXn8ga8sG82cVuvWJtt4sqYN2EKQNEiQ0ZZ0un2zUtvmwuPCUj1bhunTkSLsP8Q4n72d1M84FiJ8J+SB1gbZwJ+5wo2pWI05hEN52KW91RadrPZvU+vyh4twDTvYAcKz6oTesVk+3OWakhsDBOyaWsRnvZNxb3Oh4Pq8mhj4rVjKf/i5xArkpsYRuGDC4XRIPoezdS58vJwo7a0mGEUKMvZIvxkUDDIZBL0fTSytezn8vjX7H/sGkN4FjIBdl610WCaNlTLLC2fjo9TNCF0OQoTlgXBb2qh4ws3gJfqGWLlNwau2B5C+DzHCkpWB3D60vEgW3BbQuhKby80HpbHYFU7twAjzgWrfR7iqx8IDGfrH8hRUTsemzaFBGedlyDPr1BZbZ+V4bb/HW6XeNtdUtXfNy4qdGx9hLiPQpPc3jxziju0RSvKahg+qoPQIRESnAkY4QmjYAIG5GdDqlNNSc7/vMkASnLmn5aPuA/Ut1feaTs+fb/zoCL3HrD601sBhzTboMZYYzHsym9ULKzmuM2jCn8yzV0V1Hmw1+aq90KxeXGC7yu3ZKqXmpDoQQ7pycrwgFXOPN6ztEnipruK1/vsmtnvqGe8EN944jLYjCWISEqqDG1Fn5GBNCXekivhC3Zh9JbJYkP/nsMw5AX+x+gNsFKuVVKjlI4qbIldQaU+6wex7z8L1qhVlluEdf6y+p4x7K82OQNv7EtOU38K71SmH7G7WNjL/VVs+Fs/kMfErbIG5qIXaGSTZjAtg9+ToqWbPKs7ibOjNc6YrpsNMyHCkmkF8jbrrj0QNr6vh3Kj75WyV7wiX1pt5kAZ8Udg7xLbhKC4SV+tpjROh/NLEhUAR7BafNtNLsVNhXwH15aM5nRP/VidS1x3xA6rrhgg2rkxdpLfkTxhyjO/7J0AdrbNCBdXrM6MxuUz6R4yZXzByxhS3/5dnavRdj0FkoXyO/TsUk2d2tcwz6dtmAokIWVt942bmRsbetuX5mpRcwXnA7KKos7laAWf3"}}
20.     2022-09-02 06:30:41 +0000 UTC:  [event;sealing.SectorFinalized] {"User":{}}
21.     2022-09-02 06:30:41 +0000 UTC:  [event;sealing.SectorReplicaUpdateSubmitted]    {"User":{"Message":{"/":"bafy2bzacebddqksfd6elgswb46wc5ecumwvzbvw7nqshc63m63yedy3klmiri"}}}
22.     2022-09-02 06:35:37 +0000 UTC:  [event;sealing.SectorReplicaUpdateLanded]       {"User":{}}

@vt-everstake
Copy link

vt-everstake commented Sep 8, 2022

Hi all. Have a same problem with 2 sectors. They are stuck in status "UpdateActivating". Restart lotus-miner did not help me
Screenshot 2022-09-08 at 14 12 26
sector_status_58.txt
sector_status_81.txt

@donkabat
Copy link

lotus-miner version 1.17.1+mainnet+git.8db6a939c
lotus-miner info => UpdateActivating: 34

What is more, Sealed files are not removed. When are you gonna fix this issue?

@Reiers
Copy link

Reiers commented Sep 13, 2022

@donkabat This issue is not on the top of the list, since we are getting the power for sectors in UpdateActivating, so the issue is just annoying to look at, and you need to to do some manual work (deleting Sealed from workers)

If you have an extra server, and really want them gone:

  1. Download the full chain snapshot (containing all messages)
  2. export LOTUS_ENABLE_CHAINSTORE_FALLBACK=1
  3. lotus daemon
  4. Wait until fully synced
  5. Export/import your addresses to the node
  6. Switch to that node for 24h and the Epoch the sectors is waiting for should be there, and then go to Proving

Sealed files are not removed. When are you gonna fix this issue?

  • This tends to happen only if a worker drops, or if the sealing pipeline is so full sectors and need to retry and re-schedule the sectors. Or if you prune the chain, Sealed is only removed when is 900 Epochs has passed and it goes to Proving.
    So the issue is somewhat tied together.

@mtelka
Copy link
Contributor

mtelka commented Sep 13, 2022

Isn't it possible to somehow ask or force lotus daemon (full node) to download the single missing message from the network?

@donkabat
Copy link

@Reiers thanks that was the least you could do.

@arajasek
Copy link
Contributor

Thanks for the great report and conversation everyone! I wanna state my understanding of the problem, and what I would suggest.

Understanding of the Problem

  • Sectors were in the UpdateActivating stage, waiting for the message confirmation
  • The miner missed the confirmation event because the node / miner were down
  • When the miner was restarted with a pruned node, it stayed stuck in the UpdateActivating stage
  • Eventually the node was pruned further, and the error switched to the missing block error (where the block is the PRU message CID)

Why it's hard to fix

Isn't it possible to somehow ask or force lotus daemon (full node) to download the single missing message from the network?

  • It's possible to do so, yeah, though I don't think the miner has the ability to ask for it yet (wouldn't be too hard to add).
  • The problem however is that even if you had the message, you wouldn't be able to confirm that it landed successfully on chain, because doing so needs the state tree where it was applied
  • So essentially, if you're missing the state, it's gonna be impossible to confirm

What I'd propose

Based on above, I don't think it's easy to solve this edge-case in the general case. Getting missing state trees is...hard.

However, if users are very confident that the message did in fact land on chain (eg. by checking on a dashboard), I would suggest setting the sector state to ReleaseSectorKey. This is the next state in the FSM, and should lead to the sector moving along to Proving gracefully.

Of course when forcing state, it's always a little unpredictable / error-prone, but I think that's the best thing to do.

@mtelka
Copy link
Contributor

mtelka commented Nov 23, 2022

I solved the problem with one single such sector by modifying handleUpdateActivating function in storage/pipeline/states_replica_update.go like this:

                mw, err := m.Api.StateWaitMsg(ctx.Context(), *sector.ReplicaUpdateMessage, build.MessageConfidence, api.LookbackNoLimit, true)
//              if err != nil {
//                      return err
//              }

                ts, err := m.Api.ChainHead(ctx.Context())
                if err != nil {
                        return err
                }

                nv, err := m.Api.StateNetworkVersion(ctx.Context(), ts.Key())
                if err != nil {
                        return err
                }

                lb := policy.GetWinningPoStSectorSetLookback(nv)

//              targetHeight := mw.Height + lb + InteractivePoRepConfidence
                targetHeight := 1904621 + lb + InteractivePoRepConfidence

Then compile, deploy new binary, miner restart, wait few minutes until sector move to next state, redeploy old binary, restart miner.

@arajasek
Copy link
Contributor

@mtelka Yeah, that would definitely do it too!

@Reiers Reiers added won't fix and removed P3 P3: Might get resolved labels Nov 27, 2022
@hyunmoon
Copy link

hyunmoon commented Mar 21, 2023

Thank you @mtelka, I searched everywhere and tried everything I could think of but yours was the only solution that finally worked.
I had a few dozen sectors stuck in UpdateActivating state (ipld not found), and I suspect they've been crashing my lotus-miner process once a day.

22.	2023-01-30 11:49:02 +0900 KST:	[event;sealing.SectorFinalized]	{"User":{}}
23.	2023-01-30 11:49:08 +0900 KST:	[event;sealing.SectorReplicaUpdateSubmitted]	{"User":{"Message":{"/":"bafy2bzaced62p6denbzsmo3ndhaq5d6prldmvq5fham2i3bd5h2xm74c6fhhe"}}}
24.	2023-01-30 11:53:00 +0900 KST:	[event;sealing.SectorReplicaUpdateLanded]	{"User":{}}
25.	2023-03-21 21:39:58 +0900 KST:	[event;sealing.SectorForceState]	{"User":{"State":"Proving"}}
26.	2023-03-21 21:39:58 +0900 KST:	[event;sealing.SectorForceState]	{"User":{"State":"ReleaseSectorKey"}}
27.	2023-03-21 21:39:58 +0900 KST:	[event;sealing.SectorUpdateActive]	{"User":{}}
28.	2023-03-21 21:39:58 +0900 KST:	[event;sealing.SectorForceState]	{"User":{"State":"ReleaseSectorKey"}}
29.	2023-03-21 21:39:58 +0900 KST:	[event;sealing.SectorUpdateActive]	{"User":{}}
30.	2023-03-21 21:39:58 +0900 KST:	[event;sealing.SectorUpdateActive]	{"User":{}}
31.	2023-03-21 21:39:58 +0900 KST:	[event;sealing.SectorKeyReleased]	{"User":{}}
32.	2023-03-21 21:39:58 +0900 KST:	[error;*xerrors.wrapError]	state machine error: running planner for state ReleaseSectorKey failed: planner for state ReleaseSectorKey received unexpected event sealing.SectorUpdateActive ({User:{}})
33.	2023-03-21 21:39:58 +0900 KST:	[event;sealing.SectorKeyReleased]	{"User":{}}
34.	2023-03-21 21:44:20 +0900 KST:	[event;sealing.SectorRemove]	{"User":{}}
35.	2023-03-21 21:44:20 +0900 KST:	[event;sealing.SectorRemoved]	{"User":{}}
36.	2023-03-21 21:44:20 +0900 KST:	[error;<nil>]	state machine error: %!s(<nil>)

And here's the states_replica_update.go code I used:

package sealing

import (
    "bytes"
    "context"

    "golang.org/x/xerrors"

    "github.com/filecoin-project/go-state-types/abi"
    "github.com/filecoin-project/go-state-types/big"
    "github.com/filecoin-project/go-state-types/builtin"
    "github.com/filecoin-project/go-state-types/builtin/v9/miner"
    "github.com/filecoin-project/go-state-types/exitcode"
    "github.com/filecoin-project/go-statemachine"

    "github.com/filecoin-project/lotus/api"
    "github.com/filecoin-project/lotus/build"
    "github.com/filecoin-project/lotus/chain/actors/policy"
    "github.com/filecoin-project/lotus/chain/types"
)

func(m * Sealing) handleReplicaUpdate(ctx statemachine.Context, sector SectorInfo) error {
    // if the sector ended up not having any deals, abort the upgrade
    if !sector.hasDeals() {
        return ctx.Send(SectorAbortUpgrade {
            xerrors.New("sector had no deals")
        })
    }

    if err: = checkPieces(ctx.Context(), m.maddr, sector.SectorNumber, sector.Pieces, m.Api, true);
    err != nil { // Sanity check state
        return handleErrors(ctx, err, sector)
    }
    out, err: = m.sealer.ReplicaUpdate(sector.sealingCtx(ctx.Context()), m.minerSector(sector.SectorType, sector.SectorNumber), sector.pieceInfos())
    if err != nil {
        return ctx.Send(SectorUpdateReplicaFailed {
            xerrors.Errorf("replica update failed: %w", err)
        })
    }
    return ctx.Send(SectorReplicaUpdate {
        Out: out,
    })
}

func(m * Sealing) handleProveReplicaUpdate(ctx statemachine.Context, sector SectorInfo) error {
    if sector.UpdateSealed == nil || sector.UpdateUnsealed == nil {
        return xerrors.Errorf("invalid sector %d with nil UpdateSealed or UpdateUnsealed output", sector.SectorNumber)
    }
    if sector.CommR == nil {
            return xerrors.Errorf("invalid sector %d with nil CommR", sector.SectorNumber)
        }
        // Abort upgrade for sectors that went faulty since being marked for upgrade
    ts, err: = m.Api.ChainHead(ctx.Context())
    if err != nil {
        log.Errorf("handleProveReplicaUpdate: api error, not proceeding: %+v", err)
        return nil
    }
    active, err: = m.sectorActive(ctx.Context(), ts.Key(), sector.SectorNumber)
    if err != nil {
        log.Errorf("sector active check: api error, not proceeding: %+v", err)
        return nil
    }
    if !active {
        err: = xerrors.Errorf("sector marked for upgrade %d no longer active, aborting upgrade", sector.SectorNumber)
        log.Errorf(err.Error())
        return ctx.Send(SectorAbortUpgrade {
            err
        })
    }

    vanillaProofs, err: = m.sealer.ProveReplicaUpdate1(sector.sealingCtx(ctx.Context()), m.minerSector(sector.SectorType, sector.SectorNumber), * sector.CommR, * sector.UpdateSealed, * sector.UpdateUnsealed)
    if err != nil {
        return ctx.Send(SectorProveReplicaUpdateFailed {
            xerrors.Errorf("prove replica update (1) failed: %w", err)
        })
    }

    if err: = checkPieces(ctx.Context(), m.maddr, sector.SectorNumber, sector.Pieces, m.Api, true);
    err != nil { // Sanity check state
        return handleErrors(ctx, err, sector)
    }

    proof, err: = m.sealer.ProveReplicaUpdate2(sector.sealingCtx(ctx.Context()), m.minerSector(sector.SectorType, sector.SectorNumber), * sector.CommR, * sector.UpdateSealed, * sector.UpdateUnsealed, vanillaProofs)
    if err != nil {
        return ctx.Send(SectorProveReplicaUpdateFailed {
            xerrors.Errorf("prove replica update (2) failed: %w", err)
        })

    }
    return ctx.Send(SectorProveReplicaUpdate {
        Proof: proof,
    })
}

func(m * Sealing) handleSubmitReplicaUpdate(ctx statemachine.Context, sector SectorInfo) error {

    ts,
    err: = m.Api.ChainHead(ctx.Context())
    if err != nil {
        log.Errorf("handleSubmitReplicaUpdate: api error, not proceeding: %+v", err)
        return nil
    }

    if err: = checkReplicaUpdate(ctx.Context(), m.maddr, sector, ts.Key(), m.Api);err != nil {
        return ctx.Send(SectorSubmitReplicaUpdateFailed {})
    }

    sl,
    err: = m.Api.StateSectorPartition(ctx.Context(), m.maddr, sector.SectorNumber, ts.Key())
    if err != nil {
        log.Errorf("handleSubmitReplicaUpdate: api error, not proceeding: %+v", err)
        return nil
    }

    dlinfo,
    err: = m.Api.StateMinerProvingDeadline(ctx.Context(), m.maddr, ts.Key())
    if err != nil {
        log.Errorf("handleSubmitReplicaUpdate: api error, not proceeding: %w", err)
    }
    // if sector's deadline is immutable wait in a non error state
    // sector's deadline is immutable if it is the current deadline or the next deadline
    if sl.Deadline == dlinfo.Index || (dlinfo.Index + 1) % dlinfo.WPoStPeriodDeadlines == sl.Deadline {
        return ctx.Send(SectorDeadlineImmutable {})
    }

    updateProof,
    err: = sector.SectorType.RegisteredUpdateProof()
    if err != nil {
        log.Errorf("failed to get update proof type from seal proof: %+v", err)
        return ctx.Send(SectorSubmitReplicaUpdateFailed {})
    }
    enc: = new(bytes.Buffer)
    params: = & miner.ProveReplicaUpdatesParams {
        Updates: [] miner.ReplicaUpdate {
            {
                SectorID: sector.SectorNumber,
                Deadline: sl.Deadline,
                Partition: sl.Partition,
                NewSealedSectorCID: * sector.UpdateSealed,
                Deals: sector.dealIDs(),
                UpdateProofType: updateProof,
                ReplicaProof: sector.ReplicaUpdateProof,
            },
        },
    }
    if err: = params.MarshalCBOR(enc);err != nil {
        log.Errorf("failed to serialize update replica params: %w", err)
        return ctx.Send(SectorSubmitReplicaUpdateFailed {})
    }

    cfg,
    err: = m.getConfig()
    if err != nil {
        return xerrors.Errorf("getting config: %w", err)
    }

    onChainInfo,
    err: = m.Api.StateSectorGetInfo(ctx.Context(), m.maddr, sector.SectorNumber, ts.Key())
    if err != nil {
        log.Errorf("handleSubmitReplicaUpdate: api error, not proceeding: %+v", err)
        return nil
    }
    if onChainInfo == nil {
        return xerrors.Errorf("sector not found %d", sector.SectorNumber)
    }

    sp,
    err: = m.currentSealProof(ctx.Context())
    if err != nil {
        log.Errorf("sealer failed to return current seal proof not proceeding: %+v", err)
        return nil
    }
    virtualPCI: = miner.SectorPreCommitInfo {
        SealProof: sp,
        SectorNumber: sector.SectorNumber,
        SealedCID: * sector.UpdateSealed,
        //SealRandEpoch: 0,
        DealIDs: sector.dealIDs(),
        Expiration: onChainInfo.Expiration,
        //ReplaceCapacity: false,
        //ReplaceSectorDeadline: 0,
        //ReplaceSectorPartition: 0,
        //ReplaceSectorNumber: 0,
    }

        collateral,
    err: = m.Api.StateMinerInitialPledgeCollateral(ctx.Context(), m.maddr, virtualPCI, ts.Key())
    if err != nil {
        return xerrors.Errorf("getting initial pledge collateral: %w", err)
    }

    collateral = big.Sub(collateral, onChainInfo.InitialPledge)
    if collateral.LessThan(big.Zero()) {
        collateral = big.Zero()
    }

    collateral,
    err = collateralSendAmount(ctx.Context(), m.Api, m.maddr, cfg, collateral)
    if err != nil {
        log.Errorf("collateral send amount failed not proceeding: %+v", err)
        return nil
    }

    goodFunds: = big.Add(collateral, big.Int(m.feeCfg.MaxCommitGasFee))

        mi,
    err: = m.Api.StateMinerInfo(ctx.Context(), m.maddr, ts.Key())
    if err != nil {
        log.Errorf("handleSubmitReplicaUpdate: api error, not proceeding: %+v", err)
        return nil
    }

    from,
    _,
    err: = m.addrSel.AddressFor(ctx.Context(), m.Api, mi, api.CommitAddr, goodFunds, collateral)
    if err != nil {
        log.Errorf("no good address to send replica update message from: %+v", err)
        return ctx.Send(SectorSubmitReplicaUpdateFailed {})
    }
    mcid,
    err: = sendMsg(ctx.Context(), m.Api, from, m.maddr, builtin.MethodsMiner.ProveReplicaUpdates, collateral, big.Int(m.feeCfg.MaxCommitGasFee), enc.Bytes())
    if err != nil {
        log.Errorf("handleSubmitReplicaUpdate: error sending message: %+v", err)
        return ctx.Send(SectorSubmitReplicaUpdateFailed {})
    }

    return ctx.Send(SectorReplicaUpdateSubmitted {
        Message: mcid
    })
}

func(m * Sealing) handleWaitMutable(ctx statemachine.Context, sector SectorInfo) error {
    immutable: = true
    for immutable {
        ts,
        err: = m.Api.ChainHead(ctx.Context())
        if err != nil {
            log.Errorf("handleWaitMutable: api error, not proceeding: %+v", err)
            return nil
        }

        sl,
        err: = m.Api.StateSectorPartition(ctx.Context(), m.maddr, sector.SectorNumber, ts.Key())
        if err != nil {
            log.Errorf("handleWaitMutable: api error, not proceeding: %+v", err)
            return nil
        }

        dlinfo,
        err: = m.Api.StateMinerProvingDeadline(ctx.Context(), m.maddr, ts.Key())
        if err != nil {
            log.Errorf("handleWaitMutable: api error, not proceeding: %w", err)
            return nil
        }

        sectorDeadlineOpen: = sl.Deadline == dlinfo.Index
        sectorDeadlineNext: = (dlinfo.Index + 1) % dlinfo.WPoStPeriodDeadlines == sl.Deadline
        immutable = sectorDeadlineOpen || sectorDeadlineNext

        // Sleep for immutable epochs
        if immutable {
            dlineEpochsRemaining: = dlinfo.NextOpen() - ts.Height()
            var targetEpoch abi.ChainEpoch
            if sectorDeadlineOpen {
                // sleep for remainder of deadline
                targetEpoch = ts.Height() + dlineEpochsRemaining
            } else {
                // sleep for remainder of deadline and next one
                targetEpoch = ts.Height() + dlineEpochsRemaining + dlinfo.WPoStChallengeWindow
            }

            atHeight: = make(chan struct {})
            err: = m.events.ChainAt(ctx.Context(), func(context.Context, * types.TipSet, abi.ChainEpoch) error {
                close(atHeight)
                return nil
            }, func(ctx context.Context, ts * types.TipSet) error {
                log.Warn("revert in handleWaitMutable")
                return nil
            }, 5, targetEpoch)
            if err != nil {
                log.Errorf("handleWaitMutalbe: events error: api error, not proceeding: %w", err)
                return nil
            }

            select {
                case <-atHeight:
                case <-ctx.Context().Done():
                    return ctx.Context().Err()
            }
        }

    }
    return ctx.Send(SectorDeadlineMutable {})
}

func(m * Sealing) handleReplicaUpdateWait(ctx statemachine.Context, sector SectorInfo) error {
    if sector.ReplicaUpdateMessage == nil {
        log.Errorf("handleReplicaUpdateWait: no replica update message cid recorded")
        return ctx.Send(SectorSubmitReplicaUpdateFailed {})
    }

    mw, err: = m.Api.StateWaitMsg(ctx.Context(), * sector.ReplicaUpdateMessage, build.MessageConfidence, api.LookbackNoLimit, true)
    if err != nil {
        log.Errorf("handleReplicaUpdateWait: failed to wait for message: %+v", err)
        return ctx.Send(SectorSubmitReplicaUpdateFailed {})
    }

    switch mw.Receipt.ExitCode {
        case exitcode.Ok:
            //expected
        case exitcode.SysErrInsufficientFunds:
            fallthrough
        case exitcode.SysErrOutOfGas:
            log.Errorf("gas estimator was wrong or out of funds")
            return ctx.Send(SectorSubmitReplicaUpdateFailed {})
        default:
            return ctx.Send(SectorSubmitReplicaUpdateFailed {})
    }
    si, err: = m.Api.StateSectorGetInfo(ctx.Context(), m.maddr, sector.SectorNumber, mw.TipSet)
    if err != nil {
        log.Errorf("api err failed to get sector info: %+v", err)
        return ctx.Send(SectorSubmitReplicaUpdateFailed {})
    }
    if si == nil {
        log.Errorf("api err sector not found")
        return ctx.Send(SectorSubmitReplicaUpdateFailed {})
    }

    if !si.SealedCID.Equals( * sector.UpdateSealed) {
        return ctx.Send(SectorAbortUpgrade {
            xerrors.Errorf("mismatch of expected onchain sealed cid after replica update, expected %s got %s", sector.UpdateSealed, si.SealedCID)
        })
    }
    return ctx.Send(SectorReplicaUpdateLanded {})
}

func(m * Sealing) handleFinalizeReplicaUpdate(ctx statemachine.Context, sector SectorInfo) error {
    cfg,
    err: = m.getConfig()
    if err != nil {
        return xerrors.Errorf("getting sealing config: %w", err)
    }

    if err: = m.sealer.ReleaseUnsealed(ctx.Context(), m.minerSector(sector.SectorType, sector.SectorNumber), sector.keepUnsealedRanges(sector.Pieces, false, cfg.AlwaysKeepUnsealedCopy));err != nil {
        return ctx.Send(SectorFinalizeFailed {
            xerrors.Errorf("release unsealed: %w", err)
        })
    }

    if err: = m.sealer.FinalizeReplicaUpdate(sector.sealingCtx(ctx.Context()), m.minerSector(sector.SectorType, sector.SectorNumber));err != nil {
        return ctx.Send(SectorFinalizeFailed {
            xerrors.Errorf("finalize sector: %w", err)
        })
    }

    return ctx.Send(SectorFinalized {})
}

func(m * Sealing) handleUpdateActivating(ctx statemachine.Context, sector SectorInfo) error {
    if sector.ReplicaUpdateMessage == nil {
        return xerrors.Errorf("nil sector.ReplicaUpdateMessage!")
    }

    ts, err: = m.Api.ChainHead(ctx.Context())
    if err != nil {
        return err
    }

    nv, err: = m.Api.StateNetworkVersion(ctx.Context(), ts.Key())
    if err != nil {
        return err
    }

    lb: = policy.GetWinningPoStSectorSetLookback(nv)

    targetHeight: = 1904621 + lb

    return m.events.ChainAt(context.Background(), func(context.Context, * types.TipSet, abi.ChainEpoch) error {
        return ctx.Send(SectorUpdateActive {})
    }, func(ctx context.Context, ts * types.TipSet) error {
        log.Warn("revert in handleUpdateActivating")
        return nil
    }, InteractivePoRepConfidence, targetHeight)

    return nil
}

func(m * Sealing) handleReleaseSectorKey(ctx statemachine.Context, sector SectorInfo) error {
    if err: = m.sealer.ReleaseSectorKey(sector.sealingCtx(ctx.Context()), m.minerSector(sector.SectorType, sector.SectorNumber));
    err != nil {
        return ctx.Send(SectorReleaseKeyFailed {
            err
        })
    }

    return ctx.Send(SectorKeyReleased {})
}

func handleErrors(ctx statemachine.Context, err error, sector SectorInfo) error {
    switch err.(type) {
        case *ErrApi:
            log.Errorf("handleReplicaUpdate: api error, not proceeding: %+v", err)
            return nil
        case *ErrInvalidDeals:
            log.Warnf("invalid deals in sector %d: %v", sector.SectorNumber, err)
            return ctx.Send(SectorInvalidDealIDs {})
        case *ErrExpiredDeals: // Probably not much we can do here, maybe re-pack the sector?
            return ctx.Send(SectorDealsExpired {
                xerrors.Errorf("expired dealIDs in sector: %w", err)
            })
        default:
            return xerrors.Errorf("checkPieces sanity check error: %w (%+v)", err, err)
    }
}

@donkabat
Copy link

donkabat commented Jun 1, 2023

@Reiers
could you change log level from error to debug? It generetes a lot of useless mess and we are not able to fix this error:
ERROR sectors pipeline/states_replica_update.go:362 error in handleUpdateActivating {"error": "failed to load message: ipld: could not find bafy2bzacebkn7732l2p6gztd43a7u2nj2iockqdtfld7ud5nv2sr3qf3nmh5s"}

@Reiers
Copy link

Reiers commented Jun 6, 2023

Unfortunately, it will be on loop.
The good news is that we are looking into fixing the FSM update-state cmd for this, so you can force it into Proving.
And that should resolve the log loop issue.

I will update this issue, when/if we have a fix. Remember apart from being annoying it's not breaking anything, and you still get the power for the sector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests