New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snap-up sector stuck in "UpdateActivating" state #8415
Comments
Quoting the author from slack
I could not find anything in the logs. But after looking through the code, i have a theory.
Which is predetermined by
But if we look at the lotus/chain/events/events_height.go Line 46 in b0f57d7
The miner would not be able to determine its height and compare to targetHeight. I think this is where it got stuck. If the context was cancelled(timeout I suspect) before the eventHandler was fired off then state would persist and nothing should happen. In our case, 900 epochs account for 7.5 hours. Which is in the middle of node issue. Now, if this theory is correct and "UpdateActivating" is idempotent then firing it again should trigger the next chain of events. I will put all this on the issue and wait for someone from core team to give their thoughts. Again, this is just a theory based on my understanding of the code and could entirely be wrong. |
Hi @SBudo Thanks for the report. Is the sector still stuck in that state? |
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 24 hours. |
This issue was closed because it is missing author input. |
@Reiers Considering to just try to force the sector status in "proving". |
@SBudo |
I have the same error with 2 sectors. Deals are active. 2022-05-06T18:02:22.086Z INFO markets loggers/loggers.go:20 storage provider event {"name": "ProviderEventDealActivationFailed", "proposal CID": "bafyreigooen5jybdm2z5prqn4udg2hjpspvzcnnxytnydptokpaw6qtwku", "state": "StorageDealFailing", "message": "error activating deal: failed to set up called handler: called check error (h: 1785124): failed to look up deal on chain: looking for publish deal message bafy2bzacea2blcv2boaphl7f2ffpbwsvtdc42qz47grcpf6fh4jficcbr6ivs: search msg failed: failed to load message: blockstore: block not found"} |
Im seeing the same thing here. Re-opening. |
I did another prune of the chain. After restarting the node and miner with a new imported snapshot, I started to get this error in the log: Also tried to force the sector status to "Proving", but it's not doing it (no errors when trying, just not applying the status) |
lotus-miner logs every minute: example sector:
|
Hi all. Have a same problem with 2 sectors. They are stuck in status "UpdateActivating". Restart lotus-miner did not help me |
What is more, Sealed files are not removed. When are you gonna fix this issue? |
@donkabat This issue is not on the top of the list, since we are getting the power for sectors in If you have an extra server, and really want them gone:
|
Isn't it possible to somehow ask or force lotus daemon (full node) to download the single missing message from the network? |
@Reiers thanks that was the least you could do. |
Thanks for the great report and conversation everyone! I wanna state my understanding of the problem, and what I would suggest. Understanding of the Problem
Why it's hard to fix
What I'd propose Based on above, I don't think it's easy to solve this edge-case in the general case. Getting missing state trees is...hard. However, if users are very confident that the message did in fact land on chain (eg. by checking on a dashboard), I would suggest setting the sector state to Of course when forcing state, it's always a little unpredictable / error-prone, but I think that's the best thing to do. |
I solved the problem with one single such sector by modifying
Then compile, deploy new binary, miner restart, wait few minutes until sector move to next state, redeploy old binary, restart miner. |
@mtelka Yeah, that would definitely do it too! |
Thank you @mtelka, I searched everywhere and tried everything I could think of but yours was the only solution that finally worked.
And here's the states_replica_update.go code I used:
|
@Reiers |
Unfortunately, it will be on loop. I will update this issue, when/if we have a fix. Remember apart from being annoying it's not breaking anything, and you still get the power for the sector. |
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
Experienced a lotus node failure while a sector (sector 155) was in state "UpdateActivating". While the node and miner are all back up and running, the sector has now been stuck in that state for over 72 hours.
Timeline of the event:
Sector 155 has been stuck in "UpdateActivating" since.
Note that, during the lotus node and miner down time, all WindowPoSt (4 partitions) were missed (from 7:54am to 9:24am)
Lotus and miner logs.zip
Logging Information
Repo Steps
see above.
The text was updated successfully, but these errors were encountered: