Running update-state
on a pending ReplicaUpdateFailed
causes looping miner crash
#9329
Closed
9 of 18 tasks
Labels
area/sealing
kind/bug
Kind: Bug
need/analysis
Hint: Needs Analysis
P2
P2: Should be resolved
SnapDeals
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
We are beginning to see reports of a critical looping miner crash after running the
update-state
command on aReplicaUpdateFailed
sector.Currently, if the FSM finishes a replica update within the deadline of the sector or the one before it, the finalisation of the replica update will hang for a while in the
ReplicaUpdateFailed
state until the deadline has passed. This is intended behaviour and doesn't represent a problem in itself.Some users who are new to snapping sectors and are not aware of this behaviour are attempting to force a state update to
UpdateActivating
using thelotus-miner sectors update-state XXX UpdateActivating
. This command is being issued prior to the natural expiry of the active sector deadline.The result of taking this action is a looping critical of the miner instance which is proving very complicated to resolve. If the miner is able to stay online long enough between crashes, a lotus-miner sector remove command can be issued which will resolve the issue. However, in some cases, the miner will not stay online long enough to allow the running of any commands and the only resolution is a code level change.
It is understandable that running an update-state command at this juncture would produce an error but a total miner crash resulting in a completely unusable miner instance is not an acceptable outcome and needs to be resolved.
journalctl
crash output is shown below.https://filecoinproject.slack.com/archives/CPFTWMY7N/p1663342132529389
https://filecoinproject.slack.com/archives/CEGN061C5/p1657690542697849
Logging Information
Repo Steps
As above
The text was updated successfully, but these errors were encountered: