-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic after upgrade to 1.22 #5331
Comments
could you provide the full neard log here, if possible? |
We got the same issue today :| |
@kucharskim thanks! Could you rerun with Is your node a validator or non-validator node? |
This was from a validator node. I don't have database which triggers this issue. I needed to restore data from secondary node to bring validator online. Sorry, didn't do a backup of the problematic |
@kucharskim it's ok, thanks for the report. I think I figured out the issue. It's because the node ran 1.21.0 until after the epoch boundary when the upgrade is already scheduled. At the end of epoch T, the node decides the epoch information for epoch T+2 and store that information in the database. In the case for sharding upgrade, if you ran 1.21.0, since the binary is not updated, the node thinks that epoch T+2 will only have 1 shard and stores the epoch info for T+2 according to that. That incorrect info is stored in the database and used after the binary is changed to 1.22.0, which caused the crash. The only solution now, since your data is already corrupted, it's to start from a backup with sharded state https://near-protocol-public.s3.ca-central-1.amazonaws.com/backups/mainnet/rpc/data.tar |
Yes, I figured I needed restore from a snapshot / another machine. Is there a way to avoid this problem inside |
Thanks, will try this snapshot.
|
There is something specific to this upgrade that is uncommon: the node acts differently in the epoch before the network actually switches to the new protocol version depending on the client version. In some sense it is not a "stateless" upgrade. |
This should be communicated very clearly that upgrade needs to be done before 80% epoch boundary. |
@kucharskim yes you are definitely right. It is our mistake that we overlooked this. Fortunately such big upgrades won't happen for a while and we are working on ways to make big upgrades smoother next time, for example, near/NEPs#205 |
Closing this now since we know the cause |
Is it fixed though? |
@kucharskim this bug won't be triggered again because we are not doing any sharding upgrade soon. We will take this into account for future upgrades though. |
Describe the bug
Errors in logs after upgrading validator to 1.22
To Reproduce
upgrade binary and /neard run
Expected behavior
Validator should validate blocks
Screenshots
neard[910]: thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', chain/epoch_manager/src/lib.rs:1193:26
Version (please complete the following information):
Additional context
neard[910]: thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', chain/epoch_manager/src/lib.rs:1193:26
The text was updated successfully, but these errors were encountered: