New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nearcore stuck with crashes on betanet-node7 #3084
Comments
In fact, this error happened "immediately" on node start ("immediately" is almost 10 minutes from the boot time, but there were no sync logs during that period of time):
|
I believe this is caused by an unclean shutdown of the node, but I don't think this will cause the node to get stuck forever. |
It was in that state 5 hours, so I call it “forever” |
I don't think an unclean shut down can explain it? The batch writes to the storage are atomic, so the head should not be updated if the block is not persisted. This appears to be a real bug. |
Yes I was wrong. I was investigating it today but our nodes got nuked :( |
It should have been fixed by #3099. We can reopen if we see it again. |
Describe the bug
The node stuck with tons of logs of same backtraces:
The end of the log is the following (as is, without edits, note the timestamps):
Node uses 100% of all the CPU (the VM has 2 cores and all of them are busy with neard) and consumes 3.3GB or RAM.
strace
reports that there are tons of writes of a single byte 0x01 (the file descriptors are anonymous pipes):To Reproduce
N/A
Version (please complete the following information):
Additional context
It is an RPC node running on betanet-node7 instance. I am restarting the instance.
The text was updated successfully, but these errors were encountered: