You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This needs to be investigated, seems to have occurred on 11.3.0
hi seem to be having problems History ERROR] Replay failed: Error merging bucket curr=0acce3 with snap=03c5ba: Malformed bucket: old non-DEAD + new INIT.. There may be a problem with the local filesystem. Ensure that there is enough space to perform that operation and that disc is behaving correctly. [ApplyLedgerChainWork.cpp:290]
disk is fine and there is 10 GB of free space
Aug 06 11:03:21 ip-172-31-62-17 stellar-core[2960]: 2019-08-06T11:03:21.671 GCJCS [History INFO] Applying transactions for ledgers 25187933..25195267, LCL is [seq=25187932, hash=48e36b]
Aug 06 11:03:21 ip-172-31-62-17 stellar-core[2960]: 2019-08-06T11:03:21.671 GCJCS [History INFO] Catching up: applying checkpoint 1/115 (0%)
Aug 06 11:03:21 ip-172-31-62-17 stellar-core[2960]: 2019-08-06T11:03:21.678 GCJCS [Tx INFO] applying ledger 25187933 (txs:32, ops:73)
Aug 06 11:03:22 ip-172-31-62-17 stellar-core[2960]: stellar-core: bucket/BucketList.cpp:134: void stellar::BucketLevel::prepare(stellar::Application &, uint32_t, uint32_t, std::shared_ptr, const std::vector<std::shared_ptr > &, bool): Assertion `!mNextCurr.isMerging()' failed.
Aug 06 11:03:22 ip-172-31-62-17 systemd[1]: stellar.service: Main process exited, code=dumped, status=6/ABRT
Aug 06 11:03:22 ip-172-31-62-17 systemd[1]: stellar.service: Unit entered failed state.
Aug 06 11:03:22 ip-172-31-62-17 systemd[1]: stellar.service: Failed with result 'core-dump'.
stellar-core 11.3.0 (5f7821d)
antb321
4:36 AM - Yesterday
I downgraded to the last version of 11.2 and it still occured and then downgraded to stellar-core 11.0.0 (236f831) and it seems to be resolved
EDITED
Here is the log from 11.2
....
Aug 06 10:11:37 ip-172-31-62-17 stellar-core[11983]: 2019-08-06T10:11:37.146 GCJCS [History ERROR] Replay failed: Error merging bucket curr=0acce3 with snap=03c5ba: Malformed bucket: old non-DEAD + new INIT.. There may be a problem with the local filesystem. Ensure that there is enough space to perform that operation and that disc is behaving correctly. [ApplyLedgerChainWork.cpp:290]
Aug 06 10:11:37 ip-172-31-62-17 stellar-core[11983]: 2019-08-06T10:11:37.153 GCJCS [Ledger ERROR] Catchup will restart at next close. [LedgerManagerImpl.cpp:692]
The text was updated successfully, but these errors were encountered:
Eek! Well, the scary message in here is the first one:
Error merging bucket curr=0acce3 with snap=03c5ba: Malformed bucket: old non-DEAD + new INIT
I'll talk to the original reporter to try to figure out a little more context. Judging from the lines following it looks like it happened during a catchup from some non-initial state. Replaying through the same range (say 25187904..25195268) does not reproduce such a failure:
No replay errors occur
No bucket with either putative hash 0acce3 or 03c5ba occurs at all
(of 15,071 retained buckets, with --disable-bucket-gc)
This does not mean it didn't happen, but it means it's not trivial to provoke.
Vague initial hypothesis: I wonder if there's some way to get a malformed bucket of this sort if core crashes at the right time, in the form of the failure-to-fsync bug currently pending. The malformedness is either merging old LIVE + new INIT, or old INIT + new INIT, for some ledger entry.
This could happen if, say, some live entry E was killed -- had a DEAD entry written to some bucket B -- and core crashed before B was durably stored in the bucketlist. Then B would (potentially) be zero-sized, and from the bucketlist perspective E would be still alive, but the database would think B was dead and so when reviving it, would write a new INIT entry for it.
This needs to be investigated, seems to have occurred on 11.3.0
The text was updated successfully, but these errors were encountered: