Restarted W10 node sometimes becomes stuck adding new block (onPersist insufficient funds) #3273

EdgeDLT · 2023-12-19T16:34:40Z

Current Behavior

Attempting to start an out-of-sync node produces:

2023-12-19T15:57:41.356Z        WARN    blockQueue: failed adding block into the blockchain     {"error": "onPersist failed: VM has failed: at instruction 0 (SYSCALL): insufficient funds", "blockHeight": 4578499, "nextIndex": 4578500}

Expected Behavior

The node should successfully sync the next block.

Possible Solution

Something related to #3110 maybe? Not looked too deep into it.

Steps to Reproduce

Fully or partially sync a node then kill it.
Restart the node. If it progresses, kill it again.
Repeat 2 until the error occurs.

Context

Ran into this issue with my indexer node. By default the node is not kept alive, it is fully synchronized then shut down once all operations are complete. Occasionally, but not always, starting the node again produces this error. If it does, no further progress can be made without a full resync.

Your Environment

Windows 10

> ./neogo -v
NeoGo
Version: 0.104.0
GoVersion: go1.21.4

The text was updated successfully, but these errors were encountered:

roman-khimov · 2023-12-19T17:29:27Z

Are you using Level or Bolt?

EdgeDLT · 2023-12-19T17:35:36Z

LevelDB.

roman-khimov · 2023-12-19T17:38:07Z

That's more interesting. It obviously means a broken DB, but the way it's handled internally it's not trivial for it to get into this state. Yet if it happens, there should be a reason for that.

CC @532910.

EdgeDLT · 2024-01-03T20:07:16Z

Uncovered some memory issues on my machine in recent days. Could well be a hardware fault that is responsible, so if you can't replicate, don't spend too long on it.

AnnaShaleva · 2024-01-12T09:00:30Z

I was able to reproduce a similar bug on my local machine under the following conditions:

Low disk space left (< 100 Mb), LevelDB is used (BoltDB can't function at all in such restricted conditions).
DB resync from trusted dump is started, but the process was killed by OOM killer.
Free a bit of disk space.
DB resync from trusted dump is resumed and in a while I got:

2024-01-12T11:46:43.686+0300	WARN	contract invocation failed	{"tx": "591473831fa9508870afb22d89e595edb40306ff49bc2373d6b9bbc87abbda0e", "block": 2913226, "error": "at instruction 1 (SYSCALL): method 'Callback' not found"}
2024-01-12T11:46:43.700+0300	WARN	contract invocation failed	{"tx": "40d7d6be4922724174072b7c60f0cce175a0d470a23235b02e584fafcb9d0ec7", "block": 2913281, "error": "at instruction 1 (SYSCALL): method 'Callback' not found"}
2024-01-12T11:46:44.011+0300	INFO	shutting down service	{"service": "Prometheus", "endpoint": "[::]:2112"}
2024-01-12T11:47:21.743+0300	INFO	persisted to disk	{"blocks": 5852, "keys": 118232, "headerHeight": 2876784, "blockHeight": 2876783, "took": "43.997511933s"}
2024-01-12T11:47:36.029+0300	INFO	persisted to disk	{"blocks": 37886, "keys": 774298, "headerHeight": 2914670, "blockHeight": 2914669, "took": "14.285445886s"}
failed to add block 2914670: onPersist failed: VM has failed: at instruction 0 (SYSCALL): insufficient funds

I also was able to reproduce the same problem during network syncing with low disk space, so from my side this problem also seems to be connected with hardware problems.

roman-khimov · 2024-01-12T12:33:39Z

Do you have the DB still? Can we check what's the state difference? It shouldn't happen this way, whatever written must be consistent, it's a single transaction after all (or the DB itself is broken if it can't ensure consistency on commit failure because of limited disk space).

AnnaShaleva · 2024-01-12T18:57:25Z

I don't, I've removed the DB to test the release compatibility, but for some reason I suspect that the problem can be reproduced. I'll try to reproduce it and save the DB to look at the diffs.

fyfyrchik · 2024-04-05T15:53:16Z

I have reproduced the same problem with boltdb, but can't share the database, unfortunately.
My plan is to look into all the keys we have and look at the diff between corrupted db and freshly restored one.

@roman-khimov @AnnaShaleva Do you have any specific place in mind to look into?

fyfyrchik · 2024-04-05T15:55:24Z

Oh, I was too soon to write, my problem seems a bit different:

2024-04-05T14:33:03.605Z        WARN        blockQueue: failed adding block into the blockchain        {"error": "transaction 66385f00f74653c40ff2b0aaa07b394670abf853977ac309755a5c86d4e36aca failed to verify: insufficient funds", "blockHeight": 36734, "nextIndex": 36735}

But there are other nodes which are OK, so it seems an application/db issue, likely related to this one.

roman-khimov · 2024-04-05T16:54:23Z

Zero ideas on this one. It'd be cool if you'll be able to at least do the diff between proper state and whatever is in this broken DB. Luckily, your chain doesn't seem to be long.

fyfyrchik · 2024-04-08T11:41:57Z

My version of neo-go is 0.104 with some commits after (base at 441eb8a) with fixes from #3279 (we reverted and reapplied #3110, but there were no conflicts so I do not expect problems here) and some unrelated improvements in the network server (mostly for multi-IP and TLS)

I have a supposedly corruppted DB which contains blocks up to 36734 and some headers after.
From this node I have exported chain dump.
I restored dump up to 36734 block to get a fresh database.
Dumped all they keys in hex to a file and compared diff:

Sanity check: values at 0xc1 are different as corrupted node has acquired some headers, values at 0xc0 are the same.

That being said here are seemingly relevant diffs stored by 0x70 prefix (contract data):
Native neo (0xfbffffff == -5)
0x17 = 23 = prefixVoterRewardPerCommittee

< 70fbffffff0e: 400841022821020c54c4a3f309e5f2e388950dae194d12802704c789923064ca71407197cce0ff21004102282102731f3a03f1cb00e42abc39babbbd85289f8964352a201bea483b945081cac09021004102282102780ab53cd434ae48657b609351c595bfee165e2e148d37821d1793c85336900c2100410228210320a445bde3e92223695e9131fc60d0accfe274facd0723ff5c74421dc3ec18c9210041022821034f6caac07bfd7c4f3522a7c9aa787ad72ec073fceee87901ca6aaf692467e23c21004102282103ba68d87e378154bda1eef17276d594cb5335c57eba927b7ec14289d17ab1908221004102282103d05b5b040baced5e0006485c8565cd3eb69754faa0ea6c694f1099feb608f2b721004102282103e774485cb0a46389bfaf565937f54e31dbbb5bcff5cb9a5f220e6dd5445b16372100
< 70fbffffff1410ee5df183cc0f45e9beda75964628ee0471f9f8: 4104210420bcbe002103778f002821034f6caac07bfd7c4f3522a7c9aa787ad72ec073fceee87901ca6aaf692467e23c21060080d8ce6905
< 70fbffffff143a017ecb1a4f0afd95e28a856109c006db91a2e2: 4104210420bcbe0021037e8f002821020c54c4a3f309e5f2e388950dae194d12802704c789923064ca71407197cce0ff21060080d8ce6905
< 70fbffffff145061b45aa4b5fa5293310a5300f0968099f4d145: 4104210420bcbe0021037d8f00282102731f3a03f1cb00e42abc39babbbd85289f8964352a201bea483b945081cac09021060080d8ce6905
< 70fbffffff145a296b9b7035d8c1dc74ef80bb7ea35be8decfa0: 4104210420bcbe002103598f00282102780ab53cd434ae48657b609351c595bfee165e2e148d37821d1793c85336900c21060080d8ce6905
< 70fbffffff145fd4c6252239f6d810987dd4344ed8fe6cf2a5f8: 4104210420bcbe0021037b8f0028210320a445bde3e92223695e9131fc60d0accfe274facd0723ff5c74421dc3ec18c921060080d8ce6905
< 70fbffffff1485d8114bf1c98b3cb0511d335e95bbc5577d241b: 4104210420bcbe0021036f8f00282103d05b5b040baced5e0006485c8565cd3eb69754faa0ea6c694f1099feb608f2b721060080d8ce6905
< 70fbffffff14d6271d847b9f5c752b774f783c3ec540b87b0856: 4104210420bcbe0021036a8f00282103e774485cb0a46389bfaf565937f54e31dbbb5bcff5cb9a5f220e6dd5445b163721060080d8ce6905
< 70fbffffff14f33a2a2297f7f25642289180eb8f015ab9f8458a: 4104210420bcbe002103398f00282103ba68d87e378154bda1eef17276d594cb5335c57eba927b7ec14289d17ab1908221060080d8ce6905
< 70fbffffff17020c54c4a3f309e5f2e388950dae194d12802704c789923064ca71407197cce0ff: 0080d8ce6905
< 70fbffffff1702731f3a03f1cb00e42abc39babbbd85289f8964352a201bea483b945081cac090: 0080d8ce6905
< 70fbffffff1702780ab53cd434ae48657b609351c595bfee165e2e148d37821d1793c85336900c: 0080d8ce6905
< 70fbffffff170320a445bde3e92223695e9131fc60d0accfe274facd0723ff5c74421dc3ec18c9: 0080d8ce6905
< 70fbffffff17034f6caac07bfd7c4f3522a7c9aa787ad72ec073fceee87901ca6aaf692467e23c: 0080d8ce6905
< 70fbffffff1703ba68d87e378154bda1eef17276d594cb5335c57eba927b7ec14289d17ab19082: 0080d8ce6905
< 70fbffffff1703d05b5b040baced5e0006485c8565cd3eb69754faa0ea6c694f1099feb608f2b7: 0080d8ce6905
< 70fbffffff1703e774485cb0a46389bfaf565937f54e31dbbb5bcff5cb9a5f220e6dd5445b1637: 0080d8ce6905
---
> 70fbffffff0e: 400841022821020c54c4a3f309e5f2e388950dae194d12802704c789923064ca71407197cce0ff210420bcbe00410228210320a445bde3e92223695e9131fc60d0accfe274facd0723ff5c74421dc3ec18c9210420bcbe0041022821034f6caac07bfd7c4f3522a7c9aa787ad72ec073fceee87901ca6aaf692467e23c210420bcbe004102282102731f3a03f1cb00e42abc39babbbd85289f8964352a201bea483b945081cac090210420bcbe004102282102780ab53cd434ae48657b609351c595bfee165e2e148d37821d1793c85336900c210420bcbe004102282103ba68d87e378154bda1eef17276d594cb5335c57eba927b7ec14289d17ab19082210420bcbe004102282103d05b5b040baced5e0006485c8565cd3eb69754faa0ea6c694f1099feb608f2b7210420bcbe004102282103e774485cb0a46389bfaf565937f54e31dbbb5bcff5cb9a5f220e6dd5445b1637210420bcbe00
> 70fbffffff1410ee5df183cc0f45e9beda75964628ee0471f9f8: 4104210420bcbe002103778f002821034f6caac07bfd7c4f3522a7c9aa787ad72ec073fceee87901ca6aaf692467e23c210600205917580d
> 70fbffffff143a017ecb1a4f0afd95e28a856109c006db91a2e2: 4104210420bcbe0021037e8f002821020c54c4a3f309e5f2e388950dae194d12802704c789923064ca71407197cce0ff2106004015d6580d
> 70fbffffff145061b45aa4b5fa5293310a5300f0968099f4d145: 4104210420bcbe0021037d8f00282102731f3a03f1cb00e42abc39babbbd85289f8964352a201bea483b945081cac0902106004015d6580d
> 70fbffffff145a296b9b7035d8c1dc74ef80bb7ea35be8decfa0: 4104210420bcbe002103598f00282102780ab53cd434ae48657b609351c595bfee165e2e148d37821d1793c85336900c210600c024db550d
> 70fbffffff145fd4c6252239f6d810987dd4344ed8fe6cf2a5f8: 4104210420bcbe0021037b8f0028210320a445bde3e92223695e9131fc60d0accfe274facd0723ff5c74421dc3ec18c92106004015d6580d
> 70fbffffff1485d8114bf1c98b3cb0511d335e95bbc5577d241b: 4104210420bcbe0021036f8f00282103d05b5b040baced5e0006485c8565cd3eb69754faa0ea6c694f1099feb608f2b7210600009d58570d
> 70fbffffff14d6271d847b9f5c752b774f783c3ec540b87b0856: 4104210420bcbe0021036a8f00282103e774485cb0a46389bfaf565937f54e31dbbb5bcff5cb9a5f220e6dd5445b1637210600009d58570d
> 70fbffffff14f33a2a2297f7f25642289180eb8f015ab9f8458a: 4104210420bcbe002103398f00282103ba68d87e378154bda1eef17276d594cb5335c57eba927b7ec14289d17ab190822106004034e0520d
> 70fbffffff17020c54c4a3f309e5f2e388950dae194d12802704c789923064ca71407197cce0ff: 004015d6580d
> 70fbffffff1702731f3a03f1cb00e42abc39babbbd85289f8964352a201bea483b945081cac090: 004015d6580d
> 70fbffffff1702780ab53cd434ae48657b609351c595bfee165e2e148d37821d1793c85336900c: 004015d6580d
> 70fbffffff170320a445bde3e92223695e9131fc60d0accfe274facd0723ff5c74421dc3ec18c9: 004015d6580d
> 70fbffffff17034f6caac07bfd7c4f3522a7c9aa787ad72ec073fceee87901ca6aaf692467e23c: 004015d6580d
> 70fbffffff1703ba68d87e378154bda1eef17276d594cb5335c57eba927b7ec14289d17ab19082: 004015d6580d
> 70fbffffff1703d05b5b040baced5e0006485c8565cd3eb69754faa0ea6c694f1099feb608f2b7: 004015d6580d
> 70fbffffff1703e774485cb0a46389bfaf565937f54e31dbbb5bcff5cb9a5f220e6dd5445b1637: 004015d6580d

Native notary (0xf6ffffff == -10):

< 70f6ffffff013e93a75eaa8181d369821e4954e179f6a1731d82: 41022105973a799b072105ffffffff00
---
> 70f6ffffff013e93a75eaa8181d369821e4954e179f6a1731d82: 41022105df21f8e4082105ffffffff00
2657172c2757143
< 70f6ffffff01829cb414d5790e699a889db54a0d6b5d29bf2f08: 410221066b73b712e6002105ffffffff00
---
> 70f6ffffff01829cb414d5790e699a889db54a0d6b5d29bf2f08: 410221061820a8850c012105ffffffff00
2657198c2757169
< 70f6ffffff01ac77a8afa4216f24b548e236428d03024bca86ab: 41022106ff1db1f4e4002105ffffffff00
---
> 70f6ffffff01ac77a8afa4216f24b548e236428d03024bca86ab: 4102210630c082270b012105ffffffff00
2657212c2757183
< 70f6ffffff01c0cf46845aee71fcc27fd062253d547a83be83a3: 41022106a342145ae5002105ffffffff00
---
> 70f6ffffff01c0cf46845aee71fcc27fd062253d547a83be83a3: 41022106ada5bc9f0b012105ffffffff00
2657219c2757190
< 70f6ffffff01c857e2afbf2d669ede44ebf1a0ce022c54497c04: 41022105654b581c092105ffffffff00
---
> 70f6ffffff01c857e2afbf2d669ede44ebf1a0ce022c54497c04: 410221057f25f0a20a2105ffffffff00
2657224c2757195
< 70f6ffffff01d8da4f4721649e2a8144dbf183e8b3e030c8e6cd: 410221063c8a0135e5002105ffffffff00
---
> 70f6ffffff01d8da4f4721649e2a8144dbf183e8b3e030c8e6cd: 410221069aed55720b012105ffffffff00
2657235c2757206
< 70f6ffffff01e43c77ed610894ee38215cd9fd56eb2166dca74c: 41022106793bdc12e6002105ffffffff00
---
> 70f6ffffff01e43c77ed610894ee38215cd9fd56eb2166dca74c: 4102210605d7bf790c012105ffffffff00

There is also diff for GAS contract, it is a bit bigger, won't attach here.

Script I used to produce files:

func TestExport(t *testing.T) {
	runExport(t, "./mainnet.bolt")
	runExport(t, "./corrupted.mainnet.bolt")
}

func runExport(t *testing.T, path string) {
	db, err := bbolt.Open(path, os.ModePerm, &bbolt.Options{
		Timeout:  time.Second,
		ReadOnly: true,
	})
	require.NoError(t, err)
	defer db.Close()

	f, err := os.OpenFile(path+".keys", os.O_CREATE|os.O_TRUNC|os.O_WRONLY, os.ModePerm)
	require.NoError(t, err)
	defer f.Close()

	require.NoError(t, db.View(func(tx *bbolt.Tx) error {
		printBucket(f, tx.Bucket([]byte("DB")), "")
		return nil
	}))
}

func printBucket(f *os.File, b *bbolt.Bucket, indent string) {
	c := b.Cursor()
	for k, v := c.First(); k != nil; k, v = c.Next() {
		if v != nil {
			f.WriteString(indent)
			f.WriteString(hex.EncodeToString(k))
			f.WriteString(": ")
			f.WriteString(hex.EncodeToString(v))
			f.WriteString("\n")
		} else {
			printBucket(f, b.Bucket(k), indent+hex.EncodeToString(k)+" : ")
		}
	}
}

Account is blocked when it's in the Policy's storage. Introduced in bbbc680. This bug leads to the fact that during native Neo cache initialization at the last block in the dBFT epoch, all candidates accounts are "blocked", and thus, stand-by committee and validators are used in the subsequent new epoch. Close #3424. This bug may lead to the consequenses described in #3273, but it needs to be confirmed. Signed-off-by: Anna Shaleva <shaleva.ann@nspcc.ru>

Account is blocked when it's in the Policy's storage, not when it's missing from the Policy storage. Introduced in bbbc680. This bug leads to the fact that during native Neo cache initialization at the last block in the dBFT epoch, all candidates accounts are "blocked", and thus, stand-by committee and validators are used in the subsequent new epoch. Close #3424. This bug may lead to the consequenses described in #3273, but it needs to be confirmed. Signed-off-by: Anna Shaleva <shaleva.ann@nspcc.ru>

Account is blocked when it's in the Policy's storage, not when it's missing from the Policy storage. Introduced in bbbc680. This bug leads to the fact that during native Neo cache initialization at the last block in the dBFT epoch, all candidates accounts are "blocked", and thus, stand-by committee and validators are used in the subsequent new epoch. Close #3424. This bug may lead to the consequences described in #3273, but it needs to be confirmed. Signed-off-by: Anna Shaleva <shaleva.ann@nspcc.ru>

AnnaShaleva · 2024-05-17T13:11:44Z

@EdgeDLT, @fyfyrchik, please take a look at the #3443. It should fix the problem, but it needs to be confirmed, thus I'm not closing this issue.

AnnaShaleva · 2024-05-21T13:12:36Z

@EdgeDLT, @fyfyrchik, 0.106.0 is out, you may try it. Please, write back if succeeded.

EdgeDLT added the bug Something isn't working label Dec 19, 2023

roman-khimov added this to the v0.105.0 milestone Dec 19, 2023

roman-khimov mentioned this issue Dec 21, 2023

T5 node can't init MPT after OOM killer at height 1779831 #3082

Open

roman-khimov added I3 Minimal impact U3 Regular S4 Routine labels Dec 21, 2023

roman-khimov modified the milestones: v0.105.0, v0.106.0 Dec 21, 2023

roman-khimov modified the milestones: v0.106.0, v0.107.0 Mar 22, 2024

AnnaShaleva mentioned this issue Apr 27, 2024

Mainnet state difference at *some* of the RPC nodes at *empty* block 5282529 #3424

Closed

AnnaShaleva mentioned this issue May 17, 2024

native: fix Policy's IsBlocked behaviour #3443

Merged

AnnaShaleva modified the milestones: v0.107.0, v0.106.1 May 21, 2024

roman-khimov modified the milestones: v0.106.1, v0.106.2 Jun 1, 2024

roman-khimov modified the milestones: v0.106.2, v0.106.3 Jun 11, 2024

roman-khimov removed this from the v0.106.3 milestone Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarted W10 node sometimes becomes stuck adding new block (onPersist insufficient funds) #3273

Restarted W10 node sometimes becomes stuck adding new block (onPersist insufficient funds) #3273

EdgeDLT commented Dec 19, 2023

roman-khimov commented Dec 19, 2023

EdgeDLT commented Dec 19, 2023

roman-khimov commented Dec 19, 2023

EdgeDLT commented Jan 3, 2024

AnnaShaleva commented Jan 12, 2024 •

edited

Loading

roman-khimov commented Jan 12, 2024

AnnaShaleva commented Jan 12, 2024 •

edited

Loading

fyfyrchik commented Apr 5, 2024

fyfyrchik commented Apr 5, 2024 •

edited

Loading

roman-khimov commented Apr 5, 2024

fyfyrchik commented Apr 8, 2024 •

edited

Loading

AnnaShaleva commented May 17, 2024

AnnaShaleva commented May 21, 2024

Restarted W10 node sometimes becomes stuck adding new block (onPersist insufficient funds) #3273

Restarted W10 node sometimes becomes stuck adding new block (onPersist insufficient funds) #3273

Comments

EdgeDLT commented Dec 19, 2023

Current Behavior

Expected Behavior

Possible Solution

Steps to Reproduce

Context

Your Environment

roman-khimov commented Dec 19, 2023

EdgeDLT commented Dec 19, 2023

roman-khimov commented Dec 19, 2023

EdgeDLT commented Jan 3, 2024

AnnaShaleva commented Jan 12, 2024 • edited Loading

roman-khimov commented Jan 12, 2024

AnnaShaleva commented Jan 12, 2024 • edited Loading

fyfyrchik commented Apr 5, 2024

fyfyrchik commented Apr 5, 2024 • edited Loading

roman-khimov commented Apr 5, 2024

fyfyrchik commented Apr 8, 2024 • edited Loading

AnnaShaleva commented May 17, 2024

AnnaShaleva commented May 21, 2024

AnnaShaleva commented Jan 12, 2024 •

edited

Loading

AnnaShaleva commented Jan 12, 2024 •

edited

Loading

fyfyrchik commented Apr 5, 2024 •

edited

Loading

fyfyrchik commented Apr 8, 2024 •

edited

Loading