Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node crash after node restart #1390

Open
bekman1 opened this issue Jun 13, 2021 · 11 comments
Open

Node crash after node restart #1390

bekman1 opened this issue Jun 13, 2021 · 11 comments

Comments

@bekman1
Copy link

bekman1 commented Jun 13, 2021

Hi guys, i set a Ergo node on my MacOS BigSur 11.4
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode)
Ergo Protocol Reference Client 4.0.12

By first node start all works ok, but if i stop the node and run again, the node start to sync, get some peer connection, but after short time i get my computer overloaded and this java error:

Uncaught error from thread [ergoref-akka.actor.default-dispatcher-5]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[ergoref] java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Unknown Source) at java.base/java.util.ArrayList.grow(Unknown Source) at java.base/java.util.ArrayList.grow(Unknown Source) at java.base/java.util.ArrayList.add(Unknown Source) at java.base/java.util.ArrayList.add(Unknown Source) at org.iq80.leveldb.impl.WriteBatchImpl.delete(WriteBatchImpl.java:70) at org.iq80.leveldb.impl.WriteBatchImpl.delete(WriteBatchImpl.java:31) at scorex.db.LDBVersionedStore.$anonfun$cleanStart$1(LDBVersionedStore.scala:254) at scorex.db.LDBVersionedStore.$anonfun$cleanStart$1$adapted(LDBVersionedStore.scala:253) at scorex.db.LDBVersionedStore$$Lambda$2638/0x00000008008f8840.apply(Unknown Source) at scala.collection.immutable.NumericRange.foreach(NumericRange.scala:74) at scorex.db.LDBVersionedStore.cleanStart(LDBVersionedStore.scala:253) at scorex.db.LDBVersionedStore.update(LDBVersionedStore.scala:228) at scorex.crypto.authds.avltree.batch.VersionedLDBAVLStorage.$anonfun$update$1(VersionedLDBAVLStorage.scala:60) at scorex.crypto.authds.avltree.batch.VersionedLDBAVLStorage$$Lambda$2632/0x00000008008b8840.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at scorex.crypto.authds.avltree.batch.VersionedLDBAVLStorage.update(VersionedLDBAVLStorage.scala:52) at scorex.crypto.authds.avltree.batch.PersistentBatchAVLProver.generateProofAndUpdateStorage(PersistentBatchAVLProver.scala:25) at org.ergoplatform.nodeView.state.UtxoState.$anonfun$applyModifier$2(UtxoState.scala:113) at org.ergoplatform.nodeView.state.UtxoState$$Lambda$2617/0x000000080090d840.apply(Unknown Source) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at org.ergoplatform.nodeView.state.UtxoState.$anonfun$applyModifier$1(UtxoState.scala:110) at org.ergoplatform.nodeView.state.UtxoState$$Lambda$1973/0x0000000800ba1040.apply(Unknown Source) at scala.util.Success.flatMap(Try.scala:251) at org.ergoplatform.nodeView.state.UtxoState.applyModifier(UtxoState.scala:109) at org.ergoplatform.nodeView.state.UtxoState.applyModifier(UtxoState.scala:34) at scorex.core.NodeViewHolder.$anonfun$applyState$1(NodeViewHolder.scala:249) at scorex.core.NodeViewHolder$$Lambda$1929/0x0000000800b8e040.apply(Unknown Source) at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162)

Unlock wallet is also imposible after node restart.
Whats wrong? Regards

@pragmaxim
Copy link
Collaborator

Give it more memory, ie. run the java command with parameter java -Xmx1024m.

@bekman1
Copy link
Author

bekman1 commented Jun 14, 2021

1024m not help((
attempt with 2048m and 4096m also not help. Looks like a memory leak in code.
Programm on start use 1.36-1.39 GB RAM, but after a short time usage jump to 4.35 GB and i got same error "Out of memory"

18:44:36.924 INFO [tor.default-dispatcher-23] s.c.n.peer.PeerManager - /159.65.11.55:9030 penalized, penalty: NonDeliveryPenalty 18:44:36.924 INFO [tor.default-dispatcher-23] s.c.n.peer.PeerManager - /159.65.11.55:9030 penalized, penalty: NonDeliveryPenalty 18:44:37.148 INFO [tor.default-dispatcher-19] o.e.n.ErgoNodeViewSynchronizer - Peer ConnectedPeer(ConnectionId(remote=/159.65.11.55:9030, local=/192.168.178.91:56812, direction=Outgoing)) has not delivered asked modifier 5985b0ac13c7f89a2c8b08f3ebfce3f5c783eb915bde3bad2c340b08ad86fd92 on time 18:44:37.148 INFO [tor.default-dispatcher-23] s.c.n.peer.PeerManager - /159.65.11.55:9030 penalized, penalty: NonDeliveryPenalty 18:44:39.066 INFO [tor.default-dispatcher-22] s.c.n.NetworkController - Failed to connect to : /104.186.109.55:9030 akka.io.TcpOutgoingConnection$$anon$2: Connect timeout of Some(1 second) expired 18:44:42.524 INFO [tor.default-dispatcher-11] s.c.n.NetworkController - Connecting to peer: PeerInfo(PeerSpec(ergoref,Version(4,0,10),legatumpool-node-1,Some(/63.250.54.96:9030),Vector(ModeFeature(utxo,true,None,-1))),0,None) Uncaught error from thread [ergoref-akka.actor.default-dispatcher-15]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[ergoref] java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Unknown Source) at java.base/java.util.ArrayList.grow(Unknown Source) at java.base/java.util.ArrayList.grow(Unknown Source) at java.base/java.util.ArrayList.add(Unknown Source) at java.base/java.util.ArrayList.add(Unknown Source) at org.iq80.leveldb.impl.WriteBatchImpl.delete(WriteBatchImpl.java:70) at org.iq80.leveldb.impl.WriteBatchImpl.delete(WriteBatchImpl.java:31) at scorex.db.LDBVersionedStore.$anonfun$cleanStart$1(LDBVersionedStore.scala:254) at scorex.db.LDBVersionedStore.$anonfun$cleanStart$1$adapted(LDBVersionedStore.scala:253) at scorex.db.LDBVersionedStore$$Lambda$2641/0x0000000800913840.apply(Unknown Source) at scala.collection.immutable.NumericRange.foreach(NumericRange.scala:74) at scorex.db.LDBVersionedStore.cleanStart(LDBVersionedStore.scala:253) at scorex.db.LDBVersionedStore.update(LDBVersionedStore.scala:228) at scorex.crypto.authds.avltree.batch.VersionedLDBAVLStorage.$anonfun$update$1(VersionedLDBAVLStorage.scala:60) at scorex.crypto.authds.avltree.batch.VersionedLDBAVLStorage$$Lambda$2635/0x00000008008f3c40.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at scorex.crypto.authds.avltree.batch.VersionedLDBAVLStorage.update(VersionedLDBAVLStorage.scala:52) at scorex.crypto.authds.avltree.batch.PersistentBatchAVLProver.generateProofAndUpdateStorage(PersistentBatchAVLProver.scala:25) at org.ergoplatform.nodeView.state.UtxoState.$anonfun$applyModifier$2(UtxoState.scala:113) at org.ergoplatform.nodeView.state.UtxoState$$Lambda$2620/0x0000000800eb5040.apply(Unknown Source) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at org.ergoplatform.nodeView.state.UtxoState.$anonfun$applyModifier$1(UtxoState.scala:110) at org.ergoplatform.nodeView.state.UtxoState$$Lambda$1956/0x0000000800ba2840.apply(Unknown Source) at scala.util.Success.flatMap(Try.scala:251) at org.ergoplatform.nodeView.state.UtxoState.applyModifier(UtxoState.scala:109) at org.ergoplatform.nodeView.state.UtxoState.applyModifier(UtxoState.scala:34) at scorex.core.NodeViewHolder.$anonfun$applyState$1(NodeViewHolder.scala:249) at scorex.core.NodeViewHolder$$Lambda$1913/0x0000000800b90040.apply(Unknown Source) at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162) at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:162) 18:44:42.531 INFO [tor.default-dispatcher-12] o.e.n.UtxoNodeViewHolder - Apply modifier 3028384d4201f18befa4479ed8607d388e945fd63930764d854d85812f2b22d5 of type 108 to nodeViewHolder [ERROR] [SECURITY][06/14/2021 18:44:42.533] [ergoref-akka.actor.default-dispatcher-15] [akka.actor.ActorSystemImpl(ergoref)] Uncaught error from thread [ergoref-akka.actor.default-dispatcher-15]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[ergoref]
error always at the same place, after ConnectedPeer has not delivered asked modifier

@kushti
Copy link
Member

kushti commented Jun 15, 2021

@bekman1 @pragmaxim on MacOS, pure-Java port of LevelDB is used (as native LevelDB via JNI had problems with crashing all the time).

If error is about the same (and having scorex.db.LDBVersionedStore.cleanStart(LDBVersionedStore.scala:253 in its stacktrace) all the time,you can try following setting

ergo {
...
node {
...
keepVersions = 20
}
}

@bekman1
Copy link
Author

bekman1 commented Jun 17, 2021

added in ergo.conf keepVersions = 20 , started java with 4GB heap, but also got OutOfMemory error. But this time error come at this place:

10:16:54.223 INFO [tor.default-dispatcher-18] s.c.n.NetworkController - Connecting to peer: PeerInfo(PeerSpec(unknown,Version(0,0,1),unknown-/209.217.206.254:9030,Some(/209.217.206.254:9030),List()),0,None)
10:16:54.223 WARN [tor.default-dispatcher-18] s.c.n.NetworkController - Connection to peer /209.217.206.254:9030 is already established
10:16:59.414 INFO [ctor.default-dispatcher-4] s.c.n.NetworkController - Connecting to peer: PeerInfo(PeerSpec(ergoref,Version(4,0,10),ergo-mainnet-4.0.0,Some(/159.65.11.55:9030),Vector(ModeFeature(utxo,true,None,-1), SessionIdPeerFeature([B@4c09a27,7884669804957321718))),1623914123569,Some(Outgoing))
10:16:59.414 WARN [ctor.default-dispatcher-4] s.c.n.NetworkController - Connection to peer /159.65.11.55:9030 is already established

Uncaught error from thread [ergoref-akka.actor.default-dispatcher-17]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[ergoref]
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Unknown Source)
10:16:59.416 INFO [ctor.default-dispatcher-4] o.e.n.UtxoNodeViewHolder - Apply modifier c08642ef85ae81042338563e50909bde9edb4592832d9ea78e95cfcf50a1fc40 of type 108 to nodeViewHolder
at java.base/java.util.ArrayList.grow(Unknown Source)
at java.base/java.util.ArrayList.grow(Unknown Source)
at java.base/java.util.ArrayList.add(Unknown Source)
at java.base/java.util.ArrayList.add(Unknown Source)
at org.iq80.leveldb.impl.WriteBatchImpl.delete(WriteBatchImpl.java:70)
at org.iq80.leveldb.impl.WriteBatchImpl.delete(WriteBatchImpl.java:31)
at scorex.db.LDBVersionedStore.$anonfun$cleanStart$1(LDBVersionedStore.scala:254)
at scorex.db.LDBVersionedStore.$anonfun$cleanStart$1$adapted(LDBVersionedStore.scala:253)
at scorex.db.LDBVersionedStore$$Lambda$2617/0x00000008009f4040.apply(Unknown Source)
at scala.collection.immutable.NumericRange.foreach(NumericRange.scala:74)
at scorex.db.LDBVersionedStore.cleanStart(LDBVersionedStore.scala:253)
at scorex.db.LDBVersionedStore.update(LDBVersionedStore.scala:228)
at scorex.crypto.authds.avltree.batch.VersionedLDBAVLStorage.$anonfun$update$1(VersionedLDBAVLStorage.scala:60)
at scorex.crypto.authds.avltree.batch.VersionedLDBAVLStorage$$Lambda$2611/0x00000008009fdc40.apply$mcV$sp(Unknown Source)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at scorex.crypto.authds.avltree.batch.VersionedLDBAVLStorage.update(VersionedLDBAVLStorage.scala:52)
at scorex.crypto.authds.avltree.batch.PersistentBatchAVLProver.generateProofAndUpdateStorage(PersistentBatchAVLProver.scala:25)
at org.ergoplatform.nodeView.state.UtxoState.$anonfun$applyModifier$2(UtxoState.scala:113)
at org.ergoplatform.nodeView.state.UtxoState$$Lambda$2596/0x0000000800a10040.apply(Unknown Source)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at org.ergoplatform.nodeView.state.UtxoState.$anonfun$applyModifier$1(UtxoState.scala:110)
at org.ergoplatform.nodeView.state.UtxoState$$Lambda$1958/0x0000000800ba6840.apply(Unknown Source)
at scala.util.Success.flatMap(Try.scala:251)
at org.ergoplatform.nodeView.state.UtxoState.applyModifier(UtxoState.scala:109)
at org.ergoplatform.nodeView.state.UtxoState.applyModifier(UtxoState.scala:34)
at scorex.core.NodeViewHolder.$anonfun$applyState$1(NodeViewHolder.scala:249)
at scorex.core.NodeViewHolder$$Lambda$1914/0x0000000800b93840.apply(Unknown Source)
at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162)
at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:162)

@glasgowm148
Copy link
Member

The mac database has been replaced, please open a new issue if the problem persists in the latest node release.

@dreams-money
Copy link

I seem to have this same or similar problem using ARM in particular. x86 machines with similar specs (2core, 4gb ram) can sync and run just fine. I stuck to x86, but figured I should at least document the instance:

ergo.log

@kushti
Copy link
Member

kushti commented Feb 9, 2022

The problem is in pure-Java LevelDB implementation which is nightmare it seems

@dreams-money
Copy link

The Github repo does seem to have a seemingly similar 2020 open issue. Would it be appropriate to mention this?

@kushti
Copy link
Member

kushti commented Feb 14, 2022

Yes, could be related, thanks!

@pcmind
Copy link

pcmind commented Apr 18, 2022

I know nothing of ergo, but intrigued with the issue been reported to Java Leveldb.

I made my some testing on ergo docker image cdd2cdc38420 to reproduce the issue.

To produce the OutOfMemoryError I did the following:

  1. Removed leveldb native implementation from META-INF\native folder in ergo.jar to force usage of Java implementation.
  2. Used docker run --rm -p 9030:9030 -p 127.0.0.1:9053:9053 -v /localdatafolder:/home/ergo/.ergo --entrypoint "java" ergoplatform/ergo -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/localdatafolder -jar /home/ergo/.ergo/modifiedergo.jar --mainnet --config /home/ergo/.ergo/ergo.conf
  3. Wait for out of memory issue. (same with restarting the container)

By inspecting the heap dump, 98% of the space used by a leveldb object WriteBatchImpl, retained by scorex.db.LDBVersionedStore$$Lambda$1189, from a ForkJoin thread: ergoref-akka.actor.default-dispatcher-5.
So this is not a memory leak, but how Ergo code operate. In my case, I see a single WriteBatchImpl, being created by LDBVersionedStore with almost 37300000 Entries at the moment of failure.

Leveldb jni and pure Java have different memory model/consumption so using native implementation could also produce memory issues, just eventually later.

@dreams-money
Copy link

I know nothing of ergo, but intrigued with the issue been reported to Java Leveldb.

I made my some testing on ergo docker image cdd2cdc38420 to reproduce the issue.

To produce the OutOfMemoryError I did the following:

  1. Removed leveldb native implementation from META-INF\native folder in ergo.jar to force usage of Java implementation.
  2. Used docker run --rm -p 9030:9030 -p 127.0.0.1:9053:9053 -v /localdatafolder:/home/ergo/.ergo --entrypoint "java" ergoplatform/ergo -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/localdatafolder -jar /home/ergo/.ergo/modifiedergo.jar --mainnet --config /home/ergo/.ergo/ergo.conf
  3. Wait for out of memory issue. (same with restarting the container)

By inspecting the heap dump, 98% of the space used by a leveldb object WriteBatchImpl, retained by scorex.db.LDBVersionedStore$$Lambda$1189, from a ForkJoin thread: ergoref-akka.actor.default-dispatcher-5. So this is not a memory leak, but how Ergo code operate. In my case, I see a single WriteBatchImpl, being created by LDBVersionedStore with almost 37300000 Entries at the moment of failure.

Leveldb jni and pure Java have different memory model/consumption so using native implementation could also produce memory issues, just eventually later.

@kushti @pragmaxim Is there a way to curve what @pcmind found? - I'm wondering if this would solve the sync issues I've been hearing about (mostly PI syncing issues).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants