Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node stuck in "Generating state snapshot" #337

Open
andreclaro opened this issue Jun 17, 2024 · 28 comments
Open

Node stuck in "Generating state snapshot" #337

andreclaro opened this issue Jun 17, 2024 · 28 comments

Comments

@andreclaro
Copy link

andreclaro commented Jun 17, 2024

Summary

Node stuck in "Generating state snapshot" and not processing new blocks after upgrade to the latest software releases of op-geth (v1.101315.0 -> v1.101315.2) and op-node (v1.7.5 -> v1.7.7).

System information

Geth version: geth version

geth --version
geth version 1.101315.2-stable-7c281983

Op-node:

op-node --version
op-node version v0.10.14-dev

OS & Version: Linux Ubuntu

Expected behaviour

  • Process new blocks after upgrade/restart of the service
  • Generating state snapshot should not block node of processing new blocks

Actual behaviour

Generating state snapshot process is blocking op-geth of processing new blocks

Logs

op-geth logs:

geth[37251]: INFO [06-17|09:44:25.167] Generating state snapshot                root=85a73c..ab3243 at=0f1c8e..ddc728 accounts=8,212,647 slots=16,975,488 storage=1.78GiB    dangling=0 elapsed=17m45.437s eta=10h50m31.417s
geth[37251]: INFO [06-17|09:44:33.169] Generating state snapshot                root=85a73c..ab3243 at=0f326b..ebf92b accounts=8,258,971 slots=16,993,561 storage=1.78GiB    dangling=0 elapsed=17m53.438s eta=10h46m46.576s
geth[37251]: INFO [06-17|09:44:41.189] Generating state snapshot                root=85a73c..ab3243 at=0f492d..c259f8 accounts=8,307,499 slots=17,021,954 storage=1.79GiB    dangling=0 elapsed=18m1.459s  eta=10h42m47.389s
geth[37251]: INFO [06-17|09:44:49.207] Generating state snapshot                root=85a73c..ab3243 at=0f5bc9..6d21aa accounts=8,347,047 slots=17,064,430 storage=1.79GiB    dangling=0 elapsed=18m9.477s  eta=10h40m27.745s
eth[37251]: INFO [06-17|09:44:57.213] Generating state snapshot                root=85a73c..ab3243 in=0f6b51..25dc0c at=ca889b..4eac13 accounts=8,379,966 slots=17,122,787 storage=1.80GiB    dangling=0 elapsed=18m17.483s eta=10h39m19.246s

op-node logs:

Jun 17 09:44:53 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:53+0000 lvl=info msg="Received signed execution payload from p2p" id=0xad94307cc0391975de86ce7a47f50a6022ef2acc99c5839a061df7d85bb321d7:121509358 peer=16Uiu2HAmDzdJ2aupdsshYr8JbsVKd1qKgUwmT3YNSLtazT6Eq7AD
Jun 17 09:44:53 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:53+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0xad94307cc0391975de86ce7a47f50a6022ef2acc99c5839a061df7d85bb321d7:121509358
Jun 17 09:44:53 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:53+0000 lvl=warn msg="failed p2p sync request" peer=16Uiu2HAmTrP55nReSSHC8seJ29DFg1vQKmFXhtdW4GsSkXBPgHSv num=121507552 err="failed to read result part of response: stream reset"
Jun 17 09:44:53 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:53+0000 lvl=warn msg="failed p2p sync request" peer=16Uiu2HAm2ZSgLR74xfseQ3YGJEMpAMuuC2q5muzCR77K1YRGavoE num=121507551 err="failed to open stream: failed to negotiate protocol: protocols not supported: [/opstack/req/payload_by_number/10/0]"
Jun 17 09:44:53 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:53+0000 lvl=warn msg="failed p2p sync request" peer=16Uiu2HAmAAP78k6MEFKbZjFvwC2Syr86ePVKJGRa9eXeCqrnXiAq num=121507550 err="peer failed to serve request with code 1: not found"
Jun 17 09:44:53 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:53+0000 lvl=warn msg="cancelling p2p sync range request" peer=16Uiu2HAmAAP78k6MEFKbZjFvwC2Syr86ePVKJGRa9eXeCqrnXiAq rangeReqId=241
Jun 17 09:44:53 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:53+0000 lvl=warn msg="failed p2p sync request" peer=16Uiu2HAm2soquNJBrFk89c48kJMiP857W2TXubux7rQ5w8jhqajd num=121507554 err="peer failed to serve request with code 1: not found"
Jun 17 09:44:53 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:53+0000 lvl=warn msg="cancelling p2p sync range request" peer=16Uiu2HAm2soquNJBrFk89c48kJMiP857W2TXubux7rQ5w8jhqajd rangeReqId=241
Jun 17 09:44:55 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:55+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmQGLnG7sjhQYCkTD9YmB56ALcs7JBXTS2TN8vTaNDNGFr
Jun 17 09:44:55 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:55+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmEF6SvDakawQHQ3pkeC3zQRw6b6ZQHsNtDHEATm1SgbgL
Jun 17 09:44:55 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:55+0000 lvl=info msg="attempting connection" peer=16Uiu2HAm4M7kPVxrofcviyS1YYJUcm8pXSWVjjJrqgf9EyS1YFP5
Jun 17 09:44:55 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:55+0000 lvl=info msg="Received signed execution payload from p2p" id=0x67bddca00bc9bff9dd3d184465fd6d1eaf290260e428fc1f743c147c7aa4e70c:121509359 peer=16Uiu2HAmDzdJ2aupdsshYr8JbsVKd1qKgUwmT3YNSLtazT6Eq7AD
Jun 17 09:44:55 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:55+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmSViPfWUXA8APJcgVKXPp736AHfknVB8ccze4tbMWvvFY
Jun 17 09:44:56 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:56+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmCkYPPPG1Ggr37zRsVZUswYQxL9sSVRDgAgoFKfR89pzJ
Jun 17 09:44:57 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:57+0000 lvl=info msg="Received signed execution payload from p2p" id=0x326202ca06865686ecb4a24100fd533ec279f68baab64151fac35a5e02f7b61f:121509360 peer=16Uiu2HAmDzdJ2aupdsshYr8JbsVKd1qKgUwmT3YNSLtazT6Eq7AD
Jun 17 09:44:58 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:58+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmFkMXtRfYQcksZjhzLH5dytN5JeQNhiyZVRC6x4qBFJYy
Jun 17 09:44:58 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:58+0000 lvl=warn msg="failed p2p sync request" peer=16Uiu2HAmAAP78k6MEFKbZjFvwC2Syr86ePVKJGRa9eXeCqrnXiAq num=121507615 err="peer failed to serve request with code 1: not found"
Jun 17 09:44:58 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:58+0000 lvl=warn msg="cancelling p2p sync range request" peer=16Uiu2HAmAAP78k6MEFKbZjFvwC2Syr86ePVKJGRa9eXeCqrnXiAq rangeReqId=242
Jun 17 09:44:58 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:58+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmFg5sVt6Fee9F44c9FtzZi4Dix8geWjgqku4FQanFSbfq
Jun 17 09:44:59 m-optimism-02 op-node[38623]: t=2024-06-17T09:44:59+0000 lvl=info msg="attempting connection" peer=16Uiu2HAm8MXF3pZ3EhKf4MHYx8wCh5E54oFgDLwboCHxNS3MHVEJ

CLI

op-geth:

usr/local/bin/geth \
    --op-network=op-mainnet \
    --datadir=/var/lib/optimism/data \
    --syncmode=snap \
    --gcmode=archive \
    --http \
    --http.addr=0.0.0.0 \
    --http.port=8545 \
    --http.vhosts="*" \
    --http.corsdomain="*" \
    --http.api=web3,debug,eth,net,engine,geth \
    --ws \
    --ws.port=8546 \
    --ws.addr=0.0.0.0 \
    --ws.origins="*" \
    --ws.api=debug,eth,net,engine \
    --authrpc.addr=127.0.0.1 \
    --authrpc.port=8551 \
    --authrpc.jwtsecret=/var/lib/optimism/jwt.txt \
    --authrpc.vhosts="*" \
    --metrics \
    --metrics.addr=0.0.0.0 \
    --metrics.port=6060 \
    --port=30303 \
    --discovery.port=30303 \
    --maxpeers=500 \
    --rollup.historicalrpc=https://mainnet.optimism.io \
    --rollup.disabletxpoolgossip=true \
    --rollup.sequencerhttp=https://mainnet-sequencer.optimism.io \
    --rollup.superchain-upgrades \
    --rollup.halt=major \
    --verbosity=3

op-node:

/usr/local/bin/op-node \
    --syncmode=execution-layer \
    --l1.trustrpc \
    --l1.rpckind=basic \
    --l1=http://IP_ADDRESSS:8545 \
    --l1.beacon=http://IP_ADDRESSS:5052 \
    --l2=http://127.0.0.1:8551 \
    --rpc.addr=127.0.0.1 \
    --rpc.port=9545 \
    --l2.jwt-secret=/var/lib/optimism/jwt.txt \
    --network=op-mainnet \
    --p2p.peerstore.path=/var/lib/optimism/data/opnode_peerstore_db \
    --p2p.priv.path=/var/lib/optimism/data/opnode_p2p_priv.txt \
    --p2p.discovery.path=/var/lib/optimism/data/opnode_discovery_db \
    --rollup.load-protocol-versions=true \
    --rollup.halt=major
@opfocus
Copy link

opfocus commented Jun 17, 2024

Please add additional information: description of component upgrade process, full log after op-geth startup, other warning logs for op-node (p2p issues are usually not critical issues)

@andreclaro
Copy link
Author

Description of component upgrade process:

  • Compile op-node and op-geth from source
  • Stop op-node and op-geth
  • Start op-geth
  • Wait a few seconds
  • Start op-node

Node is now slowly recovering and showing logs like:

  • "Resuming state snapshot generation"
  • "Imported new potential chain segment"
  • "Chain head was updated"

Op-geth logs:

Jun 17 10:54:13 m-optimism-02 geth[3431]: INFO [06-17|10:54:13.740] Starting work on payload                 id=0x034d324ac0411490
Jun 17 10:54:13 m-optimism-02 geth[3431]: INFO [06-17|10:54:13.759] Aborting state snapshot generation       root=c5204c..ef90dc at=203350..a2bfba accounts=17,495,088 slots=29,517,607 storage=3.29GiB    dangling=0 elapsed=33m44.383s  eta=4h31m5.091s
Jun 17 10:54:13 m-optimism-02 geth[3431]: INFO [06-17|10:54:13.760] Resuming state snapshot generation       root=ca74fd..49a798 at=203350..a2bfba accounts=17,495,088 slots=29,517,607 storage=3.29GiB    dangling=0 elapsed=33m44.384s  eta=4h31m5.099s
Jun 17 10:54:13 m-optimism-02 geth[3431]: INFO [06-17|10:54:13.766] Imported new potential chain segment     number=121,504,439 hash=008c08..454d3c blocks=1 txs=13 mgas=0.884  elapsed=20.687ms    mgasps=42.733  age=3h53m18s snapdiffs=80.09KiB  triedirty=0.00B
Jun 17 10:54:13 m-optimism-02 geth[3431]: INFO [06-17|10:54:13.768] Chain head was updated                   number=121,504,439 hash=008c08..454d3c root=390e44..4c6b24 elapsed=1.223119ms  age=3h53m18s
Jun 17 10:54:13 m-optimism-02 geth[3431]: INFO [06-17|10:54:13.972] Starting work on payload                 id=0x03e1ad5ab32d2b49
Jun 17 10:54:13 m-optimism-02 geth[3431]: INFO [06-17|10:54:13.999] Aborting state snapshot generation       root=ca74fd..49a798 at=20348a..4c7bcf accounts=17,497,643 slots=29,518,485 storage=3.29GiB    dangling=0 elapsed=33m44.623s  eta=4h31m3.881s
Jun 17 10:54:14 m-optimism-02 geth[3431]: INFO [06-17|10:54:14.000] Resuming state snapshot generation       root=2814ad..3125a9 at=20348a..4c7bcf accounts=17,497,643 slots=29,518,485 storage=3.29GiB    dangling=0 elapsed=33m44.624s  eta=4h31m3.889s
Jun 17 10:54:14 m-optimism-02 geth[3431]: INFO [06-17|10:54:14.014] Imported new potential chain segment     number=121,504,440 hash=2cf535..9bd990 blocks=1 txs=14 mgas=3.154  elapsed=34.805ms    mgasps=90.624  age=3h53m17s snapdiffs=83.60KiB  triedirty=0.00B
Jun 17 10:54:14 m-optimism-02 geth[3431]: INFO [06-17|10:54:14.017] Chain head was updated                   number=121,504,440 hash=2cf535..9bd990 root=88bdd1..75cf2f elapsed=1.714696ms  age=3h53m17s

Additionally, I tried to disable snapshot (--snapshot=false) but op-geth is still entering in the generating snapshot

@opfocus
Copy link

opfocus commented Jun 17, 2024

I think snapshot here is not a same concept.
In addition:

Jun 17 10:54:14 m-optimism-02 geth[3431]: INFO [06-17|10:54:14.014] Imported new potential chain segment number=121,504,440 hash=2cf535..9bd990 blocks=1 txs=14 mgas=3.154 elapsed=34.805ms mgasps=90.624 age=3h53m17s snapdiffs=83.60KiB triedirty=0.00B
Jun 17 10:54:14 m-optimism-02 geth[3431]: INFO [06-17|10:54:14.017] Chain head was updated number=121,504,440 hash=2cf535..9bd990 root=88bdd1..75cf2f elapsed= 1.714696ms age=3h53m17s

It may mean that it has returned to normal

@andreclaro
Copy link
Author

btw, snapshot needs to be enabled with we use snap-sync:

log.Warn("Snap sync requested, enabling --snapshot")

@opfocus
Copy link

opfocus commented Jun 17, 2024

btw, snapshot needs to be enabled with we use snap-sync:

log.Warn("Snap sync requested, enabling --snapshot")

Sorry, I don't know much about this. Here may be some relevant information:

  1. Log related:

    // generate is a background thread that iterates over the state and storage tries,

  2. flag default value
    --snapshot (default: true) ($GETH_SNAPSHOT)
    Enables snapshot-database mode (default = enable)

  3. geth documentation part:
    https://geth.ethereum.org/docs/fundamentals/logs#syncing

The crux of the matter is why Generating state snapshot has to happen and is so time consuming, right?
I'm looking for help from others

@andreclaro
Copy link
Author

the issue happen again in mainnet...

Description

"Generating state snapshot" is affecting node, which gets out of sync.

System information

  • op_geth: "v1.101315.2"
  • op_node: "op-node/v1.7.7"
  • Ubuntu

Expected behaviour

Node should not get out of sync.

Actual behaviour

Node gets out of sync by more than 6000 blocks.

Logs

This are the latest logs where the node is also processing blocks ("Imported new potential chain segment" / "Chain head was updated"). However for some time the node is only generating state snapshots and getting further behind the public network.

op-geth

Jun 27 19:57:35 m-optimism-02 geth[1263]: INFO [06-27|19:57:35.352] Generating state snapshot                root=04f327..4a689f in=da96cf..e517af at=b5e103..ba5cef accounts=119,253,539 slots=318,285,026 storage=30.36GiB  dangling=0 elapsed=3h22m40.833s eta=34m41.291s
Jun 27 19:57:35 m-optimism-02 geth[1263]: INFO [06-27|19:57:35.413] Imported new potential chain segment     number=121,953,090 hash=20b718..e1933a blocks=1 txs=7  mgas=2.669  elapsed=477.707ms    mgasps=5.587   age=3h41m38s triedirty=0.00B
Jun 27 19:57:35 m-optimism-02 geth[1263]: INFO [06-27|19:57:35.415] Chain head was updated                   number=121,953,090 hash=20b718..e1933a root=7b0c63..b5761f elapsed=1.015623ms   age=3h41m38s
Jun 27 19:57:35 m-optimism-02 geth[1263]: INFO [06-27|19:57:35.783] Imported new potential chain segment     number=121,953,091 hash=8fea53..930873 blocks=1 txs=7  mgas=2.111  elapsed=365.674ms    mgasps=5.774   age=3h41m36s triedirty=0.00B
Jun 27 19:57:35 m-optimism-02 geth[1263]: INFO [06-27|19:57:35.785] Chain head was updated                   number=121,953,091 hash=8fea53..930873 root=2d1371..874032 elapsed="781.358µs"  age=3h41m36s
Jun 27 19:57:43 m-optimism-02 geth[1263]: INFO [06-27|19:57:43.353] Generating state snapshot                root=04f327..4a689f in=da96cf..e517af at=c11751..b4c505 accounts=119,253,539 slots=318,541,051 storage=30.38GiB  dangling=0 elapsed=3h22m48.834s eta=34m42.66s
Jun 27 19:57:44 m-optimism-02 geth[1263]: INFO [06-27|19:57:44.433] Imported new potential chain segment     number=121,953,092 hash=22d54a..34b31b blocks=1 txs=16 mgas=9.373  elapsed=8.646s       mgasps=1.084   age=3h41m43s triedirty=0.00B
Jun 27 19:57:44 m-optimism-02 geth[1263]: WARN [06-27|19:57:44.434] Ignoring already known beacon payload    number=121,953,092 hash=22d54a..34b31b age=3h41m43s
Jun 27 19:57:44 m-optimism-02 geth[1263]: INFO [06-27|19:57:44.437] Chain head was updated                   number=121,953,092 hash=22d54a..34b31b root=44874c..81f566 elapsed=1.917279ms   age=3h41m43s
Jun 27 19:57:45 m-optimism-02 geth[1263]: INFO [06-27|19:57:45.134] Imported new potential chain segment     number=121,953,093 hash=c321c6..d76921 blocks=1 txs=14 mgas=2.868  elapsed=693.580ms    mgasps=4.134   age=3h41m42s triedirty=0.00B
Jun 27 19:57:45 m-optimism-02 geth[1263]: INFO [06-27|19:57:45.137] Chain head was updated                   number=121,953,093 hash=c321c6..d76921 root=56df25..90bc84 elapsed=1.345763ms   age=3h41m42s
Jun 27 19:57:51 m-optimism-02 geth[1263]: INFO [06-27|19:57:51.371] Generating state snapshot                root=04f327..4a689f in=da96cf..e517af at=cce8b6..5d55bb accounts=119,253,539 slots=318,810,785 storage=30.40GiB  dangling=0 elapsed=3h22m56.851s eta=34m44.032s
Jun 27 19:57:51 m-optimism-02 geth[1263]: INFO [06-27|19:57:51.650] Imported new potential chain segment     number=121,953,094 hash=d20d58..9e51b9 blocks=1 txs=14 mgas=9.753  elapsed=6.508s       mgasps=1.499   age=3h41m46s triedirty=0.00B
Jun 27 19:57:52 m-optimism-02 geth[1263]: WARN [06-27|19:57:52.178] Ignoring already known beacon payload    number=121,953,094 hash=d20d58..9e51b9 age=3h41m47s
Jun 27 19:57:52 m-optimism-02 geth[1263]: INFO [06-27|19:57:52.182] Chain head was updated                   number=121,953,094 hash=d20d58..9e51b9 root=0b5694..997667 elapsed=2.497365ms   age=3h41m47s
Jun 27 19:57:52 m-optimism-02 geth[1263]: INFO [06-27|19:57:52.268] Imported new potential chain segment     number=121,953,095 hash=7217c4..12e630 blocks=1 txs=13 mgas=1.444  elapsed=83.602ms     mgasps=17.268  age=3h41m45s triedirty=0.00B
Jun 27 19:57:52 m-optimism-02 geth[1263]: INFO [06-27|19:57:52.270] Chain head was updated                   number=121,953,095 hash=7217c4..12e630 root=be08ae..a83b79 elapsed="797.269µs"  age=3h41m45
```

op-node
```
Jun 27 19:58:14 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:14+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0xac234f9146c810dea6121552580a4ed4cd864ccccde907b12a09c5120b1ba426:121959758
Jun 27 19:58:15 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:15+0000 lvl=info msg="attempting connection" peer=16Uiu2HAkxHXmw5iw5amSc32mTxoTo2R2gZKrAfx143uBbZ9P6QQr
Jun 27 19:58:15 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:15+0000 lvl=info msg="Received signed execution payload from p2p" id=0xed2040c95470d3ab3c66b398b2d66387185658a2898a9151fab17a6b8b4ca553:121959759 peer=16Uiu2HAkzc1eByUyVc8tmu97dXYwehh6QTo5MxK5VFTjHH4wkj6J
Jun 27 19:58:15 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:15+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0xed2040c95470d3ab3c66b398b2d66387185658a2898a9151fab17a6b8b4ca553:121959759
Jun 27 19:58:17 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:17+0000 lvl=info msg="Sync progress" reason="new chain head block" l2_finalized=0x22f88dea6338b4e91c6e68a4e71424b00079c36e7751a7e801a7c7499d3690f7:121952848 l2_safe=0x22f88dea6338b4e91c6e68a4e71424b00079c36e7751a7e801a7c7499d3690f7:121952848 l2_pending_safe=0x87dfd4a049bc84d2c31bd4a7dfa0b84e95ef9b7c0537253ec9277151b3b305f7:121952940 l2_unsafe=0xc97b9699f06ee17ac114e94ce2755c5aae861c6130affe754ad5883b02b7188c:121953102 l2_backup_unsafe=0x0000000000000000000000000000000000000000000000000000000000000000:0 l2_time=1719504981
Jun 27 19:58:17 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:17+0000 lvl=info msg="Received signed execution payload from p2p" id=0x999e1d918794702d659d4fa32e79beff68322863fe94385d14add76198f6de1b:121959760 peer=16Uiu2HAkzc1eByUyVc8tmu97dXYwehh6QTo5MxK5VFTjHH4wkj6J
Jun 27 19:58:17 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:17+0000 lvl=info msg="Sync progress" reason="new chain head block" l2_finalized=0x22f88dea6338b4e91c6e68a4e71424b00079c36e7751a7e801a7c7499d3690f7:121952848 l2_safe=0x22f88dea6338b4e91c6e68a4e71424b00079c36e7751a7e801a7c7499d3690f7:121952848 l2_pending_safe=0x87dfd4a049bc84d2c31bd4a7dfa0b84e95ef9b7c0537253ec9277151b3b305f7:121952940 l2_unsafe=0x55f4f8e3e37df26f8c43d58ee8c5dd7ac206d03a62a9314d08354047ae843a59:121953103 l2_backup_unsafe=0x0000000000000000000000000000000000000000000000000000000000000000:0 l2_time=1719504983
Jun 27 19:58:17 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:17+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x999e1d918794702d659d4fa32e79beff68322863fe94385d14add76198f6de1b:121959760
Jun 27 19:58:18 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:18+0000 lvl=info msg="attempting connection" peer=16Uiu2HAkzg2h8aN2gAq497tkHmN5hDy9EaKAkhbMScmCAbBPxNwJ
Jun 27 19:58:18 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:18+0000 lvl=info msg="attempting connection" peer=16Uiu2HAm1zB9TgXxvtxjLXXJfhRw68ZMeCoiJ7BwSXKY2AbDAtVU
Jun 27 19:58:19 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:19+0000 lvl=info msg="Received signed execution payload from p2p" id=0x7ab504f8fb4452ca8c1fd749fec06de95a94ac93cfee6eab02b375a5bdb485d9:121959761 peer=16Uiu2HAkzc1eByUyVc8tmu97dXYwehh6QTo5MxK5VFTjHH4wkj6J
Jun 27 19:58:20 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:20+0000 lvl=info msg="attempting connection" peer=16Uiu2HAm7v25THFJZJvkoK3bH4Wqjga9P96uWgBLRGQSYqggDA2N
Jun 27 19:58:20 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:20+0000 lvl=info msg="attempting connection" peer=16Uiu2HAm1ftqsNaMCUi6JhQ6RdZrGvQ3Vt1d69xv1ZqzVMac8Cki
Jun 27 19:58:20 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:20+0000 lvl=info msg="attempting connection" peer=16Uiu2HAm5Yo4tL1PnjeiVAvgKtr8vWzMTLSQs9xd7vYUC2fS4GJC
Jun 27 19:58:21 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:21+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmNytWmvGV1UHuUBngGFxyNsLkquzzCLeUoTLUGUrjW5Ju
Jun 27 19:58:21 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:21+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmFBzRPMeAkNUipyJ1DEN5d9XyvvNvEP41xotCiPhCb6Kz
Jun 27 19:58:21 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:21+0000 lvl=info msg="Received signed execution payload from p2p" id=0xaac5384d7406582c361803169b19e1f1fde1ff67f118bfe01662253675302340:121959762 peer=16Uiu2HAkzc1eByUyVc8tmu97dXYwehh6QTo5MxK5VFTjHH4wkj6J
Jun 27 19:58:22 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:22+0000 lvl=error msg="Payload execution failed" block_hash=0xf3e2cf9db49412ae31e6d9c579c1cbd96529e9f3e3452df27eaa07a78ee084b7 err="Post \"http://127.0.0.1:8551\": context deadline exceeded"
Jun 27 19:58:22 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:22+0000 lvl=warn msg="Derivation process temporary error" attempts=1 err="temp: failed to update insert payload: failed to execute payload: Post \"http://127.0.0.1:8551\": context deadline exceeded"
Jun 27 19:58:22 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:22+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x7ab504f8fb4452ca8c1fd749fec06de95a94ac93cfee6eab02b375a5bdb485d9:121959761
Jun 27 19:58:22 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:22+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0xaac5384d7406582c361803169b19e1f1fde1ff67f118bfe01662253675302340:121959762
Jun 27 19:58:23 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:23+0000 lvl=info msg="Received signed execution payload from p2p" id=0xe3607546f535e3b7b62bffecf741b044f08dac778afefe7c98f6a365349f325f:121959763 peer=16Uiu2HAkzc1eByUyVc8tmu97dXYwehh6QTo5MxK5VFTjHH4wkj6J
Jun 27 19:58:23 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:23+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0xe3607546f535e3b7b62bffecf741b044f08dac778afefe7c98f6a365349f325f:121959763
Jun 27 19:58:25 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:25+0000 lvl=info msg="attempting connection" peer=16Uiu2HAm2vXWR3ooDT3dnM9JuaHzeNysTmQV3tsqhXC8p1aFn3V7
Jun 27 19:58:25 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:25+0000 lvl=info msg="Received signed execution payload from p2p" id=0xcde7b7508a24f4e123734b96e724ff84d17f0afc73eb6f35ba77e87e3413345b:121959764 peer=16Uiu2HAkzc1eByUyVc8tmu97dXYwehh6QTo5MxK5VFTjHH4wkj6J
Jun 27 19:58:27 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:27+0000 lvl=info msg="Received signed execution payload from p2p" id=0xd4c6edd585941760e8ffc698763ed40acf3964c8176aa92ef104b39f9881ce4a:121959765 peer=16Uiu2HAkzc1eByUyVc8tmu97dXYwehh6QTo5MxK5VFTjHH4wkj6J
Jun 27 19:58:28 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:28+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmAPxaqqrjA8p2yUD2rma2pZoaJJaMory5af7x5vfKqBLG
Jun 27 19:58:29 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:29+0000 lvl=info msg="Received signed execution payload from p2p" id=0xb1b26df341b66bb777847bc5138605f6f222fab57d195cb6fa12a0f2ab52de57:121959766 peer=16Uiu2HAkzc1eByUyVc8tmu97dXYwehh6QTo5MxK5VFTjHH4wkj6J
Jun 27 19:58:29 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:29+0000 lvl=error msg="Payload execution failed" block_hash=0xf3e2cf9db49412ae31e6d9c579c1cbd96529e9f3e3452df27eaa07a78ee084b7 err="Post \"http://127.0.0.1:8551\": context deadline exceeded"
Jun 27 19:58:29 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:29+0000 lvl=warn msg="Derivation process temporary error" attempts=2 err="temp: failed to update insert payload: failed to execute payload: Post \"http://127.0.0.1:8551\": context deadline exceeded"
Jun 27 19:58:29 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:29+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0xcde7b7508a24f4e123734b96e724ff84d17f0afc73eb6f35ba77e87e3413345b:121959764
Jun 27 19:58:29 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:29+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0xd4c6edd585941760e8ffc698763ed40acf3964c8176aa92ef104b39f9881ce4a:121959765
Jun 27 19:58:29 m-optimism-02 op-node[1264]: t=2024-06-27T19:58:29+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0xb1b26df341b66bb777847bc5138605f6f222fab57d195cb6fa12a0f2ab52de57:121959766
```

#### CLIs

op-geth:

```
/usr/local/bin/geth \
    --op-network=op-mainnet \
    --datadir=/var/lib/optimism/data \
    --syncmode=snap \
    --gcmode=archive \
    --http \
    --http.addr=0.0.0.0 \
    --http.port=8545 \
    --http.vhosts="*" \
    --http.corsdomain="*" \
    --http.api=web3,debug,eth,net,engine,geth \
    --ws \
    --ws.port=8546 \
    --ws.addr=0.0.0.0 \
    --ws.origins="*" \
    --ws.api=debug,eth,net,engine \
    --authrpc.addr=127.0.0.1 \
    --authrpc.port=8551 \
    --authrpc.jwtsecret=/var/lib/optimism/jwt.txt \
    --authrpc.vhosts="*" \
    --metrics \
    --metrics.addr=0.0.0.0 \
    --metrics.port=6060 \
    --port=30303 \
    --discovery.port=30303 \
    --maxpeers=500 \
    --rollup.historicalrpc=https://mainnet.optimism.io \
    --rollup.disabletxpoolgossip=true \
    --rollup.sequencerhttp=https://mainnet-sequencer.optimism.io \
    --rollup.superchain-upgrades \
    --rollup.halt=major \
    --verbosity=3
```

op-node

```
/usr/local/bin/op-node \
    --syncmode=execution-layer \
    --l1.trustrpc \
    --l1.rpckind=basic \
    --l1=http://ETH_ADDRESS:8545 \
    --l1.beacon=http://ETH_ADDRESS:5052 \
    --l2=http://127.0.0.1:8551 \
    --rpc.addr=127.0.0.1 \
    --rpc.port=9545 \
    --l2.jwt-secret=/var/lib/optimism/jwt.txt \
    --network=op-mainnet \
    --p2p.peerstore.path=/var/lib/optimism/data/opnode_peerstore_db \
    --p2p.priv.path=/var/lib/optimism/data/opnode_p2p_priv.txt \
    --p2p.discovery.path=/var/lib/optimism/data/opnode_discovery_db \
    --rollup.load-protocol-versions=true \
    --rollup.halt=major
```

@andreclaro
Copy link
Author

When I restart de node it starts again "Generating state snapshot" without processing new blocks. Logs:

Jun 27 20:23:34 m-optimism-02 geth[1441]: INFO [06-27|20:23:34.008] Generating state snapshot                root=cd7db8..47e256 at=000bb4..5adcf6 accounts=25092 slots=14909 storage=2.76MiB dangling=0 elapsed=8.004s  eta=12h26m55.013s
Jun 27 20:23:36 m-optimism-02 geth[1441]: INFO [06-27|20:23:36.608] Looking for peers                        peercount=3 tried=223 static=0
Jun 27 20:23:43 m-optimism-02 geth[1441]: INFO [06-27|20:23:43.244] Generating state snapshot                root=cd7db8..47e256 at=002561..0a9945 accounts=80007 slots=54874 storage=9.15MiB dangling=0 elapsed=16.009s eta=7h47m34.094s
Jun 27 20:23:51 m-optimism-02 geth[1441]: INFO [06-27|20:23:51.247] Generating state snapshot                root=cd7db8..47e256 at=00450f..aa0c3d accounts=147,852 slots=86056 storage=15.94MiB dangling=0 elapsed=24.012s eta=6h19m23.436s
Jun 27 20:23:59 m-optimism-02 geth[1441]: INFO [06-27|20:23:59.248] Generating state snapshot                root=cd7db8..47e256 in=005784..87caff at=338140..f1fe8b accounts=187,139 slots=250,057 storage=31.38MiB dangling=0 elapsed=32.013s eta=6h39m0.586s
Jun 27 20:24:07 m-optimism-02 geth[1441]: INFO [06-27|20:24:07.255] Generating state snapshot                root=cd7db8..47e256 in=0067b0..13c55c at=c035c1..a94276 accounts=221,556 slots=389,421 storage=44.41MiB dangling=0 elapsed=40.021s eta=7h0m55.19s
Jun 27 20:24:15 m-optimism-02 geth[1441]: INFO [06-27|20:24:15.259] Generating state snapshot                root=cd7db8..47e256 in=008a8f..c61aab at=1fbf5b..09ee45 accounts=295,951 slots=437,341 storage=52.91MiB dangling=0 elapsed=48.025s eta=6h17m47.152s
Jun 27 20:24:23 m-optimism-02 geth[1441]: INFO [06-27|20:24:23.276] Generating state snapshot                root=cd7db8..47e256 in=00acea..20c5cb at=0025e0..faec1c accounts=369,615 slots=544,266 storage=65.75MiB dangling=0 elapsed=56.041s eta=5h53m3.985s
Jun 27 20:24:31 m-optimism-02 geth[1441]: INFO [06-27|20:24:31.279] Generating state snapshot                root=cd7db8..47e256 in=00b43f..448241 at=d36e15..676750 accounts=385,416 slots=936,255 storage=98.46MiB dangling=0 elapsed=1m4.044s eta=6h27m2.078s
Jun 27 20:24:39 m-optimism-02 geth[1441]: INFO [06-27|20:24:39.279] Generating state snapshot                root=cd7db8..47e256 at=00d252..21f97f accounts=449,693 slots=1,038,518 storage=110.85MiB dangling=0 elapsed=1m12.045s eta=6h12m57.351s

@andreclaro
Copy link
Author

@opfocus , is this an issue that needs to be fixed?

Can the "generating snapshot" process be the cause of the node to run out of-sync???

should we use "--snapshot=false" to avoid this issue?

@opfocus
Copy link

opfocus commented Jun 28, 2024

Can you investigate the performance of your server, which appears in the op-node log above. "context deadline exceeded" messages that indicate things are taking too long (hence deadline exceeded). I am not sure

@andreclaro
Copy link
Author

I have increased resources allocation (cpu and memory) and disk is an io2 ebs volume with 20k iops, but I still see "context deadline exceeded" in the op-node logs. Op-geth finished "generating snapshot" but node is still 2000 block behind.

op-geth:

Jun 28 07:59:44 m-optimism-02 geth[4648]: INFO [06-28|07:59:44.509] Imported new potential chain segment     number=121,978,515 hash=44e58f..fae752 blocks=1 txs=12  mgas=0.830  elapsed=203.124ms    mgasps=4.084   age=1h36m17s triedirty=0.00B
Jun 28 07:59:44 m-optimism-02 geth[4648]: INFO [06-28|07:59:44.510] Chain head was updated                   number=121,978,515 hash=44e58f..fae752 root=e362f9..0372a0 elapsed="637.619µs"  age=1h36m17s
Jun 28 07:59:55 m-optimism-02 geth[4648]: INFO [06-28|07:59:55.194] Imported new potential chain segment     number=121,978,516 hash=4099bd..b1623d blocks=1 txs=8   mgas=13.746 elapsed=10.681s      mgasps=1.287   age=1h36m26s triedirty=0.00B
Jun 28 07:59:55 m-optimism-02 geth[4648]: WARN [06-28|07:59:55.194] Ignoring already known beacon payload    number=121,978,516 hash=4099bd..b1623d age=1h36m26s
Jun 28 07:59:55 m-optimism-02 geth[4648]: INFO [06-28|07:59:55.198] Chain head was updated                   number=121,978,516 hash=4099bd..b1623d root=3eea7c..c216ee elapsed=2.786871ms   age=1h36m26s
Jun 28 07:59:55 m-optimism-02 geth[4648]: INFO [06-28|07:59:55.663] Imported new potential chain segment     number=121,978,517 hash=ffc39b..26367f blocks=1 txs=10  mgas=2.271  elapsed=463.523ms    mgasps=4.899   age=1h36m24s triedirty=0.00B
Jun 28 07:59:55 m-optimism-02 geth[4648]: INFO [06-28|07:59:55.665] Chain head was updated                   number=121,978,517 hash=ffc39b..26367f root=f5841e..da457c elapsed="775.678µs"  age=1h36m24s

op-node:

Jun 28 08:00:19 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:19+0000 lvl=info msg="Received signed execution payload from p2p" id=0x77407905f683b97277d76c35c37ecf23835ec5ddf972726b97f4b18bec728dfd:121981421 peer=16Uiu2HAm9XhrnZKYNd4bpUpjuWbEPix3kJwMpMGSpcjwcydKRcp8
Jun 28 08:00:20 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:20+0000 lvl=info msg="attempting connection" peer=16Uiu2HAm3eqrDVcH2pFaBuoFPgh7ncYhpTEUNEHYc1gKnsW21B7z
Jun 28 08:00:21 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:21+0000 lvl=info msg="Received signed execution payload from p2p" id=0x30c756021473fef0ddb7b803cd4ec3702437a3c5a0068e4577e65b743ab07cfd:121981422 peer=16Uiu2HAm9XhrnZKYNd4bpUpjuWbEPix3kJwMpMGSpcjwcydKRcp8
Jun 28 08:00:21 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:21+0000 lvl=info msg="attempting connection" peer=16Uiu2HAm4pBud6PmDkq72TTydHYzNAZbvibaPebNvboABjAS8CJv
Jun 28 08:00:22 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:22+0000 lvl=error msg="Payload execution failed" block_hash=0x6b34a50f5654a4bbfe5a0039a47b360ddb64d9888225f492f3cd2bec2600813b err="Post \"http://127.0.0.1:8551\": context deadline exceeded"
Jun 28 08:00:22 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:22+0000 lvl=warn msg="Derivation process temporary error" attempts=1 err="temp: failed to update insert payload: failed to execute payload: Post \"http://127.0.0.1:8551\": context deadline exceeded"
Jun 28 08:00:22 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:22+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x77407905f683b97277d76c35c37ecf23835ec5ddf972726b97f4b18bec728dfd:121981421
Jun 28 08:00:22 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:22+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x30c756021473fef0ddb7b803cd4ec3702437a3c5a0068e4577e65b743ab07cfd:121981422
Jun 28 08:00:23 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:23+0000 lvl=info msg="Received signed execution payload from p2p" id=0x8a792649348d9768140b555a7dc4ba33ae938328d400434573a6da817083cf31:121981423 peer=16Uiu2HAm9XhrnZKYNd4bpUpjuWbEPix3kJwMpMGSpcjwcydKRcp8
Jun 28 08:00:23 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:23+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x8a792649348d9768140b555a7dc4ba33ae938328d400434573a6da817083cf31:121981423
Jun 28 08:00:24 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:24+0000 lvl=info msg="Sync progress" reason="new chain head block" l2_finalized=0x6acd813fd568864eafbde7e98f1648a8e91884eb152ce6291fa687327aa33991:121973468 l2_safe=0x6acd813fd568864eafbde7e98f1648a8e91884eb152ce6291fa687327aa33991:121973468 l2_pending_safe=0x6acd813fd568864eafbde7e98f1648a8e91884eb152ce6291fa687327aa33991:121973468 l2_unsafe=0x6b34a50f5654a4bbfe5a0039a47b360ddb64d9888225f492f3cd2bec2600813b:121978525 l2_backup_unsafe=0x0000000000000000000000000000000000000000000000000000000000000000:0 l2_time=1719555827
Jun 28 08:00:25 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:25+0000 lvl=info msg="Received signed execution payload from p2p" id=0x3a361ec79f6b119b454cb322565691e4c6e49f2143375e8a94058a142802086e:121981424 peer=16Uiu2HAm9XhrnZKYNd4bpUpjuWbEPix3kJwMpMGSpcjwcydKRcp8
Jun 28 08:00:26 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:26+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmEAGuCYWNsqWR8E4LPqGvfCBR222nUDqr81Un1Psm4roM
Jun 28 08:00:27 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:27+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmDUGWtUgtExBhrNCmvkGJWgpZRKMbV68nkGhYCNtoW1NV
Jun 28 08:00:27 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:27+0000 lvl=info msg="Received signed execution payload from p2p" id=0x8f7fbfbc0e34edc0311b1909da007c64d5ae556615c87225352a417857b09857:121981425 peer=16Uiu2HAm9XhrnZKYNd4bpUpjuWbEPix3kJwMpMGSpcjwcydKRcp8
Jun 28 08:00:29 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:29+0000 lvl=info msg="Received signed execution payload from p2p" id=0xd527eb96fd7a380bdcc921b75c0bdd05ddd6181e778aa15f4711d5a0facca53c:121981426 peer=16Uiu2HAm9XhrnZKYNd4bpUpjuWbEPix3kJwMpMGSpcjwcydKRcp8
Jun 28 08:00:29 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:29+0000 lvl=error msg="Payload execution failed" block_hash=0x017b5234e2e0caf6f66ff49ae90758da353056c778e561fa20dd6b36852638c6 err="Post \"http://127.0.0.1:8551\": context deadline exceeded"
Jun 28 08:00:29 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:29+0000 lvl=warn msg="Derivation process temporary error" attempts=1 err="temp: failed to update insert payload: failed to execute payload: Post \"http://127.0.0.1:8551\": context deadline exceeded"
Jun 28 08:00:29 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:29+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x3a361ec79f6b119b454cb322565691e4c6e49f2143375e8a94058a142802086e:121981424
Jun 28 08:00:29 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:29+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x8f7fbfbc0e34edc0311b1909da007c64d5ae556615c87225352a417857b09857:121981425
Jun 28 08:00:29 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:29+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0xd527eb96fd7a380bdcc921b75c0bdd05ddd6181e778aa15f4711d5a0facca53c:121981426
Jun 28 08:00:30 m-optimism-02 op-node[4649]: t=2024-06-28T08:00:30+0000 lvl=info msg="attempting connection" peer=16Uiu2HAmLMff8TyjBXV22hfiaPWiYCp5t7dztkNeF4qdSeczPhEh

@andreclaro
Copy link
Author

Every time I restart op-geth, it starts to generate snapshots:

Jun 28 08:03:57 m-optimism-02 geth[1333040]: INFO [06-28|08:03:57.042] Generating state snapshot                root=f37bb2..ca4435 in=005784..87caff at=03a62f..3a611d accounts=187,467 slots=233,195 storage=30.04MiB dangling=0 elapsed=32.043s  eta=6h39m23.02s
Jun 28 08:04:05 m-optimism-02 geth[1333040]: INFO [06-28|08:04:05.043] Generating state snapshot                root=f37bb2..ca4435 at=005c86..10f356 accounts=198,112 slots=331,944 storage=38.71MiB dangling=0 elapsed=40.044s  eta=7h52m4.109s
Jun 28 08:04:07 m-optimism-02 geth[1333040]: INFO [06-28|08:04:07.572] New local node record                    seq=1,717,531,279,945 id=4857b91b76a03e11 ip=108.128.34.94 udp=30303 tcp=30303
Jun 28 08:04:13 m-optimism-02 geth[1333040]: INFO [06-28|08:04:13.052] Generating state snapshot                root=f37bb2..ca4435 at=007bf3..ac002a accounts=265,159 slots=417,557 storage=49.38MiB dangling=0 elapsed=48.054s  eta=7h2m40.094s
Jun 28 08:04:21 m-optimism-02 geth[1333040]: INFO [06-28|08:04:21.055] Generating state snapshot                root=f37bb2..ca4435 at=009859..93b546 accounts=326,266 slots=520,921 storage=61.19MiB dangling=0 elapsed=56.056s  eta=6h40m58.05s
Jun 28 08:04:29 m-optimism-02 geth[1333040]: INFO [06-28|08:04:29.056] Generating state snapshot                root=f37bb2..ca4435 at=00b425..8207dd accounts=385,846 slots=554,450 storage=67.56MiB dangling=0 elapsed=1m4.057s eta=6h27m19.945s
Jun 28 08:04:37 m-optimism-02 geth[1333040]: INFO [06-28|08:04:37.060] Generating state snapshot                root=f37bb2..ca4435 at=00bccc..933349 accounts=404,529 slots=1,022,520 storage=106.67MiB dangling=0 elapsed=1m12.062s eta=6h55m42.248s
Jun 28 08:04:45 m-optimism-02 geth[1333040]: INFO [06-28|08:04:45.062] Generating state snapshot                root=f37bb2..ca4435 at=00e0cf..0ab337 accounts=481,493 slots=1,149,376 storage=120.55MiB dangling=0 elapsed=1m20.063s eta=6h27m40.115s
Jun 28 08:04:53 m-optimism-02 geth[1333040]: INFO [06-28|08:04:53.064] Generating state snapshot                root=f37bb2..ca4435 at=010102..2d7a89 accounts=550,072 slots=1,263,094 storage=133.40MiB dangling=0 elapsed=1m28.065s eta=6h12m48.312s
Jun 28 08:05:01 m-optimism-02 geth[1333040]: INFO [06-28|08:05:01.064] Generating state snapshot                root=f37bb2..ca4435 at=0126ca..4e7220 accounts=630,803 slots=1,348,877 storage=145.33MiB dangling=0 elapsed=1m36.066s eta=5h54m21.044s

is this related with snap sync?

@andreclaro
Copy link
Author

@opfocus , can you ask a developer to have a look at this issue and logs?

IMO, this is not related with server resources... We are currently using a VM with 16 cpu cores and 64Gb (m6a.4xlarge), and still getting these warning messages:

Jun 28 08:15:18 m-optimism-02 op-node[1334149]: t=2024-06-28T08:15:18+0000 lvl=warn msg="Failed to share forkchoice-updated signal" state="&{HeadBlockHash:0xbce3d67769cdd4063a2abbc49032df9f90df22f57c1d7ee98eb225313f7ffec3 SafeBlockHash:0xc8e677bd7e1667e0f9644906e55c5a0a203f995aa39119f7bfebe0a9238e5fd7 FinalizedBlockHash:0xc8e677bd7e1667e0f9644906e55c5a0a203f995aa39119f7bfebe0a9238e5fd7}" err="Post \"http://127.0.0.1:8551\": context deadline exceeded"
Jun 28 08:15:18 m-optimism-02 op-node[1334149]: t=2024-06-28T08:15:18+0000 lvl=warn msg="Derivation process temporary error" attempts=1 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to create new block via forkchoice: Post \"http://127.0.0.1:8551\": context deadline exceeded"

and the node is still running behind and unable to catch up...

@opfocus
Copy link

opfocus commented Jun 28, 2024

I will share it on Discord, as it seems I cannot tag others here. There should be three issues so far:

  1. Is the lag of 2000 blocks a persistent issue? According to the synchronization speed in the op-geth logs, it seems it will catch up soon.
  2. The process of "Generating state snapshot" after a restart seems to be very time-consuming.
  3. The issue of "context deadline exceeded."

Perhaps you should clarify the first issue: Will it always lag for some reason, making it impossible to sync to the latest block?

@andreclaro
Copy link
Author

andreclaro commented Jun 28, 2024

Now is unable to catch up... the recent chain head update age is 3h4m52s

Jun 28 12:01:07 m-optimism-02 geth[1333040]: INFO [06-28|12:01:07.858] Chain head was updated                   number=121,983,099 hash=2d6448..5d5f4b root=f66e82..c0682e elapsed="809.109µs"  age=3h4m52s

and it is still generating snapshots:

Jun 28 12:02:22 m-optimism-02 geth[1333040]: INFO [06-28|12:02:22.858] Generating state snapshot                root=f37bb2..ca4435 at=c74cb7..7c5f48 accounts=108,920,752 slots=290,736,814 storage=27.63GiB  dangling=0 elapsed=3h58m57.859s eta=1h7m59.09s

Every time I see this lagging, the node is generating the state snapshot... If I restart the node it seems to start generating the state snapshot from the begging....

I have increased all the resources allocation (cpu, memory and storage settings) but it does not help... :/

@imtipi
Copy link

imtipi commented Jun 28, 2024

can you use top to see your wa status?want to ensure this isn't disk io issue
image

@andreclaro
Copy link
Author

image

image

@imtipi
Copy link

imtipi commented Jun 28, 2024

can you also run iostat -xm?want to check the actual iops

@andreclaro
Copy link
Author

andre@m-optimism-02:~$ iostat -xm nvme1n1
Linux 6.2.0-1012-aws (m-optimism-02) 	06/28/24 	_x86_64_	(16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.04    0.00    3.03    8.65    0.00   77.27

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       1953.46    181.42     0.42   0.02    0.76    95.10  278.68     19.48     0.00   0.00    0.50    71.59    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.63  96.77

@imtipi
Copy link

imtipi commented Jun 28, 2024

seems io is ok,looks like Generating state snapshot is required in snap sync,I will recommend try with Archive sync,so that you don't need to

rebuild any intermediate state "on-the-fly"

because you already

executes every block in the chain

https://docs.optimism.io/builders/node-operators/management/snap-sync

we can also wait developer to check this thread,because I didn't test it before.

@andreclaro
Copy link
Author

andreclaro commented Jun 28, 2024

Can I change the node syncmode from snap to full now? 🤔

Yes, we need a developer to look into this issue... this shouldn't be the normal behavior...

@imtipi
Copy link

imtipi commented Jun 28, 2024

Can I change the node syncmode from snap to full now? 🤔

better delete the datadir to start again

@andreclaro
Copy link
Author

andreclaro commented Jun 28, 2024

it is important to add that we don't see this problem with base mainnet... and both networks have the same configuration.
I see base mainnet also generating state snapshots but no impact in the node height.

I'm not sure if the generating state snapshot is the issue... but the optimism mainnet node is unable to catch up.. it is now 4h9m3s behind... :/

Jun 28 15:40:28 m-optimism-02 geth[1333040]: INFO [06-28|15:40:28.459] Chain head was updated                   number=121,987,754 hash=4496bf..24c345 root=5766d9..2d1d1f elapsed="788.357µs"  age=4h9m3s

@Chomtana
Copy link

Try resyncing again but delete

--gcmode=archive

I don't think snap sync works with archive mode

With archive mode, you must sync from a snapshot. Otherwise, it will take forever (months or even year).

@andreclaro
Copy link
Author

we are now observing the same issue with optimism sepolia node:

Jul 03 09:12:27 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:12:27.995] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a790be..9c4263 accounts=3,107,081 slots=103,483,294 storage=7.58GiB   dangling=0 elapsed=40m10.097s eta=1m4.212s
Jul 03 09:12:35 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:12:35.998] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a79956..920d49 accounts=3,107,081 slots=103,487,411 storage=7.58GiB   dangling=0 elapsed=40m18.100s eta=1m4.426s
Jul 03 09:12:43 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:12:43.999] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7a271..8e966c accounts=3,107,081 slots=103,491,709 storage=7.58GiB   dangling=0 elapsed=40m26.101s eta=1m4.639s
Jul 03 09:12:52 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:12:52.000] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7ab68..bb3b96 accounts=3,107,081 slots=103,496,012 storage=7.58GiB   dangling=0 elapsed=40m34.102s eta=1m4.852s
Jul 03 09:13:00 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:13:00.001] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7b4ee..4c9522 accounts=3,107,081 slots=103,500,509 storage=7.58GiB   dangling=0 elapsed=40m42.103s eta=1m5.065s
Jul 03 09:13:08 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:13:08.002] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7bd74..86044c accounts=3,107,081 slots=103,504,711 storage=7.58GiB   dangling=0 elapsed=40m50.104s eta=1m5.278s
Jul 03 09:13:16 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:13:16.003] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7c66f..7a4136 accounts=3,107,081 slots=103,508,873 storage=7.58GiB   dangling=0 elapsed=40m58.105s eta=1m5.491s
Jul 03 09:13:24 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:13:24.005] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7cf1e..fb3212 accounts=3,107,081 slots=103,513,032 storage=7.58GiB   dangling=0 elapsed=41m6.107s  eta=1m5.705s
Jul 03 09:13:32 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:13:32.007] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7d7ba..de0794 accounts=3,107,081 slots=103,517,143 storage=7.58GiB   dangling=0 elapsed=41m14.109s eta=1m5.918s
Jul 03 09:13:40 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:13:40.016] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7e07d..89d2fe accounts=3,107,081 slots=103,521,303 storage=7.58GiB   dangling=0 elapsed=41m22.118s eta=1m6.131s
Jul 03 09:13:48 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:13:48.016] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7e960..3fab59 accounts=3,107,081 slots=103,525,560 storage=7.58GiB   dangling=0 elapsed=41m30.118s eta=1m6.344s
Jul 03 09:13:56 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:13:56.018] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7f22c..2096c9 accounts=3,107,081 slots=103,529,838 storage=7.58GiB   dangling=0 elapsed=41m38.120s eta=1m6.558s
Jul 03 09:14:04 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:14:04.021] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a7fa73..b26d99 accounts=3,107,081 slots=103,533,718 storage=7.58GiB   dangling=0 elapsed=41m46.123s eta=1m6.771s
Jul 03 09:14:12 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:14:12.023] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a80308..ceebdd accounts=3,107,081 slots=103,537,755 storage=7.58GiB   dangling=0 elapsed=41m54.125s eta=1m6.984s
Jul 03 09:14:20 t-optimism-sepolia-02 geth[775]: INFO [07-03|09:14:20.023] Generating state snapshot                root=d90958..08bb3c in=f95b39..df95f6 at=a80b8a..a83bcb accounts=3,107,081 slots=103,541,866 storage=7.58GiB   dangling=0 elapsed=42m2.125s  eta=1m7.197s

@andreclaro
Copy link
Author

@imtipi @@opfocus , I do think developers should look into these issues.... 🙏🏽

@imtipi
Copy link

imtipi commented Jul 3, 2024

@imtipi @@opfocus , I do think developers should look into these issues.... 🙏🏽

soyboy(from dev team) also think snap sync isn't for archive node

I came to the same conclusion on snapsync. Probably not for archive nodes. Seems like it just does archiving once it catches up and then moving forward. Not really useful imo

@andreclaro
Copy link
Author

I will keep monitoring our optimism and base archive nodes using snap sync, and let you know if I see more issues.

BTW, Base nodes have been running without issues.

@leiqing02
Copy link

i hava the same question,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants