Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - ChainTransitionError #2725

Closed
maierfelix opened this issue May 21, 2021 · 55 comments
Closed

[BUG] - ChainTransitionError #2725

maierfelix opened this issue May 21, 2021 · 55 comments
Labels
bug Something isn't working

Comments

@maierfelix
Copy link

maierfelix commented May 21, 2021

Internal/External
External

Summary
Unable to synchronize when setting up a new node

Steps to reproduce

  1. Download the latest Cardano node build (I'm using version 1.27.0 and build 6413627)
  2. Download configs and genesis file (I'm using version 6426791):
export LAST_BUILD=$(curl -s https://hydra.iohk.io/job/Cardano/cardano-node/cardano-deployment/latest-finished/download/1/index.html | grep -e "This item has moved" |  sed -e 's/.*build\/\(.*\)\/download.*/\1/')
wget -q -O mainnet-config.json https://hydra.iohk.io/build/${LAST_BUILD}/download/1/mainnet-config.json
wget -q -O mainnet-byron-genesis.json https://hydra.iohk.io/build/${LAST_BUILD}/download/1/mainnet-byron-genesis.json
wget -q -O mainnet-shelley-genesis.json https://hydra.iohk.io/build/${LAST_BUILD}/download/1/mainnet-shelley-genesis.json
wget -q -O mainnet-topology.json https://hydra.iohk.io/build/${LAST_BUILD}/download/1/mainnet-topology.json

  1. Create a cardano node with:
cardano-node run \
  --config xxx/cnode/config/mainnet-config.json \
  --topology xxx/cnode/config/mainnet-topology.json \
  --database-path xxx/cnode/db/ \
  --socket-path xxx/cnode/sockets/node.socket \
  --port 3001

Unexpected behavior
The node is syncing properly until about block 4492800 (shelley era begin?) and starts to throw errors. I tried deleting the ledger and volatile folders in the database, but the errors still appear. Error log:


[DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:35.02 UTC] Chain extended, new tip: c2df25e41eb7de23b702e2affc7c2d171b53dd9dc824b26a48890f365ab89f3d at slot 4483574
Event: LedgerUpdate (HardForkUpdateInEra Z (WrapLedgerUpdate {unwrapLedgerUpdate = ByronUpdatedProtocolUpdates [ProtocolUpdate {protocolUpdateVersion = 2.0.0, protocolUpdateState = UpdateStableCandidate (EpochNo 208)}]}))
[DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:35.02 UTC] Chain extended, new tip: 92f7452fb13b978483e8a737400c76e1460d65e8c03d3e60b9cdfec959e4c4cd at slot 4483575
[DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:36.27 UTC] Chain extended, new tip: 50be594a349edd8c3cf44bc5e43aac75e85404f1073d3e60cb00a96b3b3f3872 at slot 4484538
[DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:37.52 UTC] Chain extended, new tip: b994e91880f17824d36564155cc941803e213b29888851bdbc4a7dba07b654fc at slot 4485603
[DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:38.84 UTC] Chain extended, new tip: f930ab3caeac77f9dc597d07df48c86e28a020bb07937be606037ec9f3a2fc10 at slot 4486673
[DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:40.09 UTC] Chain extended, new tip: 131df406e92d4658c08623a70b8f450830bf4ce3a114025b319a674d2f0fca97 at slot 4487780
[DESKTOP-:cardano.node.DnsSubscription:Error:78] [2021-05-21 11:11:41.02 UTC] Domain: "relays-new.cardano-mainnet.iohk.io" Application Exception: 52.58.171.193:3001 HeaderError (At (Block {blockPointSlot = SlotNo 4492800, blockPointHash = aa83acbf5904c0edfe4d79b3689d3d00fcfc553cf360fd2229b98d464c28e9de})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 4492800) (Nonce "67f10d861ce22c2afcdc48cb20dc1014e7b2ca97319ac945326ccf3425a5a2e7") (CertifiedVRF {certifiedOutput = OutputVRF {getOutputVRFBytes = "6\236Sx\209\245\EOT\SUBY\235\141\150\230\GS\233o\tP\251A\180\159\245\DC1\247\188\DEL\209\t\212\&8>\GS$\190p4\230t\156f\DC2p\r\213\206\176\198ew\184\138\EM\174(k\DC3!\209[\206\SUB\183\&6"}, certifiedProof = CertPraosVRF "@Z\163p\255\NUL\149D\162\190J\165\188R\196Ec3\244\249\182W\GSf\197Y\r.V)\208\139:6\t\244\189\ENQ\STX\237]\v\225\171\219\DEL*\183j\174\174G\254\DC1\ESC\ETX5\164\228\222\246F\147\SYN'\148\184\211\193\202qP\SI\SYN\177\226DrL\ETX"}))]})))) (Tip (SlotNo 4488479) eea2bfa5527f752af311507d13dd03dc39eb0e87537955a6cc03dcfd358efde7 (BlockNo 4486190)) (Tip (SlotNo 30029101) 15c87c1604b4c1e7fb4f2038ea9ff9511669ce908f1825d07a75a7558640ac4e (BlockNo 5746606))
[DESKTOP-:cardano.node.ErrorPolicy:Warning:52] [2021-05-21 11:11:41.03 UTC] IP 52.58.171.193:3001 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (HeaderError (At (Block {blockPointSlot = SlotNo 4492800, blockPointHash = aa83acbf5904c0edfe4d79b3689d3d00fcfc553cf360fd2229b98d464c28e9de})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 4492800) (Nonce "67f10d861ce22c2afcdc48cb20dc1014e7b2ca97319ac945326ccf3425a5a2e7") (CertifiedVRF {certifiedOutput = OutputVRF {getOutputVRFBytes = "6\236Sx\209\245\EOT\SUBY\235\141\150\230\GS\233o\tP\251A\180\159\245\DC1\247\188\DEL\209\t\212\&8>\GS$\190p4\230t\156f\DC2p\r\213\206\176\198ew\184\138\EM\174(k\DC3!\209[\206\SUB\183\&6"}, certifiedProof = CertPraosVRF "@Z\163p\255\NUL\149D\162\190J\165\188R\196Ec3\244\249\182W\GSf\197Y\r.V)\208\139:6\t\244\189\ENQ\STX\237]\v\225\171\219\DEL*\183j\174\174G\254\DC1\ESC\ETX5\164\228\222\246F\147\SYN'\148\184\211\193\202qP\SI\SYN\177\226DrL\ETX"}))]})))) (Tip (SlotNo 4488479) eea2bfa5527f752af311507d13dd03dc39eb0e87537955a6cc03dcfd358efde7 (BlockNo 4486190)) (Tip (SlotNo 30029101) 15c87c1604b4c1e7fb4f2038ea9ff9511669ce908f1825d07a75a7558640ac4e (BlockNo 5746606))))) 200s 200s
[DESKTOP-:cardano.node.DnsSubscription:Error:75] [2021-05-21 11:11:41.03 UTC] Domain: "relays.stakepool247.eu" Application Exception: 35.228.55.2:3001 HeaderError (At (Block {blockPointSlot = SlotNo 4492800, blockPointHash = aa83acbf5904c0edfe4d79b3689d3d00fcfc553cf360fd2229b98d464c28e9de})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 4492800) (Nonce "67f10d861ce22c2afcdc48cb20dc1014e7b2ca97319ac945326ccf3425a5a2e7") (CertifiedVRF {certifiedOutput = OutputVRF {getOutputVRFBytes = "6\236Sx\209\245\EOT\SUBY\235\141\150\230\GS\233o\tP\251A\180\159\245\DC1\247\188\DEL\209\t\212\&8>\GS$\190p4\230t\156f\DC2p\r\213\206\176\198ew\184\138\EM\174(k\DC3!\209[\206\SUB\183\&6"}, certifiedProof = CertPraosVRF "@Z\163p\255\NUL\149D\162\190J\165\188R\196Ec3\244\249\182W\GSf\197Y\r.V)\208\139:6\t\244\189\ENQ\STX\237]\v\225\171\219\DEL*\183j\174\174G\254\DC1\ESC\ETX5\164\228\222\246F\147\SYN'\148\184\211\193\202qP\SI\SYN\177\226DrL\ETX"}))]})))) (Tip (SlotNo 4488480) 3fe1c2ae2065d4295691d6724a6b0558ad85e8abe53c9412c39295d208ebb9ef (BlockNo 4486191)) (Tip (SlotNo 30029101) 15c87c1604b4c1e7fb4f2038ea9ff9511669ce908f1825d07a75a7558640ac4e (BlockNo 5746606))

System info (please complete the following information):

  • OS: Ubuntu
  • Version: 20.04.2 LTS
  • Node version: 1.27.0 (git rev 69c77dc)
@maierfelix maierfelix added the bug Something isn't working label May 21, 2021
@ducknessman
Copy link

I encountered the same problem, if it is solved, please tell me.

@ducknessman
Copy link

Is it possible to skip this fixed block?

@erikd
Copy link
Contributor

erikd commented May 22, 2021

@maierfelix please post the 4 config files you downloaded.

Is it possible to skip this fixed block?

@ducknessman No, this is not possible.

@ducknessman
Copy link

@maierfelix please post the 4 config files you downloaded.

Is it possible to skip this fixed block?

@ducknessman No, this is not possible

Which configuration file do you need to view?

@erikd
Copy link
Contributor

erikd commented May 22, 2021

Config files look fine.

Do you still see the Chain extended, new tip messages? If so, its is still syncing and anything messages are probably irrelevant.

@ducknessman
Copy link

ducknessman commented May 22, 2021

Config files look fine.

Do you still see the Chain extended, new tip messages? If so, its is still syncing and anything messages are probably irrelevant.

no, Chain extended, new tip does not exist,the error log is

Domain: "relays-new.cardano-mainnet.iohk.io" Application Exception: 3.123.218.74:3001 HeaderError (At (Block {blockPointSlot = SlotNo 449       2800, blockPointHash = aa83acbf5904c0edfe4d79b3689d3d00fcfc553cf360fd2229b98d464c28e9de})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure        (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 4492800) (Nonce "67f10d861ce22c2afcdc48cb20dc1014e7b2ca97319ac945326ccf3425a5a2e7") (CertifiedVRF {certifiedOutput = OutputVR       F {getOutputVRFBytes = "6\236Sx\209\245\EOT\SUBY\235\141\150\230\GS\233o\tP\251A\180\159\245\DC1\247\188\DEL\209\t\212\&8>\GS$\190p4\230t\156f\DC2p\r\213\206\176\198ew\184\138\EM\174(k\DC3!\209[\206\SUB\183\&6"}, certifiedPro       of = CertPraosVRF "@Z\163p\255\NUL\149D\162\190J\165\188R\196Ec3\244\249\182W\GSf\197Y\r.V)\208\139:6\t\244\189\ENQ\STX\237]\v\225\171\219\DEL*\183j\174\174G\254\DC1\ESC\ETX5\164\228\222\246F\147\SYN'\148\184\211\193\202qP\SI       \SYN\177\226DrL\ETX"}))]})))) (Tip (SlotNo 4492798) 7e037dcb8995990d49d69ffc83327e79291e3e9c71fa46e0ddd48a1f6016f3a3 (BlockNo 4490509)) (Tip (SlotNo 30100085) 04f357385eab1fcaef35016a4f6b9ad262a100270eb9d322e98eb1b0b8b92608 (       BlockNo 5750124))

@erikd
Copy link
Contributor

erikd commented May 22, 2021

And what happens after that if anything?

And what does cardano-node --version say?

And what are the machine specs?

@ducknessman
Copy link

ducknessman commented May 22, 2021

And what happens after that if anything?

And what does cardano-node --version say?

And what are the machine specs?

image

OS: Ubuntu
Version 18.04
cpu :8
mem : 32
1T ssd

@maierfelix
Copy link
Author

Config files look fine.

Do you still see the Chain extended, new tip messages? If so, its is still syncing and anything messages are probably irrelevant.

I neither get the Chain extended messages after the errors start to throw. I'm checking the current state with cardano-cli query tip --mainnet and the block and slot properties don't seem to increment anymore

@erikd
Copy link
Contributor

erikd commented May 22, 2021

@ducknessman 32G is more than enough. 8G should be sufficient (at least for now).

@maierfelix Machine specs?

@ducknessman
Copy link

@ducknessman 32G is more than enough. 8G should be sufficient (at least for now).

@maierfelix Machine specs?

I am resynchronizing now.
If there is a solution, I hope I can provide it.
I started the synchronization yesterday and did not do any operation, but suddenly out of synchronization at a fixed height.

@ducknessman
Copy link

i have a new question,
Does ada synchronize through snapshots?

@erikd
Copy link
Contributor

erikd commented May 22, 2021

No, snapshots are currently not available.

@maierfelix
Copy link
Author

maierfelix commented May 22, 2021

@ducknessman 32G is more than enough. 8G should be sufficient (at least for now).

@maierfelix Machine specs?

I'm on Windows 10 and just tried running with Docker, same chain errors. Before, I used Ubuntu 20. 04 on WSL2 using the pre-built binaries and also by building from source, both times it failed with the same errors and stops syncing

Machine specs:

  • cpu intel i9
  • 32gb ram
  • 1tb sdd

@erikd
Copy link
Contributor

erikd commented May 22, 2021

I have posted this ticket on the internal IOHK Slack. Hoping someone more knowledgable than me can respond.

@disassembler
Copy link
Contributor

can you try with config files here? https://hydra.iohk.io/build/6198010/download/1/index.html

A couple changes were made recently for upcoming alonzo era and I want to see if using the configs recommended for 1.27.0 resolves the issue for you.

@SteveDevDev
Copy link

can you try with config files here? https://hydra.iohk.io/build/6198010/download/1/index.html

A couple changes were made recently for upcoming alonzo era and I want to see if using the configs recommended for 1.27.0 resolves the issue for you.

This fixed it for me. I put the JSON files on both of my nodes and restarted them. It's syncing again now. Thanks!

@erikd
Copy link
Contributor

erikd commented May 23, 2021

@maierfelix @ducknessman Any feedback on @disassembler's suggestion?

Would also be interested in seeing a diff between the files you retrieved and the one that works.

@ducknessman
Copy link

ducknessman commented May 23, 2021

@maierfelix @ducknessman Any feedback on @disassembler's suggestion?

Would also be interested in seeing a diff between the files you retrieved and the one that works.

I just replaced all the config files, and now it has started to synchronize.
I made a comparison before replacing it and found that there are still many differences. I hope that the link on the document can be replaced.
image
thx sooooooo much

@profd2004
Copy link

@maierfelix @ducknessman Any feedback on @disassembler's suggestion?
Would also be interested in seeing a diff between the files you retrieved and the one that works.

I just replaced all the config files, and now it has started to synchronize.
I made a comparison before replacing it and found that there are still many differences. I hope that the link on the document can be replaced.
image
thx sooooooo much

The shelley-genesis.json over at https://hydra.iohk.io/build/6198010/download/1/index.html does not reflect the diff from your screenshot. The last key:value pair in the json is still "securityParam": 2160.

@ducknessman where are you getting your config files?

@ducknessman
Copy link

@maierfelix @ducknessman Any feedback on @disassembler's suggestion?
Would also be interested in seeing a diff between the files you retrieved and the one that works.

I just replaced all the config files, and now it has started to synchronize.
I made a comparison before replacing it and found that there are still many differences. I hope that the link on the document can be replaced.
image
thx sooooooo much

The shelley-genesis.json over at https://hydra.iohk.io/build/6198010/download/1/index.html does not reflect the diff from your screenshot. The last key:value pair in the json is still "securityParam": 2160.

@ducknessman where are you getting your config files?

the link is :
https://hydra.iohk.io/build/6198010/download/1/index.html

@profd2004
Copy link

Did the config file thing work for anyone else? I'm looking at the shelley-genesis over at the links in this thread and neither the testnet nor mainnet files (https://hydra.iohk.io/build/6198010/download/1/testnet-shelley-genesis.json, https://hydra.iohk.io/build/6198010/download/1/mainnet-shelley-genesis.json) look like @ducknessman screenshot.

Neither of those files for example has a costModel: example/shelley/alonzo/costmodel.json line.
What am I missing?

@rdlrt
Copy link

rdlrt commented May 23, 2021

@profd2004 That's because the genesis being referred to in the "latest" builds are in preparation for Alonzo, and the optional parameter ShelleyGenesisHash in your config.json will not match newer genesis (even tho other other parameters are skipped).
You dont need to worry about the newer build config/genesis just yet.

@sambor81
Copy link

sambor81 commented May 23, 2021

it's started sync again, but i think it stack again on epoch 267 [51.4%] 25minutes and increasing. Umbelivable how buggy this cod is. I even copy blockchain from a running server and dident work too. Do you have any clue or idea guys how to solve this issue.
thank you so much for help

@erikd
Copy link
Contributor

erikd commented May 23, 2021

Umbelivable how buggy this cod is.

What other bugs are you facing? Have you raised tickets for any of them?

Yes, we handled changes to the confg file poorly. That is something we would like to improve.

@mrbrinker
Copy link

Are you sure it is stuck?
We are about 51,6% into epoch 267

@sambor81
Copy link

Umbelivable how buggy this cod is.

What other bugs are you facing? Have you raised tickets for any of them?

Yes, we handled changes to the confg file poorly. That is something we would like to improve.

I had issues with 1.27.0 node. after3 days of fight looks like everything works. so happy

@erikd
Copy link
Contributor

erikd commented May 23, 2021

I had issues with 1.27.0 node.

What issues? Did you raise a ticket? We cannot fix what we do not know about.

@sambor81
Copy link

I had issues with 1.27.0 node.

What issues? Did you raise a ticket? We cannot fix what we do not know about.

I dident rised the ticket, I was looking for help and information on google. The main issue was that 1.27.0 node stopped sync after 14.9%. after swapping files and restarting node. I could not switch the node and when finaly was runnimg after few attempts, just start disappearing and showing again back for 2-3 secends in system monitor. After completely new instaletion of cardano-node started works.

@profd2004
Copy link

profd2004 commented May 23, 2021

I had issues with 1.27.0 node.

What issues? Did you raise a ticket? We cannot fix what we do not know about.

@erikd Here is my specific issue that is oddly only happening with my test node. I'm using the same build for my test and mainnet node but only the test node is not syncing; hence I thought it was maybe something with the config.

I'm getting the exact issue on my remote cloud node as well as my local home node, at the same slotNo.

[2021-05-23 17:44:21.60 UTC] IP 195.154.69.26:3003 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (HeaderError (At (Block {blockPointSlot = SlotNo 1598400, blockPointHash = 02b1c561715da9e540411123a6135ee319b02f60b9a11a603d3305556c04329f})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 1598400) (Nonce "74a0665cf3990f72fea801a670dfddb9efa4815d13bc0779bab26f15e42fcf46") (CertifiedVRF {certifiedOutput = OutputVRF {getOutputVRFBytes = "t\231\145\196\165ZhA\137S\209{Z<1\194\225]Yq\235\&7#!\161:\147\129Q\236x\207\195z\170\155\182mw\141\182\135\249\209\178\134\&3_:\167b\135\204\&4\205Z\172\230\163\226\EM\DC2\226\182"}, certifiedProof = CertPraosVRF "!\202\180:L)*\DC2\250\SOH\141V \240Z\EOT\n\183\245\141\SUB\191\ETXQ\"\EOT\155A\SOH'\160LD\253\204Z\249\129/i\178\237p\155\140\240\142\183)LG\137q\248\DLE\DC1\130W\183\162\149\DEL6=[5\225/18\154\192=\242\255\181\f\190\190\t"}))]})))) (Tip (SlotNo 1598339) c24fa5ebf4d88a653a6762773278b05e68a9da08fdcabac0f58abafdeedd049d (BlockNo 1597072)) (Tip (SlotNo 27422626) fc0213330f879136967b77a56c2694810ec4cf58925e9e6cfafdd61aa53690c7 (BlockNo 2608001))))) 200s 200s

Should I create an issue somewhere else?

@erikd
Copy link
Contributor

erikd commented May 23, 2021

I'm using the same build for my test and mainnet node but only the test node is not syncing; hence I thought it was maybe something with the config.

So this is a test node that runs on mainnet or testnet? If it runs on mainnet I would be very interested in a diff between the configs of the working and non-working nodes.

Can I assume that you are running the same git checkout versions?

@jnardiello
Copy link

Unfortunately I was in the same situation. Node sync stopped at 14.4%. Trying to restart the service after updating the config files, returns this error:

May 23 23:17:43 blockproducer systemd[1]: Started Cardano node service.
May 23 23:17:44 blockproducer cardano-node[1384972]: Error decoding genesis at: /root/cardano-node/mainnet-shelley-genesis.json Error: Error in $: key "adaPerUTxOWord" not found
May 23 23:17:44 blockproducer systemd[1]: cardano-node.service: Main process exited, code=exited, status=1/FAILURE
May 23 23:17:44 blockproducer systemd[1]: cardano-node.service: Failed with result 'exit-code'.

Any idea? Doing a diff on the config files, it seems that adaPerUTxOWord was removed intentionally, but cardano-node is complaining. Any help would be much appreciated.

@erikd
Copy link
Contributor

erikd commented May 23, 2021

Node sync stopped at 14.4%.

How are you measuring that?

@jnardiello
Copy link

jnardiello commented May 23, 2021

How are you measuring that?

gLiveView

Adding the missing key back to the config file and cardano-node fails with this new error:

May 23 23:41:46 blockproducer systemd[1]: Started Cardano node service.
May 23 23:41:47 blockproducer cardano-node[1426490]: Wrong Shelley genesis file: the actual hash is "67f10d861ce22c2afcdc48cb20dc1014e7b2ca97319ac945326ccf3425a5a2e7", but the expected Shelley genesis hash given in the node configuration file is "1a3be38bcbb7911969283716ad7aa550250226b76a61fc51cc9a9a35d9276d81"
May 23 23:41:47 blockproducer systemd[1]: cardano-node.service: Main process exited, code=exited, status=1/FAILURE
May 23 23:41:47 blockproducer systemd[1]: cardano-node.service: Failed with result 'exit-code'.

It seems like cardano-node is complaining the config file has changed?

Edit: Starting from scratch with the new config file works, you just can't continue syncing if the config file has changed (per my understanding, might be wrong)

@erikd
Copy link
Contributor

erikd commented May 23, 2021

It seems like cardano-node is complaining the config file has changed?

Yeah, config changed to support upcoming Alonzo features. We (IOHK/IOG) probably need to handle dissemination of these files in a better way. They should be locked to a release version instead of whatever is latest on master.

Starting from scratch with the new config file works, you just can't continue syncing if the config file has changed

Some changes will not be a problem and others will. Resyncing is probably the best option.

@jnardiello
Copy link

@erikd I'm currently re-syncing with latest cardano-node and updated config files as mentioned in this issue. Let's see if this solves the issue 🤞

I totally agree that config files should be locked to release, especially if they are subject to change. Thank you a lot for your help!

@disassembler
Copy link
Contributor

I'll get the release manager to add links in the release notes. We're a little spoiled with nix. It handles all the dependencies down to even the config file versions so we never run into the issue of mismatching config files in our deployments since we point to the tagged commit.

@profd2004
Copy link

So this is a test node that runs on mainnet or testnet? If it runs on mainnet I would be very interested in a diff between the configs of the working and non-working nodes.
Can I assume that you are running the same git checkout versions?

Yes, same git version. Here is my setup:
I build a docker image in a ci/cd pipeline
push it to a test environment where it runs on testnet mounting in test configs and db
then I push the same image to a production environment where I run it on the mainnet with mainnet config and db.

Locally I've always ran and developed against the testnet which was working until I rebooted for 1.26.1. Since then, only production/mainnet works.

@erikd
Copy link
Contributor

erikd commented May 23, 2021

We are moving fast and unfortunately we are breaking things. We need to keep moving fast, but make sure things don't get broken.

@erikd
Copy link
Contributor

erikd commented May 23, 2021

@maierfelix You say:

Node version: 1.27.0 (git rev 69c77dc)

However, the 1.27.0 tag is not at commit 69c77dc (which is a commit on master).

If you are building a node to run on mainnnet, you should never build from master. You should always build from the tag for the latest release (which is currently 1.27.0).

@profd2004
Copy link

...I would be very interested in a diff between the configs of the working and non-working nodes...

@erikd here is that diff:
config json-diff

@erikd
Copy link
Contributor

erikd commented May 24, 2021

@profd2004 One is a testnet config and the other is mainnet. They are different. They are not compatible.

@profd2004
Copy link

profd2004 commented May 24, 2021

@profd2004 One is a testnet config and the other is mainnet. They are different. They are not compatible.

@erikd, yes they are not compatible. the mainnet confit is running on a mainnet and it works just great.

The testnet config is running on a tesnet node, were I am having issues. Both environments running the same 1.27.0 build.

None of my testnet nodes are working.

@erikd
Copy link
Contributor

erikd commented May 24, 2021

@profd2004 Ok, testnet may be busted. Please raise a separate ticket about that.

@bruceharrison1984
Copy link

I am seeing this same issue when running from the cardano-node:1.27.0 Docker image for a db-sync node.

@erikd
Copy link
Contributor

erikd commented May 30, 2021

@bruceharrison1984 is that testnet or mainnet?

@bruceharrison1984
Copy link

My apologies, it is main-net but a different error. I had to wait for the node to sync again to get the error to appear. You can disregard my +1.

For the sake of completion, this is the error I received:

[ef7402f1:cardano.node.DnsSubscription:Error:34372] [2021-05-31 03:02:07.56 UTC] Domain: "relays-new.cardano-mainnet.iohk.io" Application Exception: 54.215.120.53:3001 InvalidBlock (At (Block {blockPointSlot = SlotN
o 30847244, blockPointHash = 4638c1fcb92e1ec81fe95188dbc2dbdd35f37f7b6a2e214ef8304548b0c41159})) (InFutureExceedsClockSkew (RealPoint (SlotNo 30847244) 4638c1fcb92e1ec81fe95188dbc2dbdd35f37f7b6a2e214ef8304548b0c41159)

Which is totally unrelated to this topic.

@rae89
Copy link

rae89 commented Jun 5, 2021

I may be having this same issue. I am trying to sync the cardano-node using the docker container. I can get up to the Allegra era, and then eventually the docker container exits. And when I try starting the docker container up again, the node.socket is not found in the /ipc directory anymore, where previous to the docker crashing it was available. Here is how the output looks like when I restart the docker, and it just hangs here before the container exits again:

Starting: /nix/store/b4hj6i49x89762mllqlqznmsa6n12wsh-cardano-node-exe-cardano-node-1.27.0/bin/cardano-node run
--config /nix/store/r6ygkc694c1vfhikdx4dhsqwkim7gds0-config-0.json
--database-path /data/db
--topology /nix/store/mb0zb61472xp1hgw3q9pz7m337rmfx7f-topology.yaml
--host-addr 127.0.0.1
--port 3001
--socket-path /ipc/node.socket

+RTS
-N2
-A16m
-qg
-qb
--disable-delayed-os-memory-return
-RTS
..or, once again, in a single line:
/nix/store/b4hj6i49x89762mllqlqznmsa6n12wsh-cardano-node-exe-cardano-node-1.27.0/bin/cardano-node run --config /nix/store/r6ygkc694c1vfhikdx4dhsqwkim7gds0-config-0.json --database-path /data/db --topology /nix/store/mb0zb61472xp1hgw3q9pz7m337rmfx7f-topology.yaml --host-addr 127.0.0.1 --port 3001 --socket-path /ipc/node.socket +RTS -N2 -A16m -qg -qb --disable-delayed-os-memory-return -RTS
Listening on http://127.0.0.1:12798
[61b4a33f:cardano.node.networkMagic:Notice:5] [2021-06-04 17:23:46.59 UTC] NetworkMagic 764824073
[61b4a33f:cardano.node.basicInfo.protocol:Notice:5] [2021-06-04 17:23:46.59 UTC] Byron; Shelley
[61b4a33f:cardano.node.basicInfo.version:Notice:5] [2021-06-04 17:23:46.59 UTC] 1.27.0
[61b4a33f:cardano.node.basicInfo.commit:Notice:5] [2021-06-04 17:23:46.59 UTC] 8fe4614
[61b4a33f:cardano.node.basicInfo.nodeStartTime:Notice:5] [2021-06-04 17:23:46.59 UTC] 2021-06-04 17:23:46.5987889 UTC
[61b4a33f:cardano.node.basicInfo.systemStartTime:Notice:5] [2021-06-04 17:23:46.59 UTC] 2017-09-23 21:44:51 UTC
[61b4a33f:cardano.node.basicInfo.slotLengthByron:Notice:5] [2021-06-04 17:23:46.59 UTC] 20s
[61b4a33f:cardano.node.basicInfo.epochLengthByron:Notice:5] [2021-06-04 17:23:46.59 UTC] 21600
[61b4a33f:cardano.node.basicInfo.slotLengthShelley:Notice:5] [2021-06-04 17:23:46.59 UTC] 1s
[61b4a33f:cardano.node.basicInfo.epochLengthShelley:Notice:5] [2021-06-04 17:23:46.59 UTC] 432000
[61b4a33f:cardano.node.basicInfo.slotsPerKESPeriodShelley:Notice:5] [2021-06-04 17:23:46.59 UTC] 129600
[61b4a33f:cardano.node.basicInfo.slotLengthAllegra:Notice:5] [2021-06-04 17:23:46.59 UTC] 1s
[61b4a33f:cardano.node.basicInfo.epochLengthAllegra:Notice:5] [2021-06-04 17:23:46.59 UTC] 432000
[61b4a33f:cardano.node.basicInfo.slotsPerKESPeriodAllegra:Notice:5] [2021-06-04 17:23:46.59 UTC] 129600
[61b4a33f:cardano.node.basicInfo.slotLengthMary:Notice:5] [2021-06-04 17:23:46.59 UTC] 1s
[61b4a33f:cardano.node.basicInfo.epochLengthMary:Notice:5] [2021-06-04 17:23:46.59 UTC] 432000
[61b4a33f:cardano.node.basicInfo.slotsPerKESPeriodMary:Notice:5] [2021-06-04 17:23:46.59 UTC] 129600
[61b4a33f:cardano.node.addresses:Notice:5] [2021-06-04 17:23:46.59 UTC] [SocketInfo 127.0.0.1:3001]
[61b4a33f:cardano.node.diffusion-mode:Notice:5] [2021-06-04 17:23:46.59 UTC] InitiatorAndResponderDiffusionMode
[61b4a33f:cardano.node.dns-producers:Notice:5] [2021-06-04 17:23:46.59 UTC] [DnsSubscriptionTarget {dstDomain = "relays-new.cardano-mainnet.iohk.io", dstPort = 3001, dstValency = 1}]
[61b4a33f:cardano.node.ip-producers:Notice:5] [2021-06-04 17:23:46.59 UTC] IPSubscriptionTarget {ispIps = [], ispValency = 0}

@sloik
Copy link

sloik commented Jun 16, 2021

Is there an official solution? Nod version that is working on the main net? :)

@erikd
Copy link
Contributor

erikd commented Jun 27, 2021

Anyone wanting to run the node on mainnet should do so from the latest tagged version (currently 1.27.0).

They should not use the master branch which has changes related to the upcoming Alonzo release and is not compatible with the current mainnet.

@sloik
Copy link

sloik commented Jun 28, 2021

Anyone wanting to run the node on mainnet should do so from the latest tagged version (currently 1.27.0).

I did build it form a tag and I have this issue with the synchronization :| Will there be a fix for this?

@ggcaponetto
Copy link

@erikd Could you maybe please tell us how to find the correct config files for a specific cardano-node version (for example 1.28.0)? Navigating https://hydra.iohk.io is very confusing. All I find online are specific links to builds but i don't know how to map a build number to a specific cardano-node version. Wouldn't it make sense to commit the config files in the code repo for both the mainnet and testnet?

@profd2004
Copy link

So should be running different builds or tags on the testnet vs the mainnet? I thought the cardano-node binary should work on both network; you just pass different configs and parameters when starting it? Is this assumption incorrect?

I just tried to spin up a brand new 1.29.0 node with config files from https://hydra.iohk.io/build/7366583/download/1/index.html and

I'm getting this error when spun on the testnet:
Just (ApplicationExceptionTrace (HeaderError (At (Block {blockPointSlot = SlotNo 1598400, blockPointHash = 02b1c561715da9e540411123a6135ee319b02f60b9a11a603d3305556c04329f})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 1598400) (Nonce "322ede4866f68aaa9f3110e1448392c1f87038c588d14849a1b99d10ee48da2b"

I ask about mainnet because I also upgraded my pool's mainnet deployment with the same 1.29.0 build (with mainnet config and command) and it came back up with zero issues.

Should I barking a different path?
Right now the only relay I'm connected with is iog's @ relays-new.cardano-testnet.iohkdev.io

cat of my generated protocolMagicId shows the correct id (1097911063 for the testnet) so I know cardano-node is being passed the correct configuration.

At this point I'm just looking for something new to poke at; other than the config files.

@Jimbo4350
Copy link
Contributor

Closing this. If this is still relevant please reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.