Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Head state missing, repairing chain" I lost many blocks , how can I get them back? #19124

Closed
relaxbao opened this issue Feb 19, 2019 · 9 comments

Comments

@relaxbao
Copy link

relaxbao commented Feb 19, 2019

Hi,

I run geth using supervisor on Ubuntu as a privatechain, the geth crashed at yesterday because generating DAG file need more CPU than our server

when it started , we lost many blocks , the blocknumber is from 1110002 back to 422522.
I need the lost blocks ,can I find them back?

looking forward to your response, thank you ~

System information

Geth version: 1.8.11
OS & Version: Linux Ubuntu

Expected behaviour

the height is 1110002

Actual behaviour

now the blockNumber is 422522, it shold be 1110002, how can I back to 1110002.
here is the log

Head state missing, repairing chain      number=1110002 hash=3610a2…5c104d
INFO [02-18|16:07:13] Rewound blockchain to past state         number=422522  hash=e6875a…a56025
INFO [02-18|16:07:13] Loaded most recent local header          number=1110002 hash=7620a2…5c104d td=875201842768
INFO [02-18|16:07:13] Loaded most recent local full block      number=422522  hash=e6875a…a56025 td=460514461017
INFO [02-18|16:07:13] Loaded most recent local fast block      number=1110002 hash=7620a2…5c104d td=875201842768

Steps to reproduce the behaviour

I restarted geth , but it's just the same

@karalabe
Copy link
Member

karalabe commented Feb 19, 2019

Geth keeps the state in memory (and garbage collects in memory) and only flushes every hour or so. If Geth crashes, whatever was in memory is lost.

In your case the block data is still there, just the historical states got lost so the chain rolled back. Normally this is not a big of an issue as when you reconnect to the network, Geth reprocesses from a past block. If you run a single node however, there might be no remote peer with the data.

Long term I think we should fix Geth so that it reprocesses the blocks locally instead of reaching out to the network. Short term that won't help you, but you could try to do a geth export chain.rlp 0 1110002 and then import into a different datadir (to make sure you don't lose any data).

@tsujp
Copy link

tsujp commented Mar 7, 2019

This only affects blocks right? Not the keystore etc?

@relaxbao
Copy link
Author

relaxbao commented Mar 7, 2019

This only affects blocks right? Not the keystore etc?

yes, It only affects blocks.

acctually, I can still get the Transactions in the higher blocks , but when it starts mining , the blockNumber is increasing from the lower number 422522.

on the other hand, the storages in the contract were back to the status in the blockNumber 422522 . while i need the status in the 1110002

can I find some way to solve this problem?

@karalabe
Copy link
Member

karalabe commented Mar 7, 2019

I wrote in my previous comment that you could have exported your chain and fixed it that way. If you started mining on top, it's probably way too messy now to try and extract the correct blocks.

@hito This only affects the state, yes.

@relaxbao
Copy link
Author

relaxbao commented Mar 14, 2019

Geth keeps the state in memory (and garbage collects in memory) and only flushes every hour or so. If Geth crashes, whatever was in memory is lost.

In your case the block data is still there, just the historical states got lost so the chain rolled back. Normally this is not a big of an issue as when you reconnect to the network, Geth reprocesses from a past block. If you run a single node however, there might be no remote peer with the data.

Long term I think we should fix Geth so that it reprocesses the blocks locally instead of reaching out to the network. Short term that won't help you, but you could try to do a geth export chain.rlp 0 1110002 and then import into a different datadir (to make sure you don't lose any data).

Thank you so much , I think it's a great way to save all the data .
But after export my data , and import it to a new datadir , i found an Error.

INFO [03-14|11:23:04] Imported new chain segment               blocks=2500 txs=43   mgas=2.321  elapsed=2.286s mgasps=1.015  number=420000 hash=ce1ed8…6dc233 cache=1.12mB
INFO [03-14|11:23:06] Imported new chain segment               blocks=2500 txs=5    mgas=1.446  elapsed=2.098s mgasps=0.689  number=422500 hash=0b6e6d…117d9e cache=1.12mB
ERROR[03-14|11:23:06] Non contiguous block insert              number=423619 hash=100756…a3b36b parent=ff2b53…f13d44 prevnumber=423618 prevhash=8d7930…bcebb6
ERROR[03-14|11:23:06] Import error                             err="invalid block 423619: non contiguous insert: item 1117 is #423618 [8d793034…], item 1118 is #423619 [10075641…] (parent [ff2b53e8…])"
INFO [03-14|11:23:06] Writing cached state to disk             block=422500 hash=0b6e6d…117d9e root=e6191e…df4f85

here is my block 423618 and 423619 and the parentBlock of 423619, is there something wrong with it ?

> eth.getBlock(423618)
{
  difficulty: 941215,
  extraData: "0xd88301080b846765746888676f312e31302e32856c696e7578",
  gasLimit: 4294967295,
  gasUsed: 0,
  hash: "0x8d79303491e8384dedb57812e5c8eefd83d8125e5c287a7009000ed292bcebb6",
  logsBloom: "0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
  miner: "0xa969f32fcdc83a6039286f267f2e7a246b4b030a",
  mixHash: "0xad860354cf2e2b2a9e2c13bb72d1abef7606ea6e1d60f4d995e0e0e09f159f22",
  nonce: "0x230828cb4887e0b0",
  number: 423618,
  parentHash: "0xe2025cb6ddf79f3dc2414301b715b54a9aad10b0f25e494882133c2551377493",
  receiptsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  sha3Uncles: "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
  size: 540,
  stateRoot: "0xce0241488e1af373ffb9ae91eaf74cbeacb7984c0a3e293c64f676eed1c36fc1",
  timestamp: 1550546650,
  totalDifficulty: 460584013838,
  transactions: [],
  transactionsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  uncles: []
}

> eth.getBlock(423619)
{
  difficulty: 1000444,
  extraData: "0xd88301080b846765746888676f312e31302e32856c696e7578",
  gasLimit: 4294967295,
  gasUsed: 0,
  hash: "0x10075641add742f1447a67c9fc1136a5492a9b622e883042f38864ba33a3b36b",
  logsBloom: "0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
  miner: "0xd271baa1ed277c3730ad5b88ef97a5921d7a8c77",
  mixHash: "0x59f649b3123e348d84c889ae2e50f9240f1b826f65d2818c803ef94c15e6c84e",
  nonce: "0x0d7db510f26a7025",
  number: 423619,
  parentHash: "0xff2b53e8424ddaa724a9ab3561ef44c8dfee4d260b938b5212813cc379f13d44",
  receiptsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  sha3Uncles: "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
  size: 540,
  stateRoot: "0xba6860c90a95997f28b4da3c7f42cbf8d2e48e9f288dfeefd7e374b164eff745",
  timestamp: 1536726254,
  totalDifficulty: 460585640739,
  transactions: [],
  transactionsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  uncles: []
}

> eth.getBlock("0xff2b53e8424ddaa724a9ab3561ef44c8dfee4d260b938b5212813cc379f13d44")
{
  difficulty: 999952,
  extraData: "0xd88301080b846765746888676f312e31302e32856c696e7578",
  gasLimit: 4294967295,
  gasUsed: 0,
  hash: "0xff2b53e8424ddaa724a9ab3561ef44c8dfee4d260b938b5212813cc379f13d44",
  logsBloom: "0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
  miner: "0x10592c5155ad6655189bac1a61af49083e37152c",
  mixHash: "0xc52cd8c7110438d401e37dd359e31dfd719ec8ff20bfc7827697b40d9afc3ad4",
  nonce: "0x5e46c9d131d5a90d",
  number: 423618,
  parentHash: "0xc8a26455ee1781826047fe913824669fd3f30646b59fb226f917869f3cbcafb1",
  receiptsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  sha3Uncles: "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
  size: 540,
  stateRoot: "0xddbf4b764d59a7c11cf453a5bfaa90ad5b45ced46c159acacd82e550da59e55d",
  timestamp: 1536726248,
  totalDifficulty: 460584640295,
  transactions: [],
  transactionsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  uncles: []
}

I used the following steps to recover my data, is there something wrong with it ?

geth --datadir /home/workspace/recoverdatas/rdata export /home/workspace/recoverdatas/rdata/chain423619.rlp 0 423619

geth --datadir "/home/workspace/recoverdatas/datanew" init "/home/workspace/data/conf/genesis.json"

geth --datadir /home/workspace/recoverdatas/datanew import /home/workspace/recoverdatas/rdata/chain423619.rlp

Shadowfiend added a commit to keep-network/keep-core that referenced this issue Apr 15, 2019
…th-persistent-disks

ETH Deployment: kill StatefulSets

Every now and again we were losing the entire chain state in our
`keep-dev` env.  After tracking things down (see thread), determined it
happens when all of our participating eth nodes get shot by Kube.

This shouldn't matter since were using persistent disks with the eth
`datadir` set there.  It turns out this isn't enough.  We lose state in
memory when the geth process is terminated. Per ethereum/go-ethereum#19124.
This is normally fine because there's at least one other node on the
real networks that can fill in the blanks when you come back up.

In our case there's not always another node, we were running only 2.

Now we're running 6 to try and hedge against all getting shot at once. 
Here we've removed the `StatefulSet` deployments because they're doing
us no good.
@BeOleg
Copy link

BeOleg commented Apr 28, 2019

I get this as well every time I restart the node via docker compose

root@nexwallet-eth3:/mnt/STORAGE/WALLETS# cat /var/lib/docker/containers/23406ad9bc38a5b9ab3c8e342295c91f2550d3b5c16f00d681f181aed9721c0d/23406ad9bc38a5b9ab3c8e342295c91f2550d3b5c16f00d681f181aed9721c0d-json.log | grep 'Head state missing'
{"log":"WARN [04-24|12:13:48.884] Head state missing, repairing chain      number=7623347 hash=b6c254…3a0dad\n","stream":"stderr","time":"2019-04-24T12:13:48.885446409Z"}
{"log":"WARN [04-25|19:19:10.008] Head state missing, repairing chain      number=7638271 hash=c73676…151eb8\n","stream":"stderr","time":"2019-04-25T19:19:10.008881355Z"}
{"log":"WARN [04-26|08:52:25.125] Head state missing, repairing chain      number=7641791 hash=c263d9…dfbe79\n","stream":"stderr","time":"2019-04-26T08:52:25.125447127Z"}
{"log":"WARN [04-28|09:21:28.698] Head state missing, repairing chain      number=7655058 hash=f11663…507ae5\n","stream":"stderr","time":"2019-04-28T09:21:28.698886159Z"}

Or if it restarts due to some fault, I lost a day or 2 of blocks.
How to solve this? how to properly restart?

@holiman
Copy link
Contributor

holiman commented Apr 29, 2019

@BeOleg I see that you've opened #19504 , so let's continue that one there.

@relaxbao yes, something is wrong with it! It seems to have lost track of the canon chain, and there's a discrepancy in the chain. This is very interesting, however, since you're on a very old version 1.8.11, I doubt we'll be able to go to the bottom of that.

@holiman
Copy link
Contributor

holiman commented May 21, 2019

@relaxbao your scenario was fixed in #19514

@relaxbao
Copy link
Author

relaxbao commented Jul 4, 2019

@relaxbao your scenario was fixed in #19514

@holiman Thank you very much . but I still have to questions :

  1. Can I get the blocks back if I stay in the version 1.8.11 ?
  2. Is there someway to avoid this happen again ?

Shadowfiend added a commit to keep-network/keep-common that referenced this issue Sep 4, 2019
…th-persistent-disks

ETH Deployment: kill StatefulSets

Every now and again we were losing the entire chain state in our
`keep-dev` env.  After tracking things down (see thread), determined it
happens when all of our participating eth nodes get shot by Kube.

This shouldn't matter since were using persistent disks with the eth
`datadir` set there.  It turns out this isn't enough.  We lose state in
memory when the geth process is terminated. Per ethereum/go-ethereum#19124.
This is normally fine because there's at least one other node on the
real networks that can fill in the blanks when you come back up.

In our case there's not always another node, we were running only 2.

Now we're running 6 to try and hedge against all getting shot at once. 
Here we've removed the `StatefulSet` deployments because they're doing
us no good.
@fjl fjl removed the status:triage label Aug 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants