Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronisation failed: Dropping peer #15067

Closed
mcgravier opened this issue Aug 31, 2017 · 30 comments
Closed

Synchronisation failed: Dropping peer #15067

mcgravier opened this issue Aug 31, 2017 · 30 comments

Comments

@mcgravier
Copy link

System information

Geth version: 1.6.7
OS & Version: Linux (Ubuntu 17.04)

Actual behaviour

After several hours of working geth fails to sync any further

INFO [08-31|21:29:22] Imported new chain segment blocks=1 txs=87 mgas=6.720 elapsed=310.079ms mgasps=21.670 number=4224370 hash=a3660d…b7b5fb
INFO [08-31|21:29:41] Imported new chain segment blocks=1 txs=117 mgas=3.595 elapsed=467.968ms mgasps=7.683 number=4224371 hash=5ae6f1…edc294
INFO [08-31|21:30:10] Imported new chain segment blocks=1 txs=61 mgas=6.711 elapsed=175.206ms mgasps=38.301 number=4224372 hash=5abb29…2a0891
WARN [08-31|21:30:15] message loop peer=d8cb8306a528cf96 err=EOF
WARN [08-31|21:31:05] Synchronisation failed, dropping peer peer=11fdde20fc7831ef err="retrieved hash chain is invalid"
WARN [08-31|21:31:45] Synchronisation failed, dropping peer peer=c300581e16c7d233 err=timeout
WARN [08-31|21:32:30] Synchronisation failed, dropping peer peer=938199d61038ff42 err="retrieved hash chain is invalid"
WARN [08-31|21:35:01] Synchronisation failed, dropping peer peer=0fc5fe924314d328 err="retrieved hash chain is invalid"

After client restart, geth manages to sync latest blocks, after few hours issue repeats.
I'm running geth with flags
--rpc --shh --maxpeers 100 --lightserv 90 --cache 2048

Issue appeared today - I was running client in background for days (or even weeks) without any problems earlier

@mcgravier
Copy link
Author

update:
Getting this when closing client:

ERROR[08-31|22:27:00] Failed to close database database=/home/adam/.ethereum/geth/chaindata err="leveldb/table: corruption on data-block (pos=1849286): checksum mismatch, want=0x66919f5e got=0x875cb029 [file=5152408.ldb]"

@arbach
Copy link

arbach commented Sep 22, 2017

Same on ubuntu 16.04 LTS

geth 1.7.0-stable-6c6c7b2a

@wtfiwtz
Copy link

wtfiwtz commented Sep 23, 2017

Have a look at this: #15001... probably related

@yi-ji
Copy link

yi-ji commented Nov 26, 2017

Same on Ubuntu 16.04, with geth-linux-amd64-1.7.0-6c6c7b2a
__

It turned out that my problem was at the block 4370000, so update to 1.7.3 can solve the problem.
Reason is in here: #15265

@dimcoderx
Copy link

Ubuntu 17.10 (latest, fresh install)
geth 1.7.3-stable-4bb3c89d (latest)

Problem with "Synchronisation failed, dropping peer" i sync to the latest block, 20-30 min later i get this error, 5-10 min he tryies to reconnect then i have to sync again, i am 100-200 blocks behind while he was retrying to connect, this erorr repeats every 20-30 min. When i dont get this error i have the latest block, when i get this error it always leaves me 100-200 blocks behind because he wasnt connected, its annoying.

INFO [12-05|17:09:16] Imported new chain segment               blocks=1 txs=119  mgas=6.713  elapsed=18.631s   mgasps=0.360 number=4680097 hash=99e572…37a955
INFO [12-05|17:09:28] Imported new chain segment               blocks=1 txs=160  mgas=6.712  elapsed=12.155s   mgasps=0.552 number=4680098 hash=c6ca8f…20651f
WARN [12-05|17:09:28] Synchronisation failed, retrying         err="block body download canceled (requested)"
WARN [12-05|17:09:38] Synchronisation failed, dropping peer    peer=409d9b45abc2a211 err=timeout
WARN [12-05|17:09:55] Synchronisation failed, dropping peer    peer=45a8a36e755912da err=timeout
WARN [12-05|17:10:04] Synchronisation failed, dropping peer    peer=4ae3f639e2ada120 err=timeout
INFO [12-05|17:10:37] Imported new chain segment               blocks=1 txs=97   mgas=6.727  elapsed=14.629s   mgasps=0.460 number=4680099 hash=f897ce…5da529
INFO [12-05|17:10:56] Imported new chain segment               blocks=1 txs=137  mgas=6.727  elapsed=18.696s   mgasps=0.360 number=4680100 hash=28cb99…5c82e9
WARN [12-05|17:10:56] Synchronisation failed, retrying         err="block body download canceled (requested)"
WARN [12-05|17:11:03] Synchronisation failed, dropping peer    peer=45a8a36e755912da err=timeout
WARN [12-05|17:11:14] Synchronisation failed, dropping peer    peer=6cce04b224a22d57 err=timeout
WARN [12-05|17:11:26] Synchronisation failed, retrying         err="block body download canceled (requested)"
WARN [12-05|17:11:56] Synchronisation failed, dropping peer    peer=8af1d7b7928e93bb err=timeout
WARN [12-05|17:12:14] Synchronisation failed, dropping peer    peer=7ec5e61d504cce17 err=timeout
WARN [12-05|17:14:04] Synchronisation failed, dropping peer    peer=b7a2cafcbbb6d497 err=timeout
WARN [12-05|17:14:46] Synchronisation failed, retrying         err="block download canceled (requested)"
WARN [12-05|17:14:56] Synchronisation failed, dropping peer    peer=4bfd5c539119fba6 err=timeout
WARN [12-05|17:15:04] Synchronisation failed, dropping peer    peer=d87932c26878f725 err=timeout
WARN [12-05|17:15:12] Synchronisation failed, dropping peer    peer=3b6c09a5391927d1 err=timeout
WARN [12-05|17:15:54] Synchronisation failed, dropping peer    peer=0b78929ee2b1db7a err=timeout
WARN [12-05|17:16:38] Synchronisation failed, retrying         err="block download canceled (requested)"
WARN [12-05|17:16:48] Synchronisation failed, dropping peer    peer=500c559573dcb6df err=timeout
WARN [12-05|17:16:59] Synchronisation failed, dropping peer    peer=d87932c26878f725 err=timeout
WARN [12-05|17:17:06] Synchronisation failed, dropping peer    peer=25dcd2766622c1d8 err=timeout
WARN [12-05|17:17:26] Synchronisation failed, retrying         err="block body download canceled (requested)"

@doexclusive
Copy link

Same problem, ubuntu 16.04

@raahil190
Copy link

Same issue Ubuntu 16.04 - is there a solution for this ? Geth version 1.7.3 for me

@Spacefish
Copy link

Same issue here Geth 1.7.3 on Windows 10

@albertwh1te
Copy link

same issue here Ubuntu 16.04 ,Version: 1.7.3-stable
Is there a solution for this ?

@maxvgi
Copy link

maxvgi commented Jan 5, 2018

Debian 8, Geth 1.7.3-stable, the same issue. First time occured yesterday. Geth was running for about two weeks continuously before the problem occured

@bogatyy
Copy link
Contributor

bogatyy commented Jan 5, 2018

Ubuntu 16.04, Geth 1.7.3-stable (commit 4bb3c89)

Same issue, if a peer has to be dropped because of timeout, the whole blockchain sync freezes for ~1-2 minutes, messing up the whole process (as opposed to gently dropping one bad peer and downloading from others)

@DZDomi
Copy link

DZDomi commented Jan 6, 2018

Ubuntu 16.04, running inside docker on AWS with geth 1.7.3-stable having the same problem. We now moved our geth node into a different datacenter (not aws) and synching is working stable the last few days. Are you maybe also running inside AWS and have this issues?

@userpasta
Copy link

Ubuntu 16.04, geth 1.7.3-stable same issue. I'm running it on GCP, but it throws the same error on my Ubuntu 16.04 workstation at home.

@greensea
Copy link

DebIan 8 64bit, geth 1.7.3-stable
Same issue, I can't catch the latest block, always about a hundred blocks behind

@nikashitsa
Copy link

nikashitsa commented Jan 13, 2018

Have same issue in Docker ethereum/client-go:v1.7.2

@agrcrobles
Copy link

Same issue on macos high sierra on ropsten.

t=25f1f73de08fa4e5ebaac8cfe7b18fbfdba4598048543e01ca957f180ca98609
INFO [01-17|03:42:32] Finished upgrading chain index           type=cht
WARN [01-17|03:42:43] message loop                             peer=9e99e183b5c71d51 err=EOF
INFO [01-17|03:43:08] Block synchronisation started 
WARN [01-17|03:43:11] Ancestor below allowance                 peer=c2f9fdd74dd62c55 number=92421  hash=000000…000000 allowance=92421
WARN [01-17|03:43:11] Synchronisation failed, dropping peer    peer=c2f9fdd74dd62c55 err="retrieved ancestor is invalid"

@Zumili
Copy link

Zumili commented Jan 17, 2018

Same thing here, Ubuntu 16.04 geth 1.8.0-unstable go1.9.2
It is 100-200 blocks behind and didn't get actual highest block only sometimes when restarting geth, but then also not reach the highest block and is stuck 100-200 blocks behind.

@sleimana
Copy link

Same issue Geth 1.8 Ubuntu 16.04
It synced ~24M tries then stopped, I restarted geth but the state reset to zero. I was waiting for three days. what a pain!

@wtfiwtz
Copy link

wtfiwtz commented Mar 14, 2018

Please look at this detailed description of the issue: #15001 (comment)

@CryptoKiddies
Copy link

CryptoKiddies commented Apr 23, 2018

Same issue on Geth 1.8.6. I get

geth[21609]: WARN [04-23|18:52:21] Synchronisation failed, retrying         err="block download canceled (requested)"
Apr 23 18:52:39 ip-xxx-xx-xx-xx geth[21609]: WARN [04-23|18:52:39] Synchronisation failed, dropping peer"

after which the blockchain falls out of sync by 10-100 blocks. This happens every time sync catches up.

**downgrading to Geth 1.8.3 solved my problem for 3 weeks, before displaying same issues

@CryptoKiddies
Copy link

CryptoKiddies commented May 17, 2018

@wtfiwtz this is not explained by the @karalabe explanation in #15001 (comment). I would love to see an optimization that addresses the problem you succinctly laid out here #14647 (comment). There has to be a suitable built-in alternative to hosting numerous independent nodes in order to have robustness. Either a node waits too long for a response or drops a peer too quickly. I haven't dug into the code enough (nor am I a golang developer) to figure out which is the case.

There should be a solution to help prevent settling into a degraded network state where most peers in a subgroup are now all behind together. This echo chamber situation should trigger a clearing of peers and reboot from boot/static nodes. Maybe this is too difficult to implement, but would go a very long way if possible. Perhaps it involves a nearest-neighbor analysis which should is possible, given that all your peers' connections are discoverable.

@aliensyntax
Copy link

Experiencing the same issue on Ubuntu 18.04 LTS (Bionic Beaver) with a brand new SSD + 8GB RAM. I've been looping around block 1.3M for the last 48 hours (nearly 80 hours total sync time so far and after many other previous attempts) using: geth --syncmode "fast" --cache 2048. For whatever reason, I also can't sync Parity due to similar issues.

It's been nearly a year now since this issue was posted and there doesn't appear to be a working solution. I don't believe it can be safely ignored as a byproduct of inefficient HDDs. Very many Ubuntu/Windows users have reported similar cases with consumer-grade Laptops/PCs + SSDs.

Given the hardware centralization risks, security compromises, and UX nightmare this implies, shouldn't more dev. resources be allocated to find a solution? I've run Geth and Parity nodes in 2016/2017 and never faced this level of obstruction with syncing the chain. Please fix this.

@prashantprabhakar
Copy link

Having the same issue as OP. I have a DAPP running on live. People pay but I can't detect the payment as my geth stops syncing (with Sync Failed error).
I have to check daily if my geth is running, if not, just restarting makes it fine.
But this is frustrating and I am scared how long this can go on. If I forgot to check my geth node, I wake up with emails of canceled txs (due to timeout in Dapp).

Is there any fine solution to this? Seems everyone is stuck at this hell once in life.

@Daz0k
Copy link

Daz0k commented Jun 11, 2018

Also experiencing this issue (On Windows + SSD (!)). Has there been made any progress on finding the cause of this issue?

@Atrides
Copy link

Atrides commented Jun 12, 2018

Related to #16825 with temp solution

@mcgravier
Copy link
Author

It seems, that problem can be caused by memory corruption - in my case after tuning RAM to more conservative settings and resyncing from scratch, everything finally started working fine

@mattickx
Copy link

mattickx commented Jul 4, 2018

@mcgravier what was your reasoning behind your conclusion it was caused by memory corruption?
Could you also explain what you exactly changed "tuning RAM to more conservative settings".
Might help me and others

@mcgravier
Copy link
Author

@mathieumagalhaes

I have PC with Ryzen 7 processor and 3200mhz memory. However this particular processor is guaranteed to work with 2666mhz memory - everything beyond is considered an overclocking and may not be stable. Running memtest for 24h - it reported memory errors - so I've reduced memory clock from 3200mhz to 2666mhz and all issues disappeared - I can now run Geth node for extensive amount of time without getting the issue

@bbeeley
Copy link

bbeeley commented Jul 8, 2018

I have this same issue with GCP. I have tried firewall settings (allowing UDP and TCP traffic), rebooting, changing RAM/CPUs, etc. I believe memory corruption would be unlikely, since I assume GCP gives me different hardware every time I reboot and/or change the amount of memory allocated to my VM.

Has anyone looked into firewall TCP session timeouts? I am wondering if there are some common firewall settings (could be on ether local firewall, or peer firewall) that cause connections to be dropped if they are inactive for a certain length of time (10 minutes on GCP), and whether it is possible that Geth would run into that in the course of normal operations. I am currently testing my theory on GCP by changing keepalive settings, but given the intermittent nature of this issue it seems like it would be difficult to be sure I am on the right track. I am wondering if anyone else has looked into this.

@fjl
Copy link
Contributor

fjl commented Jan 22, 2019

Sorry, closing this because the report isn't actionable. There is no single bug in geth that causes sync failures. We are aware that sync may sometimes fail for networking reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests