New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clique : Discarded bad propagated block#1 when syncing #14945

Open
mawenpeng opened this Issue Aug 9, 2017 · 8 comments

Comments

Projects
None yet
9 participants
@mawenpeng
Copy link

mawenpeng commented Aug 9, 2017

System information

Geth Version: 1.6.7-stable
OS & Version: CentOS Linux release 7.3.1611
Architecture: amd64
Protocol Versions: [63 62]
Go Version: go1.8.3

Expected behaviour

Sync block#1 successfully

Actual behaviour

Discarded block#1

Steps to reproduce the behaviour

  1. Init a private chain of 4 nodes with Clique consensus engine, 3 nodes to generate blocks, 1 more node to sync blocks only.
  2. Import private keys to the 3 nodes and unlock accounts.
  3. Add peers to make all 4 nodes connected to each other.
  4. Start mining on 1 node, will see warnings on other nodes, saying "Discarded bad propagated block number=1 hash=0ee7bf…0adaa0".

Not sure if this occurs with POW.
It may also happen with 2 or 3 nodes.
If add peers after starting mining on 1 node, other nodes will sync block#1 successfully.

@mawenpeng mawenpeng changed the title Clique block#1: Discarded bad propagated block when syncing Clique : Discarded bad propagated block#1 when syncing Aug 9, 2017

@karalabe karalabe self-assigned this Aug 9, 2017

@joeb000

This comment has been minimized.

Copy link

joeb000 commented Aug 23, 2017

I am having a similar issue with my multi-node PoA setup. I only have a single miner, mining interval set to 1 second, and the "bitchin tricks" mining hack is in place (only mine when there is a tx in the pool).

I send a transaction from a non-mining node, it propagates, and a block is subsequently mined with the transaction in it. Everything actually works great except for the two WARN messages I am seeing in the console:

WARN [08-23|10:44:56] Discarded bad propagated block           number=10 hash=8810d8…d993f0
INFO [08-23|10:44:56] Imported new state entries               count=1   flushed=0 elapsed=172.953µs    processed=20 pending=4  retry=0 duplicate=0 unexpected=0
INFO [08-23|10:44:56] Imported new state entries               count=3   flushed=0 elapsed=243.195µs    processed=23 pending=7  retry=0 duplicate=0 unexpected=0
INFO [08-23|10:44:56] Imported new block headers               count=2   elapsed=2.640ms      number=11 hash=18316a…550c90 ignored=9
INFO [08-23|10:44:56] Imported new state entries               count=3   flushed=4 elapsed=96.093µs     processed=26 pending=4  retry=0 duplicate=0 unexpected=0
INFO [08-23|10:44:56] Imported new chain segment               blocks=2 txs=3 mgas=0.021 elapsed=885.425µs    mgasps=23.717 number=11 hash=18316a…550c90 ignored=4
WARN [08-23|10:44:56] Synchronisation failed, retrying         err="state data download canceled (requested)"

Even though it says Synchronisation fails, it actually didnt. The block propagated successfully and the transaction was processed.

@sirnicolas21

This comment has been minimized.

Copy link

sirnicolas21 commented Sep 8, 2017

i am having same issue, with 3 signer/mining nodes and one node without mining that gives transactions, one of the mining nodes when a transaction is mined gives a "bad block unknown ancestor" and stops mining, whole network stucks after that

version 1.7.0 on commit (#14631)

@amissine

This comment has been minimized.

Copy link

amissine commented Sep 9, 2017

I just tried it with the latest commit 10181b5 - PoW does not have this problem, works like charm. Must be a PoA issue.

@OniReimu

This comment has been minimized.

Copy link

OniReimu commented Oct 14, 2017

From my understanding:

func NewProtocolManager(config *params.ChainConfig, mode downloader.SyncMode, networkId uint64, maxPeers int, mux *event.TypeMux, txpool txPool, engine consensus.Engine, blockchain *core.BlockChain, chaindb ethdb.Database) (*ProtocolManager, error) {
// ...
	// Figure out whether to allow fast sync or not
	if mode == downloader.FastSync && blockchain.CurrentBlock().NumberU64() > 0 {
		log.Warn("Blockchain not empty, fast sync disabled")
		mode = downloader.FullSync
	}
	if mode == downloader.FastSync {
		manager.fastSync = uint32(1)
	}
// ...
	inserter := func(blocks types.Blocks) (int, error) {
		// If fast sync is running, deny importing weird blocks
		if atomic.LoadUint32(&manager.fastSync) == 1 {
			log.Warn("Discarded bad propagated block", "number", blocks[0].Number(), "hash", blocks[0].Hash())
			return 0, nil
		}
		atomic.StoreUint32(&manager.acceptTxs, 1) // Mark initial sync done on any fetcher import
		return manager.blockchain.InsertChain(blocks)
	}
//...
}

When the PM is being created, the mode is controlled by DefaultConfig defined in eth/config.go where the default mode is FastSync.

// DefaultConfig contains default settings for use on the Ethereum main net.
var DefaultConfig = Config{
	SyncMode:             downloader.FastSync,
	EthashCacheDir:       "ethash",
	EthashCachesInMem:    2,
	EthashCachesOnDisk:   3,
	EthashDatasetsInMem:  1,
	EthashDatasetsOnDisk: 2,
	NetworkId:            1,
	LightPeers:           20,
	DatabaseCache:        128,
	GasPrice:             big.NewInt(18 * params.Shannon),

	TxPool: core.DefaultTxPoolConfig,
	GPO: gasprice.Config{
		Blocks:     10,
		Percentile: 50,
	},
}

Note that the calling operation is:

  1. Storing eth/backend.go - New() in []serviceFuncs:
    cmd/geth/main.go - geth() -> cmd/geth/config.go - makeFullNode() -> cmd/utils/flags.go - RegisterEthService() -> node/node.go - Register()
  2. Run eth/backend.go - New():
    cmd/geth/main.go - geth() -> startNode() -> cmd/utils/cmd.go - StartNode() -> node/node.go - start() -> eth/backend.go - New() -> eth/handler.go - NewProtocolManager()

That is, if the current block height is 0, then FastSync is activated. Thus when inserter is being called, log.Warn("Discarded bad propagated block", "number", blocks[0].Number(), "hash", blocks[0].Hash()) will be called.

@facundomedica

This comment has been minimized.

Copy link

facundomedica commented Mar 1, 2018

Any updates on this?
EDIT: I think I figured out.

Check:

  1. That you have an "random enough" networkId
  2. Set --nodiscover flag. And add your nodes manually.

Reason: Some peers appeared when running admin.peers but I didn't recognize their IPs (I was running only 2 nodes locally), and they were from somewhere in the USA. So maybe you are getting connections from other networks and they interfere with yours.

@REPTILEHAUS

This comment has been minimized.

Copy link

REPTILEHAUS commented Mar 20, 2018

Also hitting this issue... running 2 nodes and 4 signing nodes, running locally with different ports and connected via admin.addPeers() - usually works with just 2 nodes and 2 signers

@facundomedica

This comment has been minimized.

Copy link

facundomedica commented Mar 20, 2018

@REPTILEHAUS have you tried what is in my comments? When you list the peers, are all of them well known to you?

@christiankiller

This comment has been minimized.

Copy link

christiankiller commented Mar 20, 2018

We also ran into many issues; we changed many things along the way, maybe something might solve your issue:

  1. Take a look at this repository for local deployment and this repository if you plan to run it on the cloud
  2. Only add two sealers to extradata in your genesis.json (e.g. if you have 5 total nodes, only authorize 2 signers in extradata. (also check EIP225)
  3. Add sleep between starting miner.start() on the two authorized nodes; somehow they raced and blocked each other when started too close together
  4. Add --syncmode "full" to your geth command
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment