New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction propagation issue on private test net since geth 1.4.6 #2769

Closed
nimmaj opened this Issue Jul 1, 2016 · 21 comments

Comments

Projects
None yet
@nimmaj

nimmaj commented Jul 1, 2016

System information

Geth version: 1.4.6
OS & Version: Linux(Ubuntu:xenial)

Synopsis

I've got two geth nodes joined together on a private test net. Both have a coinbase. I can send transactions from the coinbase on the miner to the coinbase on the client geth. Post 1.4.6 i cannot send them back.

Steps to reproduce:

  • bring up two geths, create coinbase on each, unlock accounts
  • join them together using admin_addPeer
  • start mining on one node - wait for it to have some ether
  • send a transaction of, say, 4 ether from the miner coinbase to the non-miner coinbase
  • observe that this succeeds
  • once the non-miner coinbase has a balance, send some ether back
  • observe that on 1.4.5 this works fine. on 1.4.6 this transaction is never included

Presumably there is either a bug or I've mis-setup my setup. But that fact that the txns flow in one direction seems interesting.

I've attached some files that show the problem (dag generation takes a while). They work on a mac with docker-machine started and presume a docker machine ip address of 192.168.99.100 - you might need to change this to localhost on linux (there are command line args for this).

The runTest.sh file assumes that you are in a directory called ethBug to help it find the ip address.

  • docker-compose -f geth-docker-1.4.5 build
  • docker-compose -f geth-docker-1.4.5 up
  • npm install (i'm using node5, but presumably node6 will work)
  • ./runTest.sh

At this point observe that the test (eventually) finishes with a payment back.

Then:

  • ctrl-c the running docker compose (once)
  • docker-compose -f geth-docker-1.4.5 down
  • docker-compose -f geth-docker-1.4.6 build
  • docker-compose -f geth-docker-1.4.6 up
  • ./runTest.sh

Observe that this does not complete.

You can see this in the logs for 1.4.6:

geth-client_1  | I0701 13:09:53.111635 eth/api.go:1193] Tx(0x155385db87d578a08c5efb8486335da0ee690f72c0d10a95ead23763f4fffa09) to: 0xc64b8fc5796146dd68727e7bc3fba4edd7d30bb2
geth-miner_1   | I0701 13:09:53.854263 miner/worker.go:337] 🔨  Mined block (#7 / e95015dc). Wait 5 blocks for confirmation
geth-miner_1   | I0701 13:09:53.854502 miner/worker.go:555] commit new work on block 8 with 0 txs & 0 uncles. Took 117.752µs
geth-miner_1   | I0701 13:09:53.854599 miner/worker.go:433] 🔨 🔗  Mined 5 blocks back: block #2
geth-miner_1   | I0701 13:09:53.855311 miner/worker.go:555] commit new work on block 8 with 0 txs & 0 uncles. Took 223.753µs
geth-client_1  | I0701 13:09:53.865109 core/blockchain.go:964] imported 1 block(s) (0 queued 0 ignored) including 0 txs in 4.61781ms. #7 [e95015dc / e95015dc]
geth-miner_1   | I0701 13:09:55.576549 miner/worker.go:337] 🔨  Mined block (#8 / 67adc042). Wait 5 blocks for confirmation
geth-miner_1   | I0701 13:09:55.577478 miner/worker.go:555] commit new work on block 9 with 0 txs & 0 uncles. Took 847.668µs

So the txn never seems to be included in the block. The client has the txn as a pending txn. The miner has 0 pending txns.

Presumably the transaction is not propagating from the client to the miner. Anyone have any idea why?

For completeness here is the corresponding part of the 1.4.5 logs:

geth-client_1  | I0701 13:07:45.585119 eth/api.go:1193] Tx(0xd832a7543de23e8d3094555a1e5bd65185d15b4ec7bebe0dde392524f3a8b636) to: 0x1558a5b5ad08b9bcad764e27cff23c860110bc0b
geth-miner_1   | I0701 13:07:45.768951 miner/worker.go:337] 🔨  Mined block (#7 / f552fdc9). Wait 5 blocks for confirmation
geth-miner_1   | I0701 13:07:45.769838 miner/worker.go:555] commit new work on block 8 with 1 txs & 0 uncles. Took 259.71µs
geth-miner_1   | I0701 13:07:45.777002 miner/worker.go:433] 🔨 🔗  Mined 5 blocks back: block #2
geth-client_1  | I0701 13:07:45.782043 core/blockchain.go:959] imported 1 block(s) (0 queued 0 ignored) including 0 txs in 5.709456ms. #7 [f552fdc9 / f552fdc9]
geth-miner_1   | I0701 13:07:45.783540 miner/worker.go:555] commit new work on block 8 with 1 txs & 0 uncles. Took 6.46318ms
geth-miner_1   | I0701 13:07:48.797639 miner/worker.go:337] 🔨  Mined block (#8 / 30e63db1). Wait 5 blocks for confirmation
geth-miner_1   | I0701 13:07:48.798662 miner/worker.go:555] commit new work on block 9 with 0 txs & 0 uncles. Took 356.817µs
geth-miner_1   | I0701 13:07:48.799672 miner/worker.go:433] 🔨 🔗  Mined 5 blocks back: block #3
geth-miner_1   | I0701 13:07:48.808388 miner/worker.go:555] commit new work on block 9 with 0 txs & 0 uncles. Took 8.576268ms
geth-client_1  | I0701 13:07:48.815356 core/blockchain.go:959] imported 1 block(s) (0 queued 0 ignored) including 1 txs in 8.734131ms. #8 [30e63db1 / 30e63db1]

Thanks v. much :-)
ethBug.zip

@coeniebeyers

This comment has been minimized.

Show comment
Hide comment
@coeniebeyers

coeniebeyers Jul 4, 2016

I Can confirm this behaviour, any progress? This doesn't make any sense to me, but playing around with the --nodiscover flag seemed to solve some issues for me.

I Can confirm this behaviour, any progress? This doesn't make any sense to me, but playing around with the --nodiscover flag seemed to solve some issues for me.

@karalabe

This comment has been minimized.

Show comment
Hide comment
@karalabe

karalabe Jul 4, 2016

Member

This was actually a feature that behaves in a weird way on private networks with a single miner.

We've noticed that on the main network when new nodes join, they already receive and try to process transactions against their whatever stale state. Since initial sync can take quite some time, new nodes manages to pile up 10s of thousands of transactions, which put an enormous burden on them. Our solution was that the nodes do not accept new transactions from remote nodes until they complete their initial sync cycle (= receive either a long chain from a remote node, or a fresh enough block).

In a private scenario with only 1 miner, the miner actually never completes a sync since it never receives a block from anyone else (as it is the only one minting the blocks). It's an unfortunate corner case we didn't think of.

One solution is to assume the chain synced when a miner starts mining, but the drawback is that people starting a new node with --mine preset could end up in the same sync issue. Another solution was to only accept txs during actual mining (mining is halted during sync), but this results in a single-miner private-network node to stop accepting new txs if mining is stopped (e.g. only mine if txs are present). There's a third (albeit rare) corner case when there is no miner at all, so sync cycle is never assumed complete.

I'm trying to figure out the best solution for this.

Member

karalabe commented Jul 4, 2016

This was actually a feature that behaves in a weird way on private networks with a single miner.

We've noticed that on the main network when new nodes join, they already receive and try to process transactions against their whatever stale state. Since initial sync can take quite some time, new nodes manages to pile up 10s of thousands of transactions, which put an enormous burden on them. Our solution was that the nodes do not accept new transactions from remote nodes until they complete their initial sync cycle (= receive either a long chain from a remote node, or a fresh enough block).

In a private scenario with only 1 miner, the miner actually never completes a sync since it never receives a block from anyone else (as it is the only one minting the blocks). It's an unfortunate corner case we didn't think of.

One solution is to assume the chain synced when a miner starts mining, but the drawback is that people starting a new node with --mine preset could end up in the same sync issue. Another solution was to only accept txs during actual mining (mining is halted during sync), but this results in a single-miner private-network node to stop accepting new txs if mining is stopped (e.g. only mine if txs are present). There's a third (albeit rare) corner case when there is no miner at all, so sync cycle is never assumed complete.

I'm trying to figure out the best solution for this.

@aman-c

This comment has been minimized.

Show comment
Hide comment
@aman-c

aman-c Jul 4, 2016

I am also facing the same issue in geth 1.4.7.

aman-c commented Jul 4, 2016

I am also facing the same issue in geth 1.4.7.

@nimmaj

This comment has been minimized.

Show comment
Hide comment
@nimmaj

nimmaj Jul 4, 2016

@karalabe - that's really interesting, thanks. So actually in my case i could work around the issue by mining a bit on each node and letting them sync before switching to a single miner. I will try that and report back later today.

You might ask why we're only using one miner? We are trying to have broadly one block per second and we've found that we get into a lot of trouble with multiple miners and low difficulties so we've switched to one miner for the moment. There are other rather interesting issues in this use case.

Many thanks for the clear explanation about what is going on - that's really helpful.

nimmaj commented Jul 4, 2016

@karalabe - that's really interesting, thanks. So actually in my case i could work around the issue by mining a bit on each node and letting them sync before switching to a single miner. I will try that and report back later today.

You might ask why we're only using one miner? We are trying to have broadly one block per second and we've found that we get into a lot of trouble with multiple miners and low difficulties so we've switched to one miner for the moment. There are other rather interesting issues in this use case.

Many thanks for the clear explanation about what is going on - that's really helpful.

@nimmaj

This comment has been minimized.

Show comment
Hide comment
@nimmaj

nimmaj Jul 4, 2016

As per your effective suggestion, mining on a random client node for a block while the miner is not mining makes this problem go away for my test case and our use case. Many thanks.

nimmaj commented Jul 4, 2016

As per your effective suggestion, mining on a random client node for a block while the miner is not mining makes this problem go away for my test case and our use case. Many thanks.

@dan-turner

This comment has been minimized.

Show comment
Hide comment
@dan-turner

dan-turner Jul 5, 2016

I just lost two days on this... Thanks @nimmaj for creating this otherwise it could well have been another two. Any developments on an alternate solution?

Is this the sum total of the relevant changes? (Going to revert the changes in my private fork)

ecb8e23

dan-turner commented Jul 5, 2016

I just lost two days on this... Thanks @nimmaj for creating this otherwise it could well have been another two. Any developments on an alternate solution?

Is this the sum total of the relevant changes? (Going to revert the changes in my private fork)

ecb8e23

@wawrzek

This comment has been minimized.

Show comment
Hide comment
@wawrzek

wawrzek Jul 6, 2016

@karalabe Could you add 'Private Network' label please.

wawrzek commented Jul 6, 2016

@karalabe Could you add 'Private Network' label please.

@fjl fjl added the private network label Jul 7, 2016

@fjl

This comment has been minimized.

Show comment
Hide comment
@fjl

fjl Jul 7, 2016

Contributor

done (for all of them)

Contributor

fjl commented Jul 7, 2016

done (for all of them)

@mjackson001

This comment has been minimized.

Show comment
Hide comment
@mjackson001

mjackson001 Jul 27, 2016

I just had a strange case presumably related somehow to this issue on a private testnet with three miners. The "main" miner was the only one mining and the other two miners were connected to the main miner but not each other. At first, I observed the behavior above.

Reading the discussion above, I started mining on the two other nodes at which point transactions were indeed passed along to the main node, however, the two other miners happened to be mining faster than the main miner and they were all out of sync in block numbers. The main miner (slowest) was in the 1930's, and the other two were in the 2030's range and 2070's, respectively. None were syncing to the longest chain but were passing on transactions. I verified all were connected correctly by checking admin.peers.

I just had a strange case presumably related somehow to this issue on a private testnet with three miners. The "main" miner was the only one mining and the other two miners were connected to the main miner but not each other. At first, I observed the behavior above.

Reading the discussion above, I started mining on the two other nodes at which point transactions were indeed passed along to the main node, however, the two other miners happened to be mining faster than the main miner and they were all out of sync in block numbers. The main miner (slowest) was in the 1930's, and the other two were in the 2030's range and 2070's, respectively. None were syncing to the longest chain but were passing on transactions. I verified all were connected correctly by checking admin.peers.

@wawrzek

This comment has been minimized.

Show comment
Hide comment
@wawrzek

wawrzek Aug 15, 2016

@mjackson001 I remember seeing similar situation.

wawrzek commented Aug 15, 2016

@mjackson001 I remember seeing similar situation.

@cdetrio

This comment has been minimized.

Show comment
Hide comment
@cdetrio

cdetrio Aug 24, 2016

Member

One solution is to replace the conditional at eth/handler.go#L668:

if atomic.LoadUint32(&pm.synced) == 0 {

with

if (pm.downloader.Synchronising() == true) {.

Its not a perfect fix since some tx's might arrive before downloading starts, and some tx's may occasionally slip through when downloading is interrupted (when a peer is dropped, pm.downloader.Synchronising() returns false until downloading resumes with a different peer). But it should filter most tx's that arrive before syncing is finished.

Member

cdetrio commented Aug 24, 2016

One solution is to replace the conditional at eth/handler.go#L668:

if atomic.LoadUint32(&pm.synced) == 0 {

with

if (pm.downloader.Synchronising() == true) {.

Its not a perfect fix since some tx's might arrive before downloading starts, and some tx's may occasionally slip through when downloading is interrupted (when a peer is dropped, pm.downloader.Synchronising() returns false until downloading resumes with a different peer). But it should filter most tx's that arrive before syncing is finished.

@ethernomad

This comment has been minimized.

Show comment
Hide comment
@ethernomad

ethernomad Aug 30, 2016

Contributor

Just lost 3 hours because of this bug. 😩

Contributor

ethernomad commented Aug 30, 2016

Just lost 3 hours because of this bug. 😩

@joeb000

This comment has been minimized.

Show comment
Hide comment
@joeb000

joeb000 Sep 6, 2016

I was having the same issue (see ticket #2980) - it was easy enough to work around this issue by spinning up another mining geth instance on the same machine - less than ideal but definitely solves the issue for me.

@karalabe - really appreciate the help! Thank you!

joeb000 commented Sep 6, 2016

I was having the same issue (see ticket #2980) - it was easy enough to work around this issue by spinning up another mining geth instance on the same machine - less than ideal but definitely solves the issue for me.

@karalabe - really appreciate the help! Thank you!

@randomnerd

This comment has been minimized.

Show comment
Hide comment
@randomnerd

randomnerd Oct 17, 2016

just ran into the same issue. this looks like a major bug. transaction does not even appear in eth.pendingTransactions at the non-mining machine...
any help?

randomnerd commented Oct 17, 2016

just ran into the same issue. this looks like a major bug. transaction does not even appear in eth.pendingTransactions at the non-mining machine...
any help?

@randomnerd

This comment has been minimized.

Show comment
Hide comment
@randomnerd

randomnerd Oct 17, 2016

adding to the previous post:
our use case is one central mining node and a lot of non-mining clients that interact with contracts. i believe PoA schema would fit better, but afaik there is no PoA in geth yet.
so how do we set up the network to allow non-mining clients to send transactions?

adding to the previous post:
our use case is one central mining node and a lot of non-mining clients that interact with contracts. i believe PoA schema would fit better, but afaik there is no PoA in geth yet.
so how do we set up the network to allow non-mining clients to send transactions?

@iFA88

This comment has been minimized.

Show comment
Hide comment
@iFA88

iFA88 Oct 24, 2016

I think if you remove these lines:
https://github.com/ethereum/go-ethereum/blob/develop/eth/handler.go#L676-L678

And compile it ONLY for your miner-client on your server, then should working temporary.

iFA88 commented Oct 24, 2016

I think if you remove these lines:
https://github.com/ethereum/go-ethereum/blob/develop/eth/handler.go#L676-L678

And compile it ONLY for your miner-client on your server, then should working temporary.

@mattcrooks

This comment has been minimized.

Show comment
Hide comment
@mattcrooks

mattcrooks Nov 1, 2016

+1 We are seeing the same behavior on our private net

+1 We are seeing the same behavior on our private net

@eloudsa

This comment has been minimized.

Show comment
Hide comment
@eloudsa

eloudsa Dec 19, 2016

I'm using Geth 1.5.5 on a private network with one mining node (M) and a non-mining node (A). On the node A, I have a static-nodes.json file describing the node M.
I'm able to propagate transactions from M->A but not from A->M.
Is my problem related to this issue?
Is there any workaround other than patching the source code and rebuilding a new version of Geth?
Thanks.

eloudsa commented Dec 19, 2016

I'm using Geth 1.5.5 on a private network with one mining node (M) and a non-mining node (A). On the node A, I have a static-nodes.json file describing the node M.
I'm able to propagate transactions from M->A but not from A->M.
Is my problem related to this issue?
Is there any workaround other than patching the source code and rebuilding a new version of Geth?
Thanks.

@eloudsa

This comment has been minimized.

Show comment
Hide comment
@eloudsa

eloudsa Dec 20, 2016

As a workaround, I have followed the solution given by @karalabe.
I start 2 miners on my machine and updated the file "static-nodes.json" accordingly.
Now, my non-miner node is now able to send transactions that are processed by one of the miners.

eloudsa commented Dec 20, 2016

As a workaround, I have followed the solution given by @karalabe.
I start 2 miners on my machine and updated the file "static-nodes.json" accordingly.
Now, my non-miner node is now able to send transactions that are processed by one of the miners.

@randomnerd

This comment has been minimized.

Show comment
Hide comment
@randomnerd

randomnerd Jan 31, 2017

using parity (1.5 release with PoA) did the trick :)

using parity (1.5 release with PoA) did the trick :)

@obscuren

This comment has been minimized.

Show comment
Hide comment
@obscuren

obscuren Jan 31, 2017

Member

Closing this issue because it's relatively outdated and likely to be fixed. Please open a new issue with any of the 1.5.x versions.

Member

obscuren commented Jan 31, 2017

Closing this issue because it's relatively outdated and likely to be fixed. Please open a new issue with any of the 1.5.x versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment