Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block is lost and no one wrapped it as uncle, even myself #2298

Closed
sammy007 opened this issue Mar 6, 2016 · 45 comments
Closed

Block is lost and no one wrapped it as uncle, even myself #2298

sammy007 opened this issue Mar 6, 2016 · 45 comments
Assignees
Milestone

Comments

@sammy007
Copy link
Contributor

sammy007 commented Mar 6, 2016

I found block at height N. Usually I expect it to be at least an uncle in case of race. But no one included it as uncle in next 7 blocks. Also I found N+5 block and if my block is in the same geth instance block N was not included as uncle in N+5 block. This is weird. Can anyone explain how that happened? I have more than 96 peers for propagation.

This is not a single case, day before that happened too and I remember some lost blocks since ETH launch. Is there unresolved issues in network protocol? If I can't include my own block as uncle in my own block I think it's a design flaw or at least a bug.

@obscuren
Copy link
Contributor

obscuren commented Mar 6, 2016

Took a quick look through the code (on my phone) and I think there's indeed an error when there's a chain reorganization. It appears that when a block gets reverts - because it's deemed "less worth" - it doesn't get announced as SideChain even (this is the event that fired for uncle blocks). This likely results in the behavior you explained above.

I'll have a look tomorrow when I get home. Thanks for reporting the issue.

@sammy007
Copy link
Contributor Author

sammy007 commented Mar 6, 2016

Thanks. FYI, following blocks after N were uncle-free, so no limit was reached.

@Atrides
Copy link

Atrides commented Mar 6, 2016

I agree that's big issue. Up to release 1.3.5 I had 3-7 mined blocks that not in uncles comes.
After upgrade to 1.3.5 (3th March) now every day 30-40 lost blocks:

27 | 2016-03-06 00:00:00+01
39 | 2016-03-05 00:00:00+01
30 | 2016-03-04 00:00:00+01
 6 | 2016-03-03 00:00:00+01
 4 | 2016-03-02 00:00:00+01
 8 | 2016-03-01 00:00:00+01
 7 | 2016-02-29 00:00:00+01
 4 | 2016-02-28 00:00:00+01
 3 | 2016-02-27 00:00:00+01
 2 | 2016-02-26 00:00:00+01

@Atrides
Copy link

Atrides commented Mar 6, 2016

I'll prepare downgrade to 1.3.3 all my servers, at least till 1150000 block.

@obscuren
Copy link
Contributor

obscuren commented Mar 6, 2016

This issue is present in all versions of geth. Downgrading does not change anything if only making matters worse.

@Atrides
Copy link

Atrides commented Mar 6, 2016

Yes, it's in all versions, but since 1.3.5 were 5x more such blocks.

@obscuren
Copy link
Contributor

obscuren commented Mar 6, 2016

It won't matter. There's a change in all clients that determines the heaviest chain to prevent selfish mining. Downgrading won't change anything.

@obscuren
Copy link
Contributor

obscuren commented Mar 6, 2016

(Unless you can produce more blocks)

@obscuren
Copy link
Contributor

obscuren commented Mar 6, 2016

Relevant commit obscuren@5b28366

@Atrides
Copy link

Atrides commented Mar 6, 2016

I don't want produce more blocks to destroy somewhat. The miners get less coins if I lost 30 blocks per day. But I'll monitor block situation after downgrade. Tomorrow I'll report it.

@obscuren
Copy link
Contributor

obscuren commented Mar 6, 2016

Thanks. Looking forward to seeing the results of your experiment.

@sammy007
Copy link
Contributor Author

sammy007 commented Mar 6, 2016

@obscuren so this is still a problem or not a problem/by design? Just not clear for me after you referred to 5b28366 commit.

@obscuren
Copy link
Contributor

obscuren commented Mar 6, 2016

@sammy007 that commit isn't about the "self-uncle-mining". The linked commit shows a change in the chain acceptance code.

obscuren added a commit to obscuren/go-ethereum that referenced this issue Mar 7, 2016
Previously all blocks that were already in our chain were never re
announced as potential uncle block (e.g. ChainSideEvent). This is
problematic during mining where you want to gather as much possible
uncles as possible increasing the profit. This is now addressed in this
PR where during reorganisations of chains the old chain is regarded as
uncles.

Fixed ethereum#2298
@obscuren obscuren self-assigned this Mar 7, 2016
@obscuren
Copy link
Contributor

obscuren commented Mar 7, 2016

@sammy007 if you'd like to experiment try out the linked PR 👍

@Atrides
Copy link

Atrides commented Mar 7, 2016

One day of experiment. Downgrading to 1.3.3 doesn't help because of other nodes.

21 | 2016-03-07 00:00:00+01
32 | 2016-03-06 00:00:00+01
39 | 2016-03-05 00:00:00+01
30 | 2016-03-04 00:00:00+01
 6 | 2016-03-03 00:00:00+01
 4 | 2016-03-02 00:00:00+01

@sammy007
Copy link
Contributor Author

sammy007 commented Mar 7, 2016

@Atrides prolly you can try this PR on one of your nodes, I can't afford testing today, night is coming and I won't be eaten by my whale on morning.

@peterbitfly
Copy link
Contributor

I have tested the new PR and it looks like it does not solve the problem. I have mined block number 1117390 but suprnova beat me to it. If I look at the consecutive blocks no miner included it as an uncle.

Nevertheless what did work was that local uncles will be included in local mined blocks (if you are able to mine one fast enough, see block 1117351).

obscuren added a commit to obscuren/go-ethereum that referenced this issue Mar 8, 2016
Previously all blocks that were already in our chain were never re
announced as potential uncle block (e.g. ChainSideEvent). This is
problematic during mining where you want to gather as much possible
uncles as possible increasing the profit. This is now addressed in this
PR where during reorganisations of chains the old chain is regarded as
uncles.

Fixed ethereum#2298
@obscuren
Copy link
Contributor

@ppratscher This fix requires for at least 50% of the miners to update. If only you have this fix only you will include other people their reorganised blocks. It does not, unfortunately, work the other way around.

@Atrides
Copy link

Atrides commented Mar 11, 2016

If this fix works, upload to upstream and I can update all my nodes. Then we need yet another pool for that, at example ethpool can do it also.

@peterbitfly
Copy link
Contributor

ethpool has already been updated! Will be interesting to see how it affects the uncle rate of the network.

@fjl
Copy link
Contributor

fjl commented Mar 12, 2016

@Atrides if you want to help finding out the cause of the high memory use, you can do it as follows:

  • start geth with --memprofilerate=524288 in addition to other flags.
  • When mem usage is high (it probably takes a while to ramp up), use geth attach to get a JS console.
  • Run debug.writeMemProfile("~/geth.memprofile") in the JS console.
  • Post the file somewhere (e.g. gist.github.com).

Using the profile, we can see which part of geth is hogging all the memory.

@obscuren
Copy link
Contributor

@Atrides PR doesn't give you just more uncles. As I described above it doesn't take one minder to update. It's no magic patch

@Atrides
Copy link

Atrides commented Mar 12, 2016

@fjl http://dwarfpool.com/static/eth/geth.memprofile.gz
some stats: after start 50 MB
after 3 min 300 MB
after 20 min 580 MB
after 90 min 1150 MB
after 2h 1960 MB

@obscuren It's not because of more uncles, it's regarding found blocks that marked as lost and not uncles.

@fjl
Copy link
Contributor

fjl commented Mar 12, 2016

(pprof) top10
901.41MB of 948.34MB total (95.05%)
Dropped 778 nodes (cum <= 4.74MB)
Showing top 10 nodes out of 78 (cum >= 9MB)
      flat  flat%   sum%        cum   cum%
  331.77MB 34.98% 34.98%   331.77MB 34.98%  net/http.putBufioReader
  316.70MB 33.39% 68.38%   316.70MB 33.39%  net/http.newBufioReader
   78.03MB  8.23% 76.61%    78.03MB  8.23%  github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Get
   49.01MB  5.17% 81.78%    73.52MB  7.75%  github.com/ethereum/go-ethereum/rpc.(*httpConnHijacker).ServeHTTP
      40MB  4.22% 85.99%       40MB  4.22%  github.com/syndtr/goleveldb/leveldb/memdb.(*DB).Reset
   39.02MB  4.11% 90.11%    39.02MB  4.11%  encoding/json.(*Decoder).readValue
   16.50MB  1.74% 91.85%    18.50MB  1.95%  net/http.(*response).ReadFrom
   11.37MB  1.20% 93.05%    11.37MB  1.20%  gopkg.in/fatih/set%2ev0.(*Set).New
      10MB  1.05% 94.10%    20.50MB  2.16%  net.(*netFD).accept
       9MB  0.95% 95.05%        9MB  0.95%  net.(*TCPAddr).isWildcard

@fjl
Copy link
Contributor

fjl commented Mar 12, 2016

I think the memory usage issue with develop will be fixed by #2259. Most of the memory is used in some low-level cache maintained by the HTTP server. My guess is that we are triggering this because somehow connections are not closed properly.

@obscuren obscuren added this to the 1.4.0 milestone Mar 14, 2016
@obscuren
Copy link
Contributor

@Atrides Could you run geth version and report the output string?

@obscuren
Copy link
Contributor

I've build a test program that tests whether uncles get included. The results of the experiment can be found here.

In the log output you can see that the uncle that can be an uncle gets properly included and uncles that can not be included get discarded.

Please remember that uncles can only be 1 block deep, e.g. [S]-[1]-[2] where S being the common ancestor and [1] and [2] being blocks you mined. If another chain now gets imported, say [S]-[3]-[4]-[5] where [S] still the common ancestor and 3..5 blocks not mined by you, than the pending block [6] may ONLY contain block [1] since uncles may only by 1 deep and therefore [2] is discarded.

@obscuren
Copy link
Contributor

@Atrides pling

@sammy007
Copy link
Contributor Author

Is it possible to port patch into stable branch? 1.4 has serious issues, RPC is rewritten and broken, no one will use it for mining, this is money, not a fun.

@obscuren
Copy link
Contributor

@sammy007 It would help us enormously if you'd point out what is broken (in a separate issue, please).

@sammy007
Copy link
Contributor Author

#2255

@obscuren
Copy link
Contributor

Please try develop again. Many changes have happened since then

@karalabe
Copy link
Member

Wait, let's merge in bas's PR first. #2259

@obscuren
Copy link
Contributor

Done

@obscuren
Copy link
Contributor

@sammy007 @Atrides Let me clarify that this issue was in the code since forever. Nothing we added lately broke this, it was simply always there and you've found the issue (which is awesome, btw).

@obscuren obscuren reopened this Mar 23, 2016
@obscuren obscuren reopened this Mar 23, 2016
@obscuren
Copy link
Contributor

woops

@sammy007
Copy link
Contributor Author

it was simply always there and you've found the issue

I remember that I was bitching in gitter about it long ago, my bad that I didn't submit issue so far. Prolly not related to bug bounty, but if yes, I wouldn't resist :p

Just curious, is it really a big business to port to master? I really afraid to run 1.4 for mission critical, all my previous attempts was painful.

@peterbitfly
Copy link
Contributor

1.4 is good for mining. ethpool.org uses it since quite some time. RPC
works fine.
On Mar 23, 2016 1:15 PM, "Sammy Libre" notifications@github.com wrote:

it was simply always there and you've found the issue

I remember that I was bitching in gitter about it long ago, my bad that I
didn't submit issue so far. Prolly not related to bug bounty, but if yes, I
wouldn't resist :p

Just curious, is it really a big business to port to master? I really
afraid to run 1.4 for mission critical, all my previous attempts was
painful.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2298 (comment)

@sammy007
Copy link
Contributor Author

sammy007 commented Apr 28, 2016

Lost on 1.4-rc, maybe due to different circumstances https://gist.github.com/sammy007/a194612f82274f2c4743facfd4bae5e1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants