Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck at testnet block 205847 after restarting sequel+postgres node after long hiatus #78

Closed
isaacwaldron opened this issue Apr 4, 2014 · 8 comments

Comments

@isaacwaldron
Copy link
Contributor

I'm running a sequel+postgres node on testnet that had been down for a couple of weeks. Upon restart it began syncing but after some time stopped updating at block 205847. It is repeatedly requesting blocks from attached nodes and gets "getblocks: 0000000000048aa605ae3c7b24195bd54dda2c92dce8b5ba65dbe793664a3902" replies but has not moved past the stalled block in over eight hours.

@mhanne
Copy link
Contributor

mhanne commented Apr 4, 2014

Hi, thanks for reporting this. There were some fixes for testnet recently, but it looks like you are already beyond those blocks, so I assume you're using the current code..

Could you please run the node with the --verbose flag and watch the output when it processes the very first block? It should log the error that prevented it from accepting this block.

@isaacwaldron
Copy link
Contributor Author

It's actually apparently not getting any data as it just loops forever requesting getblocks with the hash of block 205847 and never outputting anything else except occasional "connection failed" and "establishing connection" messages.

I've tried this with both this repo and your fork that I was using for the storage optimizations.

@isaacwaldron
Copy link
Contributor Author

I recreated the database from yesterday's dump on test.webbtc.com and bitcoin_node is happily storing blocks once again. I still have a copy of the database stuck at 205847 if I can do more tests.

@mhanne
Copy link
Contributor

mhanne commented Apr 5, 2014

Strange.. I assume you restarted the node several times so it shouldn't be due to bad peers...

If that isn't it, you can compare the two databases and see if there is any difference between the latest blocks - does yours have a side-chain block at around that depth maybe?
Can you put a dump of your DB somewhere I can download it from, to try and reproduce it here?

@comboy
Copy link
Contributor

comboy commented Apr 6, 2014

I just did testnet3 sync from the scratch on current master. So it could have been some database corruption error, or maybe some change between versions (not sure what could that be)?

@mhanne
Copy link
Contributor

mhanne commented Apr 6, 2014

Ah, thanks for checking. So it must be something related to the old database..

My first guess would be it's related to #57 still - there are some pretty weird reorg patterns on testnet. But around block 205847, I can't see any side blocks on webbtc. Are there any in your DB at that depth, or at the depth where you started? Do you remember which version of the code you were using before?

What I find curious is that if you only see 'getblocks' messages, it either means that none of your peers has any newer blocks, or they all sent it to you already and won't do it again.
When you restart the node, you should always see it doing something with the blocks, at least categorizing them as "main", "side" or "orphan"...

@isaacwaldron
Copy link
Contributor Author

When it was stuck I restarted the node several times and at one point deleted peers.json to force it to some new peers via DNS seed. I've posted a copy of the DB to https://s3.amazonaws.com/rarefied-public/blockchain_testnet_205847_dbuser.sql.bz2 if you'd like to take a look.

mhanne added a commit to mhanne/bitcoin-ruby that referenced this issue Apr 7, 2014
this solves a reorg issue when the node thinks the current chain head is a side branch

lian#78
@mhanne
Copy link
Contributor

mhanne commented Apr 7, 2014

Yes, it was a reorg issue. There's a check when a block comes in, checking if it is already stored or not. If it is, it was just skipping the block completely. But of course if this block is a side-chain block, it needs to run the branching logic again to get it in the main branch before it can continue.

Long story short, after this change I was able to sync your DB up to block 205909.
Thanks again for your help! :)

@mhanne mhanne closed this as completed May 9, 2014
mhanne added a commit that referenced this issue Jul 11, 2014
this solves a reorg issue when the node thinks the current chain head is a side branch

#78
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants