Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node can't catch up if stay long enough offline #823

Closed
rotilho opened this issue Apr 23, 2018 · 10 comments
Closed

Node can't catch up if stay long enough offline #823

rotilho opened this issue Apr 23, 2018 · 10 comments

Comments

@rotilho
Copy link

rotilho commented Apr 23, 2018

Currently when I close my node from a old laptop (4 years old i7) the node can't catch up with new blocks getting slowly more out-dated.

If I download a up-to-date database from external source the node is able to keep up with the latest blocks.

Steps to reproduce the issue:

  1. With slow computer make a clean installation
  2. Open it
  3. The node can't catch up accumulating unprocessed blocks

Describe the results you received:
Unprocessed blocks accumulates

Describe the results you expected:
Slowly catch up with the latest blocks

Environment:

  • Windows 10
  • Node 12.1
  • 5200 RPM hd
@PlasmaPower
Copy link
Contributor

What version are you experiencing this issue on?

@rotilho
Copy link
Author

rotilho commented Apr 23, 2018

Mainly every each version but the latest that I tested were 12.1.

If my node was also left behind after update the db it would be ok however since this happens just when my node is been offline for while it raise the concern that it may be a serious performance issue (exponencial?); what if the TPS increases? Bootstrap a new node when we have dozens of million of blocks will be possible?

@PlasmaPower
Copy link
Contributor

I wouldn't worry about future theoretical performance. There are much more efficient methods of bootstrapping, but they haven't been implemented yet (they are complicated). Using an alternative to LMDB may also speed things up. Right now though, we're making the existing bootstrapping system work until new ideas are implemented.

@ariesunny
Copy link

@PlasmaPower Hey, It seems more related to the block verification procedure when Sync, Using an alternative DB may not make big improvement, Please have review the issue #833 , Thank you.

@kenzaburo
Copy link

I have a same issue. I'm running Node version 13.0 (latest one).

@ariesunny
Copy link

@rotilho Hi Bro, How is your Sync going? Is it finished?

@rotilho
Copy link
Author

rotilho commented May 2, 2018

@ariesunny I gave up. The only way to keep it sync is downloading a updated version of the database and don't keep the node down for too long.

@mfontanini
Copy link

mfontanini commented May 8, 2018

Is this being addressed or at least acknowledged by someone? I had an up to date node which I shutdown and only turned it on after a couple of months and I just cannot seem to catch up. I already updated to 12.1 but still, it's completely stuck. The block count (~7.7M) + downloaded blocks (~1.2M) seems to add up to more or less what I see in nanode as the amount of blocks. The block count is going up very slowly, at a rate at which it simply won't ever catch up and I do see some CPU usage on the nano node so it's definitely doing something. This is running on an 8 cores i7 @ 2.6hz on a 100mbit/s internet connection on Linux x64.

This is what I see in the log file:

[2018-05-07 20:47:30.431837]: 12971 accounts in pull queue
[2018-05-07 20:47:38.332263]: Error initiating bootstrap connection to [::ffff:45.77.113.181]:7075: No route to host
[2018-05-07 20:47:44.415537]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 19 representatives
[2018-05-07 20:47:44.415888]: Block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 was republished to peers
[2018-05-07 20:47:44.415986]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 19 representatives
[2018-05-07 20:47:44.415994]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 9 representatives
[2018-05-07 20:47:44.416180]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 9 representatives
[2018-05-07 20:47:45.442525]: 12955 accounts in pull queue
[2018-05-07 20:47:51.573910]: 92 blocks in processing queue
[2018-05-07 20:47:56.318490]: Error initiating bootstrap connection to [::ffff:38.98.13.230]:49086: No route to host
[2018-05-07 20:48:00.416292]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 19 representatives
[2018-05-07 20:48:00.417092]: Block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 was republished to peers
[2018-05-07 20:48:00.417487]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 19 representatives
[2018-05-07 20:48:00.417506]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 9 representatives
[2018-05-07 20:48:00.417999]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 9 representatives
[2018-05-07 20:48:00.512359]: 12907 accounts in pull queue
[2018-05-07 20:48:01.425857]: Error initiating bootstrap connection to [::ffff:80.101.36.174]:7075: No route to host
[2018-05-07 20:48:16.418597]: Block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 was republished to peers
[2018-05-07 20:48:16.460890]: 12879 accounts in pull queue
[2018-05-07 20:48:21.362392]: Error initiating bootstrap connection to [::ffff:45.77.113.181]:7075: No route to host
[2018-05-07 20:48:29.307518]: Resolving fork between our block: 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 and block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A both with root 8AD01907273D08B226B3D061ABF3A257FB9DAAD44F3AFBD72B5545F5F8FA31BD
[2018-05-07 20:48:29.307621]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 20 representatives
[2018-05-07 20:48:29.307771]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 20 representatives
[2018-05-07 20:48:29.307772]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 10 representatives
[2018-05-07 20:48:29.307943]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 10 representatives
[2018-05-07 20:48:31.467229]: 12865 accounts in pull queue
[2018-05-07 20:48:32.418663]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 19 representatives
[2018-05-07 20:48:32.419389]: Block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 was republished to peers
[2018-05-07 20:48:32.419649]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 19 representatives
[2018-05-07 20:48:32.419668]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 9 representatives
[2018-05-07 20:48:32.420066]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 9 representatives
[2018-05-07 20:48:40.345799]: Error initiating bootstrap connection to [::ffff:38.98.13.230]:49086: No route to host
[2018-05-07 20:48:46.467503]: 12860 accounts in pull queue
[2018-05-07 20:48:46.467902]: Error initiating bootstrap connection to [::ffff:80.101.36.174]:7075: No route to host
[2018-05-07 20:48:48.420185]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 19 representatives
[2018-05-07 20:48:48.421190]: Block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 was republished to peers
[2018-05-07 20:48:48.421425]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 19 representatives
[2018-05-07 20:48:48.421432]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 9 representatives
[2018-05-07 20:48:48.422022]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 9 representatives
[2018-05-07 20:49:01.467882]: 12855 accounts in pull queue
[2018-05-07 20:49:04.390553]: Error initiating bootstrap connection to [::ffff:45.77.113.181]:7075: No route to host
[2018-05-07 20:49:04.422168]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 19 representatives
[2018-05-07 20:49:04.422843]: Block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 was republished to peers
[2018-05-07 20:49:04.422961]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 19 representatives
[2018-05-07 20:49:04.422977]: Broadcasting confirm req for block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 to 9 representatives
[2018-05-07 20:49:04.423366]: Broadcasting confirm req for block 45839D8DDE7A42BF4D2AEE969C403D87BF940E5CA6F09CE476440CDC68C74F6A to 9 representatives
[2018-05-07 20:49:16.476059]: 12843 accounts in pull queue
[2018-05-07 20:49:20.423786]: Block 7B74B9ADB8F3C473BE2CFDF73DBF1537F1B6A3F49BC4AB2CEA23DE2587B32921 was republished to peers
[2018-05-07 20:49:24.377212]: Error initiating bootstrap connection to [::ffff:38.98.13.230]:49086: No route to host
[2018-05-07 20:49:26.954137]: 72 blocks in processing queue
[2018-05-07 20:49:30.500637]: Error initiating bootstrap connection to [::ffff:80.101.36.174]:7075: No route to host
[2018-05-07 20:49:31.482498]: 12817 accounts in pull queue
[2018-05-07 20:49:46.505484]: 12765 accounts in pull queue
[2018-05-07 20:49:50.187516]: 81 blocks in processing queue
[2018-05-07 20:49:51.428682]: Error initiating bootstrap connection to [::ffff:45.77.113.181]:7075: No route to host
[2018-05-07 20:49:57.391583]: Found a representative at [::ffff:45.32.246.108]:7075
[2018-05-07 20:50:01.513491]: 12700 accounts in pull queue

The fact that we have to resort to some random data upload website to download the latest state of the ledger to be able to keep up is concerning. Also note that this is not a bootstrap from scratch, this node was up to date until March 23rd. Back then I did a bootstrap from scratch and it only took a few hours to get synced up.

Edit: I downloaded the latest dump from here (modified date of today at 12:33am) and after almost 2 hours I'm stuck and the amount of downloaded blocks is going up. How does anyone even keep up with this?

@clemahieu
Copy link
Contributor

The need for IO optimizations is a well-known issue. We'll be working on that once other higher priority issues are fixed.

@rkeene
Copy link
Contributor

rkeene commented Aug 23, 2018

This should be resolved by a few different on-going projects. Lazy bootstrapping (#995) will reduce the amount of disk I/O and voting traffic during bootstrapping since it will only pull confirmed blocks, thus no need to store them in a different database temporarily or ask for confirmation. Vote-by-hash and vote stapling both ensure less bandwidth is used and vote stapling makes the votes a bit more durable so that nodes can be use them for longer, which reduces network utilization greatly.

I am going to close this ticket so that we can track the individual improvements directly but let me know if there are additional questions.

@rkeene rkeene closed this as completed Aug 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants