Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing warm-up time #2048

Closed
acud opened this issue Jun 10, 2021 · 4 comments · Fixed by #2050
Closed

Syncing warm-up time #2048

acud opened this issue Jun 10, 2021 · 4 comments · Fixed by #2050
Assignees
Labels

Comments

@acud
Copy link
Member

acud commented Jun 10, 2021

Task

We need to add a warm-up time to pushsync and pullsync protocols.

  • CLI flag for warmup that can be used on both protocols, default 10m, can accept 0 as immediate startup, otherwise integration tests won't work, enforce upper bound of 1h as too high
  • update helm charts and beekeeper to propagate correct values for CI and integration tests
  • pusher and puller should start after the warmup time expired
  • pullsync needs no changes
  • pushsync does not give back receipts during the warmup period, stream should reset
@bee-runner bee-runner bot added the issue label Jun 10, 2021
@Eknir
Copy link
Contributor

Eknir commented Jun 10, 2021

slight clarification: push sync may forward receipts, but not generate receipts itself. @acud , can you validate?

@acud
Copy link
Member Author

acud commented Jun 10, 2021

correct. pushsync should not sign receipts during the warmup period

@ldeffenb
Copy link
Collaborator

In my experience, 10 minutes, or any coded or user-specified time frame, is not going to solve the problem that this is trying to solve, namely services activating before the depth has been established.

I believe kademlia needs to find a way to use the count of known nodes per bin in the address book to estimate what depth is likely to be achievable and provide a method that pusher, puller, and pushsync can use to determine if they are "clear" to operate, because depth has reached the estimated achievable. This "clear" flag only needs to be recalculated when the depth changes to avoid unnecessary overhead.

This approach will also provide for a subsequent loss of depth to "re-block" these features if the check is made in their normal processing loops and not only a startup delay. I have seen my depth go from 15 to 3, stay there for a while, and then jump back to 15 as a single pivotal peer connection was lost from bin 4.

For some real-world traces of depth changes from node startup and over time, see:

https://ipfs.io/ipfs/QmSs8qGMEWyzezWtfs7fM2QuCcfjKn72J9D1zp1wBXibom

bee1 took 45 minutes to initially get to depth 14 jumping from 4
bee2 took 30 minutes to initially get to depth 13 jumping from 4
bee3 took 25 minutes to initially get to depth 15 jumping from 2

And for posterity, these are the current connected counts for topology bins 0->15 from left to right:

bee1: 8 8 15 10 12 20 20 20 20 19 20 17 14 8 3 4
bee2: 9 5 14 18 13 10 20 13 20 18 18 14 14 4 6 4
bee3: 12 10 9 11 10 20 19 15 11 14 17 15 16 9 7 3

and they are all running version 0.6.3-5b9541c4 which I believe was yesterday's master from github.

@acud
Copy link
Member Author

acud commented Jun 10, 2021

@ldeffenb we are incrementing on this issue on all fronts, in particular kademlia. this is to improve on the protocols on top of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants