Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reload Typha addresses between connection attempts. #7176

Merged
merged 4 commits into from
Jan 24, 2023

Conversation

fasaxc
Copy link
Member

@fasaxc fasaxc commented Jan 11, 2023

Description

Rework Typha discovery to avoid a long timeout if we happen to race with a Typha upgrade. Previously we could:

  • Load the list of typhas, just before...
  • ...all the typhas start shutting down to be upgraded
  • Attempt to connect to all the dead typhas one by one.
  • Only once we'd timed out trying to connect to each gone Typha, give up and restart. This could take 10s * number of Typhas; not the end of the world but also much longer than desired.

This change makes sure that we reload the list of Typhas between calls:

  • Extract out a Discoverer object to keep track of the discovery parameters.
  • Have the discoverer call back to initiate wireguard bootstrap handling.
  • Replace loop over slice of Typhas with an object that handles refreshing the list of Typhas and avoiding connecting to the same one twice.

Related issues/PRs

Todos

  • Tests
  • Release note

Release Note

Fix that, if a Typha client loads the list of Typha instances just before they all get upgraded, it takes 30s+ to time out.  Reload the list of Typha instances between each connection attempt.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@marvin-tigera marvin-tigera added this to the Calico v3.26.0 milestone Jan 11, 2023
@marvin-tigera marvin-tigera added docs-pr-required Change is not yet documented release-note-required Change has user-facing impact (no matter how small) labels Jan 11, 2023
@fasaxc fasaxc force-pushed the typha-discov-reload branch 5 times, most recently from 2bcfe6b to 841bd24 Compare January 16, 2023 16:09
@fasaxc fasaxc added docs-not-required Docs not required for this change and removed docs-pr-required Change is not yet documented labels Jan 16, 2023
@fasaxc fasaxc force-pushed the typha-discov-reload branch 2 times, most recently from 48ec750 to 8964e13 Compare January 17, 2023 13:36
@@ -21,53 +21,128 @@ import (
)

type Interface interface {
NodesClient
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split out these smaller interfaces to make mocking easier. Then I refactored the code and I didn't need to mock the NodeClient any more. Nevertheless, it seemed like a good idea to make mocking easier in future...

Rework Typha discovery:
- Extract out a Discoverer object to keep track of state.
- Replace loop over slice of Typhas with an object that
  handles refreshing the list of Typhas.
@fasaxc fasaxc marked this pull request as ready for review January 17, 2023 14:35
@fasaxc fasaxc requested a review from a team as a code owner January 17, 2023 14:35
Copy link
Member

@electricjesus electricjesus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! The integration with the existing bootstrap for WireGuard has been done quite nicely. There shouldn't be any concern if the post-filter discovery function ends up calling bootstrap (that will remove WireGuard config) more than once.


// Defensive: set a sanity limit in case NextAddr() keeps finding new Typha addresses.
maxTries := len(s.discoverer.CachedTyphaAddrs()) * 2
if maxTries < 6 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 6 seems to be a reasonable number given we typically see 3 typhas.. I guess we could explain why it's currently 6 in a comment? and maybe making this configurable is going to be overkill, too

@fasaxc fasaxc merged commit 52869ad into projectcalico:master Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants