Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return RoundTripper even if we fail to vet masquerades by the given t… #38

Merged
merged 2 commits into from May 31, 2023

Conversation

oxtoacart
Copy link
Contributor

@oxtoacart oxtoacart commented May 31, 2023

…imeout

For getlantern/engineering#180

The short version is that if the network is down when we bootstrap fronted, it will never work even after the network recovers.

What's happening is this:

  1. When starting Lantern with no working Internet connection, fronted quickly blows through all 2000 configured masquerades and fails to vet any
  2. It then fails to return a RoundTripper
  3. Subsequent calls to obtain a RoundTripper time out because all masquerades failed to vet and so we never get a vetted masquerade
  4. Even if connectivity is restored, we don't ever get any vetted masquerades, to fronted remains broken

The solution is to not bother waiting for vetted masquerades before returning the RoundTripper. When someone then attempts to dial with it, if it has successfully vetted masquerades, those will be preferred, but if it hasn't yet vetted any, we'll dial with a random candidate, which at least has a hope of succeeding once connectivity is restored.

I tested this by following the steps in getlantern/engineering#180 and was able to verify that the fix worked. It also seems to just generally speed up startup if we're starting with decent network connectivity.

Note that even though the logged ticket is for Android, this affects Desktop and iOS as well.

@oxtoacart oxtoacart force-pushed the ox/issue180 branch 2 times, most recently from 6f12efc to aaedffc Compare May 31, 2023 03:05
Copy link
Contributor

@hwh33 hwh33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, I haven't had much exposure to this part of the code. That said, I think this looks good. I appreciate the thorough explanation of the problem and solution!

@myleshorton
Copy link
Contributor

This sounds great @oxtoacart! I had noticed that test that expects a certain number of attempts or something -- I wonder if we can just look for more than 10 or something.

@oxtoacart
Copy link
Contributor Author

I'm realizing that we have a race condition due to dial temporarily consuming masquerades from our channels. We don't notice it much in production because we have so many masquerades, but in a test with just one masquerade like TestCustomValidators, it shows up. Will work on that, stay tuned.

@oxtoacart
Copy link
Contributor Author

Okay, I originally didn't want to get this invasive, but the whole thing we were doing with different channels for candidates, vetted and cached masquerades was pretty difficult to follow, and it caused the aforementioned issue of tests failing because while one goroutine is trying to use a masquerade, another goroutine can't.

The main changes here are:

  1. Use a single sorted list for all masquerades, where masquerades that were recently successfully dialed sort first
  2. When dialing, continuously loop through that list until either we dial successfully, or our context times out (note that the context just comes from the http Request and is controlled by setting the Timeout property of the containing http.Client
  3. Simplify the caching logic to simply save the top N masquerades from that sorted list based on our configured max cache size

@oxtoacart oxtoacart merged commit 13856a1 into main May 31, 2023
4 checks passed
@oxtoacart oxtoacart deleted the ox/issue180 branch May 31, 2023 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants