Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load stats: fix integration test flake #12265

Merged
merged 1 commit into from
Jul 24, 2020
Merged

load stats: fix integration test flake #12265

merged 1 commit into from
Jul 24, 2020

Conversation

mattklein123
Copy link
Member

Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes #11784

Risk Level: Low
Testing: Existing tests, 1000 runs of the integration test
Docs Changes: N/A
Release Notes: N/A

Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes #11784

Signed-off-by: Matt Klein <mklein@lyft.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM (do no harm), but I don't fully grok why this fixes the race. Can you elaborate a little more for posterity?

@mattklein123
Copy link
Member Author

There is a race condition here:

requestLoadStatsResponse({"cluster_0"});
initiateClientConnection();
ASSERT_TRUE(waitForLoadStatsRequest({localityStats("winter", 0, 0, 1, 1)}));

requestLoadStatsResponse returns when "load_reporter.requests" is incremented. After it's incremented, the host stats are cleared (in particular rq_total). Thus, it's possible to miss the rq_total increment if the client connection happens after the stat increment but before the clear.

Sorry I should have made this more clear. This took me a while to figure out.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@mattklein123 mattklein123 merged commit 3cd9cfb into master Jul 24, 2020
@mattklein123 mattklein123 deleted the load_stats_fix branch July 24, 2020 04:03
lambdai pushed a commit to lambdai/envoy-dai that referenced this pull request Jul 24, 2020
Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes envoyproxy#11784

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
istio-testing pushed a commit to istio/envoy that referenced this pull request Jul 25, 2020
Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes envoyproxy#11784

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>

Co-authored-by: Matt Klein <mklein@lyft.com>
KBaichoo pushed a commit to KBaichoo/envoy that referenced this pull request Jul 30, 2020
Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes envoyproxy#11784

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
chaoqin-li1123 pushed a commit to chaoqin-li1123/envoy that referenced this pull request Aug 7, 2020
Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes envoyproxy#11784

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LoadStatsIntegrationTest flake
3 participants