-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ext_proc flaky streaming integration test #30253
Merged
phlax
merged 6 commits into
envoyproxy:main
from
yanjunxiang-google:envoy_streamed_test_convert
Oct 18, 2023
Merged
Changes from 4 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f21934d
Moving ext_proc STREAMED mode small chunks integraton test to unit test
yanjunxiang-google db6e7aa
Merge branch 'main' of https://github.com/envoyproxy/envoy into envoy…
yanjunxiang-google 9fa0228
Fix flaky ext_proc streaming integration test
yanjunxiang-google 94a3878
Merge branch 'main' of https://github.com/envoyproxy/envoy into envoy…
yanjunxiang-google 43c35f9
deflake the test failure due to autonomous_upstream d'tor ASSERT crash
yanjunxiang-google ecb1eca
tear down the test server in each iteration to avoid random gRPC cha…
yanjunxiang-google File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still seems likely to flake when overloaded. Is there a way to run these tests with mock/fake time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's doable. I am thinking to convert some of these flaky tests in streaming_integration_test.cc into unit tests and put them in filter_test.cc. These will be done in follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use simulated time in integration tests, so IMO this is actionable in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also adding a unit test doesn't really address Harvey's comment that this test would be likely to flake :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting failure_mode_allow to true as below will be able to get the tests passing even timeout or gRPC channel random close happens:
The request will be forwarded as if the ext_proc filter does not exist in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK what impact that setting has on the realism of the test. At first glance it makes me wonder if the test just isn't really checking that much.
The purpose of simulated time tests is to enable robust predictable testing of code with timeouts, including in integration tests. You can force a timeout to occur,, or not occur, based on explicit control of time from the test fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments! There are other ext_proc tests to test timeout behavior, like here and other tests around it:
https://github.com/envoyproxy/envoy/blob/5c9cc8460c582f1a895dbffd974e248f0b2911bc/test/extensions/filters/http/ext_proc/ext_proc_integration_test.cc#L1690C56-L1690C56
This streaming_integration_test.cc is not to test timeout scenario. It's to test the ext_proc server mutation behaviors when the ext_proc filter is configured with STREAMED or BUFFERED_PARTIAL mode.
The original test cases are written with the assumption that timeout won't happen and gRPC channel will not close. But in the tests, both of them may happen, which leads to flakiness of the test. So, extending the timeout value from 200ms to 2s will reduce the timeout probability. Set failure_mode_allow to true will catch up if either errors happens, the tests can still pass, like client still receive 200 instead of 500, et.al.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on how flaky this file still is after the proposed change, we can consider convering these streamed integration tests to unit tests using mocks, similar to what we have don in this PR: #29940.