Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ext_proc: gcc test flakes: IpVersionsClientTypeDeferredProcessing/ExtProcIntegrationTest.GetAndSetTrailersIncorrectlyOnResponse/IPv6_GoogleGrpc_NoDeferredProcessing #33017

Closed
jmarantz opened this issue Mar 20, 2024 · 8 comments
Assignees
Labels
area/test flakes stale stalebot believes this issue/PR has not been touched recently

Comments

@jmarantz
Copy link
Contributor

This flaked for me in a CI: https://dev.azure.com/cncf/envoy/_build/results?buildId=165561&view=logs&j=8bf29878-a4cc-50f7-4e84-2255e6fd4065&t=150a3762-112b-5713-0d28-af1eb6c2fbf4

2024-03-20T19:10:36.7571536Z [ RUN      ] IpVersionsClientTypeDeferredProcessing/ExtProcIntegrationTest.GetAndSetTrailersIncorrectlyOnResponse/IPv6_GoogleGrpc_NoDeferredProcessing
2024-03-20T19:10:36.7572847Z test/extensions/filters/http/ext_proc/ext_proc_integration_test.cc:252: Failure
2024-03-20T19:10:36.7573591Z Value of: response.waitForEndStream()
2024-03-20T19:10:36.7574178Z   Actual: false (Timed out waiting for end stream
2024-03-20T19:10:36.7574832Z )
2024-03-20T19:10:36.7575165Z Expected: true
2024-03-20T19:10:36.7575524Z Stack trace:
2024-03-20T19:10:36.7576086Z   0x5594d9350370: Envoy::ExtProcIntegrationTest::verifyDownstreamResponse()
2024-03-20T19:10:36.7577124Z   0x5594d92c450e: Envoy::ExtProcIntegrationTest_GetAndSetTrailersIncorrectlyOnResponse_Test::TestBody()
2024-03-20T19:10:36.7578166Z   0x5594dd083bed: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
2024-03-20T19:10:36.7579047Z   0x5594dd07e89d: testing::internal::HandleExceptionsInMethodIfSupported<>()
2024-03-20T19:10:36.7579736Z   0x5594dd065128: testing::Test::Run()
2024-03-20T19:10:36.7580250Z   0x5594dd065b4b: testing::TestInfo::Run()
2024-03-20T19:10:36.7580762Z ... Google Test internal frames ...
2024-03-20T19:10:36.7581046Z 
2024-03-20T19:10:36.7582221Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_defer_processing_backedup_streams to: true
2024-03-20T19:10:36.7583968Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_no_extension_lookup_by_name to: true
2024-03-20T19:10:36.7585663Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_quic_always_support_server_preferred_address to: true
2024-03-20T19:10:36.7613937Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_immediate_response_use_filter_mutation_rule to: true
2024-03-20T19:10:36.7616082Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_send_header_raw_value to: true
2024-03-20T19:10:36.7617753Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_runtime_initialized to: false
2024-03-20T19:10:36.7619474Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_defer_processing_backedup_streams to: true
2024-03-20T19:10:36.7621225Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_no_extension_lookup_by_name to: true
2024-03-20T19:10:36.7622934Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_quic_always_support_server_preferred_address to: true
2024-03-20T19:10:36.7624732Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_immediate_response_use_filter_mutation_rule to: true
2024-03-20T19:10:36.7626586Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_send_header_raw_value to: true
2024-03-20T19:10:36.7628342Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_runtime_initialized to: false
2024-03-20T19:10:36.7630916Z [  FAILED  ] IpVersionsClientTypeDeferredProcessing/ExtProcIntegrationTest.GetAndSetTrailersIncorrectlyOnResponse/IPv6_GoogleGrpc_NoDeferredProcessing, where GetParam() = ((4-byte object <01-00 00-00>, 4-byte object <01-00 00-00>), false) (10547 ms)
@yanjunxiang-google
Copy link
Contributor

Response timeouts. This test is flaky. Retest should be able to get the CI pass.

Overall, ext_proc_integration_test.cc is growing too big. We probably should start to convert some of the test cases to unit test and move them to filter_test.cc. @tyxia @stevenzzzz

@jmarantz
Copy link
Contributor Author

We should not consider this state to be acceptable. We should use simulated time in tests, where possible, to make tests not be brittle to time. If that is impossible we should allow much more generous timeouts.

@tyxia
Copy link
Member

tyxia commented Mar 25, 2024

I am thinking we probably can first try to use timeout value larger than TestUtility::DefaultTimeout for the trailer case, to see if it helps to de-flake.

Regarding converting to unit test. I personally like integration test (and over unit test :)), as it tests the code in a closer-to-production and more E2E way. If we have concern with its size, we probably can refactor it into multiple integration test.

@yanjunxiang-google
Copy link
Contributor

yanjunxiang-google commented Apr 18, 2024

ext_proc has side stream process which adds round trip latency for a request. This probably is the reason why these ext_proc tests are frequently flaky due to timeout. As a first remedy, we can certainly increase the timeout to reduce the probability of the flakiness

Long term, for tests only need to verify ext_proc filter behavior/state machine, actually filter_test.cc: one example,

TEST_F(HttpFilterTest, SimplestPost) {
can test most of them. Tests in this file using simulated timer. We very rarely see flakiness in this test.

@yanjunxiang-google
Copy link
Contributor

@tyxia yes, integration tests is end-to-end and close to production. The draw back is it has to create an Envoy instance, which is slow and expensive for tests not necessarily need it.

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label May 18, 2024
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test flakes stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

4 participants