-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-26601: Re-enable test/extended/router/http2 tests on AWS #28515
base: master
Are you sure you want to change the base?
Conversation
@frobware: This pull request references Jira Issue OCPBUGS-26601, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@frobware: This pull request references Jira Issue OCPBUGS-26601, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
See #26089 /approve |
/hold Revision 09eb1f0 was retested 3 times: holding |
/retest |
It's been a long time since we disabled these tests on AWS. I have been running the http2 tests on AWS all week and I haven't run into the issue once. Let's re-enable the http2 x AWS tests for better coverage. Original bug: https://bugzilla.redhat.com/show_bug.cgi?id=1912413
09eb1f0
to
a464d89
Compare
@frobware: This pull request references Jira Issue OCPBUGS-26601, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/test e2e-aws-ovn-upi |
@lihongan: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This probably requires openshift/cloud-provider-aws#57 |
This commit tackles an intermittent issue found in AWS environments during the router's h2spec conformance tests, specifically relating to slower hostname resolution within the cluster that often results in test timeouts. This slower resolution time in AWS, as compared to Azure or GCP, suggests possible environmental differences in DNS handling. The solution involves resolving the hostname on the test host before initiating the h2spec tests in the cluster. Implementing this change leads to a significant improvement in the speed and consistency of test executions. With this method, the h2spec test now completes in ~80 seconds, markedly faster than the previous 376 seconds (i.e., just over the 5-minute mark). This observation, particularly when considering that using an alternative DNS resolver like @1.1.1.1 on the node yields nearly instant results for the same hostname, suggests distinctive DNS resolution characteristics within AWS clusters. It doesn't definitively attribute the issue to negative caching. To adapt to this variability, I have adjusted the polling interval to 2 seconds and the overall test timeout to 10 minutes. These changes aim to improve test success rates across diverse cloud environments. With these changes I consistently see the h2spec test on AWS completing in ~85 seconds. Ran 1 of 1 Specs in 77.519 seconds Ran 1 of 1 Specs in 90.507 seconds Ran 1 of 1 Specs in 80.268 seconds and without the change it appears to be very consistently 376 seconds.
a464d89
to
00ea63b
Compare
/retest-required |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: candita, frobware The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
/hold I think the consensus was that this PR still requires openshift/cloud-provider-aws#57. |
Slack discussion: https://redhat-internal.slack.com/archives/CBWMXQJKD/p1704908895477469. |
57^ has merged. /test all |
/retest |
/jira refresh The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity. |
@openshift-bot: This pull request references Jira Issue OCPBUGS-26601, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity. |
@openshift-bot: This pull request references Jira Issue OCPBUGS-26601, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/test all |
@frobware: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Job Failure Risk Analysis for sha: 00ea63b
|
It's been a long time since we disabled these tests on AWS. I have been running the http2 tests on AWS all week and I haven't run into the issue once. Let's re-enable the http2 x AWS tests for better coverage.
This PR also addresses an intermittent issue encountered in AWS environments during the router's h2spec conformance tests. The challenge involved slower hostname resolution within the cluster, resulting in frequent timeouts. Notably, AWS exhibited slower resolution times compared to Azure or GCP, hinting at potential differences in DNS handling.
The solution implemented in this PR focuses on resolving the hostname on the test host before initiating the h2spec tests within the cluster. This adjustment has resulted in a remarkable improvement in test execution speed, with the h2spec test now completing in approximately 85 seconds, a significant reduction from the previous average of over 376 seconds (just above the 5-minute mark).
While the difference in resolution times suggests environmental variations, particularly in AWS, it's important to note that this PR does not definitively attribute the issue to negative caching. Instead, it prioritises the substantial improvement achieved through the new approach. As a precaution, the polling interval and overall test timeout have been adjusted to 2 seconds and 10 minutes, respectively, to enhance test success rates across diverse cloud environments.
This PR represents a practical win in terms of improved test efficiency, while acknowledging potential environmental differences for further investigation, if needed, in the future.
Original bug: https://bugzilla.redhat.com/show_bug.cgi?id=1912413