Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intermittent issue on OciMetricsSupportTest #6151

Conversation

klustria
Copy link
Member

@klustria klustria commented Feb 10, 2023

Changes include the following:

  1. In OciMetricsSupportTest.testEndpoint, extend the amount of validation time to 3 seconds for checking that the metric endpoint has been restored. Intermittently, a race condition exist where the validation happens before the endpoint is restored.
  2. Modify all countdownLatch to be locally defined in the test methods rather than being a static variable, which is causing chain reaction failure to other tests if a previous test fails because they share the same countdownLatch.
  3. Always check that countDownLatch.await() is verified to have completed or otherwise, assert a failure.
  4. Remove the use of fixed port when starting a WebServer.
  5. Reset postingEndPoint to its original value before each test, so @RepeatedTest can be used in the future for debugging purposes.
  6. Apply Helidon Code Style on both OciMetricsSupportTest and OciMetricsCdiExtensionTest. This would include making the tests's class and methods package local rather than public, rearranging variable fields order based on whether they are static, final, etc.
  7. Note that OciMetricsCdiExtensionTest only involves Code Style change and removal of delay method which is never used, so logic in that test class will be the same as before. Only OciMetricsSupportTest contain significant change to resolve the issue reported.

RCA is described here: helidon-io#6112 (comment)

Changes include the following:
1. In OciMetricsSupportTest.testEndpoint, extend the amount of validation time to 3 seconds for checking that the metric endpoint has been restored. Intermittently, a race condition exist where the validation happens before the endpoint is restored.
2. Modify all countdownLatch to be locally defined in the test methods rather than being a static variable, which is causing chain reaction failure to other tests if a previous test fails because they share the same countdownLatch.
3. Always check that countDownLatch.await() is verified to have completed or otherwise, assert a failure.
4. Remove the use of fixed port when starting a WebServer.
5. Reset postingEndPoint to its original value before each test, so @RepeatedTest can be used in the future for debugging purposes.
6. Apply Helidon Code Style on both OciMetricsSupportTest and OciMetricsCdiExtensionTest. This would include making the tests's class and methods package local  rather than public, rearranging variable fields order based on whether they are static, final, etc.
7. Note that OciMetricsCdiExtensionTest only involves Code Style change and removal of delay method which is never used, so logic in that test class will be the same as before. Only OciMetricsSupportTest contain significant change to resolve the issue reported.
@klustria klustria added the 4.x Version 4.x label Feb 10, 2023
@klustria klustria self-assigned this Feb 10, 2023
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Feb 10, 2023
@klustria
Copy link
Member Author

klustria commented Feb 10, 2023

Validate / build (ubuntu-20.04) (pull_request) failed with (which is not related to the change):

Warning:  Tests run: 6, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.814 s - in io.helidon.nima.tests.integration.http2.webserver.Http2ServerTest
[28988](https://github.com/helidon-io/helidon/actions/runs/4147930397/jobs/7175380333#step:4:28989)
[INFO] 
[28989](https://github.com/helidon-io/helidon/actions/runs/4147930397/jobs/7175380333#step:4:28990)
[INFO] Results:
[28990](https://github.com/helidon-io/helidon/actions/runs/4147930397/jobs/7175380333#step:4:28991)
[INFO] 
[28991](https://github.com/helidon-io/helidon/actions/runs/4147930397/jobs/7175380333#step:4:28992)
Error:  Failures: 
[28992](https://github.com/helidon-io/helidon/actions/runs/4147930397/jobs/7175380333#step:4:28993)
Error:    Http2WebServerStopIdleTest.stopWhenIdleExpectTimelyStopHttp2:82 
[28993](https://github.com/helidon-io/helidon/actions/runs/4147930397/jobs/7175380333#step:4:28994)
Expected: is a value less than <500>
[28994](https://github.com/helidon-io/helidon/actions/runs/4147930397/jobs/7175380333#step:4:28995)
     but: <501> was greater than <500>
[28995](https://github.com/helidon-io/helidon/actions/runs/4147930397/jobs/7175380333#step:4:28996)
[INFO] 
[28996](https://github.com/helidon-io/helidon/actions/runs/4147930397/jobs/7175380333#step:4:28997)
Error:  Tests run: 22, Failures: 1, Errors: 0, Skipped: 1

@klustria
Copy link
Member Author

Validate / examples (macos-latest) (pull_request)on the other hand was cancelled. Will restart both validation.

@klustria klustria merged commit bcf0022 into helidon-io:main Feb 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.x Version 4.x OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants