Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(orc8r): Stabilize cloud tests #14083

Merged
merged 10 commits into from
Oct 20, 2022

Conversation

MoritzThomasHuebner
Copy link
Contributor

@MoritzThomasHuebner MoritzThomasHuebner commented Oct 5, 2022

Closes #13998

Signed-off-by: Moritz Huebner moritz.huebner@tngtech.com

Summary

Upon inspecting the recent cloud-workflow failures, we found the following flaky tests:

  • TestSqlConfiguratorStorage_LoadEntities: 2 failures
  • TestGetSubscriberState: 2 failures
  • TestSingletonRunSuccess: 6 failures
  • TestSingletonRunFail: 4 failures
  • TestSyncSubscribers: 1 failure
  • TestRunBrokenIndexer: 1 failures

This PR focuses on TestSingletonRunSuccess, TestSingletonRunFail, and TestRunBrokenIndexer.

These changes focus on test isolation:

  • Added manual calls to indexer.DeregisterAllForTest(t) at the end of the tests. This is not strictly necessary to make the tests green, but it can cause issues with dangling indexers if the order of the tests is ever reversed.

TestSingletonRunFail:

  • Moved the creation of the reindexer thread after the registration of the indexers. This way, it is impossible that the reindexer is triggered when only one or two out of the three indexers are registered.

TestSingletonRunSuccess:

  • The issue is that the TestHookReindexDone is called even if there is an error during reindexing. Unexpected errors can be triggered in the remote indexer getIndexerClient.
  • Added a TestHookReindexFailure, that we use to keep track of reindexing job errors.
  • Safely pull all elements out of the channel ch using recvChSafe. This function keeps track of failures and will pull the requisite number of elements from the channel.

TestRunBrokenIndexer:

  • The issue is that in rare cases the reindexer can run twice before the indexer is registered.
  • Moved the creation of the reindexer thread after the registration of the indexer, similar to what we do in TestSingletonRunFail.
  • Removed one recvCh call as this appears to be unnecessary now,

Test Plan

TestSingletonRunSuccess:

  • Reproduced error locally by manually adding an error that would trigger a fixed number of times in the remote indexer getIndexerClient.
  • The fix in this PR fixes the error that occurs when we manually insert an error.
  • It was not possible to recreate this issue on the CI, since copy/pasting the test did not appear to increase the likelihood compared to a single run.

TestSingletonRunFail

  • Reproduced error by re-running many times on the CI initially.
  • After adding the change, the test is green even with many re-runs: workflow.

TestRunBrokenIndexer:

  • Error output can be reproduced locally by deleting one of the recvCh calls in the function.
  • After adding the change, the test is green locally.
  • Unfortunately, the test can not be re-run on the github runner many times in a single workflow as it fails for a different reason when running the second time.

Additional Information

  • This change is backwards-breaking

@pull-request-size pull-request-size bot added the size/XXL Denotes a Pull Request that changes 1000+ lines. label Oct 5, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2022

Thanks for opening a PR! 💯

A couple initial guidelines

Howto

  • Reviews. The "Reviewers" listed for this PR are the Magma maintainers who will shepherd it.
  • Checks. All required CI checks must pass before merge.
  • Merge. Once approved and passing CI checks, use the ready2merge label to indicate the maintainers can merge your PR.

More info

Please take a moment to read through the Magma project's

If this is your first Magma PR, also consider reading

@github-actions github-actions bot added the component: ci All updates on CI (Jenkins/CircleCi/Github Action) label Oct 5, 2022
@MoritzThomasHuebner MoritzThomasHuebner changed the title fix(orc8r): Delete irrelevant cloud tests and workflows fix(orc8r): Stabilize cloud tests Oct 5, 2022
@MoritzThomasHuebner MoritzThomasHuebner force-pushed the stabilize_cloud_tests branch 3 times, most recently from 00f5b97 to b5ad548 Compare October 5, 2022 01:30
@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2022

cloud-workflow

1 135 tests   1 135 ✔️  2m 25s ⏱️
   365 suites         0 💤
       7 files           0

Results for commit 55750d9.

♻️ This comment has been updated with latest results.

@pull-request-size pull-request-size bot added size/M Denotes a PR that changes 30-99 lines. and removed size/XXL Denotes a Pull Request that changes 1000+ lines. labels Oct 5, 2022
@github-actions github-actions bot added component: orc8r Orchestrator-related issue and removed component: ci All updates on CI (Jenkins/CircleCi/Github Action) labels Oct 5, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2022

feg-workflow

    2 files  203 suites   40s ⏱️
374 tests 374 ✔️ 0 💤 0
388 runs  388 ✔️ 0 💤 0

Results for commit 55750d9.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2022

dp-workflow

14 tests   14 ✔️  2m 16s ⏱️
  1 suites    0 💤
  1 files      0

Results for commit 55750d9.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2022

agw-workflow

615 tests   611 ✔️  3m 48s ⏱️
    2 suites      4 💤
    2 files        0

Results for commit 55750d9.

♻️ This comment has been updated with latest results.

@MoritzThomasHuebner MoritzThomasHuebner force-pushed the stabilize_cloud_tests branch 2 times, most recently from 3103fc3 to e8a27b1 Compare October 6, 2022 04:50
@pull-request-size pull-request-size bot added size/L Denotes a Pull Request that changes 100-499 lines. size/M Denotes a PR that changes 30-99 lines. and removed size/M Denotes a PR that changes 30-99 lines. size/L Denotes a Pull Request that changes 100-499 lines. labels Oct 6, 2022
@voisey voisey linked an issue Oct 6, 2022 that may be closed by this pull request
2 tasks
@pull-request-size pull-request-size bot added size/L Denotes a Pull Request that changes 100-499 lines. and removed size/M Denotes a PR that changes 30-99 lines. labels Oct 7, 2022
@MoritzThomasHuebner MoritzThomasHuebner marked this pull request as ready for review October 7, 2022 09:16
@MoritzThomasHuebner MoritzThomasHuebner requested review from a team and maxhbr October 7, 2022 09:16
Copy link
Contributor

@sebathomas sebathomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't exactly make the test easier to understand. ;) I think it's looking good but there's a few changes I'm not sure about. Maybe let's go over it tomorrow in person.

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
Keeps track of connection failures when running tests.

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
- Renamed `recvChSafe` to `recvChAndRetryFailures`.
- Added global check on `reindexConnectionFailsTotal` so it aborts if the connection keeps failing.

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
@sebathomas sebathomas merged commit afbca0f into magma:master Oct 20, 2022
@MoritzThomasHuebner MoritzThomasHuebner deleted the stabilize_cloud_tests branch February 23, 2023 01:21
lucasgonze pushed a commit to lucasgonze/magma that referenced this pull request Feb 29, 2024
* fix(orc8r): Stabilize reindexer singleton tests

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>

* fix(orc8r): Reverted all temporary changes

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>

* fix(orc8r): Added Connection failure test hook

Keeps track of connection failures when running tests.

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>

* fix(orc8r): Added safe checks for reindexing failures

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>

* fix(orc8r): Some refactoring to existing changes

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>

* fix(orc8r): Simplified some changes

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>

* fix(orc8r): Stabilized TestRunBrokenIndexer

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>

* fix(orc8r): Changed += 1 to ++

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>

* fix(orc8r): Improved `TestRunBrokenIndexer` stability

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>

* fix(orc8r): Some changes for the review

- Renamed `recvChSafe` to `recvChAndRetryFailures`.
- Added global check on `reindexConnectionFailsTotal` so it aborts if the connection keeps failing.

Signed-off-by: Moritz Huebner <moritz.huebner@tngtech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: orc8r Orchestrator-related issue size/L Denotes a Pull Request that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stabilize another flaky orc8r test
2 participants