-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter unknown flaky tests when filtering known intermittents #29370
Conversation
@bors-servo try |
Filter unknown flaky tests when filtering known intermittents There are two kinds of flaky/intermittent tests in Servo. The traditional kind is the test that fails on the CI, but has an associated bug indicating that the test is an intermittent failure. Many of these tests have completely unstable results, for instance those where an unpredictable set of subtests fail. It's impossible to generate stable results for these, so we have traditionally simply discard these unexpected results. Another kind of intermittent test is one that will produce an expected result when rerun (ie will flake). Some of these are also labeled with bugs, while some are not. In some cases, there is flakiness in some core Servo functionality that can lead to *any* test flaking, such as a race condition that can lead to an early screenshot for reftests. When these kinds of tests do not have associated bugs, they cause the CI to fail. In this case, it is impossible to label these tests as intermittent because it can literally be any test. This change, reruns failed tests in order to detect unlabeled tests in the second category. Instead of blocking the CI when the second run leads to expected results, the CI will now pass, but the flake will be reported to the new flakiness dashboard. This prevents unrelated flakes from slowing down the merge queue. <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes do not require tests because they are a change for CI only. <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
💔 Test failed - checks-github |
48193a2
to
a84a162
Compare
@bors-servo try |
Filter unknown flaky tests when filtering known intermittents There are two kinds of flaky/intermittent tests in Servo. The traditional kind is the test that fails on the CI, but has an associated bug indicating that the test is an intermittent failure. Many of these tests have completely unstable results, for instance those where an unpredictable set of subtests fail. It's impossible to generate stable results for these, so we have traditionally simply discard these unexpected results. Another kind of intermittent test is one that will produce an expected result when rerun (ie will flake). Some of these are also labeled with bugs, while some are not. In some cases, there is flakiness in some core Servo functionality that can lead to *any* test flaking, such as a race condition that can lead to an early screenshot for reftests. When these kinds of tests do not have associated bugs, they cause the CI to fail. In this case, it is impossible to label these tests as intermittent because it can literally be any test. This change, reruns failed tests in order to detect unlabeled tests in the second category. Instead of blocking the CI when the second run leads to expected results, the CI will now pass, but the flake will be reported to the new flakiness dashboard. This prevents unrelated flakes from slowing down the merge queue. <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes do not require tests because they are a change for CI only. <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
Results from try job (#4195706688): Flaky unexpected result (15)
Stable unexpected results that are known to be intermittent (30)
Stable unexpected results (1)
|
💔 Test failed - checks-github |
@bors-servo try |
@bors-servo retry |
Filter unknown flaky tests when filtering known intermittents There are two kinds of flaky/intermittent tests in Servo. The traditional kind is the test that fails on the CI, but has an associated bug indicating that the test is an intermittent failure. Many of these tests have completely unstable results, for instance those where an unpredictable set of subtests fail. It's impossible to generate stable results for these, so we have traditionally simply discard these unexpected results. Another kind of intermittent test is one that will produce an expected result when rerun (ie will flake). Some of these are also labeled with bugs, while some are not. In some cases, there is flakiness in some core Servo functionality that can lead to *any* test flaking, such as a race condition that can lead to an early screenshot for reftests. When these kinds of tests do not have associated bugs, they cause the CI to fail. In this case, it is impossible to label these tests as intermittent because it can literally be any test. This change, reruns failed tests in order to detect unlabeled tests in the second category. Instead of blocking the CI when the second run leads to expected results, the CI will now pass, but the flake will be reported to the new flakiness dashboard. This prevents unrelated flakes from slowing down the merge queue. <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes do not require tests because they are a change for CI only. <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
I'm so excited for this change! |
Results from try job (#4200782460): Flaky unexpected result (16)
Stable unexpected results that are known to be intermittent (23)
Stable unexpected results (1)
|
💔 Test failed - checks-github |
I think the stable pass above is a case where the nightly WPT import script saw a flake and that became the expected result. I need to consider how to handle this case when importing WPT tests. |
a84a162
to
93ab35e
Compare
@bors-servo try |
Filter unknown flaky tests when filtering known intermittents There are two kinds of flaky/intermittent tests in Servo. The traditional kind is the test that fails on the CI, but has an associated bug indicating that the test is an intermittent failure. Many of these tests have completely unstable results, for instance those where an unpredictable set of subtests fail. It's impossible to generate stable results for these, so we have traditionally simply discard these unexpected results. Another kind of intermittent test is one that will produce an expected result when rerun (ie will flake). Some of these are also labeled with bugs, while some are not. In some cases, there is flakiness in some core Servo functionality that can lead to *any* test flaking, such as a race condition that can lead to an early screenshot for reftests. When these kinds of tests do not have associated bugs, they cause the CI to fail. In this case, it is impossible to label these tests as intermittent because it can literally be any test. This change, reruns failed tests in order to detect unlabeled tests in the second category. Instead of blocking the CI when the second run leads to expected results, the CI will now pass, but the flake will be reported to the new flakiness dashboard. This prevents unrelated flakes from slowing down the merge queue. <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes do not require tests because they are a change for CI only. <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
Results from try job (#4207421230): Flaky unexpected result (19)
Stable unexpected results that are known to be intermittent (18)
|
☀️ Test successful - checks-github |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, with some minor comments :)
tests/wpt/servowpt.py
Outdated
return not result.flaky and not result.issues | ||
|
||
output = [] | ||
add_result(output, "Flaky unexpected results:", unexpected_results, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this section has a colon, but the other two don’t
93ab35e
to
f9ec77c
Compare
@delan Thanks for the review! I've uploaded a new version which addresses your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! One last thing:
There are two kinds of flaky/intermittent tests in Servo. The traditional kind is the test that fails on the CI, but has an associated bug indicating that the test is an intermittent failure. Many of these tests have completely unstable results, for instance those where an unpredictable set of subtests fail. It's impossible to generate stable results for these, so we have traditionally simply discard these unexpected results. Another kind of intermittent test is one that will produce an expected result when rerun (ie will flake). Some of these are also labeled with bugs, while some are not. In some cases, there is flakiness in some core Servo functionality that can lead to *any* test flaking, such as a race condition that can lead to an early screenshot for reftests. When these kinds of tests do not have associated bugs, they cause the CI to fail. In this case, it is impossible to label these tests as intermittent because it can literally be any test. This change, reruns failed tests in order to detect unlabeled tests in the second category. Instead of blocking the CI when the second run leads to expected results, the CI will now pass, but the flake will be reported to the new flakiness dashboard. This prevents unrelated flakes from slowing down the merge queue.
f9ec77c
to
5e30ce8
Compare
@bors-servo r+ |
📌 Commit 5e30ce8 has been approved by |
Filter unknown flaky tests when filtering known intermittents There are two kinds of flaky/intermittent tests in Servo. The traditional kind is the test that fails on the CI, but has an associated bug indicating that the test is an intermittent failure. Many of these tests have completely unstable results, for instance those where an unpredictable set of subtests fail. It's impossible to generate stable results for these, so we have traditionally simply discard these unexpected results. Another kind of intermittent test is one that will produce an expected result when rerun (ie will flake). Some of these are also labeled with bugs, while some are not. In some cases, there is flakiness in some core Servo functionality that can lead to *any* test flaking, such as a race condition that can lead to an early screenshot for reftests. When these kinds of tests do not have associated bugs, they cause the CI to fail. In this case, it is impossible to label these tests as intermittent because it can literally be any test. This change, reruns failed tests in order to detect unlabeled tests in the second category. Instead of blocking the CI when the second run leads to expected results, the CI will now pass, but the flake will be reported to the new flakiness dashboard. This prevents unrelated flakes from slowing down the merge queue. <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes do not require tests because they are a change for CI only. <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
💔 Test failed - checks-github |
@bors-servo retry |
Results from try job (#4235473176): Flaky unexpected result (17)
Stable unexpected results that are known to be intermittent (14)
|
☀️ Test successful - checks-github |
There are two kinds of flaky/intermittent tests in Servo. The traditional kind is the test that fails on the CI, but has an associated bug indicating that the test is an intermittent failure. Many of these tests have completely unstable results, for instance those where an unpredictable set of subtests fail. It's impossible to generate stable results for these, so we have traditionally simply discard these unexpected results.
Another kind of intermittent test is one that will produce an expected result when rerun (ie will flake). Some of these are also labeled with bugs, while some are not. In some cases, there is flakiness in some core Servo functionality that can lead to any test flaking, such as a race condition that can lead to an early screenshot for reftests. When these kinds of tests do not have associated bugs, they cause the CI to fail. In this case, it is impossible to label these tests as intermittent because it can literally be any test.
This change, reruns failed tests in order to detect unlabeled tests in the second category. Instead of blocking the CI when the second run leads to expected results, the CI will now pass, but the flake will be reported to the new flakiness dashboard. This prevents unrelated flakes from slowing down the merge queue.
./mach build -d
does not report any errors./mach test-tidy
does not report any errors