Better test disabling system #118133

zou3519 · 2024-01-23T22:16:26Z

Motivation

We just added a lot of failing tests (10000+) to CI. I put in a lot of work to actually skip the failing tests, so that CI is actually green. We anticipate doing this more (adding a lot of failing tests to CI) in the future:

Adding Dynamo testing for python 3.12, which will likely add even more failing tests
CUDA Dynamo testing (we test CPU-only right now) which will add more failing tests

It would be nice if we had a way to mass-disable tests per configuration (e.g. dynamo 3.11 vs dynamo 3.8), perhaps with a single issue instead of one issue per test?

cc @ZainRizvi @kit1980 @huydhn @clee2000 @pytorch/pytorch-dev-infra

zou3519 · 2024-01-29T18:28:02Z

Btw, this is probably the number one pain point for working with torch.compile tests today:

We want to add even more failing tests. Manually adding unittest.skip / unittest.expectedFailure is tedious and takes weeks
People who fix the failing tests need to undo the unittest.skip / unittest.expectedFailure. A single change may fix ~100 failing tests: a dev needs to manually remove unittest.expectedFailure, and may not catch all of the unittest.skip they need to remove.

A better test disabling system would help by:

allow us to mass-disable failing tests
mass removing disables (if the tests consistently pass).

malfet · 2024-01-29T19:27:14Z

FYI, under the hood test disabling system via issues reads https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json, but we probably want to extend it to support wildcards of sorts, as adding 10K entires there is bad

zou3519 · 2024-01-31T21:10:15Z

Proposal:

More fine-grained test disabling: we allow disabling tests at a per-configuration granularity. For example, tests can be disabled for dynamo-py311 and dynamo-py38 separately.
No disabling wildcards. This seems like it could get messy in a compositional sense. The easiest thing is to add 10k+ entries into a json, csv, or database somewhere. If this is checked into a file somewhere in version control, we can do various things to prevent merge conflicts (e.g. add 3 spaces above and below each entry).
We introduce some sort of "new" state. When adding new test configurations, there is some way to add them to the "new" state. "new" tests don't actually report signal in CI; instead, CI will auto-disable any failing tests and then move the test configuration from the "new" state to the regular state. Then it should probably report this.
If a Disabled test is consistently passing, CI will automatically remove the test disable. (it appears this already happens today)

Example use case: enabling Dynamo for Python 3.12. Using the above scheme, we would create the Dynamo-py312 test configuration and then mark it as "new". CI would disable any failing tests and then move it to stable. After this, if devs fix failing tests, then CI will automatically re-enable the test. The task of "enabling Dynamo for Python 3.12" can be considered done when a sufficient number of tests are no longer disabled.

huydhn · 2024-01-31T21:58:10Z

1. More fine-grained test disabling: we allow disabling tests at a per-configuration granularity. For example, tests can be disabled for dynamo-py311 and dynamo-py38 separately.

I completely agree on this point, I feel bad to disable a test on all linux platforms. A finer granularity is needed.

2. No disabling wildcards. This seems like it could get messy in a compositional sense. The easiest thing is to add 10k+ entries into a json, csv, or database somewhere. If this is checked into a file somewhere in version control, we can do various things to prevent merge conflicts (e.g. add 3 spaces above and below each entry).

The file is already on GitHub https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json, so we have something to start with here. Sometimes, it's desirable to disable all the variants of a test though, i.e. different dtypes, which could make the total number of entries more manageable.

3. We introduce some sort of "new" state. When adding new test configurations, there is some way to add them to the "new" state. "new" tests don't actually report signal in CI; instead, CI will auto-disable any failing tests and then move the test configuration from the "new" state to the regular state. Then it should probably report this.

This is an interesting idea and it reminds me of the test quality concept. A new test could be classify as having unknown quality. A failed new test has bad quality and is disabled. Although I think we need more design discussion here because only accepting good quality tests upfront is what we have been doing till now on CI, i.e. if a new test breaks trunk, the commit would be reverted instead of accepting with the test disabled. However, I get your point here when it comes to dynamo. I believe there are already talks about building a burnout chart for the number of tests disabled on dynamo and the target to reduce them over time. cc @clee2000 as this seems relevant to the test dashboard that you re building.

4. If a Disabled test is consistently passing, CI will automatically remove the test disable. (it appears this already happens today)

Yup, this is an existing feature, disabled tests are run on CI daily and they will be re-enable if they are passing without any failures after 2 weeks.

zou3519 · 2024-02-01T14:04:59Z

Some offline nuggets of wisdom from @malfet:
3) We don't want a "new" or "unknown quality" test for all new tests for PyTorch. Instead, this should be a one-time opt-in thing that lets us populate the list of test skips

zou3519 added the module: devx Related to PyTorch contribution experience (HUD, pytorchbot) label Jan 23, 2024

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 29, 2024

malfet added the needs design label Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better test disabling system #118133

Better test disabling system #118133

zou3519 commented Jan 23, 2024 •

edited by pytorch-bot bot

zou3519 commented Jan 29, 2024

malfet commented Jan 29, 2024

zou3519 commented Jan 31, 2024

huydhn commented Jan 31, 2024 •

edited

zou3519 commented Feb 1, 2024

Better test disabling system #118133

Better test disabling system #118133

Comments

zou3519 commented Jan 23, 2024 • edited by pytorch-bot bot

Motivation

zou3519 commented Jan 29, 2024

malfet commented Jan 29, 2024

zou3519 commented Jan 31, 2024

huydhn commented Jan 31, 2024 • edited

zou3519 commented Feb 1, 2024

zou3519 commented Jan 23, 2024 •

edited by pytorch-bot bot

huydhn commented Jan 31, 2024 •

edited