Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better test disabling system #118133

Open
zou3519 opened this issue Jan 23, 2024 · 5 comments
Open

Better test disabling system #118133

zou3519 opened this issue Jan 23, 2024 · 5 comments
Labels
module: devx Related to PyTorch contribution experience (HUD, pytorchbot) needs design triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@zou3519
Copy link
Contributor

zou3519 commented Jan 23, 2024

Motivation

We just added a lot of failing tests (10000+) to CI. I put in a lot of work to actually skip the failing tests, so that CI is actually green. We anticipate doing this more (adding a lot of failing tests to CI) in the future:

  • Adding Dynamo testing for python 3.12, which will likely add even more failing tests
  • CUDA Dynamo testing (we test CPU-only right now) which will add more failing tests

It would be nice if we had a way to mass-disable tests per configuration (e.g. dynamo 3.11 vs dynamo 3.8), perhaps with a single issue instead of one issue per test?

cc @ZainRizvi @kit1980 @huydhn @clee2000 @pytorch/pytorch-dev-infra

@zou3519 zou3519 added the module: devx Related to PyTorch contribution experience (HUD, pytorchbot) label Jan 23, 2024
@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 29, 2024
@zou3519
Copy link
Contributor Author

zou3519 commented Jan 29, 2024

Btw, this is probably the number one pain point for working with torch.compile tests today:

  1. We want to add even more failing tests. Manually adding unittest.skip / unittest.expectedFailure is tedious and takes weeks
  2. People who fix the failing tests need to undo the unittest.skip / unittest.expectedFailure. A single change may fix ~100 failing tests: a dev needs to manually remove unittest.expectedFailure, and may not catch all of the unittest.skip they need to remove.

A better test disabling system would help by:

  1. allow us to mass-disable failing tests
  2. mass removing disables (if the tests consistently pass).

@malfet
Copy link
Contributor

malfet commented Jan 29, 2024

FYI, under the hood test disabling system via issues reads https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json, but we probably want to extend it to support wildcards of sorts, as adding 10K entires there is bad

@zou3519
Copy link
Contributor Author

zou3519 commented Jan 31, 2024

Proposal:

  1. More fine-grained test disabling: we allow disabling tests at a per-configuration granularity. For example, tests can be disabled for dynamo-py311 and dynamo-py38 separately.
  2. No disabling wildcards. This seems like it could get messy in a compositional sense. The easiest thing is to add 10k+ entries into a json, csv, or database somewhere. If this is checked into a file somewhere in version control, we can do various things to prevent merge conflicts (e.g. add 3 spaces above and below each entry).
  3. We introduce some sort of "new" state. When adding new test configurations, there is some way to add them to the "new" state. "new" tests don't actually report signal in CI; instead, CI will auto-disable any failing tests and then move the test configuration from the "new" state to the regular state. Then it should probably report this.
  4. If a Disabled test is consistently passing, CI will automatically remove the test disable. (it appears this already happens today)

Example use case: enabling Dynamo for Python 3.12. Using the above scheme, we would create the Dynamo-py312 test configuration and then mark it as "new". CI would disable any failing tests and then move it to stable. After this, if devs fix failing tests, then CI will automatically re-enable the test. The task of "enabling Dynamo for Python 3.12" can be considered done when a sufficient number of tests are no longer disabled.

@huydhn
Copy link
Contributor

huydhn commented Jan 31, 2024

1. More fine-grained test disabling: we allow disabling tests at a per-configuration granularity. For example, tests can be disabled for dynamo-py311 and dynamo-py38 separately.

I completely agree on this point, I feel bad to disable a test on all linux platforms. A finer granularity is needed.

2. No disabling wildcards. This seems like it could get messy in a compositional sense. The easiest thing is to add 10k+ entries into a json, csv, or database somewhere. If this is checked into a file somewhere in version control, we can do various things to prevent merge conflicts (e.g. add 3 spaces above and below each entry).

The file is already on GitHub https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json, so we have something to start with here. Sometimes, it's desirable to disable all the variants of a test though, i.e. different dtypes, which could make the total number of entries more manageable.

3. We introduce some sort of "new" state. When adding new test configurations, there is some way to add them to the "new" state. "new" tests don't actually report signal in CI; instead, CI will auto-disable any failing tests and then move the test configuration from the "new" state to the regular state. Then it should probably report this.

This is an interesting idea and it reminds me of the test quality concept. A new test could be classify as having unknown quality. A failed new test has bad quality and is disabled. Although I think we need more design discussion here because only accepting good quality tests upfront is what we have been doing till now on CI, i.e. if a new test breaks trunk, the commit would be reverted instead of accepting with the test disabled. However, I get your point here when it comes to dynamo. I believe there are already talks about building a burnout chart for the number of tests disabled on dynamo and the target to reduce them over time. cc @clee2000 as this seems relevant to the test dashboard that you re building.

4. If a Disabled test is consistently passing, CI will automatically remove the test disable. (it appears this already happens today)

Yup, this is an existing feature, disabled tests are run on CI daily and they will be re-enable if they are passing without any failures after 2 weeks.

@zou3519
Copy link
Contributor Author

zou3519 commented Feb 1, 2024

Some offline nuggets of wisdom from @malfet:
3) We don't want a "new" or "unknown quality" test for all new tests for PyTorch. Instead, this should be a one-time opt-in thing that lets us populate the list of test skips

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: devx Related to PyTorch contribution experience (HUD, pytorchbot) needs design triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: No status
Development

No branches or pull requests

4 participants