Skip to content

Conversation

jbschlosser
Copy link
Contributor

@jbschlosser jbschlosser commented Jul 3, 2023

Stack from ghstack (oldest at bottom):

MS2 of the Reproducible Testing BE initiative. For context, this is the ask:

Another thing that would be really great as we start to have more dependent 
systems or types of tests (functorch, dynamo, crossref) would be to have a 
minimally reproducible version of the test (something at the end of the HUD 
comment like: "Run python test/test_file.py -k test_name" but also if you need 
flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll 
often go through the test infra to find the flags that I need to pass when 
something only breaks crossref/dynamo tests.

Implementation details:

  • Adds a new flag PRINT_REPRO_ON_FAILURE that is settable through the environment variable PYTORCH_PRINT_REPRO_ON_FAILURE=1
    • Default is ON but I can be persuaded otherwise
  • When the flag is enabled, our base TestCase will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example:
To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
  • To keep track of flag settings, this PR introduces a new TestEnvironment class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages:
TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR")
# can track implication relationships to avoid adding unnecessary flags to the repro
TestEnvironment.def_flag(
    "TEST_WITH_TORCHDYNAMO",
    env_var="PYTORCH_TEST_WITH_DYNAMO",
    implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER)
# can use include_in_repro=False to keep the flag from appearing in the repro command
TestEnvironment.def_flag(
    "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False)
# the default default value is False, but this can be changed
TestEnvironment.def_flag(
    "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False)
  • AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / run_test.py are unaware of individual test cases. Implementing it in our base TestCase class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests.
  • I couldn't find an easy way to test the logic via test_testing.py, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however.

@jbschlosser jbschlosser requested a review from a team as a code owner July 3, 2023 20:34
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 3, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104537

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 675fe55:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to be too picky about what eventually goes in, since it is better to have something rather than nothing, but I think my comments are definitely worth looking into.

MS2 of the Reproducible Testing BE initiative. For context, this is the ask:

```
Another thing that would be really great as we start to have more dependent 
systems or types of tests (functorch, dynamo, crossref) would be to have a 
minimally reproducible version of the test (something at the end of the HUD 
comment like: "Run python test/test_file.py -k test_name" but also if you need 
flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll 
often go through the test infra to find the flags that I need to pass when 
something only breaks crossref/dynamo tests.
```

Implementation details:
* Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1`
    * **Default is ON but I can be persuaded otherwise**
* When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example:

```
To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32
```
* AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests.
* I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however.

[ghstack-poisoned]
@jbschlosser jbschlosser requested a review from ezyang July 6, 2023 22:02
MS2 of the Reproducible Testing BE initiative. For context, this is the ask:

```
Another thing that would be really great as we start to have more dependent 
systems or types of tests (functorch, dynamo, crossref) would be to have a 
minimally reproducible version of the test (something at the end of the HUD 
comment like: "Run python test/test_file.py -k test_name" but also if you need 
flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll 
often go through the test infra to find the flags that I need to pass when 
something only breaks crossref/dynamo tests.
```

Implementation details:
* Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1`
    * **Default is ON but I can be persuaded otherwise**
* When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example:

```
To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
```
* To keep track of flag settings, this PR introduces a new `TestEnvironment` class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages:
```python
TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR")
# can track implication relationships to avoid adding unnecessary flags to the repro
TestEnvironment.def_flag(
    "TEST_WITH_TORCHDYNAMO",
    env_var="PYTORCH_TEST_WITH_DYNAMO",
    implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER)
# can use include_in_repro=False to keep the flag from appearing in the repro command
TestEnvironment.def_flag(
    "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False)
# the default default value is False, but this can be changed
TestEnvironment.def_flag(
    "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False)
```
* AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests.
* I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however.

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Jul 6, 2023
ghstack-source-id: 19f27ae
Pull Request resolved: #104537
MS2 of the Reproducible Testing BE initiative. For context, this is the ask:

```
Another thing that would be really great as we start to have more dependent 
systems or types of tests (functorch, dynamo, crossref) would be to have a 
minimally reproducible version of the test (something at the end of the HUD 
comment like: "Run python test/test_file.py -k test_name" but also if you need 
flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll 
often go through the test infra to find the flags that I need to pass when 
something only breaks crossref/dynamo tests.
```

Implementation details:
* Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1`
    * **Default is ON but I can be persuaded otherwise**
* When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example:

```
To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
```
* To keep track of flag settings, this PR introduces a new `TestEnvironment` class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages:
```python
TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR")
# can track implication relationships to avoid adding unnecessary flags to the repro
TestEnvironment.def_flag(
    "TEST_WITH_TORCHDYNAMO",
    env_var="PYTORCH_TEST_WITH_DYNAMO",
    implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER)
# can use include_in_repro=False to keep the flag from appearing in the repro command
TestEnvironment.def_flag(
    "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False)
# the default default value is False, but this can be changed
TestEnvironment.def_flag(
    "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False)
```
* AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests.
* I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however.

[ghstack-poisoned]
MS2 of the Reproducible Testing BE initiative. For context, this is the ask:

```
Another thing that would be really great as we start to have more dependent 
systems or types of tests (functorch, dynamo, crossref) would be to have a 
minimally reproducible version of the test (something at the end of the HUD 
comment like: "Run python test/test_file.py -k test_name" but also if you need 
flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll 
often go through the test infra to find the flags that I need to pass when 
something only breaks crossref/dynamo tests.
```

Implementation details:
* Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1`
    * **Default is ON but I can be persuaded otherwise**
* When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example:

```
To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
```
* To keep track of flag settings, this PR introduces a new `TestEnvironment` class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages:
```python
TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR")
# can track implication relationships to avoid adding unnecessary flags to the repro
TestEnvironment.def_flag(
    "TEST_WITH_TORCHDYNAMO",
    env_var="PYTORCH_TEST_WITH_DYNAMO",
    implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER)
# can use include_in_repro=False to keep the flag from appearing in the repro command
TestEnvironment.def_flag(
    "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False)
# the default default value is False, but this can be changed
TestEnvironment.def_flag(
    "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False)
```
* AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests.
* I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however.

[ghstack-poisoned]
@jbschlosser
Copy link
Contributor Author

jbschlosser commented Jul 10, 2023

@huydhn do you have any insight on the other test failures? Doesn't seem related to my stuff. I can repro locally without my changes, but then I don't see those failures in the HUD. what should I do?

MS2 of the Reproducible Testing BE initiative. For context, this is the ask:

```
Another thing that would be really great as we start to have more dependent 
systems or types of tests (functorch, dynamo, crossref) would be to have a 
minimally reproducible version of the test (something at the end of the HUD 
comment like: "Run python test/test_file.py -k test_name" but also if you need 
flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll 
often go through the test infra to find the flags that I need to pass when 
something only breaks crossref/dynamo tests.
```

Implementation details:
* Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1`
    * **Default is ON but I can be persuaded otherwise**
* When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example:

```
To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
```
* To keep track of flag settings, this PR introduces a new `TestEnvironment` class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages:
```python
TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR")
# can track implication relationships to avoid adding unnecessary flags to the repro
TestEnvironment.def_flag(
    "TEST_WITH_TORCHDYNAMO",
    env_var="PYTORCH_TEST_WITH_DYNAMO",
    implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER)
# can use include_in_repro=False to keep the flag from appearing in the repro command
TestEnvironment.def_flag(
    "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False)
# the default default value is False, but this can be changed
TestEnvironment.def_flag(
    "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False)
```
* AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests.
* I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however.

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Jul 10, 2023
…RADCHECK flag"


Finishes the job from #104537. See #104537 (review)

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Jul 10, 2023
@huydhn
Copy link
Contributor

huydhn commented Jul 10, 2023

@huydhn do you have any insight on the other test failures? Doesn't seem related to my stuff. I can repro locally without my changes, but then I don't see those failures in the HUD. what should I do?

From what I see, they are all disabled tests and shouldn't be run at all in the PR. Looking at an example failure, I see it was run as:

Executing ['/opt/conda/envs/py_3.9/bin/python', '-bb', 'test_type_hints.py', '--shard-id=0', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2']

while the correct command should be:

Executing ['/opt/conda/envs/py_3.9/bin/python', '-bb', 'test_type_hints.py', '--shard-id=0', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '--sc=test_type_hints_0', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests']

So there are several missing parameters:

  • --sc=test_type_hints_0
  • --import-slow-tests
  • and --import-disabled-tests

They are all controlled via env variables I think, so let's double check these parts.

@huydhn
Copy link
Contributor

huydhn commented Jul 10, 2023

Or may be the issue is in how IS_CI env variable is exposed. All three of the missing flags are available only on CI:

This doesn't looks like a coincidence. So my theory is that IS_CI isn't set correctly in run_test.py

@jbschlosser
Copy link
Contributor Author

jbschlosser commented Jul 10, 2023

This doesn't looks like a coincidence. So my theory is that IS_CI isn't set correctly in run_test.py

Awesome investigation, thanks! Looking into it now.

Edit: Figured it out. The old code was doing this:

IS_CI = bool(os.getenv('CI'))

which is True more often than os.getenv('CI') == '1'. So I guess the CI env var is being set to something other than 1 when launching run_test.py.

@huydhn
Copy link
Contributor

huydhn commented Jul 10, 2023

Ohh, that's the reason. The CI variable comes from GitHub itself https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables and it's set to true according to GitHub doc.

MS2 of the Reproducible Testing BE initiative. For context, this is the ask:

```
Another thing that would be really great as we start to have more dependent 
systems or types of tests (functorch, dynamo, crossref) would be to have a 
minimally reproducible version of the test (something at the end of the HUD 
comment like: "Run python test/test_file.py -k test_name" but also if you need 
flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll 
often go through the test infra to find the flags that I need to pass when 
something only breaks crossref/dynamo tests.
```

Implementation details:
* Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1`
    * **Default is ON but I can be persuaded otherwise**
* When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example:

```
To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
```
* To keep track of flag settings, this PR introduces a new `TestEnvironment` class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages:
```python
TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR")
# can track implication relationships to avoid adding unnecessary flags to the repro
TestEnvironment.def_flag(
    "TEST_WITH_TORCHDYNAMO",
    env_var="PYTORCH_TEST_WITH_DYNAMO",
    implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER)
# can use include_in_repro=False to keep the flag from appearing in the repro command
TestEnvironment.def_flag(
    "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False)
# the default default value is False, but this can be changed
TestEnvironment.def_flag(
    "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False)
```
* AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests.
* I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however.

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Jul 10, 2023
…RADCHECK flag"


Finishes the job from #104537. See #104537 (review)

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Jul 10, 2023
@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@janeyx99
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 11, 2023
@facebook-github-bot facebook-github-bot deleted the gh/jbschlosser/86/head branch July 14, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: testing Issues related to the torch.testing module (not tests) topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants