-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Testing: Print test reproduction command on failure #104537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104537
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 675fe55: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to be too picky about what eventually goes in, since it is better to have something rather than nothing, but I think my comments are definitely worth looking into.
MS2 of the Reproducible Testing BE initiative. For context, this is the ask: ``` Another thing that would be really great as we start to have more dependent systems or types of tests (functorch, dynamo, crossref) would be to have a minimally reproducible version of the test (something at the end of the HUD comment like: "Run python test/test_file.py -k test_name" but also if you need flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll often go through the test infra to find the flags that I need to pass when something only breaks crossref/dynamo tests. ``` Implementation details: * Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1` * **Default is ON but I can be persuaded otherwise** * When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example: ``` To execute this test, run the following from the base repo dir: PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32 ``` * AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests. * I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however. [ghstack-poisoned]
MS2 of the Reproducible Testing BE initiative. For context, this is the ask: ``` Another thing that would be really great as we start to have more dependent systems or types of tests (functorch, dynamo, crossref) would be to have a minimally reproducible version of the test (something at the end of the HUD comment like: "Run python test/test_file.py -k test_name" but also if you need flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll often go through the test infra to find the flags that I need to pass when something only breaks crossref/dynamo tests. ``` Implementation details: * Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1` * **Default is ON but I can be persuaded otherwise** * When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example: ``` To execute this test, run the following from the base repo dir: PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ``` * To keep track of flag settings, this PR introduces a new `TestEnvironment` class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages: ```python TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR") # can track implication relationships to avoid adding unnecessary flags to the repro TestEnvironment.def_flag( "TEST_WITH_TORCHDYNAMO", env_var="PYTORCH_TEST_WITH_DYNAMO", implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER) # can use include_in_repro=False to keep the flag from appearing in the repro command TestEnvironment.def_flag( "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False) # the default default value is False, but this can be changed TestEnvironment.def_flag( "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False) ``` * AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests. * I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however. [ghstack-poisoned]
MS2 of the Reproducible Testing BE initiative. For context, this is the ask: ``` Another thing that would be really great as we start to have more dependent systems or types of tests (functorch, dynamo, crossref) would be to have a minimally reproducible version of the test (something at the end of the HUD comment like: "Run python test/test_file.py -k test_name" but also if you need flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll often go through the test infra to find the flags that I need to pass when something only breaks crossref/dynamo tests. ``` Implementation details: * Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1` * **Default is ON but I can be persuaded otherwise** * When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example: ``` To execute this test, run the following from the base repo dir: PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ``` * To keep track of flag settings, this PR introduces a new `TestEnvironment` class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages: ```python TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR") # can track implication relationships to avoid adding unnecessary flags to the repro TestEnvironment.def_flag( "TEST_WITH_TORCHDYNAMO", env_var="PYTORCH_TEST_WITH_DYNAMO", implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER) # can use include_in_repro=False to keep the flag from appearing in the repro command TestEnvironment.def_flag( "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False) # the default default value is False, but this can be changed TestEnvironment.def_flag( "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False) ``` * AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests. * I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however. [ghstack-poisoned]
MS2 of the Reproducible Testing BE initiative. For context, this is the ask: ``` Another thing that would be really great as we start to have more dependent systems or types of tests (functorch, dynamo, crossref) would be to have a minimally reproducible version of the test (something at the end of the HUD comment like: "Run python test/test_file.py -k test_name" but also if you need flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll often go through the test infra to find the flags that I need to pass when something only breaks crossref/dynamo tests. ``` Implementation details: * Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1` * **Default is ON but I can be persuaded otherwise** * When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example: ``` To execute this test, run the following from the base repo dir: PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ``` * To keep track of flag settings, this PR introduces a new `TestEnvironment` class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages: ```python TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR") # can track implication relationships to avoid adding unnecessary flags to the repro TestEnvironment.def_flag( "TEST_WITH_TORCHDYNAMO", env_var="PYTORCH_TEST_WITH_DYNAMO", implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER) # can use include_in_repro=False to keep the flag from appearing in the repro command TestEnvironment.def_flag( "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False) # the default default value is False, but this can be changed TestEnvironment.def_flag( "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False) ``` * AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests. * I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however. [ghstack-poisoned]
Finishes the job from #104537. See #104537 (review) [ghstack-poisoned]
@huydhn do you have any insight on the other test failures? Doesn't seem related to my stuff. I can repro locally without my changes, but then I don't see those failures in the HUD. what should I do? |
MS2 of the Reproducible Testing BE initiative. For context, this is the ask: ``` Another thing that would be really great as we start to have more dependent systems or types of tests (functorch, dynamo, crossref) would be to have a minimally reproducible version of the test (something at the end of the HUD comment like: "Run python test/test_file.py -k test_name" but also if you need flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll often go through the test infra to find the flags that I need to pass when something only breaks crossref/dynamo tests. ``` Implementation details: * Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1` * **Default is ON but I can be persuaded otherwise** * When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example: ``` To execute this test, run the following from the base repo dir: PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ``` * To keep track of flag settings, this PR introduces a new `TestEnvironment` class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages: ```python TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR") # can track implication relationships to avoid adding unnecessary flags to the repro TestEnvironment.def_flag( "TEST_WITH_TORCHDYNAMO", env_var="PYTORCH_TEST_WITH_DYNAMO", implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER) # can use include_in_repro=False to keep the flag from appearing in the repro command TestEnvironment.def_flag( "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False) # the default default value is False, but this can be changed TestEnvironment.def_flag( "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False) ``` * AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests. * I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however. [ghstack-poisoned]
…RADCHECK flag" Finishes the job from #104537. See #104537 (review) [ghstack-poisoned]
Finishes the job from #104537. See #104537 (review) [ghstack-poisoned]
From what I see, they are all disabled tests and shouldn't be run at all in the PR. Looking at an example failure, I see it was run as:
while the correct command should be:
So there are several missing parameters:
They are all controlled via env variables I think, so let's double check these parts. |
Or may be the issue is in how
This doesn't looks like a coincidence. So my theory is that |
Awesome investigation, thanks! Looking into it now. Edit: Figured it out. The old code was doing this: IS_CI = bool(os.getenv('CI')) which is True more often than |
Ohh, that's the reason. The |
MS2 of the Reproducible Testing BE initiative. For context, this is the ask: ``` Another thing that would be really great as we start to have more dependent systems or types of tests (functorch, dynamo, crossref) would be to have a minimally reproducible version of the test (something at the end of the HUD comment like: "Run python test/test_file.py -k test_name" but also if you need flags, like crossref it would be like "Run <flag to run crossref> python test/..." ). I'll often go through the test infra to find the flags that I need to pass when something only breaks crossref/dynamo tests. ``` Implementation details: * Adds a new flag `PRINT_REPRO_ON_FAILURE` that is settable through the environment variable `PYTORCH_PRINT_REPRO_ON_FAILURE=1` * **Default is ON but I can be persuaded otherwise** * When the flag is enabled, our base `TestCase` will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example: ``` To execute this test, run the following from the base repo dir: PYTORCH_TEST_WITH_CROSSREF=1 python test/test_ops.py -k test_foo_add_cuda_float32 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ``` * To keep track of flag settings, this PR introduces a new `TestEnvironment` class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages: ```python TestEnvironment.def_flag("TEST_WITH_TORCHINDUCTOR", env_var="PYTORCH_TEST_WITH_INDUCTOR") # can track implication relationships to avoid adding unnecessary flags to the repro TestEnvironment.def_flag( "TEST_WITH_TORCHDYNAMO", env_var="PYTORCH_TEST_WITH_DYNAMO", implied_by_fn=lambda: TEST_WITH_TORCHINDUCTOR or TEST_WITH_AOT_EAGER) # can use include_in_repro=False to keep the flag from appearing in the repro command TestEnvironment.def_flag( "DISABLE_RUNNING_SCRIPT_CHK", env_var="PYTORCH_DISABLE_RUNNING_SCRIPT_CHK", include_in_repro=False) # the default default value is False, but this can be changed TestEnvironment.def_flag( "PRINT_REPRO_ON_FAILURE", env_var="PYTORCH_PRINT_REPRO_ON_FAILURE", default=(not IS_FBCODE), include_in_repro=False) ``` * AFAICT it is only feasible to achieve this from within the test framework rather than at the CI level. This is because CI / `run_test.py` are unaware of individual test cases. Implementing it in our base `TestCase` class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests. * I couldn't find an easy way to test the logic via `test_testing.py`, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however. [ghstack-poisoned]
…RADCHECK flag" Finishes the job from #104537. See #104537 (review) [ghstack-poisoned]
Finishes the job from #104537. See #104537 (review) [ghstack-poisoned]
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Finishes the job from #104537. See #104537 (review) Pull Request resolved: #104819 Approved by: https://github.com/huydhn
Stack from ghstack (oldest at bottom):
MS2 of the Reproducible Testing BE initiative. For context, this is the ask:
Implementation details:
PRINT_REPRO_ON_FAILURE
that is settable through the environment variablePYTORCH_PRINT_REPRO_ON_FAILURE=1
TestCase
will wrap the test method in a context manager that catches any non-skip exceptions and appends a repro string to the exception message. The repro includes setting of necessary test flags through env vars. Example:TestEnvironment
class that defines global flags by querying related environment variables. Flag and env var names are purposefully kept searchable via full names. Example usages:run_test.py
are unaware of individual test cases. Implementing it in our baseTestCase
class has the broadest area of effect, as it's not isolated to e.g. OpInfo tests.test_testing.py
, as the logic for extracting the test filename doesn't work for generated test classes. I'm open to ideas on testing this, however.