Run functorch tests in default shards; delete functorch-specific shards #96464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

zou3519 wants to merge 13 commits into gh/zou3519/615/base from gh/zou3519/615/head

Contributor

zou3519 commented Mar 9, 2023 •

edited

Loading

Stack from ghstack:

-> Run functorch tests in default shards; delete functorch-specific shards #96464

Fixes #96347

This PR:

Makes the functorch tests run as a part of the "default" shards
Delete the functorch CI shard from all CI job configurations (if it exists)
Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:

wait for CI


          Run functorch tests in default shards; delete functorch-specific shards

eac7b9e

Body to come soon

[ghstack-poisoned]

pytorch-bot bot added the release notes: releng label

pytorch-bot bot commented Mar 9, 2023 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96464

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 6dd2479:

NEW FAILURES - The following jobs have failed:

linux-bionic-cuda11.8-py3.10-gcc7 / test (distributed, 2, 3, linux.8xlarge.nvidia.gpu) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zou3519 added a commit that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shards

928844b

Body to come soon

ghstack-source-id: 0a00907
Pull Request resolved: #96464

Contributor Author

zou3519 commented Mar 9, 2023

Not ready for review yet

zou3519 added keep-going ciflow/trunk labels


          Update on "Run functorch tests in default shards; delete functorch-sp…

6475ba5

…ecific shards"

Body to come soon

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shards

60a4d4e

Body to come soon

ghstack-source-id: c100787
Pull Request resolved: #96464


          Update on "Run functorch tests in default shards; delete functorch-sp…

6973cee

…ecific shards"

Body to come soon

[ghstack-poisoned]


          Update on "Run functorch tests in default shards; delete functorch-sp…

ffc8851

…ecific shards"

Body to come soon

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shards

62279da

Body to come soon

ghstack-source-id: 53f7c2f
Pull Request resolved: #96464


          Update on "Run functorch tests in default shards; delete functorch-sp…

43022fd

…ecific shards"

Body to come soon

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shards

90d8b07

Body to come soon

ghstack-source-id: bfce049
Pull Request resolved: #96464


          Update on "Run functorch tests in default shards; delete functorch-sp…

5c6c67f

…ecific shards"

Body to come soon

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shards

4b41f9e

Body to come soon

ghstack-source-id: b462516
Pull Request resolved: #96464


          Update on "Run functorch tests in default shards; delete functorch-sp…

a87fc29

…ecific shards"

Body to come soon

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shards

574d25a

Body to come soon

ghstack-source-id: c2a9e0e
Pull Request resolved: #96464


          Update on "Run functorch tests in default shards; delete functorch-sp…

8d99c0a

…ecific shards"

Body to come soon

[ghstack-poisoned]

zou3519 marked this pull request as ready for review

March 16, 2023 16:31

zou3519 requested review from a team, Chillee, ezyang and kshitij12345 as code owners

March 16, 2023 16:31


          Update on "Run functorch tests in default shards; delete functorch-sp…

1cd55da

…ecific shards"

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI

[ghstack-poisoned]

zou3519 requested a review from huydhn

March 16, 2023 16:31

Contributor

huydhn commented Mar 16, 2023

FYI, there is one remaining functorch shard for MacOS x86_64 in periodic https://github.com/pytorch/pytorch/blob/master/.github/workflows/periodic.yml#L313

huydhn approved these changes

View reviewed changes

Contributor

huydhn left a comment

LGTM! Let's also update the MacOS x86_64 shard and wait if all tests pass


          Update on "Run functorch tests in default shards; delete functorch-sp…

cd561a4

…ecific shards"

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shards

d72f30e

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI

ghstack-source-id: 13a4c38
Pull Request resolved: #96464

Contributor Author

zou3519 commented Mar 16, 2023

FYI, there is one remaining functorch shard for MacOS x86_64 in periodic https://github.com/pytorch/pytorch/blob/master/.github/workflows/periodic.yml#L313

Good catch, I forgot about the jobs in periodic. I updated that shard and also increased the default shard count by 1 for the jobs in periodic


          Update on "Run functorch tests in default shards; delete functorch-sp…

988c010

…ecific shards"

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shards

0c225b1

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI

ghstack-source-id: ee52726
Pull Request resolved: #96464


          Update on "Run functorch tests in default shards; delete functorch-sp…

1397fe2

…ecific shards"

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shards

8f56395

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI

ghstack-source-id: ca719db
Pull Request resolved: #96464


          Update on "Run functorch tests in default shards; delete functorch-sp…

6dd2479

…ecific shards"

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shards

2fa3a1d

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI

ghstack-source-id: a3b703a
Pull Request resolved: #96464

Contributor Author

zou3519 commented Mar 21, 2023

@pytorchbot merge -f "test failure looks flaky"

Collaborator

pytorchmergebot commented Mar 21, 2023

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

pytorchmergebot closed this in

5acf403

huydhn mentioned this pull request

Mark ROCm tests as flaky #97259

Closed

pytorchmergebot pushed a commit that referenced this pull request


          Mark ROCm tests as flaky (#97259)

244736a

Before #96464, ROCm tests in trunk are already quite flaky https://hud.pytorch.org/reliability/pytorch/pytorch?jobName=trunk%20%2F%20linux-focal-rocm5.4.2-py3.8%20%2F%20test%20(default).

After #96464, there is a new group of flaky failures coming from functorch.  So let's mark the test as flaky to monitor without impacting trunk.

Two flaky tests currently seeing in trunk are:

* #97256
* `functorch/test_memory_efficient_fusion.py` OOM

Pull Request resolved: #97259
Approved by: https://github.com/malfet, https://github.com/zou3519

zou3519 mentioned this pull request

Enable functorch testing for rocm #96560

Closed

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shar…

e359926

…ds (#96464)

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI
Pull Request resolved: pytorch/pytorch#96464
Approved by: https://github.com/huydhn

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request


          Mark ROCm tests as flaky (#97259)

ab8ce2a

Before pytorch/pytorch#96464, ROCm tests in trunk are already quite flaky https://hud.pytorch.org/reliability/pytorch/pytorch?jobName=trunk%20%2F%20linux-focal-rocm5.4.2-py3.8%20%2F%20test%20(default).

After pytorch/pytorch#96464, there is a new group of flaky failures coming from functorch.  So let's mark the test as flaky to monitor without impacting trunk.

Two flaky tests currently seeing in trunk are:

* pytorch/pytorch#97256
* `functorch/test_memory_efficient_fusion.py` OOM

Pull Request resolved: pytorch/pytorch#97259
Approved by: https://github.com/malfet, https://github.com/zou3519

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request


          Run functorch tests in default shards; delete functorch-specific shar…

df152f4

…ds (#96464)

Fixes #96347

This PR:

- Makes the functorch tests run as a part of the "default" shards
- Delete the functorch CI shard from all CI job configurations (if it exists)
- Increase the "default" shard count by 1 for each job, unless it was
previously set to 1, to accommodate the new functorch tests and not
regress time-to-signal.
- Adds a bunch of skips for ROCM and torchdynamo configurations. We can
investigate them later.

NB: I might go through some more iterations to figure out what other
skips need to be added, but this iteration of the PR seems to pass most CI.
suite.

Test Plan:
- wait for CI
Pull Request resolved: pytorch/pytorch#96464
Approved by: https://github.com/huydhn

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request


          Mark ROCm tests as flaky (#97259)

4641e7b

Before pytorch/pytorch#96464, ROCm tests in trunk are already quite flaky https://hud.pytorch.org/reliability/pytorch/pytorch?jobName=trunk%20%2F%20linux-focal-rocm5.4.2-py3.8%20%2F%20test%20(default).

After pytorch/pytorch#96464, there is a new group of flaky failures coming from functorch.  So let's mark the test as flaky to monitor without impacting trunk.

Two flaky tests currently seeing in trunk are:

* pytorch/pytorch#97256
* `functorch/test_memory_efficient_fusion.py` OOM

Pull Request resolved: pytorch/pytorch#97259
Approved by: https://github.com/malfet, https://github.com/zou3519

facebook-github-bot deleted the gh/zou3519/615/head branch

June 8, 2023 19:34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk keep-going Merged release notes: releng