Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Inductor perf smoke test with other inductor CI tests #93395

Closed
wants to merge 1 commit into from

Conversation

desertfire
Copy link
Contributor

@desertfire desertfire commented Jan 31, 2023

Stack from ghstack (oldest at bottom):

Summary: Now the smoke test can also be triggered with the
ciflow/inductor label.

Summary: Now the smoke test can also be triggered with the
ciflow/inductor label.

[ghstack-poisoned]
@desertfire desertfire requested a review from a team as a code owner January 31, 2023 20:50
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 31, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93395

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 45c5cb5:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jan 31, 2023
desertfire added a commit that referenced this pull request Jan 31, 2023
Summary: Now the smoke test can also be triggered with the
ciflow/inductor label.

ghstack-source-id: b785a72c295336fb534d22d65b5fc0c81d79ba4f
Pull Request resolved: #93395
Copy link
Contributor

@weiwangmeta weiwangmeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will keep an eye on the queue time of A100, and increase runners if needed.

@desertfire
Copy link
Contributor Author

@malfet, let me know if this is what you had in mind when you said "running on master".

@desertfire
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 2, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed (Rule OSS CI). The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be enough capacity?

@malfet
Copy link
Contributor

malfet commented Feb 2, 2023

@pytorchbot merge -f "Doesn't really affect trunk, but newly added config is green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@weiwangmeta
Copy link
Contributor

Would there be enough capacity?

The standalone action was already running for every push to the master, the capacity is enough.
This move operation may increase the load, if more PRs add ciflow/inductor label. I will monitor and ensure we have enough capacity.

@ngimel
Copy link
Collaborator

ngimel commented Feb 2, 2023

Perf smoke test takes ~1min, it's very little compared to all the other accuracy sweeps we are running

@weiwangmeta
Copy link
Contributor

Perf smoke test takes ~1min, it's very little compared to all the other accuracy sweeps we are running

Good point, we might just have identified a BE work. This smoke test job spent most of its time checking out torchbench. There might be a way to skip that.
FYI, these torchbench setup tests were taking like 20 minutes, will see if this can be saved. https://github.com/pytorch/pytorch/actions/runs/4076511843/jobs/7024936041#step:11:1128

ragulpr added a commit to ragulpr/pytorch that referenced this pull request Feb 2, 2023
…n-dev-setup

* origin: (898 commits)
  Move dynamo.optimizations.distributed to backends (pytorch#93408)
  Remove cuda 11.6 from nightly (pytorch#93979)
  Refactor dynamo register_backend/BACKENDS (pytorch#93389)
  Remove cuda 11.6 from CI replace with 11.7 (pytorch#93406)
  [Dynamo] Rename `GuardBuilder.guarded_code` -> `check_fn_manager` (pytorch#93934)
  Revert "Remove CUDA 11.6 from nightly builds (pytorch#93404)"
  Revert "[inductor] fix crash issue when input is a view tensor (pytorch#90150)"
  Basic Validation for FSDP `state_dict` transformations of modules with persistent buffers (pytorch#93396)
  Merge Inductor perf smoke test with other inductor CI tests (pytorch#93395)
  [inductor] Don't import torchvision (pytorch#93027)
  [FSDP][3/N] Refactor `summon_full_params` unit tests (pytorch#92298)
  [FSDP][2/N] `_summon_full_params` -> `_unshard_params` (pytorch#92297)
  Remove CUDA 11.6 from nightly builds (pytorch#93404)
  Mark buffers that reuse other buffers (pytorch#93329)
  Refactor to allow reuse of SchedulerNode.allocate (pytorch#93328)
  retire sparse_mask_helper (pytorch#91714)
  update fbgemm third party (pytorch#93907)
  [inductor] fix crash issue when input is a view tensor (pytorch#90150)
  [Inductor] add config for weight prepacking (pytorch#93811)
  Check for none for NNModuleVariable.__module__ (pytorch#93326)
  ...
pytorchmergebot pushed a commit that referenced this pull request Feb 13, 2023
Addresses (#93395 (comment)) The perf smoke test is supposed to be around one minute. But the torchbench checkout process is taking more than 15 minutes. This PR explores a way to just checkout torchbench with only needed models that are later used to do perf smoke test and memory compression ratio check.

Torchbench installation has "python install.py models model1 model 2 model3" support to just install model1 model2 and model3, not providing "models model1 model2 model3" would install all models by default.

Before this PR, inductor job takes about 27 minutes (21 minutes spent in testing phase) https://github.com/pytorch/pytorch/actions/runs/4149154553/jobs/7178024253
After this PR, inductor job takes about 19 minutes (12 minutes spent in testing phase), pytorch checkout and docker image pull takes about 5 - 6 minutes total.  https://github.com/pytorch/pytorch/actions/runs/4149155814/jobs/7178735494

Pull Request resolved: #94578
Approved by: https://github.com/orionr, https://github.com/malfet, https://github.com/desertfire
@facebook-github-bot facebook-github-bot deleted the gh/desertfire/62/head branch June 8, 2023 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants