-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Consider updating pin to CMake version 3.23.1 on run_torchbench CI #91739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91739
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit 9024480: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
.github/workflows/run_torchbench.yml
Outdated
| # see details at: https://github.com/pytorch/pytorch/issues/74985 | ||
| conda install -y numpy="${NUMPY_VERSION}" requests ninja pyyaml mkl mkl-include \ | ||
| setuptools cmake=3.22 cffi typing_extensions boto3 \ | ||
| setuptools cmake cffi typing_extensions boto3 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our best practice is to pin the version of the dependencies (cmake) so that they can be upgraded in a controlled manner. For this specific PR, I think you can try to up the version of cmake here to 3.23.1 or 3.23.* and see if torchbench passes. In PyTorch CI, the version of most dependencies are now pinned and cached.
Depending on the platforms, they are in the following locations:
- Mac, Linux, iOS conda https://github.com/pytorch/pytorch/tree/master/.github/requirements
- Default version https://github.com/pytorch/test-infra/blob/main/.github/actions/setup-miniconda/action.yml#L172
- Windows AMI conda https://github.com/pytorch/test-infra/blob/main/aws/ami/windows/scripts/Installers/Install-Conda-Dependencies.ps1#L18
These are just FYI. There is no need to update them as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the response! My current plan is to pin cmake in run_torchbench.yml to 3.23.1 within this PR.
My next plan is to make a new PR and update the cmake versions in the files you mentioned to 3.23.1 as well. Does this sound good? Is there anything I should be careful about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. There is no concern about upgrading cmake version AFAIK. It's ok as long as all the tests pass.
Windows AMI update involves a bit of a manual testing steps (baking the new AMI and testing it), but I can help you with that. Once the PR is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pinned the cmake version to 3.23.1 in run_torchbench.yml, I'll try testing it out --
RUN_TORCHBENCH:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue
Looking at the tests, it turns out that the default conda channel doesn't have 3.23.1. The most recent version of cmake that the default channel which conda uses is 3.22.1 from 8 months ago, which threw me off since this person #74985 was able to run conda install cmake==3.23.0, and also ran their build on cmake version 3.23.1. I'm guessing some time between now and then, anaconda pulled those versions down, bringing us back to step one :(. I'm thinking of two possible solutions --
Option 1: Revert
Just revert back to 3.22. Since the default conda channel uses version 3.22.1, should we standardize the cmake version here in run_torchbench.yml to be 3.22.1 and the places mentioned here #91739 (comment) to 3.22.1 instead of 3.22.*?
Option 2: Use conda-forge channel
Another option I was looking at was using the conda-forge channel, which has cmake updated to version 3.25.1, so the command to install it would look like: conda install conda-forge::cmake=3.25.1 (at least for the latest version, there are previous versions we can use if 3.25.1 is too aggressive). I'm a first-time contributor so I don't know if using a different channel under conda is advisable. But from what I see, the default channel rarely updates cmake, while the conda-forge channel seems to be really on top of updates by comparison: see default vs. conda-forge --

What are your thoughts on which option looks best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could use conda-forge if there are important features from newer version there that we need. Otherwise, let's not bring that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I did find it suspicious that conda-forge has multiple of the same versions of cmake like 3.20.2 or 3.18.2, maybe indicating reliability issues. So there are two things on my mind right now --
Close relevant issues
We should close these two issues that initially brought me here: #74985 and #75705
Reconsider use of conda-forge in Windows AMI?
I noticed in the file linked above by @huydhn here that the command conda install cmake is used on line 17, followed by conda install -y -c conda-forge cmake=3.22.3 on line 18. I have two concerns; first is that the cmake on line 17 is unpinned, so we should pin it to 3.22 or 3.22.* to follow the current practice of pinning cmake. The second is that it looks like there has been a determination that conda-forge isn't a reliable channel, and that we should only be using the main channel, so once we pin cmake to 3.22.* on line 17, we should remove line 18. I also just realized that file is in a different repo; should I make a PR there to implement these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, #75705 can be closed because it's outdated while #74985 would probably still be there till conda makes 3.23.1 available to use.
Reconsider use of
conda-forgein Windows AMI?
This makes sense to get rid of conda-forge there and pin the one from conda to 3.22.* as you suggest, but I guess that the line is there for a reason (may be something broke in the past that I'm not aware of). You can make the PR there to update the AMI. We can then do a test to see if it works with 3.22.* like other workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make the PR there to update the AMI. We can then do a test to see if it works with
3.22.*like other workflows.
Alright, I went ahead and made one -- pytorch/test-infra#1368
As for the tests on this PR, it looks like this is the discerning issue --
++ git merge-base 7dd28e9e83eb74ed03682c5a4174d5c5a3711c89 90244807f701bd6ec9537d0abcca786410889cfb
fatal: Not a valid commit name 90244807f701bd6ec9537d0abcca786410889cfb
I'm guessing, but it seems like TorchBench CI doesn't really like this commit existing in my forked repo of PyTorch. The only solution I can think of is cloning the master repo, creating a branch, making the changes, and opening another PR. Any ideas on resolving this so that the CI can get this commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worry, this change looks simple enough. I would be happy to merge this one for you as it stands.
|
Approving the CI for this PR to test the torchbench tests. If we want to upgrade cmake version for the torchbench CI, should we also be consistent to upgrade cmake across all other CI? |
.github/workflows/run_torchbench.yml
Outdated
| # see details at: https://github.com/pytorch/pytorch/issues/74985 | ||
| conda install -y numpy="${NUMPY_VERSION}" requests ninja pyyaml mkl mkl-include \ | ||
| setuptools cmake=3.22 cffi typing_extensions boto3 \ | ||
| setuptools cmake=3.23.1 cffi typing_extensions boto3 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, I misread this as 3.22.1 while it's 3.23.1. We used to fix the version to the former till recently #90307 where it was relaxed a bit to 3.22.*
Based on pytorch#90307 Co-authored-by: Huy Do <huydhn@gmail.com>
|
@pytorchbot merge -f 'Only pin cmake version. Linter has passed' |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
### Description #### Why pin cmake When cmake was unpinned and cmake version 3.23.0 came out, Windows and Ubuntu builds would break pytorch/pytorch#74985. While this would eventually be fixed, it was determined that it would be safer to pin cmake to a known stable version, 3.22. Finally, it was decided that 3.22 be relaxed to 3.22.* pytorch/pytorch#90307. #### Why drop `conda-forge` While `conda-forge` contains much more updated versions of cmake relative to conda's main channel pytorch/pytorch#91739 (comment), it's been revealed that this channel isn't particularly stable pytorch/pytorch#87208. This has lead to the decision to avoid the use of `conda-forge` unless it has been determined that this channel contains updates the main channel doesn't have that are vital to the functionality of PyTorch pytorch/pytorch#91739 (comment). ### How is this tested? There are some manual testing steps to be done, though I'm not sure how to use them so I'm looking for guidance on this matter. ### People with relevant context @huydhn

Issues Affected
Fixes #74985 and #75705
Description
Unpinned cmake initially used version 3.23.0 which broke a build #74985 (comment). This previously led to a necessary pin to cmake version 3.22. With the release of cmake 3.23.1, it is no longer necessary to pin cmake #74985 (comment).
CMake unpin change has not been added to
.jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.batbecause Windows dependencies were refactored away #88862.How is this Tested?
This change is tested using "RUN_TORCHBENCH:" in the PR body #77577 (comment).
People with Relevant Context
@janeyx99
RUN_TORCHBENCH: