Skip to content

Conversation

@matthewgraca
Copy link
Contributor

Issues Affected

Fixes #74985 and #75705

Description

Unpinned cmake initially used version 3.23.0 which broke a build #74985 (comment). This previously led to a necessary pin to cmake version 3.22. With the release of cmake 3.23.1, it is no longer necessary to pin cmake #74985 (comment).

CMake unpin change has not been added to .jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat because Windows dependencies were refactored away #88862.

How is this Tested?

This change is tested using "RUN_TORCHBENCH:" in the PR body #77577 (comment).

People with Relevant Context

@janeyx99

RUN_TORCHBENCH:

@matthewgraca matthewgraca requested a review from a team as a code owner January 4, 2023 23:25
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jan 4, 2023
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 4, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: matthewgraca / name: Matthew Graca (90fe0c8)

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 4, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91739

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 9024480:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

# see details at: https://github.com/pytorch/pytorch/issues/74985
conda install -y numpy="${NUMPY_VERSION}" requests ninja pyyaml mkl mkl-include \
setuptools cmake=3.22 cffi typing_extensions boto3 \
setuptools cmake cffi typing_extensions boto3 \
Copy link
Contributor

@huydhn huydhn Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our best practice is to pin the version of the dependencies (cmake) so that they can be upgraded in a controlled manner. For this specific PR, I think you can try to up the version of cmake here to 3.23.1 or 3.23.* and see if torchbench passes. In PyTorch CI, the version of most dependencies are now pinned and cached.

Depending on the platforms, they are in the following locations:

These are just FYI. There is no need to update them as part of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response! My current plan is to pin cmake in run_torchbench.yml to 3.23.1 within this PR.

My next plan is to make a new PR and update the cmake versions in the files you mentioned to 3.23.1 as well. Does this sound good? Is there anything I should be careful about?

Copy link
Contributor

@huydhn huydhn Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. There is no concern about upgrading cmake version AFAIK. It's ok as long as all the tests pass.

Windows AMI update involves a bit of a manual testing steps (baking the new AMI and testing it), but I can help you with that. Once the PR is ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pinned the cmake version to 3.23.1 in run_torchbench.yml, I'll try testing it out --

RUN_TORCHBENCH:

Copy link
Contributor Author

@matthewgraca matthewgraca Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue

Looking at the tests, it turns out that the default conda channel doesn't have 3.23.1. The most recent version of cmake that the default channel which conda uses is 3.22.1 from 8 months ago, which threw me off since this person #74985 was able to run conda install cmake==3.23.0, and also ran their build on cmake version 3.23.1. I'm guessing some time between now and then, anaconda pulled those versions down, bringing us back to step one :(. I'm thinking of two possible solutions --

Option 1: Revert

Just revert back to 3.22. Since the default conda channel uses version 3.22.1, should we standardize the cmake version here in run_torchbench.yml to be 3.22.1 and the places mentioned here #91739 (comment) to 3.22.1 instead of 3.22.*?

Option 2: Use conda-forge channel

Another option I was looking at was using the conda-forge channel, which has cmake updated to version 3.25.1, so the command to install it would look like: conda install conda-forge::cmake=3.25.1 (at least for the latest version, there are previous versions we can use if 3.25.1 is too aggressive). I'm a first-time contributor so I don't know if using a different channel under conda is advisable. But from what I see, the default channel rarely updates cmake, while the conda-forge channel seems to be really on top of updates by comparison: see default vs. conda-forge --
default-channel-cmake-versions

conda-forge-cmake-versions

What are your thoughts on which option looks best?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could use conda-forge if there are important features from newer version there that we need. Otherwise, let's not bring that in.

Copy link
Contributor Author

@matthewgraca matthewgraca Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I did find it suspicious that conda-forge has multiple of the same versions of cmake like 3.20.2 or 3.18.2, maybe indicating reliability issues. So there are two things on my mind right now --

Close relevant issues

We should close these two issues that initially brought me here: #74985 and #75705

Reconsider use of conda-forge in Windows AMI?

I noticed in the file linked above by @huydhn here that the command conda install cmake is used on line 17, followed by conda install -y -c conda-forge cmake=3.22.3 on line 18. I have two concerns; first is that the cmake on line 17 is unpinned, so we should pin it to 3.22 or 3.22.* to follow the current practice of pinning cmake. The second is that it looks like there has been a determination that conda-forge isn't a reliable channel, and that we should only be using the main channel, so once we pin cmake to 3.22.* on line 17, we should remove line 18. I also just realized that file is in a different repo; should I make a PR there to implement these changes?

Copy link
Contributor

@huydhn huydhn Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, #75705 can be closed because it's outdated while #74985 would probably still be there till conda makes 3.23.1 available to use.

Reconsider use of conda-forge in Windows AMI?

This makes sense to get rid of conda-forge there and pin the one from conda to 3.22.* as you suggest, but I guess that the line is there for a reason (may be something broke in the past that I'm not aware of). You can make the PR there to update the AMI. We can then do a test to see if it works with 3.22.* like other workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make the PR there to update the AMI. We can then do a test to see if it works with 3.22.* like other workflows.

Alright, I went ahead and made one -- pytorch/test-infra#1368

As for the tests on this PR, it looks like this is the discerning issue --
++ git merge-base 7dd28e9e83eb74ed03682c5a4174d5c5a3711c89 90244807f701bd6ec9537d0abcca786410889cfb
fatal: Not a valid commit name 90244807f701bd6ec9537d0abcca786410889cfb
I'm guessing, but it seems like TorchBench CI doesn't really like this commit existing in my forked repo of PyTorch. The only solution I can think of is cloning the master repo, creating a branch, making the changes, and opening another PR. Any ideas on resolving this so that the CI can get this commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worry, this change looks simple enough. I would be happy to merge this one for you as it stands.

@janeyx99
Copy link
Contributor

janeyx99 commented Jan 5, 2023

Approving the CI for this PR to test the torchbench tests.

If we want to upgrade cmake version for the torchbench CI, should we also be consistent to upgrade cmake across all other CI?

@matthewgraca matthewgraca changed the title Consider removing pin to CMake version 3.22 on run_torchbench CI Consider updating pint to CMake version 3.23.1 on run_torchbench CI Jan 5, 2023
@matthewgraca matthewgraca changed the title Consider updating pint to CMake version 3.23.1 on run_torchbench CI Consider updating pin to CMake version 3.23.1 on run_torchbench CI Jan 5, 2023
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 6, 2023
# see details at: https://github.com/pytorch/pytorch/issues/74985
conda install -y numpy="${NUMPY_VERSION}" requests ninja pyyaml mkl mkl-include \
setuptools cmake=3.22 cffi typing_extensions boto3 \
setuptools cmake=3.23.1 cffi typing_extensions boto3 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, I misread this as 3.22.1 while it's 3.23.1. We used to fix the version to the former till recently #90307 where it was relaxed a bit to 3.22.*

Based on pytorch#90307

Co-authored-by: Huy Do <huydhn@gmail.com>
@huydhn
Copy link
Contributor

huydhn commented Jan 9, 2023

@pytorchbot merge -f 'Only pin cmake version. Linter has passed'

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@matthewgraca matthewgraca deleted the unpin-cmake-version branch January 9, 2023 19:55
huydhn pushed a commit to pytorch/test-infra that referenced this pull request Jan 18, 2023
### Description
#### Why pin cmake
When cmake was unpinned and cmake version 3.23.0 came out, Windows and
Ubuntu builds would break
pytorch/pytorch#74985. While this would
eventually be fixed, it was determined that it would be safer to pin
cmake to a known stable version, 3.22. Finally, it was decided that 3.22
be relaxed to 3.22.* pytorch/pytorch#90307.

#### Why drop `conda-forge`
While `conda-forge` contains much more updated versions of cmake
relative to conda's main channel
pytorch/pytorch#91739 (comment),
it's been revealed that this channel isn't particularly stable
pytorch/pytorch#87208. This has lead to the
decision to avoid the use of `conda-forge` unless it has been determined
that this channel contains updates the main channel doesn't have that
are vital to the functionality of PyTorch
pytorch/pytorch#91739 (comment).

### How is this tested?
There are some manual testing steps to be done, though I'm not sure how
to use them so I'm looking for guidance on this matter.

### People with relevant context
@huydhn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cmake==3.23.0 breaks the build with CMAKE_CUDA_COMPILE_WHOLE_COMPILATION

7 participants