Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py-tensorflow: remove patch file for 2.16-rocm-enhanced #44783

Merged
merged 8 commits into from
Jul 3, 2024

Conversation

afzpatel
Copy link
Contributor

The changes to make tensorflow with rocm support compatible with spack have been pushed:
ROCm/tensorflow-upstream@c467913

@spackbot-app spackbot-app bot added the patch label Jun 20, 2024
@afzpatel afzpatel marked this pull request as ready for review June 20, 2024 09:57
Copy link
Contributor

@tldahlgren tldahlgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed patch URLs exist. Just have a quick question about one of the two affected patches.

@tldahlgren tldahlgren self-assigned this Jun 20, 2024
@afzpatel
Copy link
Contributor Author

I had to add an additional patch as some of the paths were still assuming all the rocm components were in a single ROCM_PATH. I'll try to push the changes in the new patch to the rocm/tensorflow-upstream eventually.
I also removed version 2.14-rocm-enhanced since it didn't seem to be working and since we already have the newer 2.16-rocm-enhanced version.

Copy link
Contributor

@tldahlgren tldahlgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the unused patch file from the package repository.

@afzpatel
Copy link
Contributor Author

@spackbot rerun pipeline

Copy link

spackbot-app bot commented Jun 25, 2024

I've started that pipeline for you!

tldahlgren
tldahlgren previously approved these changes Jun 27, 2024
Copy link
Contributor

@tldahlgren tldahlgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@tldahlgren
Copy link
Contributor

LGTM. @adamjstewart @aweits Do either of you want to comment on this PR before it is merged?

@tldahlgren tldahlgren added the waiting-on-maintainer Waiting on a review from the package maintainer label Jun 27, 2024
Copy link
Contributor

@tldahlgren tldahlgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will defer to @adamjstewart on this PR.

Copy link
Member

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, the reason that these patches are being removed is because they were merged into the AMD release branches. AKA, the existing versions were working correctly until the patches were merged, then all builds started failing, and this PR is required to fix them. This lack of stability is the main reason we avoid using branches in Spack. Would be great if we could use stable, checksummable release tarballs or commit hashes in the future. Would be even better if we didn't need to use a fork for TF+ROCm at all, but I think that's already in the works. I'll try to restrain my complaints because I think you've heard them plenty of times before, but it would be really nice to get TF+ROCm in CI so we can detect and fix breakages like this.

@adamjstewart adamjstewart merged commit 5262412 into spack:develop Jul 3, 2024
13 checks passed
@afzpatel afzpatel deleted the tf-rocm branch July 3, 2024 11:29
@afzpatel
Copy link
Contributor Author

afzpatel commented Jul 3, 2024

If I'm understanding correctly, the reason that these patches are being removed is because they were merged into the AMD release branches. AKA, the existing versions were working correctly until the patches were merged, then all builds started failing, and this PR is required to fix them. This lack of stability is the main reason we avoid using branches in Spack. Would be great if we could use stable, checksummable release tarballs or commit hashes in the future. Would be even better if we didn't need to use a fork for TF+ROCm at all, but I think that's already in the works. I'll try to restrain my complaints because I think you've heard them plenty of times before, but it would be really nice to get TF+ROCm in CI so we can detect and fix breakages like this.

The changes were merged only into the develop branch of https://github.com/ROCm/tensorflow-upstream. The rocm-enhanced branches were unaffected and py-tensorflow+rocm was still building successfully, but I ran into some other problems when testing that required some additional changes. Also, I didn't like the idea of having patch files that are almost 400 lines long which is why I wanted to remove them once the changes were commited. I've brought up using release tarballs instead of branches to the folks that maintain the ROCm tf fork, so hopefully we'll be using tarballs for future releases.

Since TF+ROCM is working with spack built ROCm binaries now, I think we can try adding that to the CI as well.

@alecbcs alecbcs removed the waiting-on-maintainer Waiting on a review from the package maintainer label Jul 8, 2024
hariharan-devarajan pushed a commit to hariharan-devarajan/spack that referenced this pull request Jul 10, 2024
* remove patch file for py-tensorflow@2.16-rocm-enhanced

* add changes

* fix style errors

* remove 2.14-rocm-enhanced version and add patch file

* fix stlye error

* remove jit patch

* add 2.14-rocm-enhanced version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants