Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup.py: fix ROCm build #7573

Merged
merged 1 commit into from
May 10, 2023
Merged

setup.py: fix ROCm build #7573

merged 1 commit into from
May 10, 2023

Conversation

justinkb
Copy link
Contributor

@justinkb justinkb commented May 10, 2023

@pytorch-bot
Copy link

pytorch-bot bot commented May 10, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7573

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 26 New Failures

As of commit f2ab2f7:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link

Hi @justinkb!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@NicolasHug
Copy link
Member

@atalman @malfet would you mind taking a look at this? Thanks!

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks ok to me, though I don't really know much about ROCm platform, but it does not affect others, and from that perspective it's fine.

@justinkb please sign CLA and than it's ready to go

@jeffdaily
Copy link
Contributor

You can merge this as-is, but someone from our pytorch team will take a deeper look at torchvision to make sure its doing the right thing for ROCm builds. I'm concerned that the globbing might be picking up other unwanted files from "cuda" and not the hipified ones.

@justinkb
Copy link
Contributor Author

I signed the CLA, guess the bot has a delay on rechecking ;-)

@pmeier
Copy link
Contributor

pmeier commented May 10, 2023

@jeffdaily

I'm concerned that the globbing might be picking up other unwanted files from "cuda" and not the hipified ones.

What globbing are you referring to? I don't see any for image_path, which is what is used here.

vision/setup.py

Lines 327 to 332 in 078959f

image_path = os.path.join(extensions_dir, "io", "image")
image_src = (
glob.glob(os.path.join(image_path, "*.cpp"))
+ glob.glob(os.path.join(image_path, "cpu", "*.cpp"))
+ glob.glob(os.path.join(image_path, "cuda", "*.cpp"))
)

vision/setup.py

Lines 129 to 130 in 078959f

this_dir = os.path.dirname(os.path.abspath(__file__))
extensions_dir = os.path.join(this_dir, "torchvision", "csrc")

Did you mean image_src?

Edit: 🤦 Yes, we are using image_src here as well. Scratch my comment above.

@justinkb
Copy link
Contributor Author

justinkb commented May 10, 2023

@jeffdaily you seem to be correct in that intuition. we need another fix to fully tackle the rocm issues.

shall I squash both into this PR?

edit: done so now. there may be other instances in setup.py where rocm stuff isn't correctly done, I will investigate further, but this at least fixes the image extension so torchvision can actually be imported.
edit2: this is actually bit a useless fix in practice, since decode_jpeg_cuda, when hipified, functionally is the same as decode_jpeg_cuda on cuda with nvjpeg not present. a rocm build can't utilize nvjpeg (I'm assuming), so after the c preprocessor is done with the file it'll just be the same bit left between #if !NVJPEG_FOUND and #else. still, this futureproofs things in the event nvidia ever makes nvjpeg work with hip (not likely)

@malfet malfet merged commit 20d90df into pytorch:main May 10, 2023
@malfet
Copy link
Contributor

malfet commented May 10, 2023

Merged, thank you very much for your contribution.

@pmeier pmeier mentioned this pull request May 11, 2023
facebook-github-bot pushed a commit that referenced this pull request May 16, 2023
Reviewed By: vmoens

Differential Revision: D45903813

fbshipit-source-id: 1fb0458d24833caa8cf587a0fbd47a9e998ceea2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

terminate called after throwing an instance of 'c10::Error'
6 participants