Skip to content

Conversation

ahmadsharif1
Copy link
Contributor

@ahmadsharif1 ahmadsharif1 commented Aug 15, 2024

This diff adds GPU CI to TorchCodec.

It installs the NVFFMPEG headers, builds FFMPEG from source, builds TorchCodec and then runs test on the GPU tests.

This will be useful to catch GPU recording regressions.

I also added an explicit dependency on NPP image processing library because it was failing to build inside the container environment.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 15, 2024
@ahmadsharif1 ahmadsharif1 marked this pull request as ready for review August 20, 2024 15:44
Comment on lines 32 to 34
conda create --yes --name test
conda activate test
conda install --yes pip cmake pkg-config nasm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use --quiet where possible in every conda call, otherwise the logs get really long

# This fails because conda reactivate fails inside this env
# conda install --yes conda-forge::compilers

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to install torchaudio, and we shouldn't need to install torchvision at this point as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Torchvision is needed by gpu_benchmark.py


# We skip certain tests because they are not relevant to GPU decoding and they always fail with
# a custom FFMPEG build.
pytest -k "not (test_get_metadata or get_ffmpeg_version)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, fun fact: it's currently hard to tell whether the GPU tests are being ran at all, because they're protected within an "if" block. I'll share pointers on how to address this offline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accurate and a real problem. Should I address this in a different diff?


# We skip certain tests because they are not relevant to GPU decoding and they always fail with
# a custom FFMPEG build.
pytest -k "not (test_get_metadata or get_ffmpeg_version)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accurate and a real problem. Should I address this in a different diff?

conda activate test
conda install --quiet --yes pip cmake pkg-config nasm

pip install --quiet --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu124
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Torchvision is required for the gpu benchmark which uses functional transforms from vision. See gpu_benchmark.py for details.

https://github.com/pytorch/torchcodec/blob/f4065f1b477148cfb0ef94167fb0bf3a63803e55/benchmarks/decoders/gpu_benchmark.py#L8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we don't need it to build, right? So we might prefer to install it later. This is OK though, since this job is just a temporary setup I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Should I move it below after the build step so it doesn't accidentally leak as a build-time dep? WDYT?

Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahmadsharif1 . Let's follow-up about https://github.com/pytorch/torchcodec/pull/187/files#r1723560554 in a separate PR.

Eventually we'll want to make sure wheels can be built (and tested) and integrate that with our existing wheel.yml workflows, but this is a great first step to test GPU-related changes.

@ahmadsharif1 ahmadsharif1 merged commit cafea81 into meta-pytorch:main Aug 21, 2024
@ahmadsharif1 ahmadsharif1 deleted the cuda1 branch August 21, 2024 13:59
NicolasHug added a commit to NicolasHug/torchcodec that referenced this pull request Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants