BETA CUDA interface: Fix CUDA context initialization #946
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
On
main
, this fails:with
That is, spawning one
VideoDecoder
per thread fails with CUDA error 201 which is CUDA_INVALID_CONTEXT.Uh?
I'm confused as well. This seems to indicate that our CUDA context initialization hack, where we create a dummy tensor to force context creation, doesn't work as expected:
torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp
Lines 206 to 208 in 9c5da20
After a lot of trial and error, it seems that using
torch::zeros
instead oftorch::empty
resolves the problem. Why? I have no idea. Maybetorch::empty
was optimized out? Maybe, but that doesn't explain why the default CUDA interface works fine with teh snippet above... Anyway, now they both usetorch::zeros
, and they both work when running my multithreaded benchmarks.