Skip to content

Conversation

NicolasHug
Copy link
Contributor

@NicolasHug NicolasHug commented Oct 8, 2025

On main, this fails:

from torchcodec.decoders import VideoDecoder
from joblib import Parallel, delayed

video_path = "/home/nicolashug/videos_h264/vid.mp4"

import torch

def decode_one_video():
    decoder = VideoDecoder(video_path, device="cuda:0:beta", seek_mode="approximate")
    decoder.get_frame_at(-1)

Parallel(n_jobs=8, prefer="threads")(delayed(decode_one_video)() for _ in range(100))

with

RuntimeError: Failed to get decoder caps: 201

That is, spawning one VideoDecoder per thread fails with CUDA error 201 which is CUDA_INVALID_CONTEXT.

Uh?

I'm confused as well. This seems to indicate that our CUDA context initialization hack, where we create a dummy tensor to force context creation, doesn't work as expected:

// Initialize CUDA context with a dummy tensor
torch::Tensor dummyTensorForCudaInitialization = torch::empty(
{1}, torch::TensorOptions().dtype(torch::kUInt8).device(device_));

After a lot of trial and error, it seems that using torch::zeros instead of torch::empty resolves the problem. Why? I have no idea. Maybe torch::empty was optimized out? Maybe, but that doesn't explain why the default CUDA interface works fine with teh snippet above... Anyway, now they both use torch::zeros, and they both work when running my multithreaded benchmarks.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 8, 2025
@NicolasHug NicolasHug merged commit 986f10c into meta-pytorch:main Oct 9, 2025
50 checks passed
@NicolasHug NicolasHug deleted the fix-cuda-context-init branch October 9, 2025 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants