Skip to content

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Sep 3, 2025

This PR fixes a CUDA stream synchronization bug between NVDEC (the decoder) and NPP (the color-conversion).

I hope this is a also the fix for an issue that was originally discovered by @ronghanghu with https://fburl.com/oay62aq2.

For context:

  • A frame is first decoded by NVDEC as YUV, and we then use NPP to convert the frame from YUV to RGB
  • CUDA streams are... too complex for me to properly explain, so let's ask ChatGPT for a primer. The important points for us are:
    • NVDEC has a stream, and NPP has a stream
    • In most cases, and unless you explicitly request otherwise, the NVDEC's stream is the same as the NPP stream. There's no problem in this case.
    • But that's not always the case. In general the NVDEC stream may be different from the NPP stream. Because streams execute in parallel, that means that we may be sometimes launching the NPP kernel while the NVDEC stream hasn't actually finished decoding the frame. That's obviously a problem, and that's the bug this PR is fixing: we now force the NPP stream to wait for the NVDEC stream to finish.

The bug can be reproduced with this snippet:

#%%
import torch

path = "/home/nicolashug/Downloads/break/video.mp4"


DECODE_WITH_DEFAULT_STREAM = True
DECODE_WITH_DEFAULT_STREAM = False  # Leads to bug in main

with torch.cuda.stream(None if DECODE_WITH_DEFAULT_STREAM else torch.cuda.Stream()):
    from torchcodec.decoders import VideoDecoder
    # This will set stream 0 for NVDEC no matter what, because it's hardcoded in
    # FFmpeg.
    dec = VideoDecoder(path, device="cuda")

    prev_frame = None
    for index in range(len(dec)):
        # This will use the default stream for NVDEC (hardcoded) and the current
        # stream for NPP, as set above in the context manager.
        frame = dec.get_frame_at(index)

        if prev_frame is not None:
            abs_diff = (frame.data.cpu().float() - prev_frame.data.cpu().float()).abs()
            if abs_diff.max() <= 3:
                raise ValueError(f"Frame {index} is too similar to previous frame")

        prev_frame = frame

What we're observing is that sometimes, the decoded frame i is actually the same as i - 1. That's a side-effect of the original stream sync bug. How? I'm not entirely sure, it may be related to how the pytorch CUDA caching allocator works, but IDK. What I do know is that this snippet consistently fails on main and consistently succeeds with this PR.


While working on this, I discovered an interesting (and sad) piece of trivia: the NVDEC stream is hardcoded by FFmpeg to always be the default stream. This means that doing something like this:

with torch.cuda.stream(torch.cuda.Stream()):
    dec = VideoDecoder(path, device="cuda")
    dec.get_frame_at(...)

will always use the default stream for the decoding. That's sad. That's not really expected. But there's nothing we can do about it. Only NPP will be relying on the newly created stream.

This is orthogonal to the bug: the bug still exists regardless of this fact.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 3, 2025
@NicolasHug NicolasHug added the bug Something isn't working label Sep 3, 2025
Copy link
Contributor

@scotts scotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, great catch!

@NicolasHug NicolasHug merged commit 9fa8554 into pytorch:main Sep 3, 2025
47 of 50 checks passed
@ronghanghu
Copy link

ronghanghu commented Sep 3, 2025

Thanks for the quick fix @NicolasHug!

Also, wondering if this patch will come in for the nightly TorchCodec pip wheel today, or will there be a updated pip wheel (e.g. 0.6.1) compatible with PyTorch 2.8? We're excited to try this out in our model inference use case.

@NicolasHug
Copy link
Member Author

Yes it's already available as the 2025-09-03 nightly wheel, as I manually triggered a nightly build after we merged this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants