Fix `NVDEC -> NPP` CUDA stream sync issue #868

NicolasHug · 2025-09-03T13:15:54Z

This PR fixes a CUDA stream synchronization bug between NVDEC (the decoder) and NPP (the color-conversion).

I hope this is a also the fix for an issue that was originally discovered by @ronghanghu with https://fburl.com/oay62aq2.

For context:

A frame is first decoded by NVDEC as YUV, and we then use NPP to convert the frame from YUV to RGB
CUDA streams are... too complex for me to properly explain, so let's ask ChatGPT for a primer. The important points for us are:
- NVDEC has a stream, and NPP has a stream
- In most cases, and unless you explicitly request otherwise, the NVDEC's stream is the same as the NPP stream. There's no problem in this case.
- But that's not always the case. In general the NVDEC stream may be different from the NPP stream. Because streams execute in parallel, that means that we may be sometimes launching the NPP kernel while the NVDEC stream hasn't actually finished decoding the frame. That's obviously a problem, and that's the bug this PR is fixing: we now force the NPP stream to wait for the NVDEC stream to finish.

The bug can be reproduced with this snippet:

#%%
import torch

path = "/home/nicolashug/Downloads/break/video.mp4"


DECODE_WITH_DEFAULT_STREAM = True
DECODE_WITH_DEFAULT_STREAM = False  # Leads to bug in main

with torch.cuda.stream(None if DECODE_WITH_DEFAULT_STREAM else torch.cuda.Stream()):
    from torchcodec.decoders import VideoDecoder
    # This will set stream 0 for NVDEC no matter what, because it's hardcoded in
    # FFmpeg.
    dec = VideoDecoder(path, device="cuda")

    prev_frame = None
    for index in range(len(dec)):
        # This will use the default stream for NVDEC (hardcoded) and the current
        # stream for NPP, as set above in the context manager.
        frame = dec.get_frame_at(index)

        if prev_frame is not None:
            abs_diff = (frame.data.cpu().float() - prev_frame.data.cpu().float()).abs()
            if abs_diff.max() <= 3:
                raise ValueError(f"Frame {index} is too similar to previous frame")

        prev_frame = frame

What we're observing is that sometimes, the decoded frame i is actually the same as i - 1. That's a side-effect of the original stream sync bug. How? I'm not entirely sure, it may be related to how the pytorch CUDA caching allocator works, but IDK. What I do know is that this snippet consistently fails on main and consistently succeeds with this PR.

While working on this, I discovered an interesting (and sad) piece of trivia: the NVDEC stream is hardcoded by FFmpeg to always be the default stream. This means that doing something like this:

with torch.cuda.stream(torch.cuda.Stream()):
    dec = VideoDecoder(path, device="cuda")
    dec.get_frame_at(...)

will always use the default stream for the decoding. That's sad. That's not really expected. But there's nothing we can do about it. Only NPP will be relying on the newly created stream.

This is orthogonal to the bug: the bug still exists regardless of this fact.

scotts

Wow, great catch!

ronghanghu · 2025-09-03T15:43:03Z

Thanks for the quick fix @NicolasHug!

Also, wondering if this patch will come in for the nightly TorchCodec pip wheel today, or will there be a updated pip wheel (e.g. 0.6.1) compatible with PyTorch 2.8? We're excited to try this out in our model inference use case.

NicolasHug · 2025-09-03T16:29:46Z

Yes it's already available as the 2025-09-03 nightly wheel, as I manually triggered a nightly build after we merged this :)

Fix NVDEC -> NPP CUDA stream sync issue

e14f0f2

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 3, 2025

NicolasHug added the bug Something isn't working label Sep 3, 2025

scotts approved these changes Sep 3, 2025

View reviewed changes

NicolasHug merged commit 9fa8554 into pytorch:main Sep 3, 2025
47 of 50 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `NVDEC -> NPP` CUDA stream sync issue #868

Fix `NVDEC -> NPP` CUDA stream sync issue #868

Uh oh!

NicolasHug commented Sep 3, 2025 •

edited

Loading

Uh oh!

scotts left a comment

Uh oh!

Uh oh!

ronghanghu commented Sep 3, 2025 •

edited

Loading

Uh oh!

NicolasHug commented Sep 3, 2025

Uh oh!

Uh oh!

Fix NVDEC -> NPP CUDA stream sync issue #868

Fix NVDEC -> NPP CUDA stream sync issue #868

Uh oh!

Conversation

NicolasHug commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scotts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ronghanghu commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Sep 3, 2025

Uh oh!

Uh oh!

Fix `NVDEC -> NPP` CUDA stream sync issue #868

Fix `NVDEC -> NPP` CUDA stream sync issue #868

NicolasHug commented Sep 3, 2025 •

edited

Loading

ronghanghu commented Sep 3, 2025 •

edited

Loading