Skip to content

Conversation

NicolasHug
Copy link
Contributor

@NicolasHug NicolasHug commented Oct 2, 2025

This PR:

  • (Is a lot simpler than it seems, 80% of it are just comments and tests)
  • Adds support for approximate mode
  • Adds support for time-based APIs.
  • Drastically simplifies the logic of the BETA CUDA interface. We now rely on the NVCUVID callback which tells us when a frame is ready in display order:
    • We don't have to solve the frame reordering problem anymore, the callback is triggered in the proper order.
    • It correctly assigns the frame's PTS without us having to do any guess.

If we weren't relying on the NVCUVID callback, then we would have to solve both problems above ourselves, with codec-specific solutions. As a resut this PR also drastically simplifies future support for additional codecs - spoiler, I already added #919 and #920 for HEVC and AV1.

In #910, I described this design alternative and at the time, I thought it wasn't compatible enough with our sendPacket() / receiveFrame() architecture. With #910 now merged as a minimal clean-ish skeleton of the interface, I can reason about this more clearly. And after spending a few days trying (and failing) to solve the frame-reordering problem for H264 only, I came to the conclusion that this solution, in this PR, is well worth it.

This new simplified design does come with a minor trade-off. I explain it in a note, in the code.


Why is approximate mode and time-based APIs now supported? Let's first answer: why was approximate mode and time-based APIs not supported before? It was because receiveFrame(avFrame, desiredPts) was only able to return a frame if we were able to find one with the exact desiredPts. On approximate mode, we can't guarantee that desiredPts corresponds to a frame's pts, so there generally can't be a match. Same with time-based APIs: desiredPts may not correspond to where a frame starts.

In this PR, we don't need that exact desiredPts matching logic anymore. But we can still guarantee that receiveFrame returns frames in display order, so we got approximate mode and time-based support for free.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 2, 2025
@NicolasHug NicolasHug changed the title [WIP] BETA CUDA interface: support for approximate mode, time-based APIs BETA CUDA interface: support for approximate mode, time-based APIs Oct 2, 2025
@NicolasHug NicolasHug changed the title BETA CUDA interface: support for approximate mode, time-based APIs BETA CUDA interface: support for approximate mode and time-based APIs Oct 2, 2025
@NicolasHug NicolasHug marked this pull request as ready for review October 2, 2025 17:44

static int CUDAAPI
pfnDisplayPictureCallback(void* pUserData, CUVIDPARSERDISPINFO* dispInfo) {
BetaCudaDeviceInterface* decoder =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I prefer auto when the expression on the right has the literal type we're getting on the left.

parserParams.pfnSequenceCallback = pfnSequenceCallback;
parserParams.pfnDecodePicture = pfnDecodePictureCallback;
parserParams.pfnDisplayPicture = nullptr;
parserParams.pfnDisplayPicture = pfnDisplayPictureCallback;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key difference, correct? That is, by registering this callback, we get the new behavior and can delete all of the relevant code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's correct

Copy link
Contributor

@scotts scotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing improvement! :)

@NicolasHug NicolasHug merged commit 6d72f11 into meta-pytorch:main Oct 3, 2025
49 of 50 checks passed
@NicolasHug NicolasHug deleted the nvdec-rework-frame-ordering branch October 3, 2025 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants