Fix BT709 full-range CUDA color conversion #791

NicolasHug · 2025-07-27T14:28:13Z

This PR improves the color conversion of BT.709 full-range videos. In main, we were always using nppiNV12ToRGB_709CSC_8u_P2C3R_Ctx, which is for limited (studio) range. For full-range, NPP provides nppiNV12ToRGB_709HDTV_8u_P2C3R_Ctx. But, as I tried it, it doesn't seem to match our CPU results very closely - in fact they were further away than with nppiNV12ToRGB_709CSC_8u_P2C3R_Ctx. So instead, we now rely on a custom color-conversion matrix which provides very, very close results to the CPU on CUDA 12.9.

I will leave this PR description here. I tried to document a lot of the color conversion process in a note, in the code. I will be extending on this note when finalizing the 10bit support, hopefully soon.

Comparing CPU frame (top) with GPU frame (bottom) on CUDA 12.9

On main

This PR

NicolasHug · 2025-08-15T09:55:11Z

test/test_decoders.py

+            gpu_frame = decoder_gpu.get_frame_at(frame_index).data.cpu()
+            cpu_frame = decoder_cpu.get_frame_at(frame_index).data
+
+            torch.testing.assert_close(gpu_frame, cpu_frame, rtol=0, atol=2)


On main this fails on BT709_FULL_RANGE with

E AssertionError: Tensor-likes are not close! E E Mismatched elements: 2012707 / 2764800 (72.8%) E Greatest absolute difference: 20 at index (0, 62, 206) (up to 2 allowed) E Greatest relative difference: 1.0 at index (0, 0, 640) (up to 0 allowed)

NicolasHug · 2025-08-18T09:06:16Z

test/test_decoders.py

+                torch.testing.assert_close(gpu_frame, cpu_frame, rtol=0, atol=2)
+            elif cuda_version_used_for_building_torch() == (12, 8):
+                assert psnr(gpu_frame, cpu_frame) > 20
+


So, I developed this PR on CUDA 12.9, and I was unconditionally using torch.testing.assert_close(gpu_frame, cpu_frame, rtol=0, atol=2) which was passing. And it's passing on the 12.9 CI job.

When I submitted the PR and the CI tested on CUDA 12.6, I realized the test wasn't passing. I'm unable to tell by how much, and I'm unable to reproduce locally because I don't have 12.6, and I can't ssh into the runner either.

12.8 is producing OK results, with a psnr of ~24, but it's not as good as with 12.9.

I honestly think we should treat this as bugs in NPP that were eventually fixed in 12.9. I can't imagine us having to use different code-paths depending on the current runtime CUDA version. That sounds too complicated, and I'm not even sure that is doable. I.e. this isn't about compile-time #define checks, that wouldn't be enough, because we can compile on 12.9 and run on 12.8.

Note that 12.6 is considered to be legacy support from now on with torch: pytorch/pytorch#159980

scotts · 2025-08-18T14:23:18Z

Is there any documentation that explains "709 HDTV full color conversion" versus "RGB 709 CSC color conversion" more? Whenever I search for it, I always land back at the Nvidia docs. Is it an Nvidia-only concept, or a part of the the standard, but using a different name?

NicolasHug · 2025-08-18T14:34:56Z

You probably won't find resources directly addressing this. It mostly has to do with the color range: full vs limited. NVidia calls those "HDTV" and "CSC".

But the color-range (full vs limited) is an orthogonal concept to the color space (709), so it's unlikely you'll find resources explaining both concepts simultaneously for 709 specifically.

I am also really struggling to find decent resources that just talk about the color range. It seems to be the kind of concept that you have to just know about. Almost all wikipedia pages I'm reading are assuming pre-existing knowledge about the concept. There is https://www.highresolution.tv/2024/07/02/hre-shorts-color-range/ which sounds decent, but TBH I've used LLMs to build my understanding.

When working on 10bit support, I will be extending the note to discuss limited range.

scotts · 2025-08-18T14:46:14Z

Yeah, I'm going through your comment (which is amazing!) and I'm coming to the understanding that it's just the names Nvidia uses for full-color and limited-color.

scotts · 2025-08-19T03:01:17Z

src/torchcodec/_core/CudaDeviceInterface.cpp

+// defined ours above. HOWEVER, the function itself expects Y to be in [0, 255]
+// and U,V to be in [-128, 127]. So U and V need to be offset by -128 to center
+// them around 0. The offsets can be applied by adding a 4th column to the
+// matrix:


I get lost in this paragraph.

they expect the matrix coefficient to assume the input Y is in [0, 1], and [U, V] in [-0.5, 0.5]. We're in luck, that's how we defined ours above.

Who is "they"? The NPP functions? And at this point, the understanding I personally have from the text is that our matrix is aligned with what "they" want. But:

HOWEVER, the function itself expects Y to be in [0, 255] and U,V to be in [-128, 127]. So U and V need to be offset by -128 to center them around 0. The offsets can be applied by adding a 4th column to the matrix:

Now I'm confused: I feel like the first sentence I quoted said that Y is in [0, 1], but the second one says it's in [0, 255]. Or are "they" and "the function itself" different entities? By "the function itself," do you mean nppiNV12ToRGB_8u_ColorTwist32f_P2C3R_Ctx() from NPP? Same confusion applies to U and V.

What I think may be the case is that by "they" you mean "the field of people who understand this stuff and defined it" and by "this function" you mean the specific function from NPP that implements the functionality we need. Is that the case?

I'll try to clarify - this is all very confusing to me too because the API really is quite messy

scotts · 2025-08-19T03:09:44Z

test/utils.py

+# ffmpeg -f lavfi -i testsrc2=duration=1:size=1920x720:rate=30 \
+# -c:v libx264 -pix_fmt yuv420p -color_primaries bt709 -color_trc bt709 \
+# -colorspace bt709 -color_range pc bt709_full_range.mp4
+#


Up until this PR, we've maintained the rule that all generated references can be generated from the generate_reference_resources.sh script. Is that something we want to continue to maintain? I think there is a lot of value in it, but that script is also not the cleanest artifact.

From what I can tell this script generates the bmp / pt reference frames, but not the source video themselves. I see similar comments there indicating how the videos were generated:

https://github.com/pytorch/torchcodec/blob/ffcb7ab2e98c204dfb103f46b2db154cbf1aa713/test/generate_reference_resources.sh#L65-L66

Here we're not generating or using the frames, we're just comparing the CPU output with the GPU output.

I agree we should also try to check against a ground truth reference, but I'll leave that out as a follow-up if that's OK

We're actually doing a bit of both, which is messy. I'm going to create an issue about it.

scotts · 2025-08-19T03:11:08Z

@NicolasHug, thank you for this fix! This is an enormous improvement in the quality of our CUDA support, and it is extremely not obvious!

NicolasHug and others added 5 commits July 16, 2025 03:18

Fix full-range CUDA decoding

d538cd6

Add TORCHCODEC_CMAKE_BUILD_DIR

ed095ec

Merge branch 'speedup-build' into studiorange

5125468

Try custom color conversion matrix

2b26045

Found the right one

dfa1ceb

NicolasHug marked this pull request as draft July 27, 2025 14:28

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 27, 2025

NicolasHug changed the title ~~Support 10 bits videos on CUDA~~ [WIP] Fix full-range CUDA decoding Jul 27, 2025

NicolasHug added 3 commits August 14, 2025 20:58

Merge branch 'main' of github.com:pytorch/torchcodec into studiorange

e4a4772

Docs

ffbefea

Cleanup, comments

3a899bc

NicolasHug changed the title ~~[WIP] Fix full-range CUDA decoding~~ Fix BT709 full-range CUDA color conversion Aug 15, 2025

Merge branch 'main' of github.com:pytorch/torchcodec into studiorange

55071df

NicolasHug marked this pull request as ready for review August 15, 2025 09:52

NicolasHug commented Aug 15, 2025

View reviewed changes

NicolasHug added 5 commits August 15, 2025 10:59

Fix

1af1c16

Move matrix as global constexpr

6e1d9da

Debug

e2e6346

debug

5ca876d

Avoid 12.6

566d1f6

NicolasHug commented Aug 18, 2025

View reviewed changes

scotts reviewed Aug 19, 2025

View reviewed changes

scotts approved these changes Aug 19, 2025

View reviewed changes

Better comments

4a0a945

NicolasHug merged commit 4af0bfe into meta-pytorch:main Aug 19, 2025
47 checks passed

NicolasHug deleted the studiorange branch August 19, 2025 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix BT709 full-range CUDA color conversion #791

Fix BT709 full-range CUDA color conversion #791

Uh oh!

NicolasHug commented Jul 27, 2025 •

edited

Loading

Uh oh!

NicolasHug Aug 15, 2025

Uh oh!

NicolasHug Aug 18, 2025

Uh oh!

scotts commented Aug 18, 2025

Uh oh!

NicolasHug commented Aug 18, 2025

Uh oh!

scotts commented Aug 18, 2025

Uh oh!

scotts Aug 19, 2025

Uh oh!

NicolasHug Aug 19, 2025

Uh oh!

scotts Aug 19, 2025

Uh oh!

NicolasHug Aug 19, 2025 •

edited

Loading

Uh oh!

scotts Aug 19, 2025

Uh oh!

scotts commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

Fix BT709 full-range CUDA color conversion #791

Fix BT709 full-range CUDA color conversion #791

Uh oh!

Conversation

NicolasHug commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comparing CPU frame (top) with GPU frame (bottom) on CUDA 12.9

On main

This PR

Uh oh!

NicolasHug Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

scotts commented Aug 18, 2025

Uh oh!

NicolasHug commented Aug 18, 2025

Uh oh!

scotts commented Aug 18, 2025

Uh oh!

scotts Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

scotts commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

NicolasHug commented Jul 27, 2025 •

edited

Loading

NicolasHug Aug 19, 2025 •

edited

Loading