-
Notifications
You must be signed in to change notification settings - Fork 62
Fix BT709 full-range CUDA color conversion #791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d538cd6
ed095ec
5125468
2b26045
dfa1ceb
e4a4772
ffbefea
3a899bc
55071df
1af1c16
6e1d9da
e2e6346
5ca876d
566d1f6
4a0a945
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,6 +37,31 @@ def get_ffmpeg_major_version(): | |
return int(ffmpeg_version.split(".")[0]) | ||
|
||
|
||
def cuda_version_used_for_building_torch() -> Optional[tuple[int, int]]: | ||
# Return the CUDA version that was used to build PyTorch. That's not always | ||
# the same as the CUDA version that is currently installed on the running | ||
# machine, which is what we actually want. On the CI though, these are the | ||
# same. | ||
if torch.version.cuda is None: | ||
return None | ||
else: | ||
return tuple(int(x) for x in torch.version.cuda.split(".")) | ||
|
||
|
||
def psnr(a, b, max_val=255) -> float: | ||
# Return Peak Signal-to-Noise Ratio (PSNR) between two tensors a and b. The | ||
# higher, the better. | ||
# According to https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio, | ||
# typical values for the PSNR in lossy image and video compression are | ||
# between 30 and 50 dB. | ||
# Acceptable values for wireless transmission quality loss are considered to | ||
# be about 20 dB to 25 dB | ||
mse = torch.mean((a.float() - b.float()) ** 2) | ||
if mse == 0: | ||
return float("inf") | ||
return 20 * torch.log10(max_val / torch.sqrt(mse)).item() | ||
|
||
|
||
# For use with decoded data frames. On CPU Linux, we expect exact, bit-for-bit | ||
# equality. On CUDA Linux, we expect a small tolerance. | ||
# On other platforms (e.g. MacOS), we also allow a small tolerance. FFmpeg does | ||
|
@@ -637,3 +662,24 @@ def sample_format(self) -> str: | |
}, | ||
}, | ||
) | ||
|
||
|
||
# This is a BT.709 full range video, generated with: | ||
# ffmpeg -f lavfi -i testsrc2=duration=1:size=1920x720:rate=30 \ | ||
# -c:v libx264 -pix_fmt yuv420p -color_primaries bt709 -color_trc bt709 \ | ||
# -colorspace bt709 -color_range pc bt709_full_range.mp4 | ||
# | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Up until this PR, we've maintained the rule that all generated references can be generated from the generate_reference_resources.sh script. Is that something we want to continue to maintain? I think there is a lot of value in it, but that script is also not the cleanest artifact. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From what I can tell this script generates the bmp / pt reference frames, but not the source video themselves. I see similar comments there indicating how the videos were generated: Here we're not generating or using the frames, we're just comparing the CPU output with the GPU output. I agree we should also try to check against a ground truth reference, but I'll leave that out as a follow-up if that's OK There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're actually doing a bit of both, which is messy. I'm going to create an issue about it. |
||
# We can confirm the color space and color range with: | ||
# ffprobe -v quiet -select_streams v:0 -show_entries stream=color_space,color_transfer,color_primaries,color_range -of default=noprint_wrappers=1 test/resources/bt709_full_range.mp4 | ||
# color_range=pc | ||
# color_space=bt709 | ||
# color_transfer=bt709 | ||
# color_primaries=bt709 | ||
BT709_FULL_RANGE = TestVideo( | ||
filename="bt709_full_range.mp4", | ||
default_stream_index=0, | ||
stream_infos={ | ||
0: TestVideoStreamInfo(width=1280, height=720, num_color_channels=3), | ||
}, | ||
frames={0: {}}, # Not needed for now | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I developed this PR on CUDA 12.9, and I was unconditionally using
torch.testing.assert_close(gpu_frame, cpu_frame, rtol=0, atol=2)
which was passing. And it's passing on the 12.9 CI job.When I submitted the PR and the CI tested on CUDA 12.6, I realized the test wasn't passing. I'm unable to tell by how much, and I'm unable to reproduce locally because I don't have 12.6, and I can't ssh into the runner either.
12.8 is producing OK results, with a psnr of ~24, but it's not as good as with 12.9.
I honestly think we should treat this as bugs in NPP that were eventually fixed in 12.9. I can't imagine us having to use different code-paths depending on the current runtime CUDA version. That sounds too complicated, and I'm not even sure that is doable. I.e. this isn't about compile-time
#define
checks, that wouldn't be enough, because we can compile on 12.9 and run on 12.8.Note that 12.6 is considered to be legacy support from now on with torch: pytorch/pytorch#159980