Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add patches for VT #340

Merged
merged 5 commits into from Feb 28, 2024
Merged

feat: add patches for VT #340

merged 5 commits into from Feb 28, 2024

Conversation

gnattu
Copy link
Member

@gnattu gnattu commented Feb 24, 2024

Changes

This backports necessary codes from upstream to make the VT work properly. Introduced scale_vt, overlay_videotoolbox and interop between VT and OCL This also fixed full range color and HDR passthrough support on 6.0.1

Issues

Step one for #339

This backports necessary codes from upstream to make the VT work properly.
Introduced scale_vt, overlay_videotoolbox and interop between VT and OCL
This also fixed full range color and HDR passthrough support on 6.0.1
Copy link
Member

@nyanmisaka nyanmisaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Except for two patches that can be improved.

@nyanmisaka
Copy link
Member

I ran the fate test with all patches applied and found it complains. Please help me add these two patches.

diff --git a/debian/patches/0004-add-cuda-pixfmt-converter-impl.patch b/debian/patches/0004-add-cuda-pixfmt-converter-impl.patch
index 9ba8147..0a12be4 100644
--- a/debian/patches/0004-add-cuda-pixfmt-converter-impl.patch
+++ b/debian/patches/0004-add-cuda-pixfmt-converter-impl.patch
@@ -1,3 +1,15 @@
+Index: jellyfin-ffmpeg/tests/ref/fate/source
+===================================================================
+--- jellyfin-ffmpeg.orig/tests/ref/fate/source
++++ jellyfin-ffmpeg/tests/ref/fate/source
+@@ -22,6 +22,7 @@ compat/djgpp/math.h
+ compat/float/float.h
+ compat/float/limits.h
+ libavcodec/bitstream_template.h
++libavfilter/dither_matrix.h
+ tools/decode_simple.h
+ Use of av_clip() where av_clip_uintp2() could be used:
+ Use of av_clip() where av_clip_intp2() could be used:
 Index: jellyfin-ffmpeg/libavfilter/dither_matrix.h
 ===================================================================
 --- /dev/null
-- 

And add this to your debian/patches/0064-backport-enhanced-vt.patch

diff --git a/tests/fate/hevc.mak b/tests/fate/hevc.mak
index 20c2e5b..c5df8ea 100644
--- a/tests/fate/hevc.mak
+++ b/tests/fate/hevc.mak
@@ -222,7 +222,7 @@ FATE_HEVC-$(call ALLYES, HEVC_DEMUXER MOV_DEMUXER HEVC_PARSER HEVC_MP4TOANNEXB_B
 fate-hevc-bsf-mp4toannexb: tests/data/hevc-mp4.mov
 fate-hevc-bsf-mp4toannexb: CMD = md5 -i $(TARGET_PATH)/tests/data/hevc-mp4.mov -c:v copy -fflags +bitexact -f hevc
 fate-hevc-bsf-mp4toannexb: CMP = oneline
-fate-hevc-bsf-mp4toannexb: REF = 1873662a3af1848c37e4eb25722c8df9
+fate-hevc-bsf-mp4toannexb: REF = 7d05a79c7a6665ae22c0043a4d83a811
 
 fate-hevc-skiploopfilter: CMD = framemd5 -skip_loop_filter nokey -i $(TARGET_SAMPLES)/hevc-conformance/SAO_D_Samsung_5.bit -sws_flags bitexact
 FATE_HEVC-$(call FRAMEMD5, HEVC, HEVC, HEVC_PARSER) += fate-hevc-skiploopfilter
-- 

Just in case someone in our downstream wants to run this in their builds. It doesn't mean much to us.

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

The fate is fixed, but we are having some situation.

The HDR passthrough works but the color is not correct and this is an upstream issue. I tested upstream 6.1 and it also occurred there. The red light will look too yellow. It may appears to be OK for scene like flames, but it could be very wrong in certain scene like a sunset. Tone-mapped SDR output is not affected by this. Handbrake does not have similar issue so I may post additional patches during the 10.9 cycle to port the correct implementation from there to see if we can get it fixed an upstreamed.

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

I think I found the issue. The required metadata like kVTCompressionPropertyKey_MasteringDisplayColorVolume, kVTCompressionPropertyKey_ContentLightLevelInfo, kVTCompressionPropertyKey_AmbientViewingEnvironment seems not be set by the encoder.

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

A big difference for VideoToolbox is that such static Metadata should be passed in during the session configuration stage, but it seems like ffmpeg handles it as a per-frame data? What can I do to handle this? Hack the per-frame handler to modify the session config when the first frame comes in?

@nyanmisaka
Copy link
Member

A big difference for VideoToolbox is that such static Metadata should be passed in during the session configuration stage, but it seems like ffmpeg handles it as a per-frame data? What can I do to handle this? Hack the per-frame handler to modify the session config when the first frame comes in?

Yes ffmpeg handles HDR metadata per frame. Maybe you can defer finalize the VT encoding session until the first frame arrives.

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

It seems like the problem is more than the encoder. If I set -hwaccel_output_format videotoolbox_vld, then the Tone mapped output by scale_vt filter (the SDR output) will also have wrong colors. However, the result by OpenCL tone mapping filter is OK with this. What's the possible cause of this? What does hwaccel_output_format do internally?

@nyanmisaka
Copy link
Member

Some HW decoders on other platforms fails to set the correct colorspace attributes for the output frames, so I usually override it based on the results of ffprobe.

setparams=color_primaries=bt2020:color_trc=smpte2084:colorspace=bt2020nc, ... ,color_primaries=bt709:color_trc=bt709:colorspace=bt709

Using -hwaccel_output_format videotoolbox_vld will retain the decoded frames as HW frames. If you do not set it, ffmpeg will copy/transfer the HW frames to SW frames. This will greatly reduce transcoding efficiency.

See also, figure 3 & 4 in https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

Strange. setparams fixes the HDR passthrough, but it does not fix the tone mapping by scale_vt and we are still having an incorrect SDR outptut.

@nyanmisaka
Copy link
Member

FFmpeg/FFmpeg@c2c96c4

./ffmpeg -hwaccel videotoolbox \
	-hwaccel_output_format videotoolbox_vld \
	-i ios-265.mov \
	-c:v hevc_videotoolbox \
	-profile:v main \
	-b:v 3M \
	-vf scale_vt=w=iw/2:h=ih/2:color_matrix=bt709:color_primaries=bt709:color_transfer=bt709 \
	-c:a copy \
	-tag:v hvc1 \
	/tmp/test.mp4

Input: hevc (Main 10) (hvc1 / 0x31637668), yuv420p10le(tv, bt2020nc/bt2020/arib-std-b67), 3840x2160
Output: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv, bt709, progressive), 1920x1080

The upstream maintainer tested HLG->SDR using the command containing -hwaccel_output_format videotoolbox_vld. You should probably use hwdownload and libx264 to verify at which stage this problem occurs.

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

His conclusion was wrong, deadly wrong. Probably he was just using an example that is hard to tell the wrong color, or he was just not careful enough.

I will show you an example that everyone can tell that the color is wrong:

This is what it should be, and is what ffmpeg will give you without -hwaccel_output_format videotoolbox_vld:

Correct sunset scene

This is what specifying -hwaccel_output_format videotoolbox_vld will give you:
Incorrect sunset scene

This sunset scene is wrong enough, as nobody would think the sunset is yellow but not red.

Both are generated using upstream ffmpeg 6.1, so it was not our backporting's problem.

@nyanmisaka
Copy link
Member

Can u use ffprobe to print the stream info of the input and output files?

ffprobe -v quiet -hide_banner -i FILE -show_streams -select_streams v -print_format json

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

The source:

{
    "streams": [
        {
            "index": 0,
            "codec_name": "hevc",
            "codec_long_name": "H.265 / HEVC (High Efficiency Video Coding)",
            "profile": "Main 10",
            "codec_type": "video",
            "codec_tag_string": "[0][0][0][0]",
            "codec_tag": "0x0000",
            "width": 3840,
            "height": 2160,
            "coded_width": 3840,
            "coded_height": 2160,
            "closed_captions": 0,
            "film_grain": 0,
            "has_b_frames": 2,
            "sample_aspect_ratio": "1:1",
            "display_aspect_ratio": "16:9",
            "pix_fmt": "yuv420p10le",
            "level": 150,
            "color_range": "tv",
            "color_space": "bt2020nc",
            "color_transfer": "arib-std-b67",
            "color_primaries": "bt2020",
            "chroma_location": "left",
            "field_order": "progressive",
            "refs": 1,
            "r_frame_rate": "30/1",
            "avg_frame_rate": "30/1",
            "time_base": "1/1000",
            "extradata_size": 1087,
            "disposition": {
                "default": 0,
                "dub": 0,
                "original": 0,
                "comment": 0,
                "lyrics": 0,
                "karaoke": 0,
                "forced": 0,
                "hearing_impaired": 0,
                "visual_impaired": 0,
                "clean_effects": 0,
                "attached_pic": 0,
                "timed_thumbnails": 0,
                "non_diegetic": 0,
                "captions": 0,
                "descriptions": 0,
                "metadata": 0,
                "dependent": 0,
                "still_image": 0
            },
            "tags": {
                "ENCODER": "Lavc57.107.100 libx265"
            }
        }
    ]
}

The correct output:

{
    "streams": [
        {
            "index": 0,
            "codec_name": "hevc",
            "codec_long_name": "H.265 / HEVC (High Efficiency Video Coding)",
            "profile": "Main",
            "codec_type": "video",
            "codec_tag_string": "hvc1",
            "codec_tag": "0x31637668",
            "width": 1920,
            "height": 1080,
            "coded_width": 1920,
            "coded_height": 1088,
            "closed_captions": 0,
            "film_grain": 0,
            "has_b_frames": 0,
            "sample_aspect_ratio": "1:1",
            "display_aspect_ratio": "16:9",
            "pix_fmt": "yuv420p",
            "level": 123,
            "color_range": "tv",
            "color_space": "bt709",
            "color_transfer": "bt709",
            "color_primaries": "bt709",
            "chroma_location": "left",
            "field_order": "progressive",
            "refs": 1,
            "id": "0x1",
            "r_frame_rate": "30/1",
            "avg_frame_rate": "30/1",
            "time_base": "1/15360",
            "start_pts": 0,
            "start_time": "0.000000",
            "duration_ts": 1518592,
            "duration": "98.866667",
            "bit_rate": "8912514",
            "nb_frames": "2966",
            "extradata_size": 111,
            "disposition": {
                "default": 1,
                "dub": 0,
                "original": 0,
                "comment": 0,
                "lyrics": 0,
                "karaoke": 0,
                "forced": 0,
                "hearing_impaired": 0,
                "visual_impaired": 0,
                "clean_effects": 0,
                "attached_pic": 0,
                "timed_thumbnails": 0,
                "non_diegetic": 0,
                "captions": 0,
                "descriptions": 0,
                "metadata": 0,
                "dependent": 0,
                "still_image": 0
            },
            "tags": {
                "language": "und",
                "handler_name": "VideoHandler",
                "vendor_id": "[0][0][0][0]",
                "encoder": "Lavc60.31.102 hevc_videotoolbox"
            }
        }
    ]
}

The wrong output:

{
    "streams": [
        {
            "index": 0,
            "codec_name": "hevc",
            "codec_long_name": "H.265 / HEVC (High Efficiency Video Coding)",
            "profile": "Main",
            "codec_type": "video",
            "codec_tag_string": "hvc1",
            "codec_tag": "0x31637668",
            "width": 1920,
            "height": 1080,
            "coded_width": 1920,
            "coded_height": 1088,
            "closed_captions": 0,
            "film_grain": 0,
            "has_b_frames": 0,
            "sample_aspect_ratio": "1:1",
            "display_aspect_ratio": "16:9",
            "pix_fmt": "yuv420p",
            "level": 123,
            "color_range": "tv",
            "color_space": "bt709",
            "color_transfer": "bt709",
            "color_primaries": "bt709",
            "chroma_location": "left",
            "field_order": "progressive",
            "refs": 1,
            "id": "0x1",
            "r_frame_rate": "30/1",
            "avg_frame_rate": "30/1",
            "time_base": "1/15360",
            "start_pts": 0,
            "start_time": "0.000000",
            "duration_ts": 1518592,
            "duration": "98.866667",
            "bit_rate": "9473045",
            "nb_frames": "2966",
            "extradata_size": 111,
            "disposition": {
                "default": 1,
                "dub": 0,
                "original": 0,
                "comment": 0,
                "lyrics": 0,
                "karaoke": 0,
                "forced": 0,
                "hearing_impaired": 0,
                "visual_impaired": 0,
                "clean_effects": 0,
                "attached_pic": 0,
                "timed_thumbnails": 0,
                "non_diegetic": 0,
                "captions": 0,
                "descriptions": 0,
                "metadata": 0,
                "dependent": 0,
                "still_image": 0
            },
            "tags": {
                "language": "und",
                "handler_name": "VideoHandler",
                "vendor_id": "[0][0][0][0]",
                "encoder": "Lavc60.31.102 hevc_videotoolbox"
            }
        }
    ]
}

Both output are using the exact command the upstream maintainer used, except change for the input file and the removal of -hwaccel_output_format videotoolbox_vld

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

Both output JSON only differs in bitrate:

image

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

And the most interesting part: it only affects the tone mapping. The HDR passthrough works fine using setparams, even when scaling using scale_vt.

I don't think this is a problem inside the scale_vt filter itself. It could be the internal handling of videotoolbox_vld dropping necessary metadata, so the pixel buffers passed to scale_vt were never correct.

@nyanmisaka
Copy link
Member

If hwaccel_output_format is not used, then ffmpeg creates an intermediate buffer and sets its color, passing it to the first filter scale_vt.

w/o hwaccel_output_format
vt decoder (set color) -> download -> upload (set color) -> scale_vt (set color) -> vt encoder

w/ hwaccel_output_format
vt decoder (set color) -> scale_vt (set color) -> vt encoder

// set color func
https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/hwcontext_videotoolbox.c#L583-L603

// upload
https://github.com/FFmpeg/FFmpeg/blob/84e669167de9394e0de1bee0244f48ffac6f3826/libavutil/hwcontext_videotoolbox.c#L717

// decoder postproc callback
https://github.com/FFmpeg/FFmpeg/blob/84e669167de9394e0de1bee0244f48ffac6f3826/libavcodec/videotoolbox.c#L146

@nyanmisaka
Copy link
Member

I think we can use CVBufferGetAttachment to trace the CVPixelBufferRef of the input of scale_vt.

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

I've checked and it is not the attachments, very very weird. I even overwrote both the source and destination CVPixelBuffer in the scale_vt filter, but still the same result, and I'm totally lost now.

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

Submitted to upstream and see if they have a clue for it:
https://trac.ffmpeg.org/ticket/10884

At the same time, I'm going to add a workaround in Jellyfin Server to disable hwaccel_output_format when scale_vt is used for tone mapping, at least on Apple Silicon.

A very interesting thing is, using hwupload actually give me higher fps than using hwaccel_output_format on Apple Silicon:

./ffmpeg -hwaccel videotoolbox \
        -i /Users/gnattu/Movies/ffmpeg-sample/4K_HLG.mkv \
        -c:v hevc_videotoolbox \
        -profile:v main \
        -b:v 3M \
        -vf format=p010,hwupload,scale_vt=w=iw/2:h=ih/2:color_matrix=bt709:color_primaries=bt709:color_transfer=bt709 \
        -c:a copy \
        -tag:v hvc1 \
        test3.mp4

This easily give me over 250 fps:

frame= 2966 fps=265 q=-0.0 Lsize=  109227kB time=00:01:38.92 bitrate=9045.4kbits/s dup=1 drop=0 speed=8.85x

./ffmpeg -hwaccel videotoolbox \
        -hwaccel_output_format videotoolbox_vld \
        -i /Users/gnattu/Movies/ffmpeg-sample/4K_HLG.mkv \
        -c:v hevc_videotoolbox \
        -profile:v main \
        -b:v 3M \
        -vf scale_vt=w=iw/2:h=ih/2:color_matrix=bt709:color_primaries=bt709:color_transfer=bt709 \
        -c:a copy \
        -tag:v hvc1 \
        test3.mp4

The "recommended supposed to be fast" -hwaccel_output_format videotoolbox_vld struggles to even go up to 200fps:

frame= 2966 fps=193 q=-0.0 Lsize=  115998kB time=00:01:38.92 bitrate=9606.1kbits/s dup=1 drop=0 speed=6.44x

Now I'm even thinking about disable thehwaccel_output_format usage on Apple Silicon for most use cases, except for OpenCL interop. It is either the Apple silicon architecture makes copy faster than the ffmpeg's hardware frame handling, or it's the hardware frame implementation in ffmpeg is just too broken.

@nyanmisaka
Copy link
Member

What if you use ffmpeg’s built in software decoder? How is the color look like then?

@gnattu
Copy link
Member Author

gnattu commented Feb 26, 2024

The color is correct using software decoder:

./ffmpeg -init_hw_device videotoolbox \
        -i /Users/gnattu/Movies/ffmpeg-sample/4K_HLG.mkv \
        -c:v hevc_videotoolbox \
        -profile:v main \
        -b:v 3M \
        -vf format=p010,hwupload,scale_vt=w=iw/2:h=ih/2:color_matrix=bt709:color_primaries=bt709:color_transfer=bt709 \
        -c:a copy \
        -tag:v hvc1 \
        test3.mp4

Same output as the hwupload, but much slower.

@nyanmisaka
Copy link
Member

This presumably proves that the videotoolbox hwaccel implementation in ffmpeg has quirks. We usually use software codecs as a basic reference. HLG only contains static metadata, so the results should be nearly identical every time you transcode.

@gnattu
Copy link
Member Author

gnattu commented Feb 27, 2024

Our sub-filter triggered some bug in my overlay_videotoolbox filter and the hwupload:

  1. I naively believed that all non-planar input could be hwupload to a full-range BGRA with 16-bit color. However, I was wrong. The sub-filter will generate exclusively 8-bit color BGRA frames, and an 8-bit to 16-bit conversion has to take place; otherwise, Metal would complain about it.
  2. Our filter complex chain confuses the hwupload filter and is specifying a wrong pixel format, make the VTPixelTransferSessionTransferImage complained about the pixel format is not supported, have to manually specify format like format=nv12 before the hwupload of the main filter.

So need to change both the filter and Jellyfin Server.

@gnattu
Copy link
Member Author

gnattu commented Feb 27, 2024

HLG only contains static metadata, so the results should be nearly identical every time you transcode.

It should, but it does not. I'm not looking at it anymore given it does not improve performance at all. Need to find someone with an Intel Mac to test if this is still the case though. If using hwupload instead of hwaccel_output_format is slower on Intel Macs, then probably it's better to disable the VT Tone Mapping on Intel Macs and use OpenCL tone mapping exclusively until upstream fixes it.

@nyanmisaka
Copy link
Member

Can you share the ffmpeg CLI and the corresponding errors?

@gnattu
Copy link
Member Author

gnattu commented Feb 27, 2024

Can you share the ffmpeg CLI and the corresponding errors?

You mean the subtitle ones? It is not the subtitle filter issue IMO, all error code are from Apple stack.

The VT will raise kVTPixelTransferNotSupportedErr when main input format is not specified, I guess the default software input becomes very weird.

The Metal will raise an assertion failure complaining the rows are less than expected, and this usually happens when the color depth does not match the texture's settings, which is what we are having here: trying to load an 8bit image to an 16bit texture without any conversion.

The (broken) CLI looks like this:

/Users/gnattu/src/jellyfin-ffmpeg-macos-apple/ffmpeg -analyzeduration 200M -probesize 1G -init_hw_device videotoolbox=vt -noautorotate -t 300 -i file:"/Users/gnattu/Movies/ffmpeg-sample/120FPS.mkv" -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:0 -codec:v:0 hevc_videotoolbox -tag:v:0 hvc1 -b:v 292000 -maxrate 292000 -bufsize 584000 -force_key_frames:0 "expr:gte(t,n_forced*3)" -g:v:0 360 -keyint_min:v:0 360 -filter_complex "alphasrc=s=426x238:r=60:start='0',format=bgra,subtitles=f='/Users/gnattu/Movies/ffmpeg-sample/120FPS.mkv':si=0:alpha=1:sub2video=1:fontsdir='/Users/gnattu/Library/Application Support/jellyfin/cache/attachments/2af2cfc6ff2a3b75ad3699da3de7061d',hwupload=derive_device=videotoolbox[sub];[0:v]hwupload,scale_vt=w=426:h=238[main];[main][sub]overlay_videotoolbox=eof_action=pass:repeatlast=0" -start_at_zero -codec:a:0 ac3 -ab 128000 -ar 48000 wtf.mp4

By adding format=nv12 between [0:v]hwupload fixes the VT error, and changing format=bgra to format=yuva420 will also workaround the Metal Texture problem. The Metal Texture one is a bug and should be fixed in the filter.

@nyanmisaka
Copy link
Member

Both these two works for me.

// qsv
ffmpeg -init_hw_device qsv=qs -filter_hw_device qs -i INPUT -an -sn -vf hwupload=extra_hw_frames=16 -f null -

...
Stream #0:0(und): Video: wrapped_avframe, qsv(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 29.97 fps, 29.97 tbn (default)

// d3d11va
ffmpeg -init_hw_device d3d11va=dx11 -filter_hw_device dx11 -i INPUT -an -sn -vf hwupload -f null -

...
Stream #0:0: Video: wrapped_avframe, d3d11(tv, bt709/unknown/unknown, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 23.98 fps, 23.98 tbn (default)

Will this command fail on videotoolbox? My video source is HEVC Main10 1080p.

// videotoolbox
ffmpeg -init_hw_device videotoolbox=vt -filter_hw_device vt -i INPUT -an -sn -vf hwupload -f null -

@nyanmisaka
Copy link
Member

nyanmisaka commented Feb 27, 2024

The pixel format output by the software decoder for HEVC Main is yuv420p instead of nv12, but hwcontext_videotoolbox selects the most matching format yuv420p instead of converting it to nv12 in software and then uploading it. So I guess scale_vt does not support yuv420p as input and output.

https://github.com/FFmpeg/FFmpeg/blob/9d7d03ea4598484e2d200976d0064df41186a630/libavutil/hwcontext_videotoolbox.c#L45

All fixed-function hardware (non-GPGPU impl) on x86_64 that I know of only support semi-planar and packed formats such as nv12/p010 and bgra for post-processing.

So perhaps commenting out yuv420p in hwcontext_videotoolbox can avoid the need to manually insert a format=nv12 filter. In this case ffmpeg will do it for you.

@gnattu
Copy link
Member Author

gnattu commented Feb 27, 2024

Will this command fail on videotoolbox? My video source is HEVC Main10 1080p.

Of cause this will work. I can even append scale_vt to hwupload in this case. It's only using it in our burn-in filter chain would confuses the hwupload to pick up the best pixel format. I haven't seen other cases where specifying format is mandatory.

The error is not from the VTTransfer of scale_vt, but in my overlay filter. The input pixel format was weird enough that VT refuses to convert it into RGBA.

@gnattu
Copy link
Member Author

gnattu commented Feb 27, 2024

The input pixel format was weird enough that VT refuses to convert it into RGBA.

What did it even uploaded to?

When I try to retrieve the pixel format type using OSType pty = CVPixelBufferGetPixelFormatType((CVPixelBufferRef)input_main->data[3]);, it raised a segmentation fault., so it seems that the CVPixelBuffer was not valid in the first place?

@nyanmisaka
Copy link
Member

nyanmisaka commented Feb 27, 2024

When I try to retrieve the pixel format type using OSType pty = CVPixelBufferGetPixelFormatType((CVPixelBufferRef)input_main->data[3]);, it raised a segmentation fault., so it seems that the CVPixelBuffer was not valid in the first place?

You have to check for nullptr when using AVframe->data.

The input pixel format was weird enough that VT refuses to convert it into RGBA.

So what you mean is that in overlay_videotoolbox:

  • nv12 -> rgba16 works
  • yuv420p -> rgba16 fails (kVTPixelTransferNotSupportedErr)
alphasrc (bgra) -> hwupload -> hwctx(bgra) [sub]
[0:v] yuv420p -> hwupload -> hwctx(yuv420p) -> scale_vt -> hwctx(yuv420p) [main]
[main][sub] hwctx(yuv420p) + hwctx(bgra) -> overlay_vt -> hwctx(yuv420p) [out]

@gnattu
Copy link
Member Author

gnattu commented Feb 27, 2024

So what you mean is that in overlay_overtoolbox:

nv12 -> rgba16 works
yuv420p -> rgba16 fails (kVTPixelTransferNotSupportedErr)

Seems to be this case:

[Parsed_overlay_videotoolbox_6 @ 0x6000004604d0] Software format: yuv420p
[Parsed_overlay_videotoolbox_6 @ 0x6000004604d0] Hardware format: 2033463856
Error while filtering: Error number -12905 occurred

The 2033463856 is the 4-char-code OSType, represents y420, 420YpCbCr8Planar.

@nyanmisaka
Copy link
Member

nyanmisaka commented Feb 27, 2024

"format=nv12|p010,hwupload,scale_vt..."
Then I think you should use a command like this to limit the input format to nv12 or p010, thus avoiding yuv420p. But this will insert an automatic conversion filter anyway.

@gnattu
Copy link
Member Author

gnattu commented Feb 27, 2024

"format=nv12|p010,hwupload,scale_vt..." Then I think you should use a command like this to limit the input format to nv12 or p010, thus avoiding yuv420p.

This is what I'm planning to do. Detect input color depth and specify a upload format when using subtitles burn in.

@nyanmisaka
Copy link
Member

I imagine this would make the logic quite complex.

Just curious, what if you use OpenCL filters globally and enable hwaccel_output_format videotoolbox_vld? Currently our (scale_opencl / tonemap_opencl / overlay_opencl) is available for both nv12 and p010.

@gnattu
Copy link
Member Author

gnattu commented Feb 27, 2024

I tried more with this and hope to solve this by a dual conversion.

For the overlay, we can first convert it to kCVPixelFormatType_4444AYpCbCr16 and then do the YUVA->RGBA64 conversion, which is supported by Apple.
For the main frame, we can directly do convert it to the bi-planer equivalent kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange (nv12), then convert to RGBA64.

My plan is to check if any of the input is in non-bi-planer format, and perform additional conversion if non-bi-planer format is uploaded. The idea is that if the conversion is not avoidable, prefer doing such conversion using Apple's native methods instead of using ffmpeg's implementation as it will be hardware-accelerated.

Is such approach upstream-able? At the very least we can use it in our own fork as it could reduce the complexity of the filtering chain building logic.

@gnattu
Copy link
Member Author

gnattu commented Feb 27, 2024

For overlays with an alpha channel and are not in BGRA format, they would already be uploaded to kCVPixelFormatType_4444AYpCbCr16. However, some overlays may not contain an alpha channel at all, and in those cases, they may be in yuv420p format. Therefore, a conversion step is necessary.

@nyanmisaka
Copy link
Member

nyanmisaka commented Feb 27, 2024

{ kCVPixelFormatType_420YpCbCr8Planar,              false, AV_PIX_FMT_YUV420P },
{ kCVPixelFormatType_420YpCbCr8PlanarFullRange,     true,  AV_PIX_FMT_YUV420P },
{ kCVPixelFormatType_422YpCbCr8,                    false, AV_PIX_FMT_UYVY422 },
{ kCVPixelFormatType_32BGRA,                        true,  AV_PIX_FMT_BGRA },
{ kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange,  false, AV_PIX_FMT_NV12 },
{ kCVPixelFormatType_420YpCbCr8BiPlanarFullRange,   true,  AV_PIX_FMT_NV12 },
{ kCVPixelFormatType_4444AYpCbCr16,                 false, AV_PIX_FMT_AYUV64 },
{ kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange, false, AV_PIX_FMT_P010 },
{ kCVPixelFormatType_420YpCbCr10BiPlanarFullRange,  true,  AV_PIX_FMT_P010 },

I want to check first, Is nv12/p010/yuv420p/bgra to ayuv64 conversion supported by VT transfer?

If so, maybe replace the intermediate format 64RGBALE/64RGBAHalf in overlay_videotoolbox with this one 4444AYpCbCr16.

@gnattu
Copy link
Member Author

gnattu commented Feb 27, 2024

{ kCVPixelFormatType_420YpCbCr8Planar,              false, AV_PIX_FMT_YUV420P },
{ kCVPixelFormatType_420YpCbCr8PlanarFullRange,     true,  AV_PIX_FMT_YUV420P },
{ kCVPixelFormatType_422YpCbCr8,                    false, AV_PIX_FMT_UYVY422 },
{ kCVPixelFormatType_32BGRA,                        true,  AV_PIX_FMT_BGRA },
{ kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange,  false, AV_PIX_FMT_NV12 },
{ kCVPixelFormatType_420YpCbCr8BiPlanarFullRange,   true,  AV_PIX_FMT_NV12 },
{ kCVPixelFormatType_4444AYpCbCr16,                 false, AV_PIX_FMT_AYUV64 },
{ kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange, false, AV_PIX_FMT_P010 },
{ kCVPixelFormatType_420YpCbCr10BiPlanarFullRange,  true,  AV_PIX_FMT_P010 },

I want to check first, Is nv12/p010/yuv420p/bgra to ayuv64 conversion supported by VT transfer?

If so, maybe replace the intermediate format 64RGBALE/64RGBAHalf in overlay_videotoolbox with this one 4444AYpCbCr16.

I've checked most of the formats, and it appears that yuv420 is not compatible with most of them. In fact, it supports no format that can guarantee "visually lossless" color conversion. All color formats it allows to convert to/from will result in the resulting video's color being slightly off. Even converting to its own Bi-Planar equivalent kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange with VT will alter the resulting color, as VT cannot directly convert it to any color format higher than 8-bit depth. Therefore, I don't think it would be a good idea to convert it with VT's filter now. As color correctness should be preferred over performance, I'm going to use the format=nv12 filter for now, and I'm going to add a warning to my upstream patch to tell the user that they should use nv12 instead of yuv420p when VT returns the kVTPixelTransferNotSupportedErr .

@nyanmisaka
Copy link
Member

nyanmisaka commented Feb 27, 2024

You can restrict the filter's main and overlay formats, like this.
https://github.com/FFmpeg/FFmpeg/blob/1bb7d5ca9fc6dc1b961014255ada03dee40bac31/libavfilter/vf_overlay_cuda.c#L50-L61

Copy link
Member

@nyanmisaka nyanmisaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready. You may want to do more testing on the server side before merging this.

@gnattu gnattu merged commit 06eef05 into jellyfin:jellyfin Feb 28, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants