Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] support the color space conversion in VPP #212

Closed
mypopydev opened this issue Jun 27, 2017 · 4 comments · Fixed by #245
Closed

[RFC] support the color space conversion in VPP #212

mypopydev opened this issue Jun 27, 2017 · 4 comments · Fixed by #245

Comments

@mypopydev
Copy link
Contributor

mypopydev commented Jun 27, 2017

Request support the CSC in vpp for yuv420p10le -> uyvy422.

Test in Kaby Lake as follows:

  1. OS and i965 driver version

root@bluekbl:/home/junzhao/ffmpeg# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 17.04
Release: 17.04
Codename: zesty

root@bluekbl:/home/junzhao/ffmpeg# vainfo
libva info: VA-API version 0.40.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /opt/X11R7/lib/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_0_40
libva info: va_openDriver() returns 0
vainfo: VA-API version: 0.40 (libva )
vainfo: Driver version: Intel i965 driver for Intel(R) Kabylake - 1.8.4.pre1 (glk-alpha-66-g7633739)
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Simple : VAEntrypointEncSlice
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointEncSlice
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSlice
VAProfileH264Main : VAEntrypointEncSliceLP
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSlice
VAProfileH264High : VAEntrypointEncSliceLP
VAProfileH264MultiviewHigh : VAEntrypointVLD
VAProfileH264MultiviewHigh : VAEntrypointEncSlice
VAProfileH264StereoHigh : VAEntrypointVLD
VAProfileH264StereoHigh : VAEntrypointEncSlice
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileNone : VAEntrypointVideoProc
VAProfileJPEGBaseline : VAEntrypointVLD
VAProfileJPEGBaseline : VAEntrypointEncPicture
VAProfileVP8Version0_3 : VAEntrypointVLD
VAProfileVP8Version0_3 : VAEntrypointEncSlice
VAProfileHEVCMain : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointEncSlice
VAProfileHEVCMain10 : VAEntrypointVLD
VAProfileHEVCMain10 : VAEntrypointEncSlice
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointEncSlice
VAProfileVP9Profile2 : VAEntrypointVLD

  1. GPU Decode HEVC 10 bit video (without format change to uyvy422)

    ./ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i ../../video/The_World_in_HDR_in_4K_HDR10.mkv -f null /dev/null

frame= 9264 fps=121 q=-0.0 Lsize=N/A time=00:02:34.56 bitrate=N/A speed=2.02x

  1. GPU Decode HEVC 10 bit video (with format change to uyvy422)

    ./ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -hwaccel_output_format uyvy422 -i ../../video/The_World_in_HDR_in_4K_HDR10.mkv -f null /dev/null

frame= 9264 fps= 58 q=-0.0 Lsize=N/A time=00:02:34.56 bitrate=N/A speed=0.968x

So yuv420p10le -> uyvy422 will lead about 50% performance drop in this case, please support the yuv420p10le -> uyvy422 change in GPU/VPP.

@QuPengfei
Copy link

the VAAPI driver does not support conversion yuv420p10le -> uyvy422 directly. maybe we can try to work around it by yuv420p10le(P010)-> NV12->uyvy422. we will give a try-patch later.

@lizhong1008
Copy link
Contributor

lizhong1008 commented Jul 5, 2017

 After debugged ffmpeg (./ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -hwaccel_output_format uyvy422 -i ../../video/The_World_in_HDR_in_4K_HDR10.mkv -f null /dev/null) and double-confirmed with @mypopydev , I found ffmpeg didn’t call libva VPP interface (VAEntrypointVideoProc) . The color space convention is done by vaGetImage in the decoder thread. And driver implementation of vaGetImage already support convert P010 to UYVY(It is done by two steps: firstly  P010 to NV12, then covert NV12 to UYVY). 

 In a word, our driver can support P010 to UYVY when call VaGetImage, and the command ((./ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -hwaccel_output_format uyvy422 -i The_World_in_HDR_in_4K_HDR10.mkv -f null /dev/null)) is using Hardware CSC, instead of a software way. 

Here is the debug message. 

(gdb) bt
#0 i965_image_p010_processing (ctx=0x555557212cb0, src_surface=0x7fffffffd760, src_rect=0x7fffffffd7e0, dst_surface=0x7fffffffd770,
dst_rect=0x7fffffffd7e0) at i965_post_processing.c:5340
#1 0x00007fffedefd401 in i965_image_processing (ctx=0x555557212cb0, src_surface=0x7fffffffd760, src_rect=0x7fffffffd7e0, dst_surface=0x7fffffffd770,
dst_rect=0x7fffffffd7e0) at i965_post_processing.c:5514
#2 0x00007fffedec68c2 in i965_hw_getimage (ctx=0x555557212cb0, obj_surface=0x7fffe4016fe0, obj_image=0x55555721d610, rect=0x7fffffffd7e0)
at i965_drv_video.c:4976
#3 0x00007fffedec6a9c in i965_GetImage (ctx=0x555557212cb0, surface=67108888, x=0, y=0, width=3840, height=2160, image=167772160) at i965_drv_video.c:5021
#4 0x00007ffff6d3bd4d in vaGetImage (dpy=0x555557212c40, surface=67108888, x=0, y=0, width=3840, height=2160, image=167772160) at va.c:1471
#5 0x00005555562a72f3 in vaapi_map_frame (hwfc=hwfc@entry=0x7fffe40062a0, dst=0x555557337fa0, src=src@entry=0x55555734e260, flags=flags@entry=1)
at libavutil/hwcontext_vaapi.c:762
#6 0x00005555562a7503 in vaapi_transfer_data_from (hwfc=0x7fffe40062a0, dst=0x55555726cce0, src=0x55555734e260) at libavutil/hwcontext_vaapi.c:830
#7 0x00005555562a54f3 in av_hwframe_transfer_data (dst=0x55555726cce0, src=0x55555734e260, flags=) at libavutil/hwcontext.c:440
#8 0x00005555562a55b2 in transfer_data_alloc (flags=0, src=0x55555734e260, dst=0x55555726caa0) at libavutil/hwcontext.c:415
#9 av_hwframe_transfer_data (dst=0x55555726caa0, src=src@entry=0x55555734e260, flags=flags@entry=0) at libavutil/hwcontext.c:435
#10 0x00005555556c505e in hwaccel_retrieve_data (avctx=0x555557245a20, input=0x55555734e260) at ffmpeg_hw.c:354
#11 0x00005555556d0cc9 in decode_video (ist=0x555557247bc0, pkt=, got_output=0x7fffffffdb98, eof=0, decode_failed=0x7fffffffdb9c)
at ffmpeg.c:2455
#12 0x00005555556d14b3 in process_input_packet (ist=0x555557247bc0, pkt=0x7fffffffde10, no_eof=0) at ffmpeg.c:2644
#13 0x00005555556ae5fa in process_input (file_index=) at ffmpeg.c:4432
#14 transcode_step () at ffmpeg.c:4543
#15 transcode () at ffmpeg.c:4597
#16 main (argc=, argv=) at ffmpeg.c:4803

@xhaihao
Copy link
Contributor

xhaihao commented Jul 10, 2017

yuv420p10le isn't P010, for P010, P010 uses the highest 10bits of 16bits for each channel, however yuv420p10le in FFmpeg uses the lowest 10bits of 16bits for each channel.

@xhaihao
Copy link
Contributor

xhaihao commented Jul 10, 2017

Our driver doesn't support yuv420p10le format.

xhaihao added a commit to xhaihao/intel-vaapi-driver that referenced this issue Aug 9, 2017
…0 on GEN9

With the new shader, the driver can convert 10bit surface to 8bit
surface in a single step

This fixes intel#212

Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
@xhaihao xhaihao mentioned this issue Aug 9, 2017
xhaihao added a commit that referenced this issue Aug 17, 2017
…0 on GEN9

With the new shader, the driver can convert 10bit surface to 8bit
surface in a single step

This fixes #212

Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants