Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Tigerlake support #773

Closed
eero-t opened this issue Nov 12, 2019 · 7 comments
Closed

Broken Tigerlake support #773

eero-t opened this issue Nov 12, 2019 · 7 comments
Assignees

Comments

@eero-t
Copy link

eero-t commented Nov 12, 2019

Based on MediaSDK & FFmpeg VA-API and dmesg output, media-driver Tigerlake support doesn't work (yet).

This is with last evening git versions:

  • libva: 95eb8cf46936 Add missed slice parameter 'slice_data_num_emu_prevn_bytes'
  • gmmlib: 94306f5443b9 Initial Multi adapter changes.
  • media-driver: 772cf8a [CM] Add missing sw swizzling flag in MOS

MediaSDK git version, sample app / AVC:

$ sample_multi_transcode -i::h264 1280x720p_29.97_10mb_h264_cabac.264 -o::h264 output.h264 -hw
Multi Transcoding Sample Version 8.4.27.1933

libva info: VA-API version 1.6.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_6
libva info: va_openDriver() returns 0
Session 0:
Pipeline surfaces number (DecPool): 8
MFX HARDWARE Session 0 API ver 1.30 parameters: 
Input  video: AVC 
Output video: AVC 

Session 0 was NOT joined with other sessions

Transcoding started

[ERROR], sts=MFX_ERR_GPU_HANG(-21), PutBS, Encode: SyncOperation failed at samples/sample_multi_transcode/src/pipeline_transcode.cpp:1909

[ERROR], sts=MFX_ERR_ABORTED(-12), Transcode, PutBS failed at samples/sample_multi_transcode/src/pipeline_transcode.cpp:1871

[ERROR], sts=MFX_ERR_ABORTED(-12), Run, CTranscodingPipeline::Run::Transcode() [0x564c882566e0] failed at samples/sample_multi_transcode/src/pipeline_transcode.cpp:4440

 session 0 [0x564c882566e0] failed with status MFX_ERR_ABORTED shutting down the application...

FFmpeg git version, VA-API / HEVC:

$ ffmpeg -loglevel verbose -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i Netflix_FoodMarket_4096x2160_10bit_420_100mbs_600.h265 -c:v hevc_vaapi -b:v 20M -an -vframes 300 -y output.h265
ffmpeg version N-95710-gb25b6432a7 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
...
Input #0, hevc, from 'Netflix_FoodMarket_4096x2160_10bit_420_100mbs_600.h265':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: hevc (Main 10), 1 reference frame, yuv420p10le(tv), 4096x2160, 60 fps, 60 tbr, 1200k tbn, 60 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> hevc (hevc_vaapi))
Press [q] to stop, [?] for help
[graph 0 input from stream 0:0 @ 0x560286831dc0] w:4096 h:2160 pixfmt:vaapi_vld tb:1/1200000 fr:60/1 sar:0/1 sws_param:flags=2
[hevc_vaapi @ 0x560283f892c0] Input surface format is p010le.
[hevc_vaapi @ 0x560283f892c0] Using VAAPI profile VAProfileHEVCMain10 (18).
[hevc_vaapi @ 0x560283f892c0] Using VAAPI entrypoint VAEntrypointEncSlice (6).
[hevc_vaapi @ 0x560283f892c0] Using VAAPI render target format YUV420_10 (0x100).
[hevc_vaapi @ 0x560283f892c0] RC mode: VBR.
[hevc_vaapi @ 0x560283f892c0] RC target: 50% of 40000000 bps over 500 ms.
[hevc_vaapi @ 0x560283f892c0] RC buffer: 20000000 bits, initial fullness 15000000 bits.
[hevc_vaapi @ 0x560283f892c0] RC framerate: 60/1 (60.00 fps).
[hevc_vaapi @ 0x560283f892c0] Using intra, P- and B-frames (supported references: 4 / 4).
[hevc_vaapi @ 0x560283f892c0] All wanted packed headers available (wanted 0xd, found 0x1f).
[hevc_vaapi @ 0x560283f892c0] Using level 5.
Output #0, hevc, to 'output.h265':
  Metadata:
    encoder         : Lavf58.34.101
    Stream #0:0: Video: hevc (hevc_vaapi) (Main 10), 1 reference frame, vaapi_vld, 4096x2160, q=-1--1, 20000 kb/s, 60 fps, 60 tbn, 60 tbc
    Metadata:
      encoder         : Lavc58.61.100 hevc_vaapi
[hevc_vaapi @ 0x560283f892c0] Failed to end picture encode issue: 24 (internal encoding error).2x    
[hevc_vaapi @ 0x560283f892c0] Encode failed: -5.
Video encoding failed
[AVIOContext @ 0x560283f83400] Statistics: 0 seeks, 2 writeouts

Note: I'm using the latest drm-tip kernel on this TGL GT2 device, and GuC (just) loads HuC for bit-rate controls. Dmesg complains about media pre-emption timeouts:

[    0.000000] Linux version 5.4.0-rc6 (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #1 SMP PREEMPT Mon Nov 11 20:02:36 EET 2019
[    0.000000] Command line: BOOT_IMAGE=/boot/drm_intel root=/dev/nvme0n1p2 rootwait fsck.repair=yes intel_iommu=igfx_off ro
...
[    4.713243] Setting dangerous option enable_guc - tainting kernel
[    4.715131] [drm] i915.alpha_support is deprecated, use i915.force_probe=9a49 instead
[    4.715702] i915 0000:00:02.0: vgaarb: deactivate vga console
[    4.724393] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    4.724395] [drm] Driver supports precise vblank timestamp query.
[    4.727538] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    4.729294] [drm] Finished loading DMC firmware i915/tgl_dmc_ver2_04.bin (v2.4)
[    4.800122] r8169 0000:ad:00.0 enp173s0: Link is Down
[    4.818173] [drm] GuC communication enabled
[    4.821228] i915 0000:00:02.0: GuC firmware i915/tgl_guc_35.2.0.bin version 35.2 submission:disabled
[    4.821238] i915 0000:00:02.0: HuC firmware i915/tgl_huc_7.0.3.bin version 7.0 authenticated:yes
[    4.823121] [drm] Initialized i915 1.6.0 20191101 for 0000:00:02.0 on minor 0
[    4.826719] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[    4.827077] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
[    4.827943] fbcon: i915drmfb (fb0) is primary device
[    4.829432] Console: switching to colour frame buffer device 240x67
[    4.853411] i915 0000:00:02.0: fb0: i915drmfb frame buffer device
[    7.802345] r8169 0000:ad:00.0 enp173s0: Link is Up - 1Gbps/Full - flow control off
[    7.802359] IPv6: ADDRCONF(NETDEV_CHANGE): enp173s0: link becomes ready
[    9.798765] systemd-journald[299]: File /var/log/journal/fb1709e5a77f4819a6750f84785108e5/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
[  736.474418] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[  736.474451] i915 0000:00:02.0: sample_multi_tr[1033] context reset due to GPU hang
[  739.058418] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[  739.058448] i915 0000:00:02.0: sample_multi_tr[1033] context reset due to GPU hang
[ 1014.066443] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 1014.066460] i915 0000:00:02.0: ffmpeg[1069] context reset due to GPU hang
[ 1020.338432] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 1020.338499] i915 0000:00:02.0: ffmpeg[1069] context reset due to GPU hang
[ 1020.474431] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 1020.474488] i915 0000:00:02.0: ffmpeg[1069] context reset due to GPU hang
@dvrogozh
Copy link
Contributor

I am observing the same on latest drm-tip kernel.

@dvrogozh
Copy link
Contributor

dvrogozh commented Nov 22, 2019

Actually, not everything is broken. Status is:

  • AVC decoding - works
  • AVC VDENC encoding - works
  • AVC VME encoding - fails

Here are example of command lines to try (apply Intel-Media-SDK/MediaSDK#1771 to get TGL support for mediasdk):

wget https://fate-suite.libav.org/h264-conformance/AUD_MW_E.264
# works:
sample_decode h265 -i AUD_MW_E.264 -hw -vaapi -o a.yuv
sample_multi_transcode -i::h264 AUD_MW_E.264 -hw -async 1 -u 4 -o::h264 a.h264 -qsv-ff -cqp

# fails:
sample_multi_transcode -i::h264 AUD_MW_E.264 -hw -async 1 -u 4 -o::h264 a.h264
sample_multi_transcode -i::h264 AUD_MW_E.264 -hw -async 1 -u 4 -o::h264 a.h264 -cqp

dmesg log from the failing case states:

[   77.006161] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[   77.006204] i915 0000:00:02.0: sample_multi_tr[1951] context reset due to GPU hang
[   77.006293] [drm:__i915_request_reset [i915]] client sample_multi_tr[1951]: gained 1 ban score, now 1

Interesting thing is that i915 error state is actually empty. I filed bug on FDO: https://bugs.freedesktop.org/show_bug.cgi?id=112377

@zxye
Copy link
Contributor

zxye commented Mar 3, 2020

It has been root-caused as i915 issue. https://bugs.freedesktop.org/show_bug.cgi?id=112377

@zxye zxye closed this as completed Mar 3, 2020
@dvrogozh
Copy link
Contributor

dvrogozh commented Mar 3, 2020

@zxye : it's not yet fixed. Probably we shouldn't close the issue till it is not fully done? I suggest to reopen.

@zxye
Copy link
Contributor

zxye commented Mar 4, 2020

ok, let's update and close it after the i915 patch is merged.

@zxye zxye reopened this Mar 4, 2020
@tursulin
Copy link

tursulin commented Mar 5, 2020

Please test with drm-tip containing the below patch and report back.

Author: Tvrtko Ursulin tvrtko.ursulin@intel.com
Date: Wed Mar 4 15:31:44 2020 +0000

drm/i915/tgl: WaDisableGPGPUMidThreadPreemption

@zxye
Copy link
Contributor

zxye commented Mar 7, 2020

@dvrogozh could you please verify and close the issue?

@zxye zxye closed this as completed Mar 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants