New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suffer GPU hang by specific HEVC transcoding in CML #992
Comments
|
Issue cannot be duplicate by MSDK’s transcoding sample with command “sample_multi_transcode -i::h265 ~/input.h265 -deinterlace -o::h265 test-output.h265 -w 1920 -h 1080” The successful transcoded video with 1080p resolution (from 2160p) I put here. |
|
Ping. |
@fulinjie , if msdk decoder can handle the stream, perhaps a WA is possible on ffmpeg side? |
|
Hi @dmitryermilov , The main reason for this issue is that: It seems to be related with error tolerant/handling case for Null pointer in driver. • The reason MSDK is workable:
Yes, I'm working on some WA in FFmpeg to skip the invalid frames (which contradicts the native decoding pipeline), but IMHO it would be better to have GPU hang somehow prevented no matter whether we had the "valid check" or not.. Ps. FYI, internal discussion is accessible in: |
|
Yes, I fully understand you, @fulinjie . It goes without saying that UMD should attempt to prevent GPU hangs.
The motivation here is not just "simply" skip as many as possible frames :) There should be a balance between:
|
Yep, agree. These skipped frames are useless and are with garbage in this clips, and would be better to be skipped. The gpu hang could be hide after applying above patch. |
|
@wangyan-intel could we add a check when call in EndPicture, if there are no reference frame, media-driver should return failure to avoid gpu hang, not send real command buffer to GPU |
|
@XinfengZhang sorry for bothering you. May I know any possible direction on this issue? |
|
I will take a look. Sorry for slow response. |
|
@weizhu-intel Could you please help take a look? Thanks. |
|
Hi Linjie&zcwang, So could you pass in_valid_surfaceid instead of correct ref_pic_id if reference is missed, then our driver can detect this. Thanks |
|
Issue was fixed by following patch (i.e. intel-ffmpeg-patechset included in Media-Driver 2020Q3 release, but not in upstream), |
|
@weizhu-intel and @dmitryermilov, |
|
This issue should have been fixed in latest media driver, could you try it again? |
|
Let me close this issue now since fixed in media driver and you can also add strict check for invalid reference frame in ffmpeg or vpl as option. |
Need help on GPU hang issue of HEVC transcoding in CML.
It will cause GPU hang by following command with specific HEVC video (sample video about 5xxMB in here).
Command
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i target-HEVC-video.mkv -vf 'deinterlace_vaapi=rate=field:auto=1,scale_vaapi=w=1920:h=1080' -c:v hevc_vaapi output.mp4
Test Environment
OS: Ubuntu 18.04 with kernel v5.7 or the latest i915 drm-tip kernel (v5.8-rc2 on 06-29).
Open Source Media Stack: 2020’Q1 release or the latest upstream on 7/1/2020
FFmpeg vresion: the latest code in upstream on 7/1 (commit id--> e409262837 avutil/common: Fix integer overflow in av_ceil_log2_c())
vainfo: VA-API version: 1.8 (libva 2.8.0.pre1)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 20.3.pre (adc2326)
GPU Hang in vcs,
...
Jul 1 11:50:46 intel-NUC kernel: [ 9831.062462] i915 0000:00:02.0: [drm] Resetting vcs0 for preemption time out
Jul 1 11:50:46 intel-NUC kernel: [ 9831.062468] i915 0000:00:02.0: [drm] ffmpeg[3208] context reset due to GPU hang
Jul 1 11:50:46 intel-NUC kernel: [ 9831.062510] i915 0000:00:02.0: [drm:__i915_request_reset [i915]] client ffmpeg[3208]: gained 1 ban score, now 1
Jul 1 11:50:46 intel-NUC kernel: [ 9831.063554] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:4:a8fffffd, in ffmpeg [3208]
…
ERROR: 0x00000000
DONE_REG: 0xffffffff
FAULT_TLB_DATA: 0x00000011 0xb442c1b0
Address 0x00001b442c1b0000 GGTT
GTT_CACHE_EN: 0xf0007fff
vcs0 command stream:
CCID: 0x00000000
START: 0x00011000
HEAD: 0x00000268 [0x00000230]
head = 0x00000268, wraps = 0
TAIL: 0x00000ee0 [0x00000270, 0x00000298]
CTL: 0x00003001
len=16384, enabled
MODE: 0x00000000
HWS: 0xfffe3000
ACTHD: 0x00000000 000b3924
at ring: 0x00000000
IPEIR: 0x00000000
IPEHR: 0x13000002
ESR: 0x00000000
INSTDONE: 0xbbffffff
batch: [0x00000000_000b3000, 0x00000000_000bb000]
BBADDR: 0x00000000_000b3925
BB_STATE: 0x00000020
INSTPS: 0x00009080
INSTPM: 0x00000000
FADDR: 0x00000000 000b3b00
RC PSMI: 0x00000010
FAULT_REG: 0x00000000
GFX_MODE: 0x00008000
PDP0: 0x00000006237ef000
PDP1: 0x0000000000000000
PDP2: 0x0000000000000000
PDP3: 0x0000000000000000
engine reset count: 0
ELSP[0]: pid 2486, seqno 18:00000044, prio 0, head 00000e70, tail 00000ee0
ELSP[1]: pid 2485, seqno 1c:00000002, prio 0, head 00000000, tail 00000068
Active context: ffmpeg[2486] prio 0, guilty 1 active 0, runtime total 4540598ns, avg 3970720ns
Please refer log files,
ffmpeg-gpu-hang-gary-0701.zip
The text was updated successfully, but these errors were encountered: