New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling mb_rate_control kills whole machine (Skylake GT2) #172
Comments
|
Behaviour is identical with the 1.8.1 release. Whether console output appears or not appears to depend on whether the full DRM framebuffer is being used. If it is, then taking out the GPU kills the output entirely and I don't get anything. If not, the output doesn't die and gives the log above before locking up. Maybe a serial console would be able to get more output if there is any (a panic log, perhaps)? |
|
Has anyone been able to reproduce this? The failure is completely consistent for me, always killing the whole machine when running as above. Is there anything else I can do to help debug it? |
|
@fhvwy we will give a try with your patch. |
|
I'll test this on a similar workstation and report back. |
|
Can not duplicate this issue after apply the patch FFmpeg-devel-V3-lavc-vaapi_encode_h264-Enable-MB-rate-control..patch(apply the patch by copying the code line by line because the patch is too old) with ffmpeg commit 991eca0f8729043724ae4574be0eb4c20bdba915 Env |
|
upload my patched file vaapi_encode_h264.c, you can use this file with changing extension .c instead of native vaapi_encode_h264.c in ffmpeg commit 991eca0f8729043724ae4574be0eb4c20bdba915 |
|
I tried this again on the same machine (Skylake 6300), with slightly newer software. The problem persists, but the machine is no longer hard-reset by the operation so I am able to extract some debug information. The graphics core is still completely dead, and doesn't work at all until the machine is rebooted. Using:
Kernel output: DRM error dump: http://ixia.jkqxz.net/~mrt/i965/bug172_drm_error. |
|
I try one another SKL unit, this issue still can not be duplicated with ffmpeg commit 991eca0f8729043724ae4574be0eb4c20bdba915 + patch FFmpeg-devel-V3-lavc-vaapi_encode_h264-Enable-MB-rate-control..patch CPU: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz Whole info. during run ffmpeg command with option mb_rate_control as below |
|
@wangzj0601 What input file are you using? Have you tried encoding more than 100 frames? The failure is very consistent for me, but how long it takes varies by file and other settings (though usually around 200 frames). E.g. with the 1080p "Big Buck Bunny" file running: the GPU always dies when encoding frame 234. Wrt the SKU you are using, have you tried one with 23 EUs rather than 24? That is one possible difference which I suggested above and haven't been able to check. (I think both the 6Y57 and 6600K will have 24, though do correct me if I'm wrong.) |
|
@wangzj0601 could you try ffmpeg 2fdc9f7c4939f83a6c9d1f9d85b6d37ce0bab714 + http://ixia.jkqxz.net/~mrt/i965/mb_rc.patch? Mark has rebased the ffmpeg patch against a newer version of FFmpeg. @fhvwy I think your SKL should have 24 EUs, the pci id is 0x1912 in your DRM error dump. Why do you think your machine has 23EUs? |
|
@xhaihao See table and notes in https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol04-configurations.pdf - "[a] Particular SKUs produced by Intel may have one EU disabled.". It's visible at runtime in Beignet, which indicates that it has 23 compute units while other similar machines have 24. (I assume there is an ioctl() somewhere which will return how many there are.) |
|
|
Thanks for sharing the detailed info.
|
|
I also tried to reproduce this issue on my KBL (i7-7567U) but failed. Looks like it only happens on specified CPU. @wangzj0601 could you try to find a skylake 6300 (or some other verisions with 23 EU) to reproduce it? |
Build ffmpeg git master with @mypopydev's patch to add the mb_rate_control option: https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-May/211334.html.
Input file doesn't seem to matter much. To be consistent I am using the Big Buck Bunny 1080p file here.
Take steps to avoid data loss (remount all data mounts readonly, sync).
Run:
After some frames (not repeatable between runs, but at most a few hundred) the machine becomes completely unresponsive.
On some runs I get a GPU hang log on the console (transcribed) before it locks up, but not consistently:
Power-cycle to recover the machine.
Setup:
There are probably at least two issues here: in the VAAPI driver (because enabling mb_rate_control has broken the GPU) and in the kernel (because it didn't recover). I've only sent this here because the reproducer is here, but please do forward this if appropriate.
Possibly relevant: The same ffmpeg command with the mb_rate_control option works fine on a Skylake 6260U (GT3, 48 EUs). Could there be something about the proprietary shader binaries which only works on the larger GPU and breaks horribly on the smaller one?
The text was updated successfully, but these errors were encountered: