Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird tearing and artifacts in D3D11VA>>OpenCL mapping #243

Open
nyanmisaka opened this issue Apr 5, 2023 · 15 comments
Open

Weird tearing and artifacts in D3D11VA>>OpenCL mapping #243

nyanmisaka opened this issue Apr 5, 2023 · 15 comments

Comments

@nyanmisaka
Copy link

nyanmisaka commented Apr 5, 2023

Hello here! I have a use case that needs to use OpenCL kernels to process HW decoded frames to make up for some functions that VPP cannot do.

d3d11va hwaccel -> d3d11 tex -> hwmap -> opencl image -> *_opencl filter

But after testing the mapped image will produce tearing as shown in the video. This issue only happens on Intel GPU, I can't reproduce it on AMD GPU with the same KHR extension. I thought it was an OpenCL runtime issue so I also filed an issue with detailed steps in NEO.

FFmpeg was patched with the D3D11_RESOURCE_MISC_SHARED flag to allow interop.
For convenience you can also try our custom ffmpeg builds with DX11/QSV->OCL interop added.

And here's a sample video encoded in AV1 that can trigger this issue. av1_clip.zip

./ffmpeg.exe -init_hw_device d3d11va=dx -init_hw_device opencl=ocl@dx `
 -hwaccel_device dx -filter_hw_device ocl `
 -hwaccel d3d11va -hwaccel_output_format d3d11 -threads 1 `
 -c:v av1 -i "av1_clip.mp4" -an -sn `
 -vf "hwmap=derive_device=opencl,format=opencl,hwdownload,format=p010" `
 -c:v libx264 "tearing_artifacts.mp4"
dx11_ocl_interop_tearing.mp4
@nyanmisaka
Copy link
Author

nyanmisaka commented Apr 5, 2023

Note that the *_qsv decoders have no such issue but only the d3d11va hwaccel is affected.

Kindly ping someone who might be knowledgeable in d3d11va hwaccel. @galinart @feiwan1 @tong1wu

Thanks in advance!

@tong1wu
Copy link
Contributor

tong1wu commented Apr 10, 2023

I will check on this. Thanks.

@tong1wu
Copy link
Contributor

tong1wu commented Apr 10, 2023

@nyanmisaka Could you please provide the patch with your change so that I can debug with my own? I have added the D3D11_RESOURCE_MISC_SHARED but it seems not enough. Currently the map only can handle nv12 format. Thanks.

@nyanmisaka
Copy link
Author

nyanmisaka commented Apr 10, 2023

@tong1wu Thanks for look into it! As per the comment in NEO, the P010 format is already supported for years but FFmpeg haven't enabled it yet in the hwcontext_opencl.c.

Here's the patch:
0001-Enable-P010-format-in-d3d11-opencl-mapping.patch

@nyanmisaka
Copy link
Author

nyanmisaka commented Apr 10, 2023

And here's the other patch that enables QSV/D3D11 to OCL mapping:
0002-Add-support-for-QSV-D3D11-to-OpenCL-mapping.patch

./ffmpeg.exe -init_hw_device d3d11va=dx -init_hw_device qsv=qs@dx -init_hw_device opencl=ocl@dx `
 -hwaccel_device qs -filter_hw_device qs `
 -hwaccel qsv -hwaccel_output_format qsv `
 -c:v av1_qsv -i "av1_clip.mp4" -an -sn `
 -vf "hwmap=derive_device=opencl,format=opencl,hwdownload,format=p010" `
 -c:v libx264 "ok.mp4"

For comparison this mapping has no issue. So I suspect there's something wrong with the d3d11va hwaccel.

@tong1wu
Copy link
Contributor

tong1wu commented Apr 12, 2023

Weird. I tried on TGL. D3d11dec->opencl->d3d11->qsv->download. It worked fine. Rawvideo->d3d11 upload->opencl->download also worked fine. D3d11dec->qsv->opencl->download had corruption.

It doesn't look like something is going wrong with d3d11va because it works for several combinations. It seems the corruption only happens when d3d11dec->opencl then download with opencl.

Most probably something went wrong in the driver when doing the synchronization.

@nyanmisaka
Copy link
Author

nyanmisaka commented Apr 12, 2023

@tong1wu

It doesn't look like something is going wrong with d3d11va because it works for several combinations.

It seems my original issue is specific to Intel discrete GPUs, like the DG1/Xe Max and DG2/Arc. I used to have TGL (i7-1165g7) and it worked fine at that time. Or at least the issue is not obvious but the output is not bitperfect (mismatched checksums).

I also think there is a synchronization issue in FFmpeg or the driver. As far as I know, both ID3D11Device and OpenCL are thread-safe, while the ID3D11DeviceContext and ID3D11VideoContext should not be thread-safe.

So my finding is that when decoding other HEVC 4k clips using d3d11va hwaccel with the following params you can also get the same tearing and artifacts.

-c:v hevc -threads 1 -thread_type -slice-frame

Currently AV1 hwaccel in ffmpeg does not support threading.
The above command disables threading in HEVC hwaccel so it triggers the issue too.

image

@tong1wu
Copy link
Contributor

tong1wu commented Apr 13, 2023

From what I understand, if you specify -threads 1, there will be only 1 thread right? And from the code it seems -thread_type -slice-frame doesn't affect anything if threads is already set to 1. Just curious why we have this issue when the thread count is 1.

And I cannot reproduce this hevc issue on TGL. On DG2 it happens randomly, sometimes it's fine. But for the av1 issue I can reproduce it on TGL. I suspect they are different issues.

@nyanmisaka
Copy link
Author

From what I understand, if you specify -threads 1, there will be only 1 thread right?

Correct. It's my bad. The -threads 1 implies -thread_type -slice-frame or -thread_type 0.

Either -threads 1 or -thread_type -frame can trigger the issue in HEVC hwaccel but -thread_type -slice still works fine.

On my side both issues occur after the AV_CODEC_CAP_FRAME_THREADS is disabled (HEVC) or not supported (AV1).

IMHO if the corrupted frame has a pattern similar to the screenshot above, they all should be the synchronization issue.

And I cannot reproduce this hevc issue on TGL.

Indeed. It was hard for me to notice, but there is jittering on certain clips from time to time. Increasing the -threads 1 value to 3 or more helps the issue.

On DG2 it happens randomly, sometimes it's fine.

I never got a normal output using the above command on DG2.

I also found that this issue disappears when you manually limit the speed of the pipeline.

-vf realtime=speed=0.5,...

e.g. for the 60fps AV1 clip it limits the pipeline speed to ~30fps and the issue disappears but it must be inserted before the hwmap=derive_device=opencl filter and after the decoder d3d11/qsv output. This is the root cause of my suspicion that it is a d3d11va hwaccel issue.

@tong1wu
Copy link
Contributor

tong1wu commented Apr 13, 2023

OpenCL driver should have guaranteed the synchronization as you discussed in the other issue channel.

I did a small experiment. Just add following code before clEnqueueAcquireD3D11ObjectsKHR

AVFrame *tmp;
tmp = av_frame_alloc();
tmp->format = AV_PIX_FMT_P010LE;
err = av_hwframe_transfer_data(tmp, src, 0);
av_frame_free(&tmp);

This downloads the data to a useless AVFrame, where D3D11 must provide the synchronization guarantee. And it turns out to be correct for your AV1 command.

I think it's OpenCL's responsibility that it indeed does not do the synchronization job properly and deals with the dirty memory.

@nyanmisaka
Copy link
Author

That make sense. It invokes ID3D11DeviceContext_CopySubresourceRegion() so the D3D11 texture from decoder gets synchronized by the D3D driver internally before passing the texture to the clEnqueueAcquireD3D11ObjectsKHR().

This can be a temporary workaround but it still degrades performance and its not the desired behavior.

So these should be conclusions:

  1. *_qsv decoders has correct synchronization but d3d11va hwaccel has not.
  2. The Windows OpenCL driver does not synchronize D3D11 textures correctly.

Can you intel guys help me forward this issue to the OpenCL team?
Seems like I've tried all channels for submitting issues to them with no luck.

https://community.intel.com/t5/Graphics/cl-khr-d3d11-sharing-causes-tearing-and-artifacts-on-DG2-A380/m-p/1454522

Thanks again!

@nyanmisaka
Copy link
Author

image

@tong1wu
Copy link
Contributor

tong1wu commented Apr 14, 2023

Ok I'll try to forward it to OpenCL. Thanks.

@tong1wu
Copy link
Contributor

tong1wu commented Apr 17, 2023

According to OpenCL team, the github issue will be analyzed by the first available engineer. So I guess we need to wait a little bit and keep checking the status of intel/compute-runtime#602.

@nyanmisaka
Copy link
Author

Thanks for your update. I'll keep an eye on it.

I made some changes to your small experiment to speed up a little bit (GPU->GPU copy). And it's proved that the ID3D11DeviceContext_CopySubresourceRegion() can sync the texture implicitly.

Do you happen to know is there a similar sync texture function in D3D11 like the vaSyncSurface() in VA-API?

    AVHWFramesContext *src_fc =
        (AVHWFramesContext*)src->hw_frames_ctx->data;
    AVD3D11VADeviceContext *device_hwctx = src_fc->device_ctx->hwctx;

#if 1
    int srcIdx = (intptr_t)src->data[1];
    ID3D11Resource *srcTex = (ID3D11Resource *)(ID3D11Texture2D *)src->data[0];
    ID3D11Texture2D *tmpTex = NULL;
    D3D11_TEXTURE2D_DESC srcTexDesc;
    D3D11_TEXTURE2D_DESC tmpTexDesc = {
        .Width          = src_fc->width,
        .Height         = src_fc->height,
        .MipLevels      = 1,
        .SampleDesc     = { .Count = 1 },
        .ArraySize      = 1,
        .Usage          = D3D11_USAGE_DEFAULT, //D3D11_USAGE_STAGING,
        //.CPUAccessFlags = D3D11_CPU_ACCESS_READ | D3D11_CPU_ACCESS_WRITE,
    };

    ID3D11Texture2D_GetDesc((ID3D11Texture2D *)srcTex, &srcTexDesc);
    tmpTexDesc.Format = srcTexDesc.Format;

    device_hwctx->lock(device_hwctx->lock_ctx);

    HRESULT hr = ID3D11Device_CreateTexture2D(device_hwctx->device, &tmpTexDesc, NULL, &tmpTex);
    if (FAILED(hr)) {
        av_log(src_fc, AV_LOG_ERROR, "Could not create the tmp texture (%lx)\n", (long)hr);
        device_hwctx->unlock(device_hwctx->lock_ctx);
        return AVERROR_UNKNOWN;
    }

    ID3D11DeviceContext_CopySubresourceRegion(device_hwctx->device_context,
                                              tmpTex, 0, 0, 0, 0,
                                              srcTex, srcIdx, NULL);
    ID3D11Texture2D_Release(tmpTex);

    device_hwctx->unlock(device_hwctx->lock_ctx);
#endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants