Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cl_khr_d3d11_sharing causes tearing and artifacts on DG2 #602

Open
nyanmisaka opened this issue Jan 12, 2023 · 3 comments
Open

cl_khr_d3d11_sharing causes tearing and artifacts on DG2 #602

nyanmisaka opened this issue Jan 12, 2023 · 3 comments
Labels

Comments

@nyanmisaka
Copy link
Contributor

nyanmisaka commented Jan 12, 2023

Hello there! I got some tearing and artifacts when sharing textures between the D3D11 and OpenCL.

Here's the main procedures to decode and share a video frame in FFmpeg:


1, hwcontext_d3d11va

  • Create D3D11 device on DG2, set ID3D10Multithread_SetMultithreadProtected to true.
  • Create ID3D11Texture2D texture array with D3D11_RESOURCE_MISC_SHARED.

2, hwcontext_opencl

  • Create the OpenCL context with CL_CONTEXT_INTEROP_USER_SYNC=0 on same D3D11 device.
  • Use subresource to create Y and UV images from the ID3D11Texture2D texture array with clCreateFromD3D11Texture2DKHR and the cl_intel_d3d11_nv12_media_sharing extension.

3, d3d11va hwaccel decoder

  • ID3D11VideoDecoder decode a frame as NV12 or P010 to the ID3D11Texture2D texture array

4, hwcontext_opencl

  • Accquire the image from ID3D11Texture2D texture array with clEnqueueAcquireD3D11ObjectsKHR and wait the event
  • Copy the image to host for debugging
  • Release the accquired image with clEnqueueReleaseD3D11ObjectsKHR and wait the event

5, uninit and cleanup the decoder and hwcontexts


Once I set the decoder thread count to 1 -threads 1 in FFmpeg, it gives me tearing and artifacts in the output image.

I only notice the issue on DG2 and a few Xe graphics, both are Gen12 platform with the latest driver 4032 installed.

For comparison I also tried the same CLI on the GPU from AMD and it works fine.

So I suspect there are some flaws in the Gen12 Windows driver since the cl_khr_d3d11_sharing extension claimed that the driver is responsible for providing the synchronization guarantee if I set CL_CONTEXT_INTEROP_USER_SYNC=0 on context creation.

The test video is taken from http://www.larmoire.info/jellyfish/media/jellyfish-120-mbps-4k-uhd-hevc-10bit.mkv

./ffmpeg.exe -init_hw_device d3d11va=dx -init_hw_device opencl=ocl@dx `
 -hwaccel_device dx -filter_hw_device ocl `
 -hwaccel d3d11va -hwaccel_output_format d3d11 -threads 1 `
 -c:v hevc -i "jellyfish-120-mbps-4k-uhd-hevc-10bit.mkv" -an -sn `
 -vf "hwmap=derive_device=opencl,format=opencl,hwdownload,format=p010" `
 -c:v hevc_qsv -preset veryfast -global_quality 25 -g:v 120 -y "tearing_artifacts.mp4"

You can try with our pre-built custom ffmpeg or build the ffmpeg with this patch applied to enable the MISC_SHARED flag.

Thanks in advance!

@XCRobert
Copy link

Hi @nyanmisaka
What event do you try to wait for? Could you provide the details?

@nyanmisaka
Copy link
Contributor Author

What event do you try to wait for? Could you provide the details?

Wait the event returned by clEnqueueAcquireD3D11ObjectsKHR and continue the next step. Here’s the FFmpeg code:

https://github.com/FFmpeg/FFmpeg/blob/891ed24f77da99c6d41bb7c116ba5925e3206ce2/libavutil/hwcontext_opencl.c#L2551-L2562

Can you reproduce the issue with my command on Windows using an Arc dGPU?

@nyanmisaka
Copy link
Contributor Author

nyanmisaka commented Apr 26, 2023

Hi @XCRobert We found that the flushAndWait() call is unable to sync the D3D11 texture on DG2.

template <>
void D3DSharingFunctions<D3DTypesHelper::D3D11>::flushAndWait(D3DQuery *query) {
d3d11DeviceContext->End(query);
d3d11DeviceContext->Flush();
while (d3d11DeviceContext->GetData(query, nullptr, 0, 0) != S_OK)
;
}

void D3DSurface::synchronizeObject(UpdateData &updateData) {
D3DLOCKED_RECT lockedRect = {};
sharingFunctions->setDevice(resourceDevice);
if (sharedResource && !context->getInteropUserSyncEnabled()) {
sharingFunctions->flushAndWait(d3dQuery);
} else if (!sharedResource) {

We did an experiment, it's proved that combining flushAndWait() and the ID3D11DeviceContext_CopySubresourceRegion() call can do the trick but it results in performance loss. intel/cartwheel-ffmpeg#243 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants