Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export CreateSyncToken from GlContext to csharp. #1154

Open
leozzyzheng opened this issue Mar 1, 2024 · 11 comments
Open

Export CreateSyncToken from GlContext to csharp. #1154

leozzyzheng opened this issue Mar 1, 2024 · 11 comments
Assignees
Labels
type:feature New feature or request

Comments

@leozzyzheng
Copy link

leozzyzheng commented Mar 1, 2024

Feature Description

Memeber function CreateSyncTokenForCurrentExternalContext of GlContext is not exported to c# side, it is required for sync texture copy when directly use texture on GPU without readback.

Current Behaviour/State

Currently WebCamTexture will be readback to CPU and upload to GPU again to do the GPU calculation, but the WebCamTexture can be copied to TextureFrame directly on GPU side and feed to calculator directly on GPU side. It works on Android at least.
But without sync point get from Unity side, calculator won't wait the copy finished by Unity and will result huge flicking, there is a static member function of GlContext called CreateSyncTokenForCurrentExternalContext is designed for this usage, please export it to c# side.

Additional Context

Function position: https://github.com/google/mediapipe/blob/master/mediapipe/gpu/gl_context.h#L322

@leozzyzheng leozzyzheng added the type:feature New feature or request label Mar 1, 2024
@homuler homuler self-assigned this Mar 2, 2024
@homuler
Copy link
Owner

homuler commented Mar 2, 2024

Could you please tell me the relevant code you're using to input data into the CalculatorGraph? (e.g. copy from WebCamTexture to TextureFrame, give it to the CalculatorGraph, etc...)

@leozzyzheng
Copy link
Author

Could you please tell me the relevant code you're using to input data into the CalculatorGraph? (e.g. copy from WebCamTexture to TextureFrame, give it to the CalculatorGraph, etc...)

In ImageSourceSolution<T> function Run, check for GLES configType and use Graphics.ConvertTexture to do copy on GPU(Graphics.Copy will trigger in-place GPU readback on WebCamTexture), I could remove the WaitForEndOfFrame and got 27ms delay on my test device but result will flick, add this wait could stable the result and delay grow to 36ms, so I think I could make it wait on GPU side to reduce the delay.

        // Copy current image to TextureFrame
        if (graphRunner.configType == GraphRunner.ConfigType.OpenGLES)
        {
          textureFrame.ConvertTextureFrom(imageSource.GetCurrentTexture());
          yield return new WaitForEndOfFrame();
        }
        else
        {
          ReadFromImageSource(imageSource, textureFrame);
        }

@leozzyzheng
Copy link
Author

leozzyzheng commented Mar 4, 2024

After digging more about it, member function CreateSyncToken might be more suitable for this usage, since CreateSyncTokenForCurrentExternalContext won't switch context when creating Glsync.

@homuler
Copy link
Owner

homuler commented Mar 5, 2024

I haven't investigated thoroughly yet, but I think at least the following two need to be ported to Unity.

I'm not sure if it's enough to just generate a GlSyncToken.

@homuler homuler changed the title Export CreateSyncTokenForCurrentExternalContext from GlContext to csharp. Export CreateSyncToken from GlContext to csharp. Mar 5, 2024
@leozzyzheng
Copy link
Author

leozzyzheng commented Mar 5, 2024

I currently use GlContext::CreateSyncToken() and void GlTextureBuffer::Updated(std::shared_ptr) to set the sync point, but it's more complex than I think before using it.

  1. Graphics api calls only happens on rendering thread, so we need call GL.IssuePluginEvent and set sync point in rendering thread. It introduces 2ms delay on my device so a native rendering plugin might needed.
  2. MediaPipe use a dedicated thread to do all gl api call, so WaitUntilRelease, CreateSyncToken will be blocked heavily when the dedicated thread is busy dispatching compute or reading the result back from GPU. On my device, dispatch or readback may block the thread for 20ms so the WaitUntilRelease and CreateSyncToken will block the MainThread the same time in worst case.
  3. WaitUntilRelease blocking issue existing in current code and CreateSyncToken will increase the possible of hitting block.

Hope I could find a way to solve those issue.

@leozzyzheng
Copy link
Author

leozzyzheng commented Mar 6, 2024

Finally, I managed to make it working.

  1. I use this implementation to insert sync point:
MpReturnCode mp_GlTextureBuffer__InsertProducerSyncPoint(mediapipe::GlTextureBuffer* gl_texture_buffer) {
  TRY_ALL
    const auto& producerContext = gl_texture_buffer->GetProducerContext();
    if (producerContext) {
      gl_texture_buffer->Updated(mediapipe::GlContext::CreateSyncTokenForCurrentExternalContext(producerContext));
    }

    RETURN_CODE(MpReturnCode::Success);
  CATCH_ALL
}
  1. Call InsertProducerSyncPoint in RenderingThread in Unity for make sure it happens after the actual copy command.
  2. I need use GlExternalFenceSyncPoint rather than GlFenceSyncPoint. The sync point is created directly from context of Rendering Thread, and no need to wait context switching or thread switching.
  3. Call WaitUntilRelease of TextureFrame in OnTextureFrameRelease to avoid blocking on MainThread.

Calling InsertProducerSyncPoint by GL.IssuePluginEvent is little ugly I think, but it cost 1ms to wait and is easier to make a native rendering plugin, so I just let it be.

After done all of this, the GPU readback of WebCamTexture on MainThread is gone, and flicks are also gone. But the latency grows to 40ms, some of them are missing parts of WaitUntilRelease which wasn't included before, so even in pure GPU method, the latency won't be much better than normal GPU method :(

@homuler
Copy link
Owner

homuler commented Mar 6, 2024

By the way, does it run faster when calling AddPacketToInputStream from the callback of AsyncGPUReadBack ?

@leozzyzheng
Copy link
Author

By the way, does it run faster when calling AddPacketToInputStream from the callback of AsyncGPUReadBack ?

I think no, I have tested not set the sync point, the latency is same to set a sync point, the my device is stronge enough to run 60 FPS on MainThread, so the readback on MainThread won't let time cost beyond 16ms, it won't affect latency.

The bottleneck is compute dispatch and result readback on mediapipe thread.

@leozzyzheng
Copy link
Author

leozzyzheng commented Mar 6, 2024

I got your point, async readback is sure a better way to reduce MainThread blocks in the existing GPU method(requires a readback).

@leozzyzheng
Copy link
Author

After profiled the GPU render stage, my device will use around 30ms to finish compute shader for pose landmark, so it might has no way to reduce latency unless use faster model(I already use the lite model with 320*240 camera texture resolution).

@leozzyzheng
Copy link
Author

leozzyzheng commented Mar 7, 2024

Conclusion:
Export CreateSyncTokenForCurrentExternalContext, but the newly created shared_ptr might be difficult to manage lifecycle at C# side, so I directly export a new interface in GlTextureBuffer to simplify the usage like code above.

The latency might be improved if the readback is too slow in current GPU method on your device, or be simliar if readback is quick enough.

By the way, trigger the AddPacketToInputStream at RenderPipelineManager.beginContextRendering (URP), will get the camera texture earlier since the camera texture is updated in PostLateUpdate in Unity. Currently using WaitForEndOfFrame will let rendering logic finished before we call AddPacketToInputStream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants