Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculator::Process() for node "facedetectionfrontgpu__ImageToTensorCalculator" failed: Only BGRA/RGBA textures are supported, passed format: 24 #1311

Closed
jbest2015 opened this issue Nov 18, 2020 · 9 comments
Assignees

Comments

@jbest2015
Copy link

Having an issue with all mediapipe builds on my system. Everything compiles fine then when I run it, I get the BRGBA error

GLOG_logtostderr=1 bazel-bin/mediapipe/examples/desktop/face_detection/face_detection_gpu --calculator_graph_config_file=mediapipe/graphs/face_detection/face_detection_mobile_gpu.pbtxt

Result:

I20201118 09:54:38.296219 8095 demo_run_graph_main_gpu.cc:51] Get calculator graph config contents: # MediaPipe graph that performs face mesh with TensorFlow Lite on GPU.

GPU buffer. (GpuBuffer)

input_stream: "input_video"

Output image with rendered results. (GpuBuffer)

output_stream: "output_video"

Detected faces. (std::vector)

output_stream: "face_detections"

Throttles the images flowing downstream for flow control. It passes through

the very first incoming image unaltered, and waits for downstream nodes

(calculators and subgraphs) in the graph to finish their tasks before it

passes through another image. All images that come in while waiting are

dropped, limiting the number of in-flight images in most part of the graph to

1. This prevents the downstream nodes from queuing up incoming images and data

excessively, which leads to increased latency and memory usage, unwanted in

real-time mobile applications. It also eliminates unnecessarily computation,

e.g., the output produced by a node may get dropped downstream if the

subsequent nodes are still busy processing previous inputs.

node {
calculator: "FlowLimiterCalculator"
input_stream: "input_video"
input_stream: "FINISHED:output_video"
input_stream_info: {
tag_index: "FINISHED"
back_edge: true
}
output_stream: "throttled_input_video"
}

Subgraph that detects faces.

node {
calculator: "FaceDetectionFrontGpu"
input_stream: "IMAGE:throttled_input_video"
output_stream: "DETECTIONS:face_detections"
}

Converts the detections to drawing primitives for annotation overlay.

node {
calculator: "DetectionsToRenderDataCalculator"
input_stream: "DETECTIONS:face_detections"
output_stream: "RENDER_DATA:render_data"
node_options: {
[type.googleapis.com/mediapipe.DetectionsToRenderDataCalculatorOptions] {
thickness: 4.0
color { r: 255 g: 0 b: 0 }
}
}
}

Draws annotations and overlays them on top of the input images.

node {
calculator: "AnnotationOverlayCalculator"
input_stream: "IMAGE_GPU:throttled_input_video"
input_stream: "render_data"
output_stream: "IMAGE_GPU:output_video"
}
I20201118 09:54:38.296731 8095 demo_run_graph_main_gpu.cc:57] Initialize the calculator graph.
I20201118 09:54:38.297314 8095 demo_run_graph_main_gpu.cc:61] Initialize the GPU.
I20201118 09:54:38.305438 8095 gl_context_egl.cc:158] Successfully initialized EGL. Major : 1 Minor: 5
I20201118 09:54:38.363840 8106 gl_context.cc:324] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 455.45.01)
I20201118 09:54:38.363948 8095 demo_run_graph_main_gpu.cc:67] Initialize the camera or load the video.
I20201118 09:54:38.848876 8095 demo_run_graph_main_gpu.cc:88] Start running the calculator graph.
I20201118 09:54:38.849045 8095 demo_run_graph_main_gpu.cc:93] Start grabbing and processing frames.
INFO: Created TensorFlow Lite delegate for GPU.
ERROR: Following operations are not supported by GPU delegate:
DEQUANTIZE:
164 operations will run on the GPU, and the remaining 0 operations will run on the CPU.
I20201118 09:54:39.205772 8095 demo_run_graph_main_gpu.cc:175] Shutting down.
E20201118 09:54:39.238298 8095 demo_run_graph_main_gpu.cc:186] Failed to run the graph: CalculatorGraph::Run() failed in Run:
Calculator::Process() for node "facedetectionfrontgpu__ImageToTensorCalculator" failed: Only BGRA/RGBA textures are supported, passed format: 24

I am running ubuntu 18.04 / with a 2080ti. Nvidia drivers 455..
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Any help is greatly appreciated..

@AlvarezAti90
Copy link

AlvarezAti90 commented Nov 19, 2020

Hello,

I faced the same problem while testing the new palm_detection_gpu and hand_landmark_tracking_gpu graphs on Ubuntu desktop (mediapipe tag 0.8.0).
I tried several things:

1, Modified the demo_run_graph_main_gpu.cc source file to convert the camera_frame_raw with cv::COLOR_BGR2RGBA flag instead of cv::COLOR_BGR2RGB and then used mediapipe::ImageFormat::SRGBA to create ImageFrame
or
Added a SetAlphaCalculator to the entry level graph descriptor file.

These both solved the "Only BGRA/RGBA textures are supported, passed format: 24" but then with both solution I got an error stack like this:

I20201119 13:44:13.844982 3912 demo_run_graph_main_gpu.cc:88] Start running the calculator graph.
I20201119 13:44:13.848717 3912 demo_run_graph_main_gpu.cc:93] Start grabbing and processing frames.
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
INFO: Replacing 119 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions.
[mutex.cc : 1355] RAW: Potential Mutex deadlock:
@ 0x55b187c61839 absl::lts_2020_02_25::DebugOnlyDeadlockCheck()
@ 0x55b187c61b20 absl::lts_2020_02_25::Mutex::Lock()
@ 0x55b187420b14 absl::lts_2020_02_25::MutexLock::MutexLock()
@ 0x55b18776d0c7 std::make_unique<>()
@ 0x55b18776be40 mediapipe::Tensor::GetCpuReadView()
@ 0x55b18776121f mediapipe::TensorsToDetectionsCalculator::ProcessGPU()::{lambda()#1}::operator()()
@ 0x55b1877639d9 std::_Function_handler<>::_M_invoke()
@ 0x55b187477613 std::function<>::operator()()
@ 0x55b187b59ef8 mediapipe::GlContext::Run()::{lambda()#2}::operator()()
@ 0x55b187b5cb53 std::_Function_handler<>::_M_invoke()
@ 0x55b187477613 std::function<>::operator()()
@ 0x55b187b58826 mediapipe::GlContext::DedicatedThread::Run()
@ 0x55b187b5a0f4 mediapipe::GlContext::Run()
@ 0x55b1877715e3 mediapipe::GlCalculatorHelperImpl::RunInGlContext()
@ 0x55b18776e347 mediapipe::GlCalculatorHelper::RunInGlContext()
@ 0x55b18776174c mediapipe::TensorsToDetectionsCalculator::ProcessGPU()
@ 0x55b18775e82f mediapipe::TensorsToDetectionsCalculator::Process()
@ 0x55b187b340f0 mediapipe::CalculatorNode::ProcessNode()
@ 0x55b187b0b76d mediapipe::internal::SchedulerQueue::RunCalculatorNode()
@ 0x55b187b0b0f8 mediapipe::internal::SchedulerQueue::RunNextTask()
@ 0x55b187b09c95 mediapipe::Executor::AddTask()::{lambda()#1}::operator()()
@ 0x55b187b09e9f std::_Function_handler<>::_M_invoke()
@ 0x55b1879e0dfe std::function<>::operator()()
@ 0x55b187b5a28f mediapipe::GlContext::RunWithoutWaiting()::{lambda()#1}::operator()()
@ 0x55b187b5cc6b std::_Function_handler<>::_M_invoke()
@ 0x55b1879e0dfe std::function<>::operator()()
@ 0x55b187b585f0 mediapipe::GlContext::DedicatedThread::ThreadBody()
@ 0x55b187b5857e mediapipe::GlContext::DedicatedThread::ThreadBody()
@ 0x7fd4a527a669 start_thread

[mutex.cc : 1365] RAW: Acquiring 0x7fd48ca378e8 Mutexes held: 0x55b189860558 0x7fd48c0afc98
[mutex.cc : 1367] RAW: Cycle:
[mutex.cc : 1381] RAW: mutex@0x7fd48ca378e8 stack:
@ 0x55b187c61839 absl::lts_2020_02_25::DebugOnlyDeadlockCheck()
@ 0x55b187c61b20 absl::lts_2020_02_25::Mutex::Lock()
@ 0x55b187420b14 absl::lts_2020_02_25::MutexLock::MutexLock()
@ 0x55b18776d0c7 std::make_unique<>()
@ 0x55b18776be40 mediapipe::Tensor::GetCpuReadView()
@ 0x55b18776121f mediapipe::TensorsToDetectionsCalculator::ProcessGPU()::{lambda()#1}::operator()()
@ 0x55b1877639d9 std::_Function_handler<>::_M_invoke()
@ 0x55b187477613 std::function<>::operator()()
@ 0x55b187b59ef8 mediapipe::GlContext::Run()::{lambda()#2}::operator()()
@ 0x55b187b5cb53 std::_Function_handler<>::_M_invoke()
@ 0x55b187477613 std::function<>::operator()()
@ 0x55b187b58826 mediapipe::GlContext::DedicatedThread::Run()
@ 0x55b187b5a0f4 mediapipe::GlContext::Run()
@ 0x55b1877715e3 mediapipe::GlCalculatorHelperImpl::RunInGlContext()
@ 0x55b18776e347 mediapipe::GlCalculatorHelper::RunInGlContext()
@ 0x55b18776174c mediapipe::TensorsToDetectionsCalculator::ProcessGPU()
@ 0x55b18775e82f mediapipe::TensorsToDetectionsCalculator::Process()
@ 0x55b187b340f0 mediapipe::CalculatorNode::ProcessNode()
@ 0x55b187b0b76d mediapipe::internal::SchedulerQueue::RunCalculatorNode()
@ 0x55b187b0b0f8 mediapipe::internal::SchedulerQueue::RunNextTask()
@ 0x55b187b09c95 mediapipe::Executor::AddTask()::{lambda()#1}::operator()()
@ 0x55b187b09e9f std::_Function_handler<>::_M_invoke()
@ 0x55b1879e0dfe std::function<>::operator()()
@ 0x55b187b5a28f mediapipe::GlContext::RunWithoutWaiting()::{lambda()#1}::operator()()
@ 0x55b187b5cc6b std::_Function_handler<>::_M_invoke()
@ 0x55b1879e0dfe std::function<>::operator()()
@ 0x55b187b585f0 mediapipe::GlContext::DedicatedThread::ThreadBody()
@ 0x55b187b5857e mediapipe::GlContext::DedicatedThread::ThreadBody()
@ 0x7fd4a527a669 start_thread

[mutex.cc : 1381] RAW: mutex@0x7fd48c0afc98 stack:
@ 0x55b187c61839 absl::lts_2020_02_25::DebugOnlyDeadlockCheck()
@ 0x55b187c61b20 absl::lts_2020_02_25::Mutex::Lock()
@ 0x55b187420b14 absl::lts_2020_02_25::MutexLock::MutexLock()
@ 0x55b18776d0c7 std::make_unique<>()
@ 0x55b18776b48e mediapipe::Tensor::GetOpenGlBufferWriteView()
@ 0x55b187760f8f mediapipe::TensorsToDetectionsCalculator::ProcessGPU()::{lambda()#1}::operator()()
@ 0x55b1877639d9 std::_Function_handler<>::_M_invoke()
@ 0x55b187477613 std::function<>::operator()()
@ 0x55b187b59ef8 mediapipe::GlContext::Run()::{lambda()#2}::operator()()
@ 0x55b187b5cb53 std::_Function_handler<>::_M_invoke()
@ 0x55b187477613 std::function<>::operator()()
@ 0x55b187b58826 mediapipe::GlContext::DedicatedThread::Run()
@ 0x55b187b5a0f4 mediapipe::GlContext::Run()
@ 0x55b1877715e3 mediapipe::GlCalculatorHelperImpl::RunInGlContext()
@ 0x55b18776e347 mediapipe::GlCalculatorHelper::RunInGlContext()
@ 0x55b18776174c mediapipe::TensorsToDetectionsCalculator::ProcessGPU()
@ 0x55b18775e82f mediapipe::TensorsToDetectionsCalculator::Process()
@ 0x55b187b340f0 mediapipe::CalculatorNode::ProcessNode()
@ 0x55b187b0b76d mediapipe::internal::SchedulerQueue::RunCalculatorNode()
@ 0x55b187b0b0f8 mediapipe::internal::SchedulerQueue::RunNextTask()
@ 0x55b187b09c95 mediapipe::Executor::AddTask()::{lambda()#1}::operator()()
@ 0x55b187b09e9f std::_Function_handler<>::_M_invoke()
@ 0x55b1879e0dfe std::function<>::operator()()
@ 0x55b187b5a28f mediapipe::GlContext::RunWithoutWaiting()::{lambda()#1}::operator()()
@ 0x55b187b5cc6b std::_Function_handler<>::_M_invoke()
@ 0x55b1879e0dfe std::function<>::operator()()
@ 0x55b187b585f0 mediapipe::GlContext::DedicatedThread::ThreadBody()
@ 0x55b187b5857e mediapipe::GlContext::DedicatedThread::ThreadBody()
@ 0x7fd4a527a669 start_thread

[mutex.cc : 1386] RAW: dying due to potential deadlock

I wonder if you tried these soultions you would get something similar.

2, My second guess was to replace some calculators with their older versions:
ImageToTensorCalculator -> ImageTransformationCalculator and TfLiteConverterCalculator
InferenceCalculator -> TfLiteInferenceCalculator
TensorsToDetectionsCalculator -> TfLiteTensorsToDetectionsCalculator

Since the ImageTransformationCalculator supports not only RGBA but RGB input too, it could run with the original input from demo_run_graph_main_gpu.cc without any addtional pixel format conversion. The graph could run and detect hands but it performed far worse than with CPU in the same version (tag 0.8.0) or the earlier versions with GPU.

While I was trying to find other solutions, I found a notification in the new TensorConverterCalculator:
// IMPORTANT Notes:
// GPU tensors are currently only supported on mobile platforms.

So, I assume this is a known issue in the latest version.

I would also appriciate any help on this.

@AlvarezAti90
Copy link

Just a quick update, related to the error stack mentioned in my last comment at point 1.:
As the error suggests, there is a mutex attempted to locked twice without releasing it.
It can be fixed by closing the scope in tensors_to_detections.cc (TensorsToDetectionsCalculator::ProcessGPU), row:435 and opening the new scope in the next row. So separate decoded_boxes_view and scored_boxes_view variables.

After this fix, the graph can run with GPU support, but sadly it still performs bad (in sense of frequency of detecting hands). The CPU version somehow performs way better on the same video sample. I didn't experience such difference at earlier versions.

@AlvarezAti90
Copy link

AlvarezAti90 commented Nov 20, 2020

Sorry, I made some more tests and eventually with the two fixes I mentioned before, the GPU and CPU version perform similarly.
In summary, these are the steps to fix the issue:

1,
a, Modify the demo_run_graph_main_gpu.cc source file to convert the camera_frame_raw with cv::COLOR_BGR2RGBA flag instead of cv::COLOR_BGR2RGB and then used mediapipe::ImageFormat::SRGBA to create ImageFrame

row101.: cv::cvtColor(camera_frame_raw, camera_frame, cv::COLOR_BGR2RGBA);
row108.: mediapipe::ImageFormat::SRGBA, camera_frame.cols, camera_frame.rows,

OR

b,
Add a SetAlphaCalculator to the entry level graph descriptor file. In your case: face_detection_mobile_gpu.pbtxt

change input_stream: "input_video" to input_stream: "input_videoraw" in row 4.

and add:

node {
calculator: "SetAlphaCalculator"
input_stream: "IMAGE_GPU:input_videoraw"
output_stream: "IMAGE_GPU:input_video"
node_options: {
[type.googleapis.com/mediapipe.SetAlphaCalculatorOptions] {
alpha_value: 255
}
}
}

In this case you need to change demo_run_graph_main_gpu.cc row 33 to constexpr char kInputStream[] = "input_videoraw";

Also do not forget to add the calculator in the build file: mediapipe/graphs/face_detection/BUILD to mobile_calculators for example to row 22 like this:
"//mediapipe/calculators/image:set_alpha_calculator",

2,
Fix the mutex issue by closing the scope in tensors_to_detections.cc (TensorsToDetectionsCalculator::ProcessGPU), row:435 and opening the new scope in the next row. So separate decoded_boxes_view and scored_boxes_view variables.

@jbest2015
Copy link
Author

Wow this is great ! Thank you so much for your help. Just a quick question, I am working on adding a feature to the desktop version of multi hands that labels left versus right.. I thought I saw this out there as a modification to a calculator at some point , but I don't se it anymore.. Any chance you have this around as well?

@AlvarezAti90
Copy link

If I understand, you are looking for the mediapipe/graphs/hand_tracking/subgraphs/hand_renderer_gpu.pbtxt file.
There is a GateCalculator in line 156. which disallows rendering the handedness label when both hands are detected.
So the demo app renders handedness data to a fix position of the picture when only one hand is detected on the frame.

If you want to render the handedness around the related hands, you need to use the multi_hand_rects to position the labels.
So you need to edit this subgraph a little and also you need to extend LabelsToRenderDataCalculator functionality to be able to receive starting location (for example the rect of the hand) through an input stream and to offset the label starting position according to that. Currently it is a fixed parameter option in the graph descriptor file horizontal_offset_px and vertical_offset_px.

@jbest2015
Copy link
Author

Yes sir, and I have both of those functions working, but the current hand landmarks only show 0 and 1 , where 0 is the first hand it detects and 1 is the second hand (regardless of whether it is left or right) . I have seen some code that uses the palm vector in relation to the thumb position to tell left from right.. I cannot find it now.. I was able to do it in python but for speed sake I would prefer to do it in the calculator as you mentioned. Just struggling in C++ a bit with out things like Numpy.py etc.. was hoping it was lying around and I missed it. Again thank you for your prompt response and help! you have been great.

@mcclanahoochie mcclanahoochie self-assigned this Nov 20, 2020
@jbest2015
Copy link
Author

Confirmed that your solution solved my original issue, I did number 1 and I changed row 442 in tensors_to_detections_calculator.cc from : glDispatchCompute(num_boxes_, 1, 1);
to :
glDispatchCompute(num_boxes_ + 1, 1, 1);

@jbest2015
Copy link
Author

Follow up on the handness, I was compiling the wrong version (too many adjustments trying to fix my original version) . I see exacty what you are talking about and I am going to attempt the modifications to label the hands above the hand images.

@AlvarezAti90
Copy link

Great! I'm happy it helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants