Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to go about integrating mediapipe's hand tracking into an existing Visual Studio C++ project? #1162

Closed
CristianNCC opened this issue Sep 29, 2020 · 102 comments
Assignees
Labels
legacy:hands Hand tracking/gestures/etc platform:desktop desktop type:feature Enhancement in the New Functionality or Request for a New Solution

Comments

@CristianNCC
Copy link

I must admit that I am quite a newb when it comes to build systems. So far I've managed to build the hello_word and the hand_tracking examples on Windows (going through quite a few hoops along the way). Of course, building those examples gives me an executable, which I can't integrate in another project.

What I'd like to do is have mediapipe as a dll or something similar so I can pass the framework an image and get joint coordinates in return, ideally. Can someone point me in the right direction?

@CristianNCC CristianNCC changed the title How to go about integrate mediapipe's hand tracking into an existing Visual Studio C++ project? How to go about integrating mediapipe's hand tracking into an existing Visual Studio C++ project? Sep 29, 2020
@BitLoose
Copy link

An example of this would be useful as I can't see any clear explanation of how mediapipe is integrated into a standard c++ application.

@ttamas0713
Copy link

An example of this would be useful as I can't see any clear explanation of how mediapipe is integrated into a standard c++ application.

You can copy or modify the existing file called demo_run_graph_main.cc or demo_run_graph_main_gpu.cc at /mediapipe/examples/desktop. Also, simple_main_graph.cc (c++ file of hello_world example) can be useful to learn how to start a graph and get it's output.

@CristianNCC
Copy link
Author

CristianNCC commented Oct 25, 2020

An example of this would be useful as I can't see any clear explanation of how mediapipe is integrated into a standard c++ application.

You can copy or modify the existing file called demo_run_graph_main.cc or demo_run_graph_main_gpu.cc at /mediapipe/examples/desktop. Also, simple_main_graph.cc (c++ file of hello_world example) can be useful to learn how to start a graph and get it's output.

Well, I've tried that in many ways already :/. What I mean is I've tried to change the Bazel BUILD file in _/mediapipe/examples/desktop for demo_run_graph_main so it builds a DLL that you can simply plug into another application along with a header file. From what I've seen, the cc_library Bazel rules offer you a "linkstatic" flag that is true by default and it can be set to false. The output binaries didn't seem to differ when I changed that one flag though...

@jerrysheen
Copy link

I'm also interested in this, have u guys already figure out how to do this?

@shubhamvad
Copy link

Hi guys, any update on this ? I am also interested to know the solution on this. Thanks in advance.

@ghost
Copy link

ghost commented Feb 4, 2021

Even I'm trying to figure out the same, eagerly looking forward to this feature

@FabianWildgrube
Copy link

Same here. I would be very interested to see how this could be done.

@vamsee-asu2019
Copy link

vamsee-asu2019 commented Mar 13, 2021

Have you guys figured this out ?
I am trying to use the mediapipe in another c++ project. Any leads would be helpful.

@travisjayday
Copy link

Bump. As a fellow noob to the framework, the task of integration into existing C++ projects seems really daunting without guidance.

@BitLoose
Copy link

BitLoose commented Jun 16, 2021 via email

@sgowroji sgowroji self-assigned this Jul 20, 2021
@sgowroji sgowroji added legacy:hands Hand tracking/gestures/etc stat:awaiting response Waiting for user response type:others issues not falling in bug, perfromance, support, build and install or feature labels Jul 20, 2021
@carter54
Copy link

Same here. An example of how to integrating mediapipe application into an existing c++ project will be very helpful~

@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

@sgowroji sgowroji added stat:awaiting googler Waiting for Google Engineer's Response and removed stat:awaiting response Waiting for user response stale labels Aug 10, 2021
@sgowroji sgowroji assigned chuoling and unassigned sgowroji Aug 10, 2021
@sgowroji sgowroji added the platform:desktop desktop label Aug 10, 2021
@sgowroji sgowroji reopened this Aug 10, 2021
@sgowroji sgowroji added type:docs-feature Doc issues for new solution, or clarifications about functionality and removed type:others issues not falling in bug, perfromance, support, build and install or feature labels Aug 10, 2021
@rajkundu
Copy link

rajkundu commented Feb 2, 2023

@rajkundu Hello, I have based on your fork modified a version. Now it works like a charm, single header file only, I added some API easily convert data from Packet without many proto write.

I'm a C++ beginner, so my understanding of C++ is rather limited. However, when designing LibMP, there were some tradeoffs that I had to consider; I often prioritized compatibility when making design choices. One thing I learned early-on is that passing C++ objects across DLL boundaries is undefined. Apparently, this is due to C++'s lack of a standardized Application Binary Interface (ABI). To get around this, all of the functions in libmp.h return simple, C data types.

Regarding the following functions which you added to your fork of LibMP:

virtual const mediapipe::NormalizedLandmarkList DirectlyGetNormalizedLandmarkList(const char *outputStream) = 0;
virtual const std::vector<mediapipe::NormalizedLandmarkList> DirectlyGetNormalizedLandmarkListVec(const char *outputStream) = 0;
virtual cv::Mat *DirectlyGetAsCvMatWrong(const char *outputStream) = 0;
virtual cv::Mat *DirectlyGetAsCvMat(const char *outputStream) = 0;
virtual cv::Mat DirectlyGetAsCvMat2(const char *outputStream) = 0;

Due to passing complex C++ data types/objects, I believe that this code may be unstable. It only works if you use compatible (i.e., the same) compilers to build LibMP and your LibMP "client" application. It seems that this worked for your particular project/build environment/machine, but my understanding is that this is a coincidence - that others may not have the same luck as you. So, while it is certainly not ideal to be casting to and reinterpreting from void*, etc., LibMP does this to ensure that it works across as many different build environments as possible. As long as you're on the same platform, you should be able to compile LibMP with one compiler (i.e., not have to care what Bazel uses), then build your client application with any compiler of your choice.

My overall goal with LibMP was pretty much to interact with Bazel as little as possible - a "compile once, run anywhere" sort-of approach. Because it is a shared library, you should only have to compile LibMP once using Bazel. Then, you should be able to import it into any number of other projects on that same machine. Not only that, but if I understand correctly, then LibMP DLLs/SOs can actually be distributed (along with header files), so that others don't have to build LibMP themselves. In theory, after building LibMP, I should be able to compress all files (resolving Bazel symlinks) into an archive, which others using the same platform can simply unzip for use in their own projects. (If anyone is interested in this, please let me know!)

So, in the interest of maximizing compatibility, LibMP is limited to C data types and is therefore somewhat inconvenient to use. Of course, I encourage you to make improvements/optimizations for your own use case of LibMP, but do be aware of some potential limitations of shared libraries, especially those mentioned here.

PRs are absolutely more than welcome to LibMP, but I don't think they should come at the cost of compatibility.

@lucasjinreal
Copy link

Hello, thank u for answer. I have solved all problems, now it works like a charm

@rajkundu
Copy link

rajkundu commented Feb 3, 2023

@jinfagang Yay - glad to hear that! 😁

@lucasjinreal
Copy link

@rajkundu Hello, for god sake I hit another problem now..

I don't why mediapipe didn't support windows11, but it actually have problems on link:

image

Do u got a chance test on windows11? Can u built it win11?

As many many users now using win11, it should be a MUST step to make this software forward now.

@rajkundu
Copy link

rajkundu commented Feb 4, 2023

I don’t think it’s a Windows 11 problem, because I primarily developed LibMP on a Windows 11 machine (Intel x64) myself. In terms of the compiler, I used MSVC 17.3.5.

Which compiler are you using? Are you setting Bazel environmental variables properly? See the build_libmp_win scripts in the top level of my MediaPipe fork, and see this page for more information on using Bazel with Windows.

@rajkundu
Copy link

rajkundu commented Feb 4, 2023

I think we should move discussion to rajkundu/libmp-example#2 to avoid (further) polluting this thread. Sorry, everyone!

@lucasjinreal
Copy link

@rajkundu I maybe because of visual studio version, am using 2019 is fine, 2022 got above errors.

@oUp2Uo
Copy link

oUp2Uo commented Feb 16, 2023

Hello @rajkundu , thanks for your great work and I'm using LibMP well now.
I have one more question:
Now we can get multiFaceLandmarks from the result by libmp.h, in which we can get the landmarks.
Could we also get multiFaceGeometry from the result? In that case, we can get the rotaion matrix of the faces, I guess?
I thought the way to get multiFaceGeometry was similar to multiFaceLandmarks.
I have tried to read and understand the code, but still do now know how to add it.
Have you have any idea about this? Thanks.

@rajkundu
Copy link

@oUp2Uo In general, to access any MediaPipe output stream such as multi_face_geometry, you must first ensure that it is defined in your MediaPipe graph. E.g., in my libmp-example repository, I am using a slightly modified version of MediaPipe's face_mesh_desktop_live example graph, so only the multi_face_landmarks output stream is currently defined.

So, my first step to incorporate multi_face_geometry would be to find a MediaPipe example graph(s) that uses it - e.g., single_face_geometry_from_landmarks_gpu.pbtxt. Now, try to manually combine the necessary parts of these graphs' configs with the face_mesh_desktop_live graph which you're currently using for landmarks. Copy over nodes and adjust their connections as needed. For example, the ImagePropertiesCalculator in single_face_geometry_from_landmarks_gpu takes in a GPU image, but try passing it the throttled_input_video stream instead.

Here is my first draft of a combined graph:

# MediaPipe graph that performs face mesh with TensorFlow Lite on CPU.
# Input image. (ImageFrame)
input_stream: "input_video"
# Output image with rendered results. (ImageFrame)
output_stream: "output_video"
# Collection of detected/processed faces, each represented as a list of
# landmarks. (std::vector<NormalizedLandmarkList>)
output_stream: "multi_face_landmarks"

### Experimental face geometry stuff ###
# A list of geometry data for a single detected face.
#
# NOTE: there will not be an output packet in this stream for this particular
# timestamp if none of faces detected.
#
# (std::vector<face_geometry::FaceGeometry>)
output_stream: "multi_face_geometry"

# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for downstream nodes
# (calculators and subgraphs) in the graph to finish their tasks before it
# passes through another image. All images that come in while waiting are
# dropped, limiting the number of in-flight images in most part of the graph to
# 1. This prevents the downstream nodes from queuing up incoming images and data
# excessively, which leads to increased latency and memory usage, unwanted in
# real-time mobile applications. It also eliminates unnecessarily computation,
# e.g., the output produced by a node may get dropped downstream if the
# subsequent nodes are still busy processing previous inputs.
node {
	calculator: "FlowLimiterCalculator"
	input_stream: "input_video"
	input_stream: "FINISHED:output_video"
	input_stream_info: {
		tag_index: "FINISHED"
		back_edge: true
	}
	output_stream: "throttled_input_video"
}
# Defines side packets for further use in the graph.
node {
	calculator: "ConstantSidePacketCalculator"
	output_side_packet: "PACKET:0:num_faces"
	output_side_packet: "PACKET:1:use_prev_landmarks"
	output_side_packet: "PACKET:2:with_attention"
	node_options: {
		[type.googleapis.com/mediapipe.ConstantSidePacketCalculatorOptions]: {
			packet { int_value: 1 }
			packet { bool_value: true }
			packet { bool_value: true }
		}
	}
}
# Subgraph that detects faces and corresponding landmarks.
node {
	calculator: "FaceLandmarkFrontCpu"
	input_stream: "IMAGE:throttled_input_video"
	input_side_packet: "NUM_FACES:num_faces"
	input_side_packet: "USE_PREV_LANDMARKS:use_prev_landmarks"
	input_side_packet: "WITH_ATTENTION:with_attention"
	output_stream: "LANDMARKS:multi_face_landmarks"
	output_stream: "ROIS_FROM_LANDMARKS:face_rects_from_landmarks"
	output_stream: "DETECTIONS:face_detections"
	output_stream: "ROIS_FROM_DETECTIONS:face_rects_from_detections"
}
# Subgraph that renders face-landmark annotation onto the input image.
node {
	calculator: "FaceRendererCpu"
	input_stream: "IMAGE:throttled_input_video"
	input_stream: "LANDMARKS:multi_face_landmarks"
	input_stream: "NORM_RECTS:face_rects_from_landmarks"
	input_stream: "DETECTIONS:face_detections"
	output_stream: "IMAGE:output_video"
}

### Experimental face geometry stuff ###
### Adapted from https://github.com/google/mediapipe/blob/master/mediapipe/graphs/face_effect/subgraphs/single_face_geometry_from_landmarks_gpu.pbtxt

# Generates an environment that describes the current virtual scene.
node {
	calculator: "FaceGeometryEnvGeneratorCalculator"
	output_side_packet: "ENVIRONMENT:environment"
	node_options: {
		[type.googleapis.com/mediapipe.FaceGeometryEnvGeneratorCalculatorOptions] {
			environment: {
				origin_point_location: TOP_LEFT_CORNER
				perspective_camera: {
					vertical_fov_degrees: 63.0	# 63 degrees
					near: 1.0	# 1cm
					far: 10000.0	# 100m
				}
			}
		}
	}
}

# Extracts a single set of face landmarks associated with the most prominent
# face detected from a collection.
node {
	calculator: "SplitNormalizedLandmarkListVectorCalculator"
	input_stream: "multi_face_landmarks"
	output_stream: "face_landmarks"
	node_options: {
		[type.googleapis.com/mediapipe.SplitVectorCalculatorOptions] {
			ranges: { begin: 0 end: 1 }
			element_only: true
		}
	}
}
# Extracts the input image frame dimensions as a separate packet.
node {
	calculator: "ImagePropertiesCalculator"
	input_stream: "IMAGE:throttled_input_video"
	output_stream: "SIZE:input_image_size"
}
# Applies smoothing to the single set of face landmarks.
node {
	calculator: "FaceLandmarksSmoothing"
	input_stream: "NORM_LANDMARKS:face_landmarks"
	input_stream: "IMAGE_SIZE:input_image_size"
	output_stream: "NORM_FILTERED_LANDMARKS:smoothed_face_landmarks"
}
# Puts the single set of smoothed landmarks back into a collection to simplify
# passing the result into the `FaceGeometryFromLandmarks` subgraph.
node {
	calculator: "ConcatenateNormalizedLandmarkListVectorCalculator"
	input_stream: "smoothed_face_landmarks"
	output_stream: "multi_smoothed_face_landmarks"
}
# Computes face geometry from face landmarks for a single face.
node {
	calculator: "FaceGeometryFromLandmarks"
	input_stream: "MULTI_FACE_LANDMARKS:multi_smoothed_face_landmarks"
	input_stream: "IMAGE_SIZE:input_image_size"
	input_side_packet: "ENVIRONMENT:environment"
	output_stream: "MULTI_FACE_GEOMETRY:multi_face_geometry"
}

I haven't tested it at all, but hopefully it can be a starting point for you. Let me know if I can clarify any further!

@oUp2Uo
Copy link

oUp2Uo commented Feb 17, 2023

@rajkundu Thanks for your infomation! I will try to understand this.

@oUp2Uo
Copy link

oUp2Uo commented Feb 17, 2023

@rajkundu I have a question: In graph, landmark results is output_stream: "LANDMARKS:multi_face_landmarks", how do you know that should use mediapipe::NormalizedLandmarkList to get the data?
I am asking this because I tried but I do not know which class should I use to get the data of output_stream: "MULTI_FACE_GEOMETRY:multi_face_geometry".

Edit 20230220:
I found in other link (#1253) that class of face_geometry should be defined in "mediapipe/modules/face_geometry/protos/face_geometry.pb.h".
But I searched the folder of libmp compiled, there was no file face_geometry.pb.h.

Edit 20230220 2:
After add "//mediapipe/modules/face_geometry:face_geometry_from_detection", in section LIB_DEPS in mediapipe\examples\desktop\libmp\BUILD, I have got the face_geometry.pb.h.
And also I have add "//mediapipe/modules/face_geometry:env_generator_calculator", in section LIB_DEPS.
Now I can compile the new program.
But still run with result Could not find type "type.googleapis.com/mediapipe.FaceGeometryEnvGeneratorCalculatorOptions" stored in google.protobuf.Any.

Edit 20230221:
After trying many many times, I foud add "//mediapipe/modules/face_geometry:geometry_pipeline_calculator", in section LIB_DEPS could solve Could not find type "type.googleapis.com/mediapipe.FaceGeometryEnvGeneratorCalculatorOptions" stored in google.protobuf.Any error.

Edit 20230221 2:
After apply the patch in #2867, the graph could run now.

Edit 20230222:
By writing a function like static std::vector<std::vector<std::array<float, 3>>> get_landmarks(const std::shared_ptr<mediapipe::LibMP>& face_mesh),
in which, using the same way to get multi_face_geometry data, I can retrieve the matrix now.
But only one matrix even when there were more faces.

Edit 20230222 2:
change graph:

# Extracts the input image frame dimensions as a separate packet.
node {
	calculator: "ImagePropertiesCalculator"
	input_stream: "IMAGE:throttled_input_video"
	output_stream: "SIZE:input_image_size"
}
# Applies smoothing to the single set of face landmarks.
node {
	calculator: "FaceLandmarksSmoothing"
	input_stream: "NORM_LANDMARKS:face_landmarks"
	input_stream: "IMAGE_SIZE:input_image_size"
	output_stream: "NORM_FILTERED_LANDMARKS:smoothed_face_landmarks"
}
# Puts the single set of smoothed landmarks back into a collection to simplify
# passing the result into the `FaceGeometryFromLandmarks` subgraph.
node {
	calculator: "ConcatenateNormalizedLandmarkListVectorCalculator"
	input_stream: "smoothed_face_landmarks"
	output_stream: "multi_smoothed_face_landmarks"
}
# Computes face geometry from face landmarks for a single face.
node {
	calculator: "FaceGeometryFromLandmarks"
	input_stream: "MULTI_FACE_LANDMARKS:multi_smoothed_face_landmarks"
	input_stream: "IMAGE_SIZE:input_image_size"
	input_side_packet: "ENVIRONMENT:environment"
	output_stream: "MULTI_FACE_GEOMETRY:multi_face_geometry"
}

to

node {
	calculator: "FaceGeometryFromLandmarks"
	input_stream: "IMAGE_SIZE:input_image_size"
	input_stream: "MULTI_FACE_LANDMARKS:multi_face_landmarks"
	input_side_packet: "ENVIRONMENT:environment"
	output_stream: "MULTI_FACE_GEOMETRY:multi_face_geometry"
}

Now multi matries could be retrieved.

@lucasjinreal
Copy link

I just got one single question, how to specific mediapipe model path root?

@rajkundu
Copy link

@oUp2Uo Thank you very much for sharing everything! Thanks to your helpful comment and edits, I have now added face geometry to LibMP. Hopefully others can now use it easily as well! :)

@chnoblouch
Copy link

We integrated MediaPipe into our CMake project by creating a C wrapper for it. It works the same way as rajkundu's LibMP, by building a library with Bazel, which can then be used in CMake/Visual Studio/Xcode projects. The library consists of a shared object (libmediapipe.so) and a single header (mediapipe.h). It currently runs on Linux, Windows and MacOS.

This might be useful for people who want to use MediaPipe in languages that are currently not supported.

@rajkundu
Copy link

rajkundu commented Mar 1, 2023

@chnoblouch That is awesome! I know at least @c1ngular was looking for something like this, and I'm sure there are others. Thank you for sharing!

@c1ngular
Copy link

c1ngular commented Mar 2, 2023

@rajkundu @chnoblouch well done , thank you both .

@oUp2Uo
Copy link

oUp2Uo commented Mar 2, 2023

@rajkundu But still there was some small problem.
For example, without FaceLandmarksSmoothing, the landmark is not so smooth. when angle is large.
And FaceLandmarksSmoothing is only for single face, I donot know how to filter multi_face_landmarks now.

When I add SplitNormalizedLandmarkListVectorCalculator / FaceLandmarksSmoothing / ConcatenateNormalizedLandmarkListVectorCalculator to the graph, program will freeze some time.

@Daumas-hugo
Copy link

With the end of the support of legacy solutions, are these solutions still a good idea to use for long term app development ?
And if we need a lot of optimization, is it still a good idea to go after mediapipe c++ integration ?

@kuaashish kuaashish assigned kuaashish and unassigned markjsherwood Jun 12, 2023
@kuaashish kuaashish removed the stat:awaiting googler Waiting for Google Engineer's Response label Jun 12, 2023
@kuaashish
Copy link
Collaborator

@CristianNCC,

We are no longer able to provide such configuration from our end. However, We should ship pre-built binaries for Windows if we want better support. Thank you

@kuaashish kuaashish added the stat:awaiting response Waiting for user response label Jun 15, 2023
@github-actions
Copy link

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Jun 23, 2023
@github-actions
Copy link

github-actions bot commented Jul 1, 2023

This issue was closed due to lack of activity after being marked stale for past 7 days.

@github-actions github-actions bot closed this as completed Jul 1, 2023
@kuaashish kuaashish removed stat:awaiting response Waiting for user response stale labels Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
legacy:hands Hand tracking/gestures/etc platform:desktop desktop type:feature Enhancement in the New Functionality or Request for a New Solution
Projects
None yet
Development

No branches or pull requests