Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identification of Gesture in hand tracking model #40

Closed
DebankurS opened this issue Aug 22, 2019 · 9 comments
Closed

Identification of Gesture in hand tracking model #40

DebankurS opened this issue Aug 22, 2019 · 9 comments
Labels
legacy:hands Hand tracking/gestures/etc

Comments

@DebankurS
Copy link

Hi,

I am trying to understand how the gesture part of the application works.
But was able to only get the Hand Landmark model. And could not find any documentation for the Gesture.

Please provide links or guides to that I can try it out.

@DebankurS DebankurS changed the title Identification of Gesture in hend tracking model Identification of Gesture in hand tracking model Aug 22, 2019
@Suraj520
Copy link

Eagerly looking forward to the resolution of the above issue!

@mgyong
Copy link

mgyong commented Aug 26, 2019

@DebankurS Gesture as detailed in the Google Hand tracking AI blog post is not available in the open source example. @fanzhanggoogle

@camillol camillol added the legacy:hands Hand tracking/gestures/etc label Aug 27, 2019
@lisbravo
Copy link

@DebankurS Gesture as detailed in the Google Hand tracking AI blog post is not available in the open source example. @fanzhanggoogle

-And do you plan to make it available?
-Can you please explain why some parts of the code are open and some, like this case, are not?

@mgyong
Copy link

mgyong commented Aug 28, 2019

@psykhon We do not have plans to release the gesture as the code is very basic and not production ready. It is just a series of rules mapping to a gesture.
We encourage our users to write their own gesture recognition based of the hand tracking example that outputs 21 landmarks of the hand

@DebankurS
Copy link
Author

Any guide to the right data for the task @mgyong. I could not find the labelled 21 landmark dataset for this

@lisbravo
Copy link

@psykhon We do not have plans to release the gesture as the code is very basic and not production ready. It is just a series of rules mapping to a gesture.
We encourage our users to write their own gesture recognition based of the hand tracking example that outputs 21 landmarks of the hand

I understand, but at the same time I guess that if you want this framework to be accepted, you shouldn't tease with examples that later can not be reproduced, otherwise it just feels like clickbait

@mgyong
Copy link

mgyong commented Aug 30, 2019

@psykhon We have full examples for hand tracking (released model + open sourced pipeline). Definitely not teasing in any way :-)

@mgyong mgyong closed this as completed Aug 30, 2019
@DokRaphael
Copy link

Hey, I am not really a developper (I am a designer) but I tried to implement an open hand recognition
Basically I am summing up the angles between each fingertip and wrist. I am assuming that the wrist is the landmark 0 and each fingertip is : 4, 8, 12, 16 and 20 (I could be completely wrong with this). I am really sorry for my coding, I am just trying things out here and there fast and dirty.

Here is what I did in the landmark_letterbox_removal_calculator.cc :

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include <cmath>
#include <vector>

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "mediapipe/framework/port/ret_check.h"

namespace mediapipe {

namespace {

constexpr char kLandmarksTag[] = "LANDMARKS";
    constexpr char kOpen[] = "OPENHAND";
constexpr char kLetterboxPaddingTag[] = "LETTERBOX_PADDING";

}  // namespace

// Adjusts landmark locations on a letterboxed image to the corresponding
// locations on the same image with the letterbox removed. This is useful to map
// the landmarks inferred from a letterboxed image, for example, output of
// the ImageTransformationCalculator when the scale mode is FIT, back to the
// corresponding input image before letterboxing.
//
// Input:
//   LANDMARKS: An std::vector<NormalizedLandmark> representing landmarks on an
//   letterboxed image.
//
//   LETTERBOX_PADDING: An std::array<float, 4> representing the letterbox
//   padding from the 4 sides ([left, top, right, bottom]) of the letterboxed
//   image, normalized to [0.f, 1.f] by the letterboxed image dimensions.
//
// Output:
//   LANDMARKS: An std::vector<NormalizedLandmark> representing landmarks with
//   their locations adjusted to the letterbox-removed (non-padded) image.
//
// Usage example:
// node {
//   calculator: "LandmarkLetterboxRemovalCalculator"
//   input_stream: "LANDMARKS:landmarks"
//   input_stream: "LETTERBOX_PADDING:letterbox_padding"
//   output_stream: "LANDMARKS:adjusted_landmarks"
// }
class LandmarkLetterboxRemovalCalculator : public CalculatorBase {
 public:
  static ::mediapipe::Status GetContract(CalculatorContract* cc) {
    RET_CHECK(cc->Inputs().HasTag(kLandmarksTag) &&
              cc->Inputs().HasTag(kLetterboxPaddingTag))
        << "Missing one or more input streams.";

    cc->Inputs().Tag(kLandmarksTag).Set<std::vector<NormalizedLandmark>>();
    cc->Inputs().Tag(kLetterboxPaddingTag).Set<std::array<float, 4>>();

    cc->Outputs().Tag(kLandmarksTag).Set<std::vector<NormalizedLandmark>>();

    return ::mediapipe::OkStatus();
  }

  ::mediapipe::Status Open(CalculatorContext* cc) override {
    cc->SetOffset(TimestampDiff(0));

    return ::mediapipe::OkStatus();
  }

  ::mediapipe::Status Process(CalculatorContext* cc) override {
    // Only process if there's input landmarks.
    if (cc->Inputs().Tag(kLandmarksTag).IsEmpty()) {
      return ::mediapipe::OkStatus();
    }

    const auto& input_landmarks =
        cc->Inputs().Tag(kLandmarksTag).Get<std::vector<NormalizedLandmark>>();
    const auto& letterbox_padding =
        cc->Inputs().Tag(kLetterboxPaddingTag).Get<std::array<float, 4>>();

    const float left = letterbox_padding[0];
    const float top = letterbox_padding[1];
    const float left_and_right = letterbox_padding[0] + letterbox_padding[2];
    const float top_and_bottom = letterbox_padding[1] + letterbox_padding[3];

    auto output_landmarks =
        absl::make_unique<std::vector<NormalizedLandmark>>();
    int i = 0;
    float x0 = 0;
    float y0 = 0;
    float z0 = 0;

    float x2 = 0;
    float y2 = 0;
    float z2 = 0;

    float x3 = 0;
    float y3 = 0;
    float z3 = 0;

    float x4 = 0;
    float y4 = 0;
    float z4 = 0;

    float x5 = 0;
    float y5 = 0;
    float z5 = 0;

    float x6 = 0;
    float y6 = 0;
    float z6 = 0;

    auto openhand = false;
    for (const auto& landmark : input_landmarks) {

      NormalizedLandmark new_landmark;
      const float new_x = (landmark.x() - left) / (1.0f - left_and_right);
      const float new_y = (landmark.y() - top) / (1.0f - top_and_bottom);

      new_landmark.set_x(new_x);
      new_landmark.set_y(new_y);
      // Keep z-coord as is.
      new_landmark.set_z(landmark.z());
      // std::cout << new_landmark.x();
      
      output_landmarks->emplace_back(new_landmark);
      if(i==0){
        x0 = landmark.x();
        y0 = landmark.y();
        z0 = landmark.z();
      }
      if(i==4){
        x2 = landmark.x();
        y2 = landmark.y();
        z2 = landmark.z();
      }
      if(i==8){
        x3 = landmark.x();
        y3 = landmark.y();
        z3 = landmark.z();
      }
      if(i==12){
        x4 = landmark.x();
        y4 = landmark.y();
        z4 = landmark.z();
      }
      if(i==16){
        x5 = landmark.x();
        y5 = landmark.y();
        z5 = landmark.z();
      }
      if(i==20){
        x6 = landmark.x();
        y6 = landmark.y();
        z6 = landmark.z();
      }
      i++;
    }
    float vx1 = x2-x0;
    float vy1 = y2-y0;
    float vz1 = z2-z0;
    float vx2 = x3-x0;
    float vy2 = y3-y0;
    float vz2 = z3-z0;

    float dot1 = vx1*vx2 + vy1*vy2 +vz1*vz2;    
    float lenv1 = vx1*vx1 + vy1*vy1 + vz1*vz1;
    float lenv2 = vx2*vx2 + vy2*vy2 + vz2*vz2;
    float angle1 = acos(dot1/sqrt(lenv1 * lenv2));
    // float det1 = vx1*vy2 - vy1*vx2;
    // float angle1 = atan2(det1, dot1);

    float vx3 = x4-x0;
    float vy3 = y4-y0;
    float vz3 = z4-z0;

    float dot2 = vx2*vx3 + vy2*vy3 + vz2*vz3; 
    float lenv3 = vx3*vx3 + vy3*vy3 + vz3*vz3;
    float angle2 = acos(dot2/sqrt(lenv2 * lenv3));
    // float det2 = vx2*vy3 - vy2*vx3;
    // float angle2 = atan2(det2, dot2);

    float vx4 = x5-x0;
    float vy4 = y5-y0;
    float vz4 = z5-z0;

    float dot3 = vx3*vx4 + vy3*vy4 + vz3*vz4;  
    float lenv4 = vx4*vx4 + vy4*vy4 + vz4*vz4;
    float angle3 = acos(dot3/sqrt(lenv3 * lenv4));  
    // float det3 = vx3*vy4 - vy3*vx4;
    // float angle3 = atan2(det3, dot3);

    float vx5 = x6-x0;
    float vy5 = y6-y0;
    float vz5 = z6-z0;

    float dot4 = vx4*vx5 + vy4*vy5 +vz4*vz5;    
    float lenv5 = vx5*vx5 + vy5*vy5 + vz5*vz5;
    float angle4 = acos(dot4/sqrt(lenv4 * lenv5));
    // float det4 = vx4*vy5 - vy4*vx5;
    // float angle4 = atan2(det4, dot4);

    float angle = angle1 + angle2 + angle3 + angle4;


    // float dot = a[0]*b[0] + a[1]*b[1];
    // float det = a[0]*b[1] - a[1]*b[0];
    // float angle = atan2(det, dot);

    if(abs(angle)>1.1){
      std::cout << "hand is open" << std::endl;
      openhand = true;
    }else {
      openhand = false;
    }

      // int k = abs( sqrt( (double)x + (double)y ) );
      
      
//    cc->Outputs()
//        .Tag(kOpen).Add(openhand, cc->InputTimestamp());
    cc->Outputs()
        .Tag(kLandmarksTag)
        .Add(output_landmarks.release(), cc->InputTimestamp());
    return ::mediapipe::OkStatus();
  }
};
REGISTER_CALCULATOR(LandmarkLetterboxRemovalCalculator);

}  // namespace mediapipe

@mgyong Is this a good direction ? Now I have to figure out how I can surface this info out in objective-c. Is there a simple way to catch that in objective-c to work with some iOS native stuff ?

@gabrielstuff
Copy link

hello,
It is really interesting to see so many people looking for the matching algorithm. Looking at what mediapipe output for hand gives, I feel like the same idea as in the following article could be use :

https://medium.com/tensorflow/move-mirror-an-ai-experiment-with-pose-estimation-in-the-browser-using-tensorflow-js-2f7b769f9b23#3965

The landmark of the hand are not that far away from the ones of a body. So I do not see why the Cosine similarity methodology or the weighted matching methodology would not work.

Finally they use a Vantage point tree algorithm to find the closest distance with all other landmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
legacy:hands Hand tracking/gestures/etc
Projects
None yet
Development

No branches or pull requests

7 participants