NOTE : This repository is no longer maintained. Development is being continued on ofnote/gestop.
Built on top of mediapipe, this project aims to be a tool to interact with a computer through hand gestures. Out of the box, using this tool, it is possible to:
- Use your hand to act as a replacement for the mouse.
- Perform hand gestures to control system parameters like screen brightness, volume etc.
In addition, it is possible to extend and customize the functionality of the application in numerous ways:
- Remap existing hand gestures to different functions in order to better suit your needs.
- Create custom functionality through the use of either python functions or shell scripts.
- Collect data and create your own custom gestures to use with existing gestures.
As well as mediapipe's own requirements, there are a few other libraries required for this project.
- ZeroMQ - The zeromq library (libzmq.so) must be installed and symlinked into this directory. The header only C++ binding cppzmq must also be installed and its header (zmq.hpp) symlinked into the directory.
- pyzmq
- protobuf
- pynput
- pytorch
- pytorch-lightning
- xdotool
- Clone mediapipe and set it up. Make sure the provided hand tracking example is working.
- Clone this repo in the top level directory of mediapipe. Install all dependencies.
- Download the
models/
folder from the link above and place it in thegestop/
directory. - Run the instructions below to build and then execute the code.
Note: Run build instructions in the mediapipe/
directory, not inside this directory.
Note: Python dependencies can be installed simply by creating a virtual environment and running pip install -r requirements.txt
bazel build -c opt --verbose_failures --copt -DMESA_EGL_NO_X11_HEADERS --copt -DEGL_NO_X11 gestop:hand_tracking_gpu
GLOG_logtostderr=1 bazel-bin/gestop/hand_tracking_gpu --calculator_graph_config_file=gestop/hand_tracking_desktop_live.pbtxt
bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 gestop:hand_tracking_cpu
GLOG_logtostderr=1 bazel-bin/gestop/hand_tracking_cpu --calculator_graph_config_file=gestop/hand_tracking_desktop_live.pbtxt
python gestop/gesture_receiver.py
The hand keypoints are detected using google's mediapipe. These keypoints are then fed into gesture_receiver.py
through zmq. The tool recognizes two kinds of gestures:
- Static Gestures : Gestures whose meaning can be inferred from a single image itself.
- Dynamic Gestures : Gestures which can only be understood through a sequence of images i.e. a video.
Static gestures, by default, are mapped to all functionality relevant to the mouse, such as left mouse click, scroll etc. Combined with mouse tracking, this allows one to replace the mouse entirely. The mouse is tracked simply by moving the hand, where the tip of the index finger reflects the position of the cursor. The gestures related to the mouse actions are detailed below. To train the neural network to recognize static gestures, a dataset was created manually for the available gestures.
For more complicated gestures involving the movement of the hand, dynamic gestures can be used. By default, it consists of various other actions to interface with the system, such as modifying screen brightness, switching workspaces, taking screenshots etc. The data for these dynamic gestures comes from SHREC2017 dataset. Dynamic gestures are detected by holding down the Ctrl
key, which freezes the cursor, performing the gesture, and then releasing the key.
The project consists of a few distinct pieces which are:
- mediapipe executable - A modified version of the hand tracking example given in mediapipe, this executable tracks the keypoints, stores them in a protobuf, and transmits them using ZMQ.
- Gesture Receiver - See
gesture_receiver.py
, responsible for handling the ZMQ stream and utilizing the following modules. - Mouse Tracker - See
mouse_tracker.py
, responsible for moving the cursor using the position of the index finger. - Gesture Recognizer - See
gesture_recognizer.py
, takes in the keypoints from the mediapipe executable, and converts them into a high level description of the state of the hand, i.e. a gesture name. - Gesture Executor - See
gesture_executor.py
, uses the gesture name from the previous module, and executes an action.
- For best perforamnce, perform dynamic gestures with right hand only, as all data from SHREC is right hand only.
- For dynamic gestures to work properly, you may need to change the keycodes being used in
gesture_executor.py
. Use the givenfind_keycode.py
to find the keycodes of the keys used to change screen brightness and volumee. Finally, system shortcuts may need to be remapped so that the shortcuts work even with the Ctrl key held down. For example, in addition to the usual default behaviour of<Prnt_Screen>
taking a screenshot, you may need to add<Ctrl+Prnt_Screen>
as a shortcut as well.
gestop is highly customizable and can be easily extended in various ways. The existing gesture-action pairs can be remapped, new actions can be defined (either a python function or a shell script, opening up a world of possiiblity to interact with your computer), and finally, if you so desire, you can capture data to create your own gestures and retrain the network to utilize your own custom gestures. The ways to accomplish the above are briefly described in this section.
The default gesture-action mappings are stored in data/action_config.json
. The format of the config file is:
{'gesture_name':['type','func_name']}
Where, gesture_name is the name of the gesture that is detected, type is either sh
(shell) or py
(python). If the type is py
, then func_name
is the name of a python function and if the type is sh
, then func_name
is either a shell command or a shell script (./path/to/shell_script.sh
). Refer data/action_config.json
and gesture_executor.py
for more details.
It is encouraged to make all custom configuration in a new file rather than replace the original. So, before your modifications, copy data/action_config.json
and create a new file. After your modifications are done in the new file, you can run the application with your custom config using python gesture_receiver.py --config-path my_custom_config.json
To remap functionality, all you need to do is swap the values (i.e. ``['type','func_name']) for the gestures you wish to remap. As an example if you wish to take a screenshot with
Swipe +`, instead of `Grab`, the configuration would change from:
"Grab" : ["py", "take_screenshot"],
"Swipe +" : ["py", "no_func"],
To,
"Grab" : ["py", "no_func"],
"Swipe +" : ["py", "take_screenshot"],
Adding new actions is a similar process to remapping gestures, except for the additional step of defining your python function/shell command. As a simple example, if you wish to type your username on performing Pinch
, the first step would be to write the python function in user_config.py
. The function would be something similar to the following:
def print_username(self, S):
''' Prints username '''
self.keyboard.type("sriramsk1999")
return S
Where S
represents the State and is passed to all the functions in user_config.py
. Refer user_config.py
and config.py
to see more examples of how to add new actions.
Finally, replace the existing Pinch
mapping with your own in your configuration file.
"Pinch" : ["py", "print_username"],
To extend this application and create new gestures, there are a few prerequisites. Firstly, download the data from from the dataset link given above, and place in the data/
directory. This is to ensure that your model has all existing data along with the new data to train on.
You can either record a new static gesture or a dynamic gesture, with the static_data_collection.py
and dynamic_data_collection.py
scripts respectively.
To collect data for a new static gesture, run the program, enter the name of the gesture and the hand with which you will be performing the gesture. Run the mediapipe executable and hold the gesture while data is collected. A 1000 samples are collected which should take a minute or two. Hold your hand in the same pose in good lighting to ensure the model gets clean data.
To collect data for a new dynamic gesture, the process is mostly similar. Run the dynamic_data_collection.py
program, enter the name of the gesture and run the mediapipe executable. Data is collected only when the Ctrl key is held down, so to collect a single sample, hold the ctrl key, perform the gesture and then release. Repeat this process a few dozen times to collect enough data.
The next step is to retrain the network, using the static_train_model.py
or the dynamic_train_model.py
script depending on the new gesture. Finally, add the new gesture-action mapping to the configuration file. And that's it! Your new gesture is now part of gestop.
Gesture name | Gesture Action | Image |
---|---|---|
seven | Left Mouse Down | |
eight | Double Click | |
four | Right Mouse Down | |
spiderman | Scroll | |
hitchhike | Mode Switch |
Repo Overview
- models -> Stores the trained model(s) which can be called by other files for inference
- proto -> Holds the definitions of the protobufs used in the project for data transfer
- BUILD -> Various build instructions for Bazel
-
static_data_collection.py
-> Script to create a custom static gesture dataset. -
dynamic_data_collection.py
-> Script to create a custom dynamic gesture dataset. -
data/static_gestures_mapping.json
-> Stores the encoding of the static gestures as integers -
data/dynamic_gestures_mapping.json
-> Stores the encoding of the dynamic gestures as integers -
data/static_gestures_data.csv
-> Dataset created with data_collection.py -
data/action_config.json
-> Configuration of what gesture maps to what action. -
hand_tracking_desktop_live.pbtxt
-> Definition of the mediapipe calculators being used. Check out mediaipe for more details. -
hand_tracking_landmarks.cc
-> Source code for the mediapipe executable. GPU version is Linux only. -
model.py
-> Definition of the models used. -
static_train_model.py
-> Trains the "GestureNet" model for static gestures and saves to disk -
dynamic_train_model.py
-> Trains the "ShrecNet" model for dynamic gestures and saves to disk -
find_keycode.py
-> A sample program from pynput used to find the keycode of the key that was pressed. Useful in case the brightness and audio keys vary. -
gesture_receiver.py
-> Handles the stream of data coming from the mediapipe executable by passing it to the various other modules. -
mouse_tracker.py
-> Functions which implement mouse tracking. -
gesture_recognizer.py
-> Functions which use the trained neural networks to recognize gestures from keypoints. -
gesture_executor.py
-> Functions which implement the end action with an input gesture. E.g. Left Click, Reduce Screen Brightness -
config.py
-> Stores the configuration and state of the application in dataclasses for easy access. -
user_config.py
-> Stores the definition of all the actions that will be executed when a particular gesture is detected.