Skip to content

Latest commit



112 lines (71 loc) · 3.87 KB

File metadata and controls

112 lines (71 loc) · 3.87 KB

Posebox :

A machine learning approach for pose estimation of hand-drawn marker


Pose estimation is a costly operation and often requires additional hardware like depth sensor for accurate plane detection. This project is an attempt to build a computationally cheap, no additional hardware dependency and realtime pose estimation of a fixed hand-drawn marker.

(cause let's be honest, not everyone owns a printer)


Checkout requirements.txt for specifics

The project is structured per the following layout.

This means you will have to create data and it's sub folders accordingly


The training data contains a specific marker hand drawn on paper and annotated always in a particular order.

For example:

The data is created using a video file captured from a mobile device of the hand drawn marker on paper.

The data flow pipeline is as follows:

Tool used:

VGG Image Annotator is used for point annotations


Model: "sequential"
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 511, 511, 3)       39        
max_pooling2d (MaxPooling2D) (None, 255, 255, 3)       0         
conv2d_1 (Conv2D)            (None, 254, 254, 3)       39        
max_pooling2d_1 (MaxPooling2 (None, 127, 127, 3)       0         
mobilenetv2_1.00_224 (Model) (None, 4, 4, 1280)        2257984   
flatten (Flatten)            (None, 20480)             0         
dense (Dense)                (None, 64)                1310784   
dense_1 (Dense)              (None, 32)                2080      
dense_2 (Dense)              (None, 16)                528       
dense_3 (Dense)              (None, 8)                 136       
Total params: 3,571,590
Trainable params: 1,313,606
Non-trainable params: 2,257,984


Training was done on manually captured and annotated dataset containing 325 images. Model was trained using Google Colab and checkpoints were saved on google drive.

Accuracy and Loss:


Images are resized to 512 x 512 before being fed to the model. The results are regression coordinates in float between 0 to 1 which are then scaled as per the original dimension of the image.

Result from the model:

The points of the marker are numbered in the same manner as the annotation.

It is to note that the order of points in crucial and therefore must be ensured in the annotation process as well.


We use SemVer for versioning. For the versions available, see the tags on this repository.


Sanjeev Tripathi

Harshini Gudipally


This project is licensed under the MIT License - see the file for details