Human Pose estimation

This framework estimates the human pose on an image. The parts of the human body used in this project are shown in the following image:

More information regarding the human pose model might be found here: [MPI-pose](https://pose.mpi-inf.mpg.de/)

For the demo purposes I took images with myself)	The resulting human pose estimation drawn over the original image

Preparing the model

The models used in this project are based on openpose project (Caffe) and PoseEstimation-CoreML (Tensorflow) The CoreML model files are not included to the repo. To create that files do the following:

Install Python and CoreML tools (Python 3.7.5, coremltools 3.1)
Run CoreMLModels/download.sh
Make changes in the file multiPoseModel/mpi/pose_deploy_linevec_faster_4_stages_fixed_size.prototxt:

input_dim: 1 # This value will be defined at runtime ->  input_dim: 512
input_dim: 1 # This value will be defined at runtime ->  input_dim: 512

Run CoreMLModels/convert.sh. Upon successful execution the following CoreML files will be created: PoseMNV2_Single_14.mlmodel, PoseCNN_Multi_15.mlmodel. The model PoseMNV2_Single_14 is used to fast inferring of a single person on the image. The PoseCNN_Multi_15 model is used to do more sophisticated inferring of all presented human bodies on the image with significantly slower performance.

The above mentioned .prototxt contains hardcoded values to have a fixed size of an input image: input_dim: XXX - corresponds to the with of the NN input. input_dim: XXX - corresponds to the height of the NN input. When changing thes evalues do not forget to change the model configuration ModelConfigurationCNNMulti15.inputSize to a specified input value and use this configuration instead of an existing one in the framework which sets 512x512 as an input size.

Any values will work but the best results could be achieved if an aspect ratio matches the one that an original image has. Also, it should be taken into account that bigger values will affect the performance significantly which is shown in the Performance.

Run the demo app in Xcode

To run the demo the Cocoapods dependencies should be installed first. Run the following command in the Terminal app:

> cd <project-root-location>/pose
> pod install

Once the dependencies are installed open the pose.xcworkspace file in the Xcode. Select the poseDemo target and press build and Run button.

Neural network output details

The output of the MPI15 model is a group of matrices whith dimensions (input_image_width / 8, input_image_height / 8). Each element in the matrix has float type. Mapping between matrix index in the output and the body part:

POSE_MPI_BODY_PARTS {
{0,  "Head"},
{1,  "Neck"},
{2,  "RShoulder"},
{3,  "RElbow"},
{4,  "RWrist"},
{5,  "LShoulder"},
{6,  "LElbow"},
{7,  "LWrist"},
{8,  "RHip"},
{9,  "RKnee"},
{10, "RAnkle"},
{11, "LHip"},
{12, "LKnee"},
{13, "LAnkle"},
{14, "Chest"},
{15, "Background"}
};

Heatmaps and PAFs

There are two types of output matrices in the PoseCNN_Multi_15 model. The ones that represent heatmaps and the others that represent PAFs. Each heat matrix corresponds to one joint part which is 15 in total. The PAF matrices represent body connections. For each body connection, there is X and Y matrix which is 28 in total (14 + 14). The total amount of matrices including the one that represents a background is 44. The output of the single person model PoseMNV2_Single_14 contains heatmaps and does not contain neither PAF's matrices nor a background layer.

Demo project

The repository also contains a demo project 'poseDemo' that demonstrates usage of the framework.

Sample	Images
Human pose result:	Heatmaps combined into one image. Each joint has its own color:

PAFs combined into one image:	All heatmap candidates. Each candidate has its own confidence which defines its opacity on the image:

Closer look at heatmap candidates corresponding a head:	Closer look at heatmap candidates corresponding to a neck:

PAF matrix which corresponds to a head neck connection candidate. The head, neck heatmap joints are shown also on the image:	PAF matrix which corresponds to a LShoulder, LElbow connection candidate. The LShoulder-LElbow heatmap joints are shown also on the image:

Performance

Time to process one frame (1-2 persons in the view)

NN input size	iPhone XR (ms)	iPhone 8 (ms)	iPhone 5S (ms)
CoreML
512 x 512	190	3670	20801
256 x 256	70	1039	7162
Post-processing
512 x 512	19	67	100
256 x 256	5	35
Total
512 x 512	219	3737	20901
256 x 256	75	1074	7200

All numbers shown above could vary for each particular run.

The resulting pose depending on the NN input size (the smaller and faster the less accurate result is)

512 x 512	256 x 256

Applications

Healthcare

Detecting anomalies in the human spine on still images:
Health and fitness guide.

Home security and automation (not related to mobile phones)

Detecting if people at home and check if all the equipment is switched off (iron/owen).
Locating people inside the living area and do automation (turn on lights/music/tv)

Improvements

NMS optimization. A parallel GPU implementation using METAL API.
Use a different approximation for joints connection that is closer to real-life skeleton bones. Bones are not straight.
Implement more robust filtering for the output pose to get rid of artifacts.
Implement a pose estimation on a video stream

In-Depth information

Some fun


The image was taken from Magic Poser

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
pose		pose
sample-images		sample-images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
icon-rounded.png		icon-rounded.png
icon-rounded.svg		icon-rounded.svg
icon.png		icon.png
icon.svg		icon.svg
presentation-transcript.rtf		presentation-transcript.rtf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Pose estimation

Preparing the model

Run the demo app in Xcode

Neural network output details

Heatmaps and PAFs

Demo project

Performance

Time to process one frame (1-2 persons in the view)

The resulting pose depending on the NN input size (the smaller and faster the less accurate result is)

Applications

Healthcare

Home security and automation (not related to mobile phones)

Improvements

In-Depth information

Some fun

About

Releases

Packages

Languages

License

rdv0011/pose

Folders and files

Latest commit

History

Repository files navigation

Human Pose estimation

Preparing the model

Run the demo app in Xcode

Neural network output details

Heatmaps and PAFs

Demo project

Performance

Time to process one frame (1-2 persons in the view)

The resulting pose depending on the NN input size (the smaller and faster the less accurate result is)

Applications

Healthcare

Home security and automation (not related to mobile phones)

Improvements

In-Depth information

Some fun

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages