We compare our results with some popular frameworks and official releases in terms of speed and accuracy.
Here we compare our MMPose repo with other pose estimation toolboxes in the same data and model settings.
To ensure the fairness of the comparison, the comparison experiments were conducted under the same hardware environment and using the same dataset. For each model setting, we kept the same data pre-processing methods to make sure the same feature input. In addition, we also used Memcached, a distributed memory-caching system, to load the data in all the compared toolboxes. This minimizes the IO time during benchmark.
The time we measured is the average training time for an iteration, including data processing and model training. The training speed is measure with s/iter. The lower, the better.
We demonstrate the superiority of our MMPose framework in terms of speed and accuracy on the standard COCO keypoint detection benchmark. The mAP (the mean average precision) is used as the evaluation metric.
Model | Input size | MMPose (s/iter) | HRNet (s/iter) | MMPose (mAP) | HRNet (mAP) |
---|---|---|---|---|---|
resnet_50 | 256x192 | 0.28 | 0.64 | 0.718 | 0.704 |
resnet_50 | 384x288 | 0.81 | 1.24 | 0.731 | 0.722 |
resnet_101 | 256x192 | 0.36 | 0.84 | 0.726 | 0.714 |
resnet_101 | 384x288 | 0.79 | 1.53 | 0.748 | 0.736 |
resnet_152 | 256x192 | 0.49 | 1.00 | 0.735 | 0.720 |
resnet_152 | 384x288 | 0.96 | 1.65 | 0.750 | 0.743 |
hrnet_w32 | 256x192 | 0.54 | 1.31 | 0.746 | 0.744 |
hrnet_w32 | 384x288 | 0.76 | 2.00 | 0.760 | 0.758 |
hrnet_w48 | 256x192 | 0.66 | 1.55 | 0.756 | 0.751 |
hrnet_w48 | 384x288 | 1.23 | 2.20 | 0.767 | 0.763 |
- 8 NVIDIA Tesla V100 (32G) GPUs
- Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
- Python 3.7
- PyTorch 1.4
- CUDA 10.1
- CUDNN 7.6.03
- NCCL 2.4.08