RCRwPreview.mp4
Our head pose predictions on Biwi [8] dataset
- 29.03.2023 Added a webcam demo ([here])
- 20.03.2023 Fixed a bug for 6DRepNet evaluation. 6DRepNet was trained with BGR images, but I used RGB for the evaluation. With this update BGR images are used instead, and the performance is now similar to the paper.
Are head pose estimation results comparable? not really
We provide a comprehensive analysis of factors associated with the evaluation of head pose estimation methods.
We focus on the popular Biwi Kinect Head Pose Database (Biwi) [8] and show that different processing leads to incomparable test sets (Biwi variants).
What can you find:
- Comprehensive evaluation of head pose estimation methods on Biwi variants
- Models, checkpoints and test code for our works
- Code to reproduce and evaluate on different Biwi variants for
- Biwi+, file [3]
- Manually checked face bounding boxes for all frames of Biwi [8]
- Pose labels in RGB camera frame and z-y'-x'' rotation sequence
- Face bounding boxes and test sets (subsets) for Biwi [8] used by other authors, we call these "Biwi variants"
- A PyTorch Biwi variant dataset, file, to easily load the Biwi variants
Biwi variants: Image of different face bounding boxes used by different methods for cropping Biwi [8].
-
Do different face detectors result in different test sets?
Yes
quite drastic differences as the face detector determines a subset of the original Biwi files to be used as test set
(e.g., over 15% of Biwi images skipped for FSA-Net variant) -
Do different test sets change head pose estimation performance?
Yes
performance differences sometimes seem bigger than method related gains -
Is it important to use the same face detector for training and testing?
No/depends
we can achieve similar performance if we post process the detections of different face detection algorithms to have similar bounding boxes (produce a similar face crop) as the ones used during training (requires a known mapping)
Sometimes similar performance can be achieved with boxes from a detector not used during training (depends on method)
However, we notice that even changing the box size by one pixel can result in different results -
Does it matter in which rotation representation (Euler angle rotation order), e.g., z-y'-x'' (we call pyr) or x-y'-z'' (we call ypr), we evaluate our methods?
Yes
the results can be quite different and are not comparable -
Does correcting the pose from depth camera to RGB camera for Biwi improve results? (why do we need this?)
No
, no calibration seems to be better.
A possible explanation could be a global offset of center pose (0,0,0) between datasets. (let us know if you find an explanation) -
Is SOTA performance for Biwi on current paperswithcode leaderboard meaningful?
Not really
, e.g., Hopenet (2018), reported MAE 4.89 but achieves MAE 3.82 on the Biwi variant used by FSA-Net (2019)
Therefore, we suggest evaluation and comparison of results with precisely defined evaluation protocols and to report them. Biwi+ is a step in this direction. It provides a fixed test set with face bounding boxes for all Biwi images.
All results can be found [here]. This section is just a compilation of results on Biwi processed like Biwi+ [3] and Biwi processed like FSA-Net [3], except we generously selected the best performing face crop for each method. Some methods have better results than reported in their publications.
Method | MAE | Pitch | Yaw | Roll | Format | Test Set | Num Images | Training Set | Crop | Unsup. Training on Test Set | Calibrated Biwi |
---|---|---|---|---|---|---|---|---|---|---|---|
WHENet 2020 | 4.79 | 5.06 | 6.00 | 3.33 | ypr | Biwi (FSA-Net) | 13219 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✖ |
FSA-Net 2019 | 3.91 | 4.78 | 4.29 | 2.66 | ypr | Biwi (FSA-Net) | 13219 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✖ |
Hopenet 2018 | 3.82 | 4.75 | 3.98 | 2.73 | ypr | Biwi (FSA-Net) | 13219 | 300W_LP | Biwi+ -> Dockerface, Hopenet | ✖ | ✖ |
RCRw (proposed) 2023 | 3.63 | 4.51 | 3.78 | 2.60 | ypr | Biwi (FSA-Net) | 13219 | 300W-LP | Biwi+ (DLIB+manual) | ✔ | ✖ |
6DRepNet 2022 | 3.41 | 3.92 | 3.70 | 2.60 | ypr | Biwi (FSA-Net) | 13219 | 300W-LP | Biwi+ -> MTCNN, FSA-Net | ✖ | ✖ |
PADACO 2019 | 3.69 | 4.20 | 3.31 | 3.56 | ypr | Biwi (FSA-Net) | 13219 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✖ |
RCRw (proposed) 2023 | 3.34 | 3.91 | 3.43 | 2.68 | ypr | Biwi (FSA-Net) | 13219 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✖ |
Except Hopenet all methods perform best using Biwi+ face bounding boxes.
Method | MAE | Pitch | Yaw | Roll | Format | Test Set | Num Images | Training Set | Crop | Unsup. Training on Test Set | Calibrated Biwi |
---|---|---|---|---|---|---|---|---|---|---|---|
WHENet | 7.25 | 8.00 | 8.05 | 5.72 | pyr | Biwi+ | 15678 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✔ |
FSA-Net | 5.75 | 6.43 | 6.27 | 4.55 | pyr | Biwi+ | 15678 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✔ |
Hopenet | 5.73 | 7.65 | 5.32 | 4.21 | pyr | Biwi+ | 15678 | 300W_LP | Biwi+ -> Dockerface, Hopenet | ✖ | ✔ |
RCRw (proposed) | 4.55 | 6.34 | 4.55 | 2.74 | pyr | Biwi+ | 15678 | 300W-LP | Biwi+ (DLIB+manual) | ✔ | ✔ |
6DRepNet | 4.39 | 5.19 | 4.62 | 3.37 | pyr | Biwi+ | 15678 | 300W-LP | Biwi+ (DLIB+manual) | ✖ | ✔ |
PADACO | 4.13 | 4.51 | 4.11 | 3.78 | pyr | Biwi+ | 15678 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✔ |
RCRw (proposed) | 3.86 | 4.73 | 3.95 | 2.89 | pyr | Biwi+ | 15678 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✔ |
git clone --recurse-submodules https://github.com/kuhnkeF/headposeplus.git HeadPosePlus
cd HeadPosePlus
-
We assume a working Anaconda distribution. We use Anacondas virtual environment manager.
-
Change "path_biwi" in (hpp/BiwiDataset.py) to point to your copy of Biwi
- download Biwi Kinect Head Pose Database official website
- change "path_biwi" in (hpp/BiwiDataset.py) to your path containing the folders 01,02,03,...
- Download the model/checkpoint files, see here
chmod +x create_pytorch_env.sh
chmod +x create_tensorflow_env.sh
chmod +x eval_all.sh
./create_pytorch_env.sh
./create_tensorflow_env.sh
./eval_all.sh
To run the code we decided to use two environments:
-
A PyTorch environment for evaluation of PADACO, RCRw, Hopenet, 6DRepNet
-
A Tensorflow environment with Keras to evaluate FSA-Net and WHENet
The following scripts set up the environments and install the dependencies.
create_pytorch_env.sh
create_tensorflow_env.sh
Run this script (or check out the eval_* python files) to compute the results.
eval_all.sh
Precomputed results can be found in the /results folder.
- Unsupervised validation/model selection (when to stop training?) is another point that leads to incomparable/unfair results (this is the case for many UDA works and cross-dataset evaluation)
- Why is the original Biwi+ missing 1 image (15677 instead of 15678)?
It's the first image of the dataset (01/frame_00003_rgb.png) because the frame_00003_pose.bin file is missing in the annotations. In this updated version we simply copied the bounding box from 01/frame_00004. For our work, this does not change the results as the change of error is smaller than 0.005. - In [1] we report the mean result from 10 different models. We only provide and evaluate one of them here.
Biwi was intended to develop algorithms that work on depth images alone. The annotated poses (ground truth) are in the coordinate frame of the depth camera. The parameters (intrinsic, extrinsic) of the RGB camera in relation to the depth camera is provided by the authors. Therefore, it is possible to transform the ground truth to the RGB camera coordinate frame. A simple test to validate the pose is to render the provided head models and overlay them on top of the RGB images. Only with "calibration" the face in the image and the rendered head overlap correctly.
Please acknowledge the effort by citing the corresponding papers in your publications.
We hope our code and data helps your research.
[1]
@ARTICLE{kuhnke23_RelativePose_TBIOM,
author={Kuhnke, Felix and Ostermann, Jörn},
journal={IEEE Transactions on Biometrics, Behavior, and Identity Science},
title={Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency},
year={2023},
volume={},
number={},
pages={1-1},
doi={10.1109/TBIOM.2023.3237039}}
@INPROCEEDINGS{kuhnke21_RelativePose_FG,
title={Relative Pose Consistency for Semi-Supervised Head Pose Estimation},
author={Kuhnke, Felix and Ihler, Sontje and Ostermann, Jörn},
booktitle={16th IEEE International Conference on Automatic Face and Gesture Recognition (FG)},
year={2021},
pages={01--08},
doi={10.1109/FG52635.2021.9666992}}
@inproceedings{kuhnke19_PADACO_ICCV,
title={Deep head pose estimation using synthetic images and partial adversarial domain adaption for continuous label spaces},
author={Kuhnke, Felix and Ostermann, Jörn},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
pages={10164--10173},
year={2019}}
Our models are restricted for research purposes only. Please cite our work whenever publishing anything using our data, code, or models. Please check the licenses of the linked models and code.
[1] Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency, 2023
ieee
tnt preprint
BibTeX
[2] Kuhnke, Felix and Ihler, Sontje and Ostermann, Jörn, "Relative Pose Consistency for Semi-Supervised Head Pose Estimation", 2021
ieee
tnt
BibTeX
[3] Kuhnke, Felix and Ostermann, Jörn, "Deep head pose estimation using synthetic images and partial adversarial domain adaption for continuous label spaces", 2019
ieee
tnt
BibTeX
[4] Ruiz, Nataniel and Chong, Eunji and Rehg, James M, "Fine-Grained Head Pose Estimation Without Keypoints", 2018
ieee
cvf
arxiv
github
[5] Yang, Tsun-Yi and Chen, Yi-Ting and Lin, Yen-Yu and Chuang, Yung-Yu, "FSA-Net: Learning fine-grained structure aggregation for head pose estimation from a single image", 2019
ieee
cvf
github
[6] Hempel, Thorsten and Abdelrahman, Ahmed A. and Al-Hamadi, Ayoub, "6d Rotation Representation For Unconstrained Head Pose Estimation", 2022
ieee
arxiv
github
[7] Zhou, Yijun and Gregson, James, "WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose", 2020
bmvc
arxiv
github
[8] Fanelli, Gabriele and Dantone, Matthias and Gall, Juergen and Fossati, Andrea and Van Gool, Luc,
"Random Forests for Real Time 3D Face Analysis", 2013
springer
iai
[9] Kaipeng Zhang and Zhanpeng Zhang and Zhifeng Li and Yu Qiao, "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks", 2016
ieee
arxiv
[10] Nataniel Ruiz and James M. Rehg, "Dockerface: an Easy to Install and Use Faster R-CNN Face Detector in a Docker Container", 2017
arxiv
github
[11] Joseph Redmon and Ali Farhadi, "YOLOv3: An Incremental Improvement", 2018
arxiv