This repo includes inference code for our Landmark Deep Equilibrium Model network (LDEQ), which was developed during an internship at Nvidia by myself, Pavlo Molchanov, Arash Vahdat, Hongxu Yin and Jan Kautz.
The weights we provide here were reproduced on different hardware than the hardware used for the paper experiments, and results may be slightly different.
This work uses deep equilibrium models to add a form of recurrence at test time, without having access to a recurrent loss at train time. This can be used to improve temporal coherence in video landmark detection when the model is trained on still images.
Please check out this video for a demo:
conda create -y -n ldeq python=3.8
conda activate ldeq
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install opencv-python==4.6.0.66 matplotlib==3.5.1 scipy==1.7.3 torchinfo==1.7.0 pandas
I haven't tried other versions of these librairies so use them at your own risk.
NB: if opencv
gives weird errors, pip uninstall opencv-python
and then pip install opencv-python-headless==3.5.5.64
- Download the WFLW dataset that was used for training and testing LDEQ on WFLW. We used the precropped version from the HIH repo and it can be downloaded there (WFLW.zip).
- (optional) Check dataset loading is good by running datasets/WFLW/dataset.py, after updating path there
- Download the LDEQ trained weights here. This particular model uses the Anderson acceleration solver.
- run
python test_LDEQ_WFLW.py --landmark_model_weights /path/to/final.pth.tar --dataset_path /path/to/WFLW --workers 4 --batch_size 32
Someone has kindly uploaded the official dataset in his name here. I made sure this is the same version I used for the results in the paper
While all videos used in WFLW-V have creative commons licences, nvidia has stricter internal privacy rules. As such, while I provide an official download link for landmark and bboxes annotations, I can only provide a python script to download and crop the videos yourself from youtube. This is limiting because some videos may disappear in the future. If you want to preprocess the dataset yourself do the following:
- Download official bboxes annotations here
- Download official landmark annotations here
pip install pytube==12.1.2
- Run
python utils/download_WFLW_V.py --output_folder /path/to/WFLW_V --n_processes 16
.
Note that pytube may ask you to sign in to your google account and enter some code when you first run this script, depending on the platform you're using. If things freeze after entering the code (due to multiprocessing), restart the python script. You may need to restart this script several times if your connection gets closed by youtube servers.
The folder structure should look like:
WFLW_V
|
bboxes
|
-3rDiJYQ6CQ.npy
-3uh4B-qDcs.npy
...
landmarks
|
-3rDiJYQ6CQ.npy
-3uh4B-qDcs.npy
...
videos
|
-3rDiJYQ6CQ.mp4
-3uh4B-qDcs.mp4
...
- Download the LDEQ trained weights here. This particular model uses the fixed point iteration solver to go faster. These weights are different that the ones above because they were obtained with more augmentations (in particular more crops). This helps performance on WFLW_V but worsens performance on WFLW.
- Edit/pass in args in
test_LDEQ_WFLW_V.py
and run it. You can uncomment the linesolver.test_all_videos_sequential(RWR=args.rwr, plot=True)
in themain()
funciton if you want to visualize video predictions + ground truth.
nvidia won't release it, sorry:(
If you find this work useful, please consider citing us:
@InProceedings{Micaelli_2023_CVPR,
author = {Micaelli, Paul and Vahdat, Arash and Yin, Hongxu and Kautz, Jan and Molchanov, Pavlo},
title = {Recurrence Without Recurrence: Stable Video Landmark Detection With Deep Equilibrium Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {22814-22825}
}