GitHub

Data

Data organization (data/) is listed below: data/partition/modality/site/chat_index/person_index/clip_index/actual_data_files The example of data structure.

data
├── test
├── val
├── train
   ├── Video_files
       ├── NoXI
           ├── 010_2016-03-25_Paris
               ├── Expert_video
               ├── Novice_video
                   ├── 1
                       ├── 1.png
                       ├── ....
                       ├── 751.png
                   ├── ....
           ├── ....
       ├── RECOLA
   ├── Audio_files
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.wav
                   ├── ....
           ├── group-2
           ├── group-3
   ├── Emotion
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.csv
                   ├── ....
           ├── group-2
           ├── group-3
   ├── 3D_FV_files
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.npy
                   ├── ....
           ├── group-2
           ├── group-3

The task is to predict one role's reaction ('Expert' or 'Novice', 'P25' or 'P26'....) to the other ('Novice' or 'Expert', 'P26' or 'P25'....).
3D_FV_files involve extracted 3DMM coefficients (including expression (52 dim), angle (3 dim) and translation (3 dim) coefficients.
The frame rate of processed videos in each site is 25 (fps = 25), height = 256, width = 256. And each video clip has 751 frames (about 30s), The samping rate of audio files is 44100.
The csv files for baseline training and validation dataloader are now avaliable at 'data/train.csv' and 'data/val.csv'

Environment

Pytorch3D is priority to install first [Pytorch3D](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md)

Our setup followed by [Code]

Python 3.9
PyTorch 1.9+
CUDA 11.6

Training

BeLFusion

First train the Vector quantized variational autoencoder (VQ-VAE):

python train_vq_belfusion.py config=config/1_belfusion_vae.yaml name=All_VQVAEv2_W50

Once finished, you will be able to train the offline/online variants of BeLFusion with the desired value for k:

python train_belfusion.py config=config/2_belfusion_ldm.yaml name=<NAME> arch.args.k=<INT (1 or 10)> arch.args.online=<BOOL>

Validation

python evaluate.py  --resume ./results/train_offline/best_checkpoint.pth  --gpu-ids 1  --outdir results/val_offline --split val

or

python evaluate.py  --resume ./results/train_online/best_checkpoint.pth  --gpu-ids 1  --online --outdir results/val_online --split val

For computing FID (FRRea), run the following script:

python -m pytorch_fid  ./results/val_offline/fid/real  ./results/val_offline/fid/fake

Supplement

evaluate_no_render.py for Belfusion metric evaluations only (audio-visual are not loaded)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
config		config
cub-1.10.0		cub-1.10.0
external		external
metric		metric
model		model
tool		tool
.gitignore		.gitignore
README.md		README.md
commands.txt		commands.txt
dataset.py		dataset.py
environment.yml		environment.yml
evaluate.py		evaluate.py
evaluate_no_render.py		evaluate_no_render.py
evaluate_video.py		evaluate_video.py
render.py		render.py
requirements.txt		requirements.txt
run_baselines.py		run_baselines.py
test_no_render.py		test_no_render.py
train.py		train.py
train_belfusion.py		train_belfusion.py
train_vq_belfusion.py		train_vq_belfusion.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

mducducd/VQ_DDIM

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages