Skip to content

mducducd/VQ_DDIM

Repository files navigation

Data

Data organization (data/) is listed below: data/partition/modality/site/chat_index/person_index/clip_index/actual_data_files The example of data structure.

data
├── test
├── val
├── train
   ├── Video_files
       ├── NoXI
           ├── 010_2016-03-25_Paris
               ├── Expert_video
               ├── Novice_video
                   ├── 1
                       ├── 1.png
                       ├── ....
                       ├── 751.png
                   ├── ....
           ├── ....
       ├── RECOLA
   ├── Audio_files
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.wav
                   ├── ....
           ├── group-2
           ├── group-3
   ├── Emotion
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.csv
                   ├── ....
           ├── group-2
           ├── group-3
   ├── 3D_FV_files
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.npy
                   ├── ....
           ├── group-2
           ├── group-3
            
  • The task is to predict one role's reaction ('Expert' or 'Novice', 'P25' or 'P26'....) to the other ('Novice' or 'Expert', 'P26' or 'P25'....).
  • 3D_FV_files involve extracted 3DMM coefficients (including expression (52 dim), angle (3 dim) and translation (3 dim) coefficients.
  • The frame rate of processed videos in each site is 25 (fps = 25), height = 256, width = 256. And each video clip has 751 frames (about 30s), The samping rate of audio files is 44100.
  • The csv files for baseline training and validation dataloader are now avaliable at 'data/train.csv' and 'data/val.csv'

Environment Pytorch3D is priority to install first [Pytorch3D](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md)

Our setup followed by [Code]

  • Python 3.9
  • PyTorch 1.9+
  • CUDA 11.6
Training

BeLFusion

  • First train the Vector quantized variational autoencoder (VQ-VAE):
python train_vq_belfusion.py config=config/1_belfusion_vae.yaml name=All_VQVAEv2_W50
  • Once finished, you will be able to train the offline/online variants of BeLFusion with the desired value for k:
python train_belfusion.py config=config/2_belfusion_ldm.yaml name=<NAME> arch.args.k=<INT (1 or 10)> arch.args.online=<BOOL>

Validation

python evaluate.py  --resume ./results/train_offline/best_checkpoint.pth  --gpu-ids 1  --outdir results/val_offline --split val

  or

python evaluate.py  --resume ./results/train_online/best_checkpoint.pth  --gpu-ids 1  --online --outdir results/val_online --split val
  • For computing FID (FRRea), run the following script:
python -m pytorch_fid  ./results/val_offline/fid/real  ./results/val_offline/fid/fake

Supplement

evaluate_no_render.py for Belfusion metric evaluations only (audio-visual are not loaded)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages