Skip to content

[BMVC 2024] Official Implementation of the paper guided attention for interpretable motion captioning

License

Notifications You must be signed in to change notification settings

rd20karim/M2T-Interpretable

Repository files navigation

Description

[BMVC 2024] Official implementation of our paper for interpretable motion to text generation:

GIF 1

If you find this code or paper useful in your work, please cite:

@INPROCEEDINGS{radouane2024guided,
      title={Guided Attention for Interpretable Motion Captioning}, 
      author={Karim Radouane and Andon Tchechmedjiev and Sylvie Ranwez and Julien Lagarde},
      booktitle = {Proceedings of the 35th British Machine Vision Conference},
      year={2024}
}

Quick start

conda env create -f environment.yaml
conda activate wbpy310
python -m spacy download en-core-web-sm

You need also to install wandb for hyperparameters tuning: pip install wandb

Preprocess datasets

For both HumanML3D and KIT-MLD (augmented versions) you can follow the steps here: project link

Models weights

You can download the best models here: models_weights

Dataset Human-ML3D Human-ML3D Human-ML3D Human-ML3D KIT-ML KIT-ML KIT-ML KIT-ML
Run ID ge2gc507 ba9hhkji v6tv9rsx hnjlc7r6 u8qatt2y yxitfbp7 ton5mfwh lupw04om
Attention supervision [0,0] [0,3] [2,3] [3,3] [0,0] [0,3] [2,3] [1,3]

Attention supervision parameters respectively refers to spatial and adaptive attention guidance weights.

Code functionalities

  • All given arguments should correspond to the selected path of model weights in evaluation mode.
  • When providing a predefined config file it not necessary to give all the information, it will be inferred.
  • For each specified argument bellow all available choices can be displayed by running:
python name_script.py --help

Training with hyperparameters tuning

To tune hyperparameters and visualize training progress we use Wandb.

The configuration of hyperparameters defining the search space could directly be set in the file ./configs/LSTM_{dn}.yaml where dn is dataset_name (kit or h3D).

python train_wandb.py --config {config_path} --dataset_name {dataset_name}
  • The config path specify the model to train and the hyperparameters to experiment, other values can be added by changing the config file of the chosen model
  • SEED is fixed to ensure same model initialization across runs for reproducibility.
  • Replace variables project_path and aug_path with your absolute data paths.

Evaluation

python evaluate_m2L.py --config {config_path} --dataset_name {dataset_name}
  • This script will save various results and model predictions, and compute NLP scores at the batch level. For corpus-level evaluation, use the following:

NLP scores

After saving the models predictions you can run this script that computes the different NLP metrics and store them in Latex Format.

python nlg_evaluation.py

Visualizations

Attention maps+Pose animation

Generate skeleton animation and attention maps (adaptive+spatio-temporal):

set PYTHONPATH=Project_Absolute_Path
python visualizations/visu_word_attentions.py --path PATH --n_map NUMBER_ATTENTION_MAP --n_gifs NUMBER_3D_ANIMATIONS --save_results DIRECTORY_SAVE_PLOTS

The directory to which save the visualization could be set in the .yaml file for evaluation or given as argument --save_results path

Animation examples

  • The transparency level of the gold box represents the temporal attention variation for each predicted motion word selected based on adaptive attention.
  • The disk radius of each keypoint indicates the magnitude of the corresponding spatial attention weight.
GIF 1 GIF 2 GIF 3

Interpretability analysis

GIF 1

The following steps can be explored for interpret-ability analysis:

  • Adaptive gate density

You can provide a list of language words you want to display the density curves for in the visu_word_attention.py

  • Motion words histograms

You can provide a list of motion words you want to display the Histogram plots for in the visu_word_attention.py

  • Spatio-temporal maps

The following script will generate spatio-temporal attentions maps as well as gates density distributions and motion words histograms, the figures are saved in the visualizations/model_name/ folder.

python visualizations/visu_word_attention.py --path model_path --dataset_name dataset_name --n_map NUMBER_ATTENTION_MAP --n_gifs NUMBER_3D_ANIMATIONS

Reproduce paper evaluations

You only need to provide the model path preferably to put in ./models/model_name , then all the config information is inferred from the metadata stored within the model.

Write and save model predictions

Run to save the entire predictions on test set for each model for BLEU corpus-level evaluations.

python src/evaluate_m2L.py --path {model_path}

The config path format: f./configs/LSTM_{dataset_name}

Beam search

Beam searching can simply be done by adding the argument of the beam size --beam-size

python src/evaluate_m2L.py --path model_path --beam_size BEAM_SIZE

BEAM_SIZE: (=1 equivalent to Greedy search) (>1 for beam searching)
This script will print the BLEU-4 score for each beam and write beam predictions under the file:

LSTM_{args.dataset_name}_preds_{args.lambdas}_beam_size_{beam_size}.csv

License

This code is distributed under MIT LICENSE.