Skip to content

Latest commit

 

History

History
81 lines (51 loc) · 5.19 KB

README.md

File metadata and controls

81 lines (51 loc) · 5.19 KB

MICap: A Unified Model for Identity-aware Movie Descriptions

Paper Project Page Spaces

Overview

This repository contains the official code for CVPR 2024 paper "MICap: A Unified Model for Identity-aware Movie Descriptions".

Teaser Image

Installation

Run the following commands to clone the repository

git clone https://github.com/katha-ai/MovieIdentityCaptioner-CVPR2024.git
cd MovieIdentityCaptioner-CVPR2024

Environment

Install the required conda environment by running the following command: conda env create -f conda_env.yml

Data

Create a micap_data folder and fill in the path to this folder at the data_dir flag in the config_base.yaml file. Now for all the folders below, except for the SPICE Jar file and the Checkpoints, place them in the micap_data folder and put there relative paths into the config file at their specified locations in the instructions column.

Features Instructions
Clip Features The unzipped folder path should be filled in for the input_clip_dir flag in the config_base.yaml file
Face Features The unzipped folder path should be filled in for the input_arc_face_dir flag in the config_base.yaml file
I3D Features The unzipped folder path should be filled in for the input_fc_dir flag in the config_base.yaml file
Face Clusters The unzipped file path should be filled in for the input_arc_face_clusters flag in the config_base.yaml file
MICap Json The unzipped file path should be filled in for the input_json flag in the config_base.yaml file
Bert Text Embeddings The unzipped folder path (fillin_data/bert_text_gender_embedding) should be filled in for the bert_embedding_dir flag in the config_base.yaml file
H5 label file The unzipped file path (LSMDC16_labels_fillin_new_augmented.h5) should be filled in for the input_label_h5 flag in the config_base.yaml file
Tokenizer The unzipped folder path should be filled in for the tokenizer_path flag in the config_base.yaml file
SPICE Jar file The unzipped file path should be placed in the iSPICE directory
Checkpoints Folder that contains the various checkpoints for full captioning and joint training full captioning (cider score) and fitb and joint training fitb (class accuracy)

Training

The run_type flag in the config_base.yaml file can be adjusted to determine the task (either fitb, fc only, or both) for training MICap.

Make sure the overfit and checkpoint flags are set to False. Also, ensure the path relative to the features from the data directory is correctly set in the config_base.yaml file.

Once the yaml file is set run the command:python train_mod.py

Evaluation

To evaluate a pretrained model, set the checkpoint flag to True in the config_base.yaml file.

The run_type flag in the config_base.yaml file can be adjusted to specify the task for evaluation.

Once the yaml file is set run the command:python train_mod.py

Citation

Please consider citing our paper if the project helps your research with the following BibTex:

@inproceedings{raajesh2024micap,
  title={MICap: A Unified Model for Identity-aware Movie Descriptions},
  author={Raajesh, Haran and Desanur, Naveen Reddy and Khan, Zeeshan and Tapaswi, Makarand},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14011--14021},
  year={2024}
}