Ctrl-CIC: Directing the Visual Narrative through User-Defined Highlights

Repository for "Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights" in ECCV 2024.

Setup

Download the WikiWeb2M dataset from WikiWeb2M.
Update your local path variables in env_definer.sh and setup them according to the ReadME file.
Run preprocess/launch_preprocess_tasks.py with task flag 'download_img' and 'extract_txt' respectively, to download the images required for the dataset (around 3~4 TB) and extract the text data locally.
Some preprocessed data are provided, including
- .csv files specifying the training, validation and testing splits (The training split is large and provided here).
- Extracted highlights for inference during evaluation.
- GRIT image captions facilitating text-based GPT-4 Ctrl-CIC caption generation.
Remeber to move these files to the corresponding local path you specified.
Extract CLIP image feature with scripts/extract_image_features.py for efficient local training and evaluation.
Generate relevance scores for pseudo training highlights with scripts/mask_generation.py

Finetune

Run the training program with the corresponding config file, for example,
python cli/train.py --config experiments/finetune/longt5.yaml

Inference

For traditional CIC tasks, refer to eval_configs. Update the run_id according to your local checkpoints and run the inference scripts.
python cli/eval.py --config experiments/eval_configs/eval_full.yaml. CIC performance will be recorded during inference.
For Ctrl-CIC tasks, first generate Ctrl-CIC captions with ccic_eval_configs.

Evaluation

The Ctrl-CIC captions can be evaluated as follows:

CLIPScore and CLIPScore-Sentence by setting use_clip_score and use_sent_score and load_predictions to True, and run the evaluation scripts again.
Recall, with python scripts/calculate_recall.py
Diversity, with python scripts/diversity_eval.py
GPT-4(V) empowered metrics, by
- First generate jsons files to be uploaded with python scripts/generate_prompt.py --task eval
- Update your openai key here
- Run python scripts/query_response.py for GPT-4(V) API call
- Run python scripts/get_gpt_scores.py to compute the GPT-4(V) evaluation metrics scores.

Pretrained Weights

The pretrained weights are avaliable at huggingface.

Demo

For interactive Ctrl-CIC demo, you can run python scripts/rctrl_inference.py which allows flexible selection of the highlights and image. A similar program is provided for p-ctrl, but the output is shown on the command line.

Acknowledgement

The dataset and data loading implementation is based on the code provided in WikiWeb2M.

Citation

@InProceedings{Mao_2024_ECCV,
    author    = {Mao, Shunqi and Zhang, Chaoyi and Su, Hang and Song, Hwanjun and Shalyminov, Igor and Cai, Weidong},
    title     = {Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights},
    booktitle = {Proceedings of the 18th European Conference on Computer Vision (ECCV)},
    year      = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
api_mmwebpage		api_mmwebpage
cli		cli
data		data
experiments		experiments
preprocess		preprocess
prompts		prompts
scripts		scripts
src		src
starter		starter
utils		utils
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ctrl-CIC: Directing the Visual Narrative through User-Defined Highlights

Setup

Finetune

Inference

Evaluation

Pretrained Weights

Demo

Acknowledgement

Citation

About

Releases

Packages

Languages

ShunqiM/Ctrl-CIC

Folders and files

Latest commit

History

Repository files navigation

Ctrl-CIC: Directing the Visual Narrative through User-Defined Highlights

Setup

Finetune

Inference

Evaluation

Pretrained Weights

Demo

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages