ComicsPAP-Qwen2.5-VL: Sequential Narrative Understanding

Fine-tuning of Qwen2.5-VL-7B on the ComicsPAP benchmark for the "Pick A Panel" task. Achieved 66.41% Validation Accuracy, significantly outperforming zero-shot baselines.

Repository Structure

src/train_qlora.py: Training script using PEFT and QLoRA.
src/evaluate.py: Evaluation pipeline on Val/Test splits.
official_repo/: Integrated utilities from the official ComicsPAP repository.
- data_utils.py: Contains the SingleImagePickAPanel processor used for training.
slurm/: Job scripts for the Snellius Supercomputer.

Project Highlights

Model: Qwen2.5-VL-7B-Instruct (Vision-Language).
Technique: QLoRA (Rank 16, Alpha 32) for parameter-efficient fine-tuning.
Compute: Trained on 1x NVIDIA A100 (80GB) on the Snellius Supercomputer (SURFcua - TU/e).
Performance:
- Ours (7B Fine-tuned): 66.41%
- Base 72B Zero-shot: 46.88% (from paper)
- Base 7B Zero-shot: 30.53% (from paper)

Getting Started

1. Installation

git clone [https://github.com/kaj04/ComicsPAP-Project.git](https://github.com/kaj04/ComicsPAP-Project.git)
cd NOME_REPO
pip install -r requirements.txt

2. Run Evaluation

You can run the evaluation using my pre-trained adapters from Hugging Face:

python src/evaluate.py --adapter_path kaj04/Qwen2.5-VL-7B-ComicsPAP-QLoRA --split val

## Acknowledgments
This project is based on the **ComicsPAP** dataset and benchmark. Special thanks to the authors for their work in visual narrative understanding. 

- **Official Dataset:** [VLR-CVC/ComicsPAP](https://huggingface.co/datasets/VLR-CVC/ComicsPAP)
- **Official Paper:** [ComicsPAP: Understanding Comic Strips by Picking the Correct Panel](https://arxiv.org/abs/2503.08561)

Citation

@InProceedings{vivoli2025comicspap,
  author="Vivoli, Emanuele and Llabr{\'e}s, Artemis and Souibgui, Mohamed Ali and Bertini, Marco and Llobet, Ernest Valveny and Karatzas, Dimosthenis",
  editor="Yin, Xu-Cheng and Karatzas, Dimosthenis and Lopresti, Daniel",
  title="ComicsPAP: Understanding Comic Strips by Picking the Correct Panel",
  booktitle="Document Analysis and Recognition -- ICDAR 2025",
  year="2026",
  publisher="Springer Nature Switzerland",
  address="Cham",
  pages="337--350",
  isbn="978-3-032-04614-7"
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
notebooks		notebooks
official_repo		official_repo
results		results
slurm		slurm
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComicsPAP-Qwen2.5-VL: Sequential Narrative Understanding

Repository Structure

Project Highlights

Getting Started

1. Installation

2. Run Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ComicsPAP-Qwen2.5-VL: Sequential Narrative Understanding

Repository Structure

Project Highlights

Getting Started

1. Installation

2. Run Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages