LXMERT Model Compression for Visual Question Answering

This project implementation is built on the great repo of LXMERT and PyTorch code for the EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers" on VQA v2.0.

See the complete report here (Latex Template at overleaf).

Slides of project representation are available here (Google Docs).

Abstract Paper Accepted in WeCNLP is available here (paper, poster, video).

Visual Question Answering Usage

Medical Visual Question Answering

"VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019"

Answering Visual Questions from Blind People

"VizWiz Grand Challenge: Answering Visual Questions from Blind People"

Summary

Large-scale pretrained models such as LXMERT are becoming popular for learning cross-modal representations on text-image pairs for vision-language tasks. According to the lottery ticket hypothesis, NLP and computer vision models contain smaller subnetworks capable of being trained in isolation to full performance. In this project, we combine these observations to evaluate whether such trainable subnetworks exist in LXMERT when fine-tuned on the VQA task. In addition, we perform a model size cost-benefit analysis by investigating how much pruning can be done without significant loss in accuracy.

Run

Install the required packages

pip3 install -r requirements.txt

Run All Experiment

to run all experiment ,in lxmert folder run following command:

bash run/vqa_run.bash

Results

The plots are available in lxmert/result directory.
The trained models are available in lxmert/models directory.
The logs are available in lxmert/logs directory.

Plots

Low Magnitude Pruning Subnetwork

Random Pruning Subnetwork

High Magnitude Pruning Subnetwork

All Result Based on Pruning Sparcity

All Result Based on Pruning mode

Citation

@misc{hashemi2023lxmert,
      title={LXMERT Model Compression for Visual Question Answering}, 
      author={Maryam Hashemi and Ghazaleh Mahmoudi and Sara Kodeiri and Hadi Sheikhi and Sauleh Eetemadi},
      year={2023},
      eprint={2310.15325},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
lxmert		lxmert
report		report
BS_FinalProject_Presentation.pdf		BS_FinalProject_Presentation.pdf
BS_FinalProject_Report_Mahmoodi.pdf		BS_FinalProject_Report_Mahmoodi.pdf
LICENSE		LICENSE
README.md		README.md
WeCNLP_VQA.pdf		WeCNLP_VQA.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lxmert

lxmert

report

report

BS_FinalProject_Presentation.pdf

BS_FinalProject_Presentation.pdf

BS_FinalProject_Report_Mahmoodi.pdf

BS_FinalProject_Report_Mahmoodi.pdf

LICENSE

LICENSE

README.md

README.md

WeCNLP_VQA.pdf

WeCNLP_VQA.pdf

Repository files navigation

LXMERT Model Compression for Visual Question Answering

Visual Question Answering Usage

Medical Visual Question Answering

Answering Visual Questions from Blind People

Summary

Run

Install the required packages

Run All Experiment

Results

Plots

Low Magnitude Pruning Subnetwork

Random Pruning Subnetwork

High Magnitude Pruning Subnetwork

All Result Based on Pruning Sparcity

All Result Based on Pruning mode

Citation

About

Releases

Packages

Languages

License

ghazaleh-mahmoodi/lxmert_compression

Folders and files

Latest commit

History

Repository files navigation

LXMERT Model Compression for Visual Question Answering

Visual Question Answering Usage

Medical Visual Question Answering

Answering Visual Questions from Blind People

Summary

Run

Install the required packages

Run All Experiment

Results

Plots

Low Magnitude Pruning Subnetwork

Random Pruning Subnetwork

High Magnitude Pruning Subnetwork

All Result Based on Pruning Sparcity

All Result Based on Pruning mode

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages