Implementation of the paper Look, Read and Ask: Learning to Ask Questions by Reading Text in Images (ICDAR-2021) (Resolving some issues with respect to evaluation code. [Issue with NLG_eval and evaluate_textvqg.py code also, the model and entire code structure was previously written in python 2.7 environment and now it is being changed to python 3.8. Will update the complete code sooner.])
-
Use pytorch 1.7.0 CUDA 10.2
-
Other requirements from 'requirements.txt'
To setup environment
# create new env
$ virtualenv -p python2.7 textvqg
# activate
$ source textvqg/bin/activate
# install pytorch, torchvision
$ conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.2 -c pytorch
# install other dependencies
$ pip install -r requirements.txt
# Create the vocabulary files required for textVQG.
python utils/vocab.py
# Create the hdf5 dataset.
python utils/store_dataset.py
# Train the model.
python train_textvqg.py
# Evaluate the model.
python evaluate_textvqg.py
The following are some results of the proposed method:
If you find this code/paper useful for your research, please consider citing.
@InProceedings{10.1007/978-3-030-86549-8_22,
author="Jahagirdar, Soumya
and Gangisetty, Shankar
and Mishra, Anand",
editor="Llad{\'o}s, Josep
and Lopresti, Daniel
and Uchida, Seiichi",
title="Look, Read and Ask: Learning to Ask Questions by Reading Text in Images",
booktitle="Document Analysis and Recognition -- ICDAR 2021",
year="2021",
publisher="Springer International Publishing",
address="Cham",
pages="335--349"
}
This repo uses few utility function provided by https://github.com/ranjaykrishna/iq.
For any clarification, comment, or suggestion please create an issue or contact Soumya Shamarao Jahagirdar.