A dataset of digitized comic storybooks in the English language with ground truth annotations for each panel in pages and ground truth text files for each narration box and speech balloon within a panel. Additionally, ground truth binary masks of speech balloons and narration box for each page.
This paper was published in ICDAR 2021
Gupta, V., Detani, V., Khokar, V., Chattopadhyay, C. (2021). C2VNet: A Deep Learning Framework Towards Comic Strip to Audio-Visual Scene Synthesis. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12822. Springer, Cham. https://doi.org/10.1007/978-3-030-86331-9_11
@inproceedings{DBLP:conf/icdar/GuptaDKC21,
author = {Vaibhavi Gupta and
Vinay Detani and
Vivek Khokar and
Chiranjoy Chattopadhyay},
editor = {Josep Llad{\'{o}}s and
Daniel Lopresti and
Seiichi Uchida},
title = {C2VNet: {A} Deep Learning Framework Towards Comic Strip to Audio-Visual
Scene Synthesis},
booktitle = {16th International Conference on Document Analysis and Recognition,
{ICDAR} 2021, Lausanne, Switzerland, September 5-10, 2021, Proceedings,
Part {II}},
series = {Lecture Notes in Computer Science},
volume = {12822},
pages = {160--175},
publisher = {Springer},
year = {2021},
url = {https://doi.org/10.1007/978-3-030-86331-9\_11},
doi = {10.1007/978-3-030-86331-9\_11},
timestamp = {Thu, 16 Sep 2021 18:08:10 +0200},
biburl = {https://dblp.org/rec/conf/icdar/GuptaDKC21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}