Skip to content

Latest commit

 

History

History
67 lines (53 loc) · 4.54 KB

README.md

File metadata and controls

67 lines (53 loc) · 4.54 KB

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

Introduction

[ALGORITHM]

@inproceedings{li2019show,
  title={Show, attend and read: A simple and strong baseline for irregular text recognition},
  author={Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={33},
  number={01},
  pages={8610--8617},
  year={2019}
}

Dataset

Train Dataset

trainset instance_num repeat_num source
icdar_2011 3567 20 real
icdar_2013 848 20 real
icdar2015 4468 20 real
coco_text 42142 20 real
IIIT5K 2000 20 real
SynthText 2400000 1 synth
SynthAdd 1216889 1 synth, 1.6m in [1]
Syn90k 2400000 1 synth

Test Dataset

testset instance_num type
IIIT5K 3000 regular
SVT 647 regular
IC13 1015 regular
IC15 2077 irregular
SVTP 645 irregular, 639 in [1]
CT80 288 irregular

Results and Models

Methods Backbone Decoder Regular Text Irregular Text download
IIIT5K SVT IC13 IC15 SVTP CT80
SAR R31-1/8-1/4 ParallelSARDecoder 95.0 89.6 93.7 79.0 82.2 88.9 model | log
SAR R31-1/8-1/4 SequentialSARDecoder 95.2 88.7 92.4 78.2 81.9 89.6 model | log

Notes:

  • R31-1/8-1/4 means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
  • We did not use beam search during decoding.
  • We implemented two kinds of decoder. Namely, ParallelSARDecoder and SequentialSARDecoder.
    • ParallelSARDecoder: Parallel decoding during training with LSTM layer. It would be faster.
    • SequentialSARDecoder: Sequential Decoding during training with LSTMCell. It would be easier to understand.
  • For train dataset.
    • We did not construct distinct data groups (20 groups in [1]) to train the model group-by-group since it would render model training too complicated.
    • Instead, we randomly selected 2.4m patches from Syn90k, 2.4m from SynthText and 1.2m from SynthAdd, and grouped all data together. See config for details.
  • We used 48 GPUs with total_batch_size = 64 * 48 in the experiment above to speedup training, while keeping the initial lr = 1e-3 unchanged.

References

[1] Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.