Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Add MASTER #807

Merged
merged 39 commits into from
May 5, 2022
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
ccf1b6f
fix #794: add MASTER
JiaquanYe Feb 27, 2022
9f47122
fix conflict add MASTER
JiaquanYe Feb 27, 2022
d63c4e4
fix conflict add MASTER
JiaquanYe Feb 27, 2022
9c01466
fix conflict add MASTER
JiaquanYe Feb 28, 2022
263899d
fix conflict add MASTER
JiaquanYe Feb 28, 2022
570905c
fix conflict add MASTER
JiaquanYe Feb 28, 2022
d028ec8
fix conflict add MASTER
JiaquanYe Feb 28, 2022
0b15c26
fix conflict add MASTER
JiaquanYe Mar 1, 2022
248decd
Fix linting
gaotongxiao Mar 2, 2022
66fa5f8
Merge branch 'main' into feature/iss_794
gaotongxiao Mar 28, 2022
0461300
after git rebase main
JiaquanYe Feb 27, 2022
c58fd7c
after git rebase main
JiaquanYe Feb 27, 2022
7161d3a
fix conflict add MASTER
JiaquanYe Feb 27, 2022
651b6bd
fix conflict add MASTER
JiaquanYe Feb 28, 2022
8f42bac
after git rebase main
JiaquanYe Feb 28, 2022
a3bec83
fix conflict add MASTER
JiaquanYe Feb 28, 2022
88d219b
fix conflict add MASTER
JiaquanYe Feb 28, 2022
321f0c3
fix conflict add MASTER
JiaquanYe Mar 1, 2022
bd589f9
after git rebase main
gaotongxiao Mar 2, 2022
bcf7f03
add GCAModule to plugins
JiaquanYe Mar 28, 2022
1be664d
coexist master and master_old
JiaquanYe Apr 9, 2022
c03bdbc
fix merge mmocr 0.5.0 conflict
JiaquanYe Apr 9, 2022
696310b
fix lint error
JiaquanYe Apr 9, 2022
11b51b6
fix remote feature branch conflict
JiaquanYe Apr 10, 2022
8d47842
Merge branch 'main' into pr/JiaquanYe/807
gaotongxiao Apr 11, 2022
7fa541e
update
Mountchicken Apr 21, 2022
fc6e824
[fix] remove remains in __init__
Mountchicken Apr 22, 2022
12c7259
[update] update code in review
Mountchicken Apr 22, 2022
447131a
update readme for master
Mountchicken Apr 22, 2022
f9d1ae1
Add docstr to MasterDecoder, refined MasterDecoder, remove MASTERLoss
gaotongxiao Apr 22, 2022
0c6f651
Unify the output length of MasterDecoder in train and test mode; add …
gaotongxiao Apr 23, 2022
6111201
update readme
gaotongxiao Apr 23, 2022
c60f3d2
Merge remote-tracking branch 'origin/main' into test
Mountchicken May 1, 2022
d18ecda
update
Mountchicken May 1, 2022
eb9a1c1
update metafile,README,demo/README,config,ocr.py
Mountchicken May 5, 2022
0b4aa98
Update mmocr/utils/ocr.py
gaotongxiao May 5, 2022
001ac48
Merge remote-tracking branch 'origin/main' into test
Mountchicken May 5, 2022
3dca667
update
Mountchicken May 5, 2022
8f19d14
Merge branch 'feature/iss_794' of https://github.com/JiaquanYe/mmocr …
Mountchicken May 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ Supported algorithms:

- [x] [ABINet](configs/textrecog/abinet/README.md) (CVPR'2021)
- [x] [CRNN](configs/textrecog/crnn/README.md) (TPAMI'2016)
- [x] [MASTER](configs/textrecog/master/README.md) (PR'2021)
- [x] [NRTR](configs/textrecog/nrtr/README.md) (ICDAR'2019)
- [x] [RobustScanner](configs/textrecog/robust_scanner/README.md) (ECCV'2020)
- [x] [SAR](configs/textrecog/sar/README.md) (AAAI'2019)
Expand Down
1 change: 1 addition & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ MMOCR 是基于 PyTorch 和 mmdetection 的开源工具箱,专注于文本检

- [x] [ABINet](configs/textrecog/abinet/README.md) (CVPR'2021)
- [x] [CRNN](configs/textrecog/crnn/README.md) (TPAMI'2016)
- [x] [MASTER](configs/textrecog/master/README.md) (PR'2021)
- [x] [NRTR](configs/textrecog/nrtr/README.md) (ICDAR'2019)
- [x] [RobustScanner](configs/textrecog/robust_scanner/README.md) (ECCV'2020)
- [x] [SAR](configs/textrecog/sar/README.md) (AAAI'2019)
Expand Down
41 changes: 41 additions & 0 deletions configs/_base_/recog_datasets/ST_SA_MJ_train.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Text Recognition Training set, including:
# Synthetic Datasets: SynthText, Syn90k

train_root = 'data/mixture'

train_img_prefix1 = f'{train_root}/Syn90k/mnt/ramdisk/max/90kDICT32px'
train_ann_file1 = f'{train_root}/Syn90k/label.lmdb'

train1 = dict(
type='OCRDataset',
img_prefix=train_img_prefix1,
ann_file=train_ann_file1,
loader=dict(
type='AnnFileLoader',
repeat=1,
file_format='lmdb',
parser=dict(
type='LineStrParser',
keys=['filename', 'text'],
keys_idx=[0, 1],
separator=' ')),
pipeline=None,
test_mode=False)

train_img_prefix2 = f'{train_root}/SynthText/' + \
'synthtext/SynthText_patch_horizontal'
train_ann_file2 = f'{train_root}/SynthText/label.lmdb'

train_img_prefix3 = f'{train_root}/SynthText_Add'
train_ann_file3 = f'{train_root}/SynthText_Add/label.txt'

train2 = {key: value for key, value in train1.items()}
train2['img_prefix'] = train_img_prefix2
train2['ann_file'] = train_ann_file2

train3 = {key: value for key, value in train1.items()}
train3['img_prefix'] = train_img_prefix3
train3['ann_file'] = train_ann_file3
train3['loader']['file_format'] = 'txt'

train_list = [train1, train2, train3]
61 changes: 61 additions & 0 deletions configs/_base_/recog_models/master.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
label_convertor = dict(
type='AttnConvertor', dict_type='DICT90', with_unknown=True)

model = dict(
type='MASTER',
backbone=dict(
type='ResNet',
in_channels=3,
stem_channels=[64, 128],
block_cfgs=dict(
type='BasicBlock',
plugins=dict(
cfg=dict(
type='GCAModule',
ratio=0.0625,
headers=1,
pooling_type='att',
is_att_scale=False,
fusion_type='channel_add'),
position='after_conv2')),
arch_layers=[1, 2, 5, 3],
arch_channels=[256, 256, 512, 512],
strides=[1, 1, 1, 1],
plugins=[
dict(
cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)),
stages=(True, True, False, False),
position='before_stage'),
dict(
cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)),
stages=(False, False, True, False),
position='before_stage'),
dict(
cfg=dict(
type='ConvModule',
kernel_size=3,
stride=1,
padding=1,
norm_cfg=dict(type='BN'),
act_cfg=dict(type='ReLU')),
stages=(True, True, True, True),
position='after_stage')
],
init_cfg=[
dict(type='Kaiming', layer='Conv2d'),
dict(type='Constant', val=1, layer='BatchNorm2d'),
]),
encoder=None,
decoder=dict(
type='MasterDecoder',
d_model=512,
n_head=8,
attn_drop=0.,
ffn_drop=0.,
d_inner=2048,
n_layers=3,
feat_pe_drop=0.2,
feat_size=6 * 40),
loss=dict(type='TFLoss', reduction='mean'),
label_convertor=label_convertor,
max_seq_len=30)
42 changes: 42 additions & 0 deletions configs/_base_/recog_pipelines/master_pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='ResizeOCR',
height=48,
min_width=48,
max_width=160,
keep_aspect_ratio=True),
dict(type='ToTensorOCR'),
dict(type='NormalizeOCR', **img_norm_cfg),
dict(
type='Collect',
keys=['img'],
meta_keys=[
'filename', 'ori_shape', 'img_shape', 'text', 'valid_ratio',
'resize_shape'
]),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiRotateAugOCR',
rotate_degrees=[0, 90, 270],
transforms=[
dict(
type='ResizeOCR',
height=48,
min_width=48,
max_width=160,
keep_aspect_ratio=True),
dict(type='ToTensorOCR'),
dict(type='NormalizeOCR', **img_norm_cfg),
dict(
type='Collect',
keys=['img'],
meta_keys=[
'filename', 'ori_shape', 'img_shape', 'valid_ratio',
'img_norm_cfg', 'ori_filename', 'resize_shape'
]),
])
]
11 changes: 11 additions & 0 deletions configs/_base_/schedules/schedule_adam_step_12e.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# optimizer
optimizer = dict(type='Adam', lr=4e-4)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=100,
warmup_ratio=1.0 / 3,
step=[11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
Mountchicken marked this conversation as resolved.
Show resolved Hide resolved
54 changes: 54 additions & 0 deletions configs/textrecog/master/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# MASTER

>[MASTER: Multi-aspect non-local network for scene text recognition](https://arxiv.org/abs/1910.02562)

<!-- [ALGORITHM] -->

## Abstract

Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture. However, such methods suffer from attention-drift problem because high similarity among encoded features leads to attention confusion under the RNN-based local attention mechanism. Moreover, RNN-based methods have low efficiency due to poor parallelization. To overcome these problems, we propose the MASTER, a self-attention based scene text recognizer that (1) not only encodes the input-output attention but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion, and (3) owns a great training efficiency because of high training parallelization and a high-speed inference because of an efficient memory-cache mechanism. Extensive experiments on various benchmarks demonstrate the superior performance of our MASTER on both regular and irregular scene text.

<div align=center>
<img src="https://user-images.githubusercontent.com/65173622/164642001-037f81b7-37dd-4808-a6a9-09ff6f6a17ea.JPG">
</div>

## Dataset

### Train Dataset

### Train Dataset
Mountchicken marked this conversation as resolved.
Show resolved Hide resolved

| trainset | instance_num | repeat_num | source |
| :-------: | :----------: | :--------: | :----: |
| SynthText | 7266686 | 1 | synth |
| SynthAdd | 1216889 | 1 | synth |
| Syn90k | 8919273 | 1 | synth |

### Test Dataset

| testset | instance_num | type |
| :-----: | :----------: | :-------: |
| IIIT5K | 3000 | regular |
| SVT | 647 | regular |
| IC13 | 1015 | regular |
| IC15 | 2077 | irregular |
| SVTP | 645 | irregular |
| CT80 | 288 | irregular |

## Results and Models

| Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
| :----------------------------------------------------: | :-----------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
| [MASTER](/configs/textrecog/master/master_academic.py) | R31-GCAModule | 95.27 | 89.8 | 95.17 | | 77.03 | 82.95 | 89.93 | [model](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.log.json) |

## Citation

```bibtex
@article{Lu2021MASTER,
title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
journal={Pattern Recognition},
year={2021}
}
```
33 changes: 33 additions & 0 deletions configs/textrecog/master/master_r31_12e_ST_MJ_SA.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
_base_ = [
'../../_base_/default_runtime.py', '../../_base_/recog_models/master.py',
'../../_base_/schedules/schedule_adam_step_12e.py',
'../../_base_/recog_pipelines/master_pipeline.py',
'../../_base_/recog_datasets/ST_SA_MJ_train.py',
'../../_base_/recog_datasets/academic_test.py'
]

train_list = {{_base_.train_list}}
test_list = {{_base_.test_list}}

train_pipeline = {{_base_.train_pipeline}}
test_pipeline = {{_base_.test_pipeline}}

data = dict(
samples_per_gpu=512,
workers_per_gpu=4,
val_dataloader=dict(samples_per_gpu=128),
test_dataloader=dict(samples_per_gpu=128),
train=dict(
type='UniformConcatDataset',
datasets=train_list,
pipeline=train_pipeline),
val=dict(
type='UniformConcatDataset',
datasets=test_list,
pipeline=test_pipeline),
test=dict(
type='UniformConcatDataset',
datasets=test_list,
pipeline=test_pipeline))

evaluation = dict(interval=1, metric='acc')
30 changes: 30 additions & 0 deletions configs/textrecog/master/master_toy_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
_base_ = [
'../../_base_/default_runtime.py', '../../_base_/recog_models/master.py',
'../../_base_/schedules/schedule_adam_step_12e.py',
'../../_base_/recog_pipelines/master_pipeline.py',
'../../_base_/recog_datasets/toy_data.py'
]

train_list = {{_base_.train_list}}
test_list = {{_base_.test_list}}

train_pipeline = {{_base_.train_pipeline}}
test_pipeline = {{_base_.test_pipeline}}

data = dict(
workers_per_gpu=2,
samples_per_gpu=8,
train=dict(
type='UniformConcatDataset',
datasets=train_list,
pipeline=train_pipeline),
val=dict(
type='UniformConcatDataset',
datasets=test_list,
pipeline=test_pipeline),
test=dict(
type='UniformConcatDataset',
datasets=test_list,
pipeline=test_pipeline))

evaluation = dict(interval=1, metric='acc')
52 changes: 52 additions & 0 deletions configs/textrecog/master/metafile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
Collections:
- Name: MASTER
Metadata:
Training Data: OCRDataset
Training Techniques:
- Adam
Epochs: 12
Batch Size: 512
Training Resources: 4x Tesla A100
Architecture:
- ResNet31-GCAModule
- MASTERDecoder
Paper:
URL: https://arxiv.org/abs/1910.02562
Title: "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition"
README: configs/textrecog/master/README.md

Models:
- Name: master_academic
In Collection: MASTER
Config: configs/textrecog/master/master_academic.py
Metadata:
Training Data:
- SynthText
- SynthAdd
- Syn90k
Results:
- Task: Text Recognition
Dataset: IIIT5K
Metrics:
word_acc: 95.27
- Task: Text Recognition
Dataset: SVT
Metrics:
word_acc: 89.8
- Task: Text Recognition
Dataset: ICDAR2013
Metrics:
word_acc: 95.17
- Task: Text Recognition
Dataset: ICDAR2015
Metrics:
word_acc: 77.03
- Task: Text Recognition
Dataset: SVTP
Metrics:
word_acc: 82.95
- Task: Text Recognition
Dataset: CT80
Metrics:
word_acc: 89.93
Weights: https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.pth
1 change: 1 addition & 0 deletions demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,7 @@ means that `batch_mode` and `print_result` are set to `True`)
| ABINet | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition) | :heavy_check_mark: |
| CRNN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: |
| CRNN_TPS | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: |
| MASTER | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#master) | :heavy_check_mark: |
| NRTR_1/16-1/8 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
| NRTR_1/8-1/4 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
| RobustScanner | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: |
Expand Down
1 change: 1 addition & 0 deletions demo/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ mmocr 为了方便使用提供了预置的模型配置和对应的预训练权
| ABINet | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition) | :heavy_check_mark: |
| CRNN | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: |
| CRNN_TPS | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: |
| MASTER | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#master) | :heavy_check_mark: |
| NRTR_1/16-1/8 | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
| NRTR_1/8-1/4 | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
| RobustScanner | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: |
Expand Down
4 changes: 3 additions & 1 deletion mmocr/models/textrecog/decoders/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from .abinet_vision_decoder import ABIVisionDecoder
from .base_decoder import BaseDecoder
from .crnn_decoder import CRNNDecoder
from .master_decoder import MasterDecoder
from .nrtr_decoder import NRTRDecoder
from .position_attention_decoder import PositionAttentionDecoder
from .robust_scanner_decoder import RobustScannerDecoder
Expand All @@ -14,5 +15,6 @@
'CRNNDecoder', 'ParallelSARDecoder', 'SequentialSARDecoder',
'ParallelSARDecoderWithBS', 'NRTRDecoder', 'BaseDecoder',
'SequenceAttentionDecoder', 'PositionAttentionDecoder',
'RobustScannerDecoder', 'ABILanguageDecoder', 'ABIVisionDecoder'
'RobustScannerDecoder', 'ABILanguageDecoder', 'ABIVisionDecoder',
'MasterDecoder'
]
Loading