Changelog

0.6.1 (04/08/2022)

Highlights

ArT dataset is available for text detection and recognition!
Fix several bugs that affects the correctness of the models.
Thanks to MIM, our installation is much simpler now! The docs has been renewed as well.

New Features & Enhancements

Add ArT by @xinke-wang in #1006
add ABINet_Vision api by @Abdelrahman350 in #1041
add codespell ignore and use mdformat by @Harold-lkk in #1022
Add mim to extras_requrie to setup.py, update mminstall… by @gaotongxiao in #1062
Simplify normalized edit distance calculation by @maxbachmann in #1060
Test mim in CI by @gaotongxiao in #1090
Remove redundant steps by @gaotongxiao in #1091

Update links to SDMGR links by @gaotongxiao in #1252

Bug Fixes

Remove unnecessary requirements by @gaotongxiao in #1000
Remove confusing img_scales in pipelines by @gaotongxiao in #1007
inplace operator "+=" will cause RuntimeError when model backward by @garvan2021 in #1018
Fix a typo problem in MASTER by @Mountchicken in #1031
Fix config name of MASTER in ocr.py by @Mountchicken in #1044
Relax OpenCV requirement by @gaotongxiao in #1061
Restrict the minimum version of OpenCV to avoid potential vulnerability by @gaotongxiao in #1065
typo by @tpoisonooo in #1024
Fix a typo in setup.py by @gaotongxiao in #1095
fix #1067: add torchserve DockerFile and fix bugs by @Hegelim in #1073
Incorrect filename in labelme_converter.py by @xiefeifeihu in #1103
Fix dataset configs by @Mountchicken in #1106
Fix #1098: normalize text recognition scores by @Hegelim in #1119
Update ST_SA_MJ_train.py by @MingyuLau in #1117
PSENet metafile by @gaotongxiao in #1121
Flexible ways of getting file name by @balandongiv in #1107
Updating edge-embeddings after each GNN layer by @amitbcp in #1134
links update by @TekayaNidham in #1141
bug fix: access params by cfg.get by @doem97 in #1145
Fix a bug in LmdbAnnFileBackend that cause breaking in Synthtext detection training by @Mountchicken in #1159
Fix typo of --lmdb-map-size default value by @easilylazy in #1147
Fixed docstring syntax error of line 19 & 21 by @APX103 in #1157
Update lmdb_converter and ct80 cropped image source in document by @doem97 in #1164
MMCV compatibility due to outdated MMDet by @gaotongxiao in #1192
Update maximum version of mmcv by @xinke-wang in #1219
Update ABINet links for main by @Mountchicken in #1221
Update owners by @gaotongxiao in #1248
Add back some missing fields in configs by @gaotongxiao in #1171

Docs

Fix typos by @xinke-wang in #1001
Configure Myst-parser to parse anchor tag by @gaotongxiao in #1012
Fix a error in docs/en/tutorials/dataset_types.md by @Mountchicken in #1034
Update readme according to the guideline by @gaotongxiao in #1047
Limit markdown version by @gaotongxiao in #1172
Limit extension versions by @Mountchicken in #1210

Update installation guide by @gaotongxiao in #1254
Update image link @gaotongxiao in #1255

New Contributors

@tpoisonooo made their first contribution in #1024
@Abdelrahman350 made their first contribution in #1041
@Hegelim made their first contribution in #1073
@xiefeifeihu made their first contribution in #1103
@MingyuLau made their first contribution in #1117
@balandongiv made their first contribution in #1107
@amitbcp made their first contribution in #1134
@TekayaNidham made their first contribution in #1141
@easilylazy made their first contribution in #1147
@APX103 made their first contribution in #1157

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.6.0...v0.6.1

0.6.0 (05/05/2022)

Highlights

A new recognition algorithm MASTER has been added into MMOCR, which was the championship solution for the "ICDAR 2021 Competition on Scientific Table Image Recognition to Latex"! The model pre-trained on SynthText and MJSynth is available for testing! Credit to @JiaquanYe
DBNet++ has been released now! A new Adaptive Scale Fusion module has been equipped for feature enhancement. Benefiting from this, the new model achieved 2% better h-mean score than its predecessor on the ICDAR2015 dataset.
Three more dataset converters are added: LSVT, RCTW and HierText. Check the dataset zoo (Det & Recog ) to explore further information.
To enhance the data storage efficiency, MMOCR now supports loading both images and labels from .lmdb format annotations for the text recognition task. To enable such a feature, the new lmdb_converter.py is ready for use to pack your cropped images and labels into an lmdb file. For a detailed tutorial, please refer to the following sections and the doc.
Testing models on multiple datasets is a widely used evaluation strategy. MMOCR now supports automatically reporting mean scores when there is more than one dataset to evaluate, which enables a more convenient comparison between checkpoints. Doc
Evaluation is more flexible and customizable now. For text detection tasks, you can set the score threshold range where the best results might come out. (Doc) If too many results are flooding your text recognition train log, you can trim it by specifying a subset of metrics in evaluation config. Check out the Evaluation section for details.
MMOCR provides a script to convert the .json labels obtained by the popular annotation toolkit Labelme to MMOCR-supported data format. @Y-M-Y contributed a log analysis tool that helps users gain a better understanding of the entire training process. Read tutorial docs to get started.

Lmdb Dataset

Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now officially supports loading images and labels from lmdb datasets via a new pipeline LoadImageFromLMDB. This section is intended to serve as a quick walkthrough for you to master this update and apply it to facilitate your research.

Specifications

To better align with the academic community, MMOCR now requires the following specifications for lmdb datasets:

The parameter describing the data volume of the dataset is num-samples instead of total_number (deprecated).
Images and labels are stored with keys in the form of image-000000001 and label-000000001, respectively.

Usage

Use existing academic lmdb datasets if they meet the specifications; or the tool provided by MMOCR to pack images & annotations into a lmdb dataset.

Previously, MMOCR had a function txt2lmdb (deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility lmdb_converter to convert recognition datasets with both images and labels to lmdb format.

Say that your recognition data in MMOCR's format are organized as follows. (See an example in ocr_toy_dataset).

# Directory structure

├──img_path
|      |—— img1.jpg
|      |—— img2.jpg
|      |—— ...
|——label.txt (or label.jsonl)

# Annotation format

label.txt:  img1.jpg HELLO
            img2.jpg WORLD
            ...

label.jsonl:    {'filename':'img1.jpg', 'text':'HELLO'}
                {'filename':'img2.jpg', 'text':'WORLD'}
                ...

Then pack these files up:

python tools/data/utils/lmdb_converter.py  {PATH_TO_LABEL} {OUTPUT_PATH} --i {PATH_TO_IMAGES}

Check out tools.md for more details.

The second step is to modify the configuration files. For example, to train CRNN on MJ and ST datasets:

Set parser as LineJsonParser and file_format as 'lmdb' in dataset config

# configs/_base_/recog_datasets/ST_MJ_train.py
train1 = dict(
    type='OCRDataset',
    img_prefix=train_img_prefix1,
    ann_file=train_ann_file1,
    loader=dict(
        type='AnnFileLoader',
        repeat=1,
        file_format='lmdb',
        parser=dict(
            type='LineJsonParser',
            keys=['filename', 'text'],
        )),
    pipeline=None,
    test_mode=False)

Use LoadImageFromLMDB in pipeline:

# configs/_base_/recog_pipelines/crnn_pipeline.py
train_pipeline = [
    dict(type='LoadImageFromLMDB', color_type='grayscale'),
    ...

You are good to go! Start training and MMOCR will load data from your lmdb dataset.

New Features & Enhancements

Add analyze_logs in tools and its description in docs by @Y-M-Y in #899
Add LSVT Data Converter by @xinke-wang in #896
Add RCTW dataset converter by @xinke-wang in #914
Support computing mean scores in UniformConcatDataset by @gaotongxiao in #981
Support loading images and labels from lmdb file by @Mountchicken in #982
Add recog2lmdb and new toy dataset files by @Mountchicken in #979
Add labelme converter for textdet and textrecog by @cuhk-hbsun in #972
Update CircleCI configs by @xinke-wang in #918
Update Git Action by @xinke-wang in #930
More customizable fields in dataloaders by @gaotongxiao in #933
Skip CIs when docs are modified by @gaotongxiao in #941
Rename Github tests, fix ignored paths by @gaotongxiao in #946
Support latest MMCV by @gaotongxiao in #959
Support dynamic threshold range in eval_hmean by @gaotongxiao in #962
Update the version requirement of mmdet in docker by @Mountchicken in #966
Replace opencv-python-headless with open-python by @gaotongxiao in #970
Update Dataset Configs by @xinke-wang in #980
Add SynthText dataset config by @xinke-wang in #983
Automatically report mean scores when applicable by @gaotongxiao in #995
Add DBNet++ by @xinke-wang in #973
Add MASTER by @JiaquanYe in #807
Allow choosing metrics to report in text recognition tasks by @gaotongxiao in #989
Add HierText converter by @Mountchicken in #948
Fix lint_only in CircleCI by @gaotongxiao in #998

Bug Fixes

Fix CircleCi Main Branch Accidentally Run PR Stage Test by @xinke-wang in #927
Fix a deprecate warning about mmdet.datasets.pipelines.formating by @Mountchicken in #944
Fix a Bug in ResNet plugin by @Mountchicken in #967
revert a wrong setting in db_r18 cfg by @gaotongxiao in #978
Fix TotalText Anno version issue by @xinke-wang in #945
Update installation step of albumentations by @gaotongxiao in #984
Fix ImgAug transform by @gaotongxiao in #949
Fix GPG key error in CI and docker by @gaotongxiao in #988
update label.lmdb by @Mountchicken in #991
correct meta key by @garvan2021 in #926
Use new image by @gaotongxiao in #976
Fix Data Converter Issues by @xinke-wang in #955

Docs

Update CONTRIBUTING.md by @gaotongxiao in #905
Fix the misleading description in test.py by @gaotongxiao in #908
Update recog.md for lmdb Generation by @xinke-wang in #934
Add MMCV by @gaotongxiao in #954
Add wechat QR code to CN readme by @gaotongxiao in #960
Update CONTRIBUTING.md by @gaotongxiao in #947
Use QR codes from MMCV by @gaotongxiao in #971
Renew dataset_types.md by @gaotongxiao in #997

New Contributors

@Y-M-Y made their first contribution in #899

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.5.0...v0.6.0

0.5.0 (31/03/2022)

Highlights

MMOCR now supports SPACE recognition! (What a prominent feature!) Users only need to convert the recognition annotations that contain spaces from a plain .txt file to JSON line format .jsonl, and then revise a few configurations to enable the LineJsonParser. For more information, please read our step-by-step tutorial.
Tesseract is now available in MMOCR! While MMOCR is more flexible to support various downstream tasks, users might sometimes not be satisfied with DL models and would like to turn to effective legacy solutions. Therefore, we offer this option in mmocr.utils.ocr by wrapping Tesseract as a detector and/or recognizer. Users can easily create an MMOCR object by MMOCR(det=’Tesseract’, recog=’Tesseract’). Credit to @garvan2021
We release data converters for 16 widely used OCR datasets, including multiple scenarios such as document, handwritten, and scene text. Now it is more convenient to generate annotation files for these datasets. Check the dataset zoo ( Det & Recog ) to explore further information.
Special thanks to @EighteenSprings @BeyondYourself @yangrisheng, who had actively participated in documentation translation!

Migration Guide - ResNet

Some refactoring processes are still going on. For text recognition models, we unified the ResNet-like architectures which are used as backbones. By introducing stage-wise and block-wise plugins, the refactored ResNet is highly flexible to support existing models, like ResNet31 and ResNet45, and other future designs of ResNet variants.

Plugin

Plugin is a module category inherited from MMCV's implementation of PLUGIN_LAYERS, which can be inserted between each stage of ResNet or into a basicblock. You can find a simple implementation of plugin at mmocr/models/textrecog/plugins/common.py, or click the button below.

Plugin Example

@PLUGIN_LAYERS.register_module()
class Maxpool2d(nn.Module):
    """A wrapper around nn.Maxpool2d().

    Args:
        kernel_size (int or tuple(int)): Kernel size for max pooling layer
        stride (int or tuple(int)): Stride for max pooling layer
        padding (int or tuple(int)): Padding for pooling layer
    """

    def __init__(self, kernel_size, stride, padding=0, **kwargs):
        super(Maxpool2d, self).__init__()
        self.model = nn.MaxPool2d(kernel_size, stride, padding)

    def forward(self, x):
        """
        Args:
            x (Tensor): Input feature map

        Returns:
            Tensor: The tensor after Maxpooling layer.
        """
        return self.model(x)

Stage-wise Plugins

ResNet is composed of stages, and each stage is composed of blocks. E.g., ResNet18 is composed of 4 stages, and each stage is composed of basicblocks. For each stage, we provide two ports to insert stage-wise plugins by giving plugins parameters in ResNet.
```
[port1: before stage] ---> [stage] ---> [port2: after stage]
```

E.g. Using a ResNet with four stages as example. Suppose we want to insert an additional convolution layer before each stage, and an additional convolution layer at stage 1, 2, 4. Then you can define the special ResNet18 like this

resnet18_speical = ResNet(
        # for simplicity, some required
        # parameters are omitted
        plugins=[
            dict(
                cfg=dict(
                type='ConvModule',
                kernel_size=3,
                stride=1,
                padding=1,
                norm_cfg=dict(type='BN'),
                act_cfg=dict(type='ReLU')),
                stages=(True, True, True, True),
                position='before_stage')
            dict(
                cfg=dict(
                type='ConvModule',
                kernel_size=3,
                stride=1,
                padding=1,
                norm_cfg=dict(type='BN'),
                act_cfg=dict(type='ReLU')),
                stages=(True, True, False, True),
                position='after_stage')
        ])

You can also insert more than one plugin in each port and those plugins will be executed in order. Let's take ResNet in MASTER as an example:

Multiple Plugins Example

ResNet in Master is based on ResNet31. And after each stage, a module named GCAModule will be used. The GCAModule is inserted before the stage-wise convolution layer in ResNet31. In conlusion, there will be two plugins at after_stage port in the same time.

resnet_master = ResNet(
                # for simplicity, some required
                # parameters are omitted
                plugins=[
                    dict(
                        cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)),
                        stages=(True, True, False, False),
                        position='before_stage'),
                    dict(
                        cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)),
                        stages=(False, False, True, False),
                        position='before_stage'),
                    dict(
                        cfg=dict(type='GCAModule', kernel_size=3, stride=1, padding=1),
                        stages=[True, True, True, True],
                        position='after_stage'),
                    dict(
                        cfg=dict(
                            type='ConvModule',
                            kernel_size=3,
                            stride=1,
                            padding=1,
                            norm_cfg=dict(type='BN'),
                            act_cfg=dict(type='ReLU')),
                        stages=(True, True, True, True),
                        position='after_stage')
                ])

In each plugin, we will pass two parameters (in_channels, out_channels) to support operations that need the information of current channels.

Block-wise Plugin (Experimental)

We also refactored the BasicBlock used in ResNet. Now it can be customized with block-wise plugins. Check here for more details.

BasicBlock is composed of two convolution layer in the main branch and a shortcut branch. We provide four ports to insert plugins.

    [port1: before_conv1] ---> [conv1] --->
    [port2: after_conv1] ---> [conv2] --->
    [port3: after_conv2] ---> +(shortcut) ---> [port4: after_shortcut]

In each plugin, we will pass a parameter in_channels to support operations that need the information of current channels.

E.g. Build a ResNet with customized BasicBlock with an additional convolution layer before conv1:

Block-wise Plugin Example

resnet_31 = ResNet(
        in_channels=3,
        stem_channels=[64, 128],
        block_cfgs=dict(type='BasicBlock'),
        arch_layers=[1, 2, 5, 3],
        arch_channels=[256, 256, 512, 512],
        strides=[1, 1, 1, 1],
        plugins=[
            dict(
                cfg=dict(type='Maxpool2d',
                kernel_size=2,
                stride=(2, 2)),
                stages=(True, True, False, False),
                position='before_stage'),
            dict(
                cfg=dict(type='Maxpool2d',
                kernel_size=(2, 1),
                stride=(2, 1)),
                stages=(False, False, True, False),
                position='before_stage'),
            dict(
                cfg=dict(
                type='ConvModule',
                kernel_size=3,
                stride=1,
                padding=1,
                norm_cfg=dict(type='BN'),
                act_cfg=dict(type='ReLU')),
                stages=(True, True, True, True),
                position='after_stage')
        ])

Full Examples

ResNet without plugins

ResNet45 is used in ASTER and ABINet without any plugins.

resnet45_aster = ResNet(
    in_channels=3,
    stem_channels=[64, 128],
    block_cfgs=dict(type='BasicBlock', use_conv1x1='True'),
    arch_layers=[3, 4, 6, 6, 3],
    arch_channels=[32, 64, 128, 256, 512],
    strides=[(2, 2), (2, 2), (2, 1), (2, 1), (2, 1)])

resnet45_abi = ResNet(
    in_channels=3,
    stem_channels=32,
    block_cfgs=dict(type='BasicBlock', use_conv1x1='True'),
    arch_layers=[3, 4, 6, 6, 3],
    arch_channels=[32, 64, 128, 256, 512],
    strides=[2, 1, 2, 1, 1])

ResNet with plugins

ResNet31 is a typical architecture to use stage-wise plugins. Before the first three stages, Maxpooling layer is used. After each stage, a convolution layer with BN and ReLU is used.

resnet_31 = ResNet(
    in_channels=3,
    stem_channels=[64, 128],
    block_cfgs=dict(type='BasicBlock'),
    arch_layers=[1, 2, 5, 3],
    arch_channels=[256, 256, 512, 512],
    strides=[1, 1, 1, 1],
    plugins=[
        dict(
            cfg=dict(type='Maxpool2d',
            kernel_size=2,
            stride=(2, 2)),
            stages=(True, True, False, False),
            position='before_stage'),
        dict(
            cfg=dict(type='Maxpool2d',
            kernel_size=(2, 1),
            stride=(2, 1)),
            stages=(False, False, True, False),
            position='before_stage'),
        dict(
            cfg=dict(
            type='ConvModule',
            kernel_size=3,
            stride=1,
            padding=1,
            norm_cfg=dict(type='BN'),
            act_cfg=dict(type='ReLU')),
            stages=(True, True, True, True),
            position='after_stage')
    ])

Migration Guide - Dataset Annotation Loader

The annotation loaders, LmdbLoader and HardDiskLoader, are unified into AnnFileLoader for a more consistent design and wider support on different file formats and storage backends. AnnFileLoader can load the annotations from disk(default), http and petrel backend, and parse the annotation in txt or lmdb format. LmdbLoader and HardDiskLoader are deprecated, and users are recommended to modify their configs to use the new AnnFileLoader. Users can migrate their legacy loader HardDiskLoader referring to the following example:

# Legacy config
train = dict(
    type='OCRDataset',
    ...
    loader=dict(
        type='HardDiskLoader',
        ...))

# Suggested config
train = dict(
    type='OCRDataset',
    ...
    loader=dict(
        type='AnnFileLoader',
        file_storage_backend='disk',
        file_format='txt',
        ...))

Similarly, using AnnFileLoader with file_format='lmdb' instead of LmdbLoader is strongly recommended.

New Features & Enhancements

Update mmcv install by @Harold-lkk in #775
Upgrade isort by @gaotongxiao in #771
Automatically infer device for inference if not speicifed by @gaotongxiao in #781
Add open-mmlab precommit hooks by @gaotongxiao in #787
Add windows CI by @gaotongxiao in #790
Add CurvedSyntext150k Converter by @gaotongxiao in #719
Add FUNSD Converter by @xinke-wang in #808
Support loading annotation file with petrel/http backend by @cuhk-hbsun in #793
Support different seeds on different ranks by @gaotongxiao in #820
Support json in recognition converter by @Mountchicken in #844
Add args and docs for multi-machine training/testing by @gaotongxiao in #849
Add warning info for LineStrParser by @xinke-wang in #850
Deploy openmmlab-bot by @gaotongxiao in #876
Add Tesserocr Inference by @garvan2021 in #814
Add LV Dataset Converter by @xinke-wang in #871
Add SROIE Converter by @xinke-wang in #810
Add NAF Converter by @xinke-wang in #815
Add DeText Converter by @xinke-wang in #818
Add IMGUR Converter by @xinke-wang in #825
Add ILST Converter by @Mountchicken in #833
Add KAIST Converter by @xinke-wang in #835
Add IC11 (Born-digital Images) Data Converter by @xinke-wang in #857
Add IC13 (Focused Scene Text) Data Converter by @xinke-wang in #861
Add BID Converter by @Mountchicken in #862
Add Vintext Converter by @Mountchicken in #864
Add MTWI Data Converter by @xinke-wang in #867
Add COCO Text v2 Data Converter by @xinke-wang in #872
Add ReCTS Data Converter by @xinke-wang in #892
Refactor ResNets by @Mountchicken in #809

Bug Fixes

Bump mmdet version to 2.20.0 in Dockerfile by @GPhilo in #763
Update mmdet version limit by @cuhk-hbsun in #773
Minimum version requirement of albumentations by @gaotongxiao in #769
Disable worker in the dataloader of gpu unit test by @gaotongxiao in #780
Standardize the type of torch.device in ocr.py by @gaotongxiao in #800
Use RECOGNIZER instead of DETECTORS by @cuhk-hbsun in #685
Add num_classes to configs of ABINet by @gaotongxiao in #805
Support loading space character from dict file by @gaotongxiao in #854
Description in tools/data/utils/txt2lmdb.py by @Mountchicken in #870
ignore_index in SARLoss by @Mountchicken in #869
Fix a bug that may cause inplace operation error by @Mountchicken in #884
Use hyphen instead of underscores in script args by @gaotongxiao in #890

Docs

Add deprecation message for deploy tools by @xinke-wang in #801
Reorganizing OpenMMLab projects in readme by @xinke-wang in #806
Add demo/README_zh.md by @EighteenSprings in #802
Add detailed version requirement table by @gaotongxiao in #778
Correct misleading section title in training.md by @gaotongxiao in #819
Update README_zh-CN document URL by @BeyondYourself in #823
translate testing.md. by @yangrisheng in #822
Fix confused description for load-from and resume-from by @xinke-wang in #842
Add documents getting_started in docs/zh by @BeyondYourself in #841
Add the model serving translation document by @BeyondYourself in #845
Update docs about installation on Windows by @Mountchicken in #852
Update tutorial notebook by @gaotongxiao in #853
Update Instructions for New Data Converters by @xinke-wang in #900
Brief installation instruction in README by @Harold-lkk in #897
update doc for ILST, VinText, BID by @Mountchicken in #902
Fix typos in readme by @gaotongxiao in #903
Recog dataset doc by @Harold-lkk in #893
Reorganize the directory structure section in det.md by @gaotongxiao in #894

New Contributors

@GPhilo made their first contribution in #763
@xinke-wang made their first contribution in #801
@EighteenSprings made their first contribution in #802
@BeyondYourself made their first contribution in #823
@yangrisheng made their first contribution in #822
@Mountchicken made their first contribution in #844
@garvan2021 made their first contribution in #814

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.4.1...v0.5.0

v0.4.1 (27/01/2022)

Highlights

Visualizing edge weights in OpenSet KIE is now supported! #677
Some configurations have been optimized to significantly speed up the training and testing processes! Don't worry - you can still tune these parameters in case these modifications do not work. #757
Now you can use CPU to train/debug your model! #752
We have fixed a severe bug that causes users unable to call mmocr.apis.test with our pre-built wheels. #667

New Features & Enhancements

Show edge score for openset kie by @cuhk-hbsun in #677
Download flake8 from github as pre-commit hooks by @gaotongxiao in #695
Deprecate the support for 'python setup.py test' by @Harold-lkk in #722
Disable multi-processing feature of cv2 to speed up data loading by @gaotongxiao in #721
Extend ctw1500 converter to support text fields by @Harold-lkk in #729
Extend totaltext converter to support text fields by @Harold-lkk in #728
Speed up training by @gaotongxiao in #739
Add setup multi-processing both in train and test.py by @Harold-lkk in #757
Support CPU training/testing by @gaotongxiao in #752
Support specify gpu for testing and training with gpu-id instead of gpu-ids and gpus by @Harold-lkk in #756
Remove unnecessary custom_import from test.py by @Harold-lkk in #758

Bug Fixes

Fix satrn onnxruntime test by @AllentDan in #679
Support both ConcatDataset and UniformConcatDataset by @cuhk-hbsun in #675
Fix bugs of show_results in single_gpu_test by @cuhk-hbsun in #667
Fix a bug for sar decoder when bi-rnn is used by @MhLiao in #690
Fix opencv version to avoid some bugs by @gaotongxiao in #694
Fix py39 ci error by @Harold-lkk in #707
Update visualize.py by @TommyZihao in #715
Fix link of config by @cuhk-hbsun in #726
Use yaml.safe_load instead of load by @gaotongxiao in #753
Add necessary keys to test_pipelines to enable test-time visualization by @gaotongxiao in #754

Docs

Fix recog.md by @gaotongxiao in #674
Add config tutorial by @gaotongxiao in #683
Add MMSelfSup/MMRazor/MMDeploy in readme by @cuhk-hbsun in #692
Add recog & det model summary by @gaotongxiao in #693
Update docs link by @gaotongxiao in #710
add pull request template.md by @Harold-lkk in #711
Add website links to readme by @gaotongxiao in #731
update readme according to standard by @Harold-lkk in #742

New Contributors

@MhLiao made their first contribution in #690
@TommyZihao made their first contribution in #715

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.4.0...v0.4.1

v0.4.0 (15/12/2021)

Highlights

We release a new text recognition model - ABINet (CVPR 2021, Oral). With it dedicated model design and useful data augmentation transforms, ABINet can achieve the best performance on irregular text recognition tasks. Check it out!
We are also working hard to fulfill the requests from our community. OpenSet KIE is one of the achievement, which extends the application of SDMGR from text node classification to node-pair relation extraction. We also provide a demo script to convert WildReceipt to open set domain, though it cannot take the full advantage of OpenSet format. For more information, please read our tutorial.
APIs of models can be exposed through TorchServe. Docs

Breaking Changes & Migration Guide

Postprocessor

Some refactoring processes are still going on. For all text detection models, we unified their decode implementations into a new module category, POSTPROCESSOR, which is responsible for decoding different raw outputs into boundary instances. In all text detection configs, the text_repr_type argument in bbox_head is deprecated and will be removed in the future release.

Migration Guide: Find a similar line from detection model's config:

text_repr_type=xxx,

And replace it with

postprocessor=dict(type='{MODEL_NAME}Postprocessor', text_repr_type=xxx)),

Take a snippet of PANet's config as an example. Before the change, its config for bbox_head looks like:

    bbox_head=dict(
        type='PANHead',
        text_repr_type='poly',
        in_channels=[128, 128, 128, 128],
        out_channels=6,
        loss=dict(type='PANLoss')),

Afterwards:

    bbox_head=dict(
    type='PANHead',
    in_channels=[128, 128, 128, 128],
    out_channels=6,
    loss=dict(type='PANLoss'),
    postprocessor=dict(type='PANPostprocessor', text_repr_type='poly')),

There are other postprocessors and each takes different arguments. Interested users can find their interfaces or implementations in mmocr/models/textdet/postprocess or through our api docs.

New Config Structure

We reorganized the configs/ directory by extracting reusable sections into configs/_base_. Now the directory tree of configs/_base_ is organized as follows:

_base_
├── det_datasets
├── det_models
├── det_pipelines
├── recog_datasets
├── recog_models
├── recog_pipelines
└── schedules

Most of model configs are making full use of base configs now, which makes the overall structural clearer and facilitates fair comparison across models. Despite the seemingly significant hierarchical difference, these changes would not break the backward compatibility as the names of model configs remain the same.

New Features

Support openset kie by @cuhk-hbsun in #498
Add converter for the Open Images v5 text annotations by Krylov et al. by @baudm in #497
Support Chinese for kie show result by @cuhk-hbsun in #464
Add TorchServe support for text detection and recognition by @Harold-lkk in #522
Save filename in text detection test results by @cuhk-hbsun in #570
Add codespell pre-commit hook and fix typos by @gaotongxiao in #520
Avoid duplicate placeholder docs in CN by @gaotongxiao in #582
Save results to json file for kie. by @cuhk-hbsun in #589
Add SAR_CN to ocr.py by @gaotongxiao in #579
mim extension for windows by @gaotongxiao in #641
Support muitiple pipelines for different datasets by @cuhk-hbsun in #657
ABINet Framework by @gaotongxiao in #651

Refactoring

Refactor textrecog config structure by @cuhk-hbsun in #617
Refactor text detection config by @cuhk-hbsun in #626
refactor transformer modules by @cuhk-hbsun in #618
refactor textdet postprocess by @cuhk-hbsun in #640

Docs

C++ example section by @apiaccess21 in #593
install.md Chinese section by @A465539338 in #364
Add Chinese Translation of deployment.md. by @fatfishZhao in #506
Fix a model link and add the metafile for SATRN by @gaotongxiao in #473
Improve docs style by @gaotongxiao in #474
Enhancement & sync Chinese docs by @gaotongxiao in #492
TorchServe docs by @gaotongxiao in #539
Update docs menu by @gaotongxiao in #564
Docs for KIE CloseSet & OpenSet by @gaotongxiao in #573
Fix broken links by @gaotongxiao in #576
Docstring for text recognition models by @gaotongxiao in #562
Add MMFlow & MIM by @gaotongxiao in #597
Add MMFewShot by @gaotongxiao in #621
Update model readme by @gaotongxiao in #604
Add input size check to model_inference by @mpena-vina in #633
Docstring for textdet models by @gaotongxiao in #561
Add MMHuman3D in readme by @gaotongxiao in #644
Use shared menu from theme instead by @gaotongxiao in #655
Refactor docs structure by @gaotongxiao in #662
Docs fix by @gaotongxiao in #664

Enhancements

Use bounding box around polygon instead of within polygon by @alexander-soare in #469
Add CITATION.cff by @gaotongxiao in #476
Add py3.9 CI by @gaotongxiao in #475
update model-index.yml by @gaotongxiao in #484
Use container in CI by @gaotongxiao in #502
CircleCI Setup by @gaotongxiao in #611
Remove unnecessary custom_import from train.py by @gaotongxiao in #603
Change the upper version of mmcv to 1.5.0 by @zhouzaida in #628
Update CircleCI by @gaotongxiao in #631
Pass custom_hooks to MMCV by @gaotongxiao in #609
Skip CI when some specific files were changed by @gaotongxiao in #642
Add markdown linter in pre-commit hook by @gaotongxiao in #643
Use shape from loaded image by @cuhk-hbsun in #652
Cancel previous runs that are not completed by @Harold-lkk in #666

Bug Fixes

Modify algorithm "sar" weights path in metafile by @ShoupingShan in #581
Fix Cuda CI by @gaotongxiao in #472
Fix image export in test.py for KIE models by @gaotongxiao in #486
Allow invalid polygons in intersection and union by default by @gaotongxiao in #471
Update checkpoints' links for SATRN by @gaotongxiao in #518
Fix converting to onnx bug because of changing key from img_shape to resize_shape by @Harold-lkk in #523
Fix PyTorch 1.6 incompatible checkpoints by @gaotongxiao in #540
Fix paper field in metafiles by @gaotongxiao in #550
Unify recognition task names in metafiles by @gaotongxiao in #548
Fix py3.9 CI by @gaotongxiao in #563
Always map location to cpu when loading checkpoint by @gaotongxiao in #567
Fix wrong model builder in recog_test_imgs by @gaotongxiao in #574
Improve dbnet r50 by fixing img std by @gaotongxiao in #578
Fix resource warning: unclosed file by @cuhk-hbsun in #577
Fix bug that same start_point for different texts in draw_texts_by_pil by @cuhk-hbsun in #587
Keep original texts for kie by @cuhk-hbsun in #588
Fix random seed by @gaotongxiao in #600
Fix DBNet_r50 config by @gaotongxiao in #625
Change SBC case to DBC case by @cuhk-hbsun in #632
Fix kie demo by @innerlee in #610
fix type check by @cuhk-hbsun in #650
Remove depreciated image validator in totaltext converter by @gaotongxiao in #661
Fix change locals() dict by @Fei-Wang in #663
fix #614: textsnake targets by @HolyCrap96 in #660

New Contributors

@alexander-soare made their first contribution in #469
@A465539338 made their first contribution in #364
@fatfishZhao made their first contribution in #506
@baudm made their first contribution in #497
@ShoupingShan made their first contribution in #581
@apiaccess21 made their first contribution in #593
@zhouzaida made their first contribution in #628
@mpena-vina made their first contribution in #633
@Fei-Wang made their first contribution in #663

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.3.0...0.4.0

v0.3.0 (25/8/2021)

Highlights

We add a new text recognition model -- SATRN! Its pretrained checkpoint achieves the best performance over other provided text recognition models. A lighter version of SATRN is also released which can obtain ~98% of the performance of the original model with only 45 MB in size. (@2793145003) #405
Improve the demo script, ocr.py, which supports applying end-to-end text detection, text recognition and key information extraction models on images with easy-to-use commands. Users can find its full documentation in the demo section. (@samayala22, @manjrekarom) #371, #386, #400, #374, #428
Our documentation is reorganized into a clearer structure. More useful contents are on the way! #409, #454
The requirement of Polygon3 is removed since this project is no longer maintained or distributed. We unified all its references to equivalent substitutions in shapely instead. #448

Breaking Changes & Migration Guide

Upgrade version requirement of MMDetection to 2.14.0 to avoid bugs #382
MMOCR now has its own model and layer registries inherited from MMDetection's or MMCV's counterparts. (#436) The modified hierarchical structure of the model registries are now organized as follows.

mmcv.MODELS -> mmdet.BACKBONES -> BACKBONES
mmcv.MODELS -> mmdet.NECKS -> NECKS
mmcv.MODELS -> mmdet.ROI_EXTRACTORS -> ROI_EXTRACTORS
mmcv.MODELS -> mmdet.HEADS -> HEADS
mmcv.MODELS -> mmdet.LOSSES -> LOSSES
mmcv.MODELS -> mmdet.DETECTORS -> DETECTORS
mmcv.ACTIVATION_LAYERS -> ACTIVATION_LAYERS
mmcv.UPSAMPLE_LAYERS -> UPSAMPLE_LAYERS

To migrate your old implementation to our new backend, you need to change the import path of any registries and their corresponding builder functions (including build_detectors) from mmdet.models.builder to mmocr.models.builder. If you have referred to any model or layer of MMDetection or MMCV in your model config, you need to add mmdet. or mmcv. prefix to its name to inform the model builder of the right namespace to work on.

Interested users may check out MMCV's tutorial on Registry for in-depth explanations on its mechanism.

New Features

Automatically replace SyncBN with BN for inference #420, #453
Support batch inference for CRNN and SegOCR #407
Support exporting documentation in pdf or epub format #406
Support persistent_workers option in data loader #459

Bug Fixes

Remove depreciated key in kie_test_imgs.py #381
Fix dimension mismatch in batch testing/inference of DBNet #383
Fix the problem of dice loss which stays at 1 with an empty target given #408
Fix a wrong link in ocr.py (@naarkhoo) #417
Fix undesired assignment to "pretrained" in test.py #418
Fix a problem in polygon generation of DBNet #421, #443
Skip invalid annotations in totaltext_converter #438
Add zero division handler in poly utils, remove Polygon3 #448

Improvements

Replace lanms-proper with lanms-neo to support installation on Windows (with special thanks to @gen-ko who has re-distributed this package!)
Support MIM #394
Add tests for PyTorch 1.9 in CI #401
Enables fullscreen layout in readthedocs #413
General documentation enhancement #395
Update version checker #427
Add copyright info #439
Update citation information #440

Contributors

We thank @2793145003, @samayala22, @manjrekarom, @naarkhoo, @gen-ko, @duanjiaqi, @gaotongxiao, @cuhk-hbsun, @innerlee, @wdsd641417025 for their contribution to this release!

v0.2.1 (20/7/2021)

Highlights

Upgrade to use MMCV-full >= 1.3.8 and MMDetection >= 2.13.0 for latest features
Add ONNX and TensorRT export tool, supporting the deployment of DBNet, PSENet, PANet and CRNN (experimental) #278, #291, #300, #328
Unified parameter initialization method which uses init_cfg in config files #365

New Features

Support TextOCR dataset #293
Support Total-Text dataset #266, #273, #357
Support grouping text detection box into lines #290, #304
Add benchmark_processing script that benchmarks data loading process #261
Add SynthText preprocessor for text recognition models #351, #361
Support batch inference during testing #310
Add user-friendly OCR inference script #366

Bug Fixes

Fix improper class ignorance in SDMGR Loss #221
Fix potential numerical zero division error in DRRG #224
Fix installing requirements with pip and mim #242
Fix dynamic input error of DBNet #269
Fix space parsing error in LineStrParser #285
Fix textsnake decode error #264
Correct isort setup #288
Fix a bug in SDMGR config #316
Fix kie_test_img for KIE nonvisual #319
Fix metafiles #342
Fix different device problem in FCENet #334
Ignore improper tailing empty characters in annotation files #358
Docs fixes #247, #255, #265, #267, #268, #270, #276, #287, #330, #355, #367
Fix NRTR config #356, #370

Improvements

Add backend for resizeocr #244
Skip image processing pipelines in SDMGR novisual #260
Speedup DBNet #263
Update mmcv installation method in workflow #323
Add part of Chinese documentations #353, #362
Add support for ConcatDataset with two workflows #348
Add list_from_file and list_to_file utils #226
Speed up sort_vertex #239
Support distributed evaluation of KIE #234
Add pretrained FCENet on IC15 #258
Support CPU for OCR demo #227
Avoid extra image pre-processing steps #375

v0.2.0 (18/5/2021)

Highlights

Add the NER approach Bert-softmax (NAACL'2019)
Add the text detection method DRRG (CVPR'2020)
Add the text detection method FCENet (CVPR'2021)
Increase the ease of use via adding text detection and recognition end-to-end demo, and colab online demo.
Simplify the installation.

New Features

Add Bert-softmax for Ner task #148
Add DRRG #189
Add FCENet #133
Add end-to-end demo #105
Support batch inference #86 #87 #178
Add TPS preprocessor for text recognition #117 #135
Add demo documentation #151 #166 #168 #170 #171
Add checkpoint for Chinese recognition #156
Add metafile #175 #176 #177 #182 #183
Add support for numpy array inference #74

Bug Fixes

Fix the duplicated point bug due to transform for textsnake #130
Fix CTC loss NaN #159
Fix error raised if result is empty in demo #144
Fix results missing if one image has a large number of boxes #98
Fix package missing in dockerfile #109

Improvements

Simplify installation procedure via removing compiling #188
Speed up panet post processing so that it can detect dense texts #188
Add zh-CN README #70 #95
Support windows #89
Add Colab #147 #199
Add 1-step installation using conda environment #193 #194 #195

v0.1.0 (7/4/2021)

Highlights

MMOCR is released.

Main Features

Support text detection, text recognition and the corresponding downstream tasks such as key information extraction.
For text detection, support both single-step (PSENet, PANet, DBNet, TextSnake) and two-step (MaskRCNN) methods.
For text recognition, support CTC-loss based method CRNN; Encoder-decoder (with attention) based methods SAR, Robustscanner; Segmentation based method SegOCR; Transformer based method NRTR.
For key information extraction, support GCN based method SDMG-R.
Provide checkpoints and log files for all of the methods above.

Files

changelog.md

Latest commit

History

changelog.md

File metadata and controls

Changelog

0.6.1 (04/08/2022)

Highlights

New Features & Enhancements

Bug Fixes

Docs

New Contributors

0.6.0 (05/05/2022)

Highlights

Lmdb Dataset

Specifications

Usage

New Features & Enhancements

Bug Fixes

Docs

New Contributors

0.5.0 (31/03/2022)

Highlights

Migration Guide - ResNet

Plugin

Stage-wise Plugins

Block-wise Plugin (Experimental)

Full Examples

Migration Guide - Dataset Annotation Loader

New Features & Enhancements

Bug Fixes

Docs

New Contributors

v0.4.1 (27/01/2022)

Highlights

New Features & Enhancements

Bug Fixes

Docs

New Contributors

v0.4.0 (15/12/2021)

Highlights

Breaking Changes & Migration Guide

Postprocessor

New Config Structure

New Features

Refactoring

Docs

Enhancements

Bug Fixes

New Contributors

v0.3.0 (25/8/2021)

Highlights

Breaking Changes & Migration Guide

New Features

Bug Fixes

Improvements

Contributors

v0.2.1 (20/7/2021)

Highlights

New Features

Bug Fixes

Improvements

v0.2.0 (18/5/2021)

Highlights

New Features

Bug Fixes

Improvements

v0.1.0 (7/4/2021)

Highlights

Main Features