Skip to content

mxin262/ESTextSpotter

Repository files navigation

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

This is the pytorch implementation of Paper: ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer (ICCV 2023). The paper is available at this link.

News

2024.04.09 We release a new text spotting pipeline Bridge Text Spotting that combines the advantages of end-to-end and two-step text spotting. Code

2023.07.21 Code is available.

Getting Started

  • Installation

Python 3.8 + PyTorch 1.10.0 + CUDA 11.3 + torchvision=0.11.0 + Detectron2 (v0.2.1) + OpenCV for visualization

conda create -n ESTS python=3.8 -y
conda activate ESTS
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
git clone https://github.com/mxin262/ESTextSpotter.git
cd detectron2-0.2.1
python setup.py build develop
pip install opencv-python
cd models/ests/ops
sh make.sh

Data Preparation

Please download TotalText, CTW1500, MLT, ICDAR2013, ICDAR2015, and CurvedSynText150k according to the guide provided by SPTS v2: README.md.

Please download the MLT 2019 in Images / Annotations.

Extract all the datasets and make sure you organize them as follows

- datasets
  | - CTW1500
  |   | - annotations
  |   | - ctwtest_text_image
  |   | - ctwtrain_text_image
  | - totaltext (or icdar2015)
  |   | - test_images
  |   | - train_images
  |   | - test.json
  |   | - train.json
  | - mlt2017 (or syntext1, syntext2)
      | - annotations
      | - images

Model Zoo

Dataset Det-P Det-R Det-F1 E2E-None E2E-Full Weights
Pretrain 90.7 85.3 87.9 73.8 85.5 OneDrive
Total-Text 91.8 88.2 90.0 80.9 87.1 OneDrive
CTW1500 91.3 88.6 89.9 65.0 83.9 OneDrive
Dataset Det-P Det-R Det-F1 E2E-S E2E-W E2E-G Weights
ICDAR2015 95.1 88 91.4 88.5 83.1 78.1 OneDrive
Dataset H-mean Weights
VinText 73.6 OneDrive
Dataset Det-P Det-R Det-H 1-NED Weights
ICDAR 2019 ReCTS 94.1 91.3 92.7 78.1 OneDrive
Dataset R P H AP Arabic Latin Chinese Japanese Korean Bangla Hindi Weights
MLT 75.5 83.37 79.24 72.52 52.00 77.34 48.20 48.42 63.56 38.26 50.83 OneDrive

Training

We use 8 GPUs for training and 2 images each GPU by default.

  1. Pretrain
bash scripts/Pretrain.sh /path/to/your/dataset
  1. Fine-tune model on the mixed real dataset
bash scripts/Joint_train.sh /path/to/your/dataset
  1. Fine-tune model
bash scripts/TT_finetune.sh /path/to/your/dataset

Evaluation

0 for Text Detection; 1 for Text Spotting.

bash scripts/test.sh config/ESTS/ESTS_5scale_tt_finetune.py /path/to/your/dataset 1 /path/to/your/checkpoint /path/to/your/test dataset

e.g.:

bash scripts/test.sh config/ESTS/ESTS_5scale_tt_finetune.py ../datasets 1 totaltext_checkpoint.pth totaltext_val

Visualization

Visualize the detection and recognition results

python vis.py

Example Results:

Copyright

This repository can only be used for non-commercial research purpose.

For commercial use, please contact Prof. Lianwen Jin (eelwjin@scut.edu.cn).

Copyright 2023, Deep Learning and Vision Computing Lab, South China University of Technology.

Acknowlegement

AdelaiDet, DINO, Detectron2, TESTR

Citation

If our paper helps your research, please cite it in your publications:

@InProceedings{Huang_2023_ICCV,
    author    = {Huang, Mingxin and Zhang, Jiaxin and Peng, Dezhi and Lu, Hao and Huang, Can and Liu, Yuliang and Bai, Xiang and Jin, Lianwen},
    title     = {ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {19495-19505}
}

About

(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published