Re-Implementing CRAFT-Character Region Awareness for Text Detection

Focused on Japanese Text

Objective

Reproduce weak-supervision training as mentioned in the paper https://arxiv.org/pdf/1904.01941.pdf
Generate character bbox on all Datapile's data sets.

Clone the repository

git clone https://github.com/autonise/CRAFT-Remade.git
cd CRAFT-Remade

Option 1: Conda Environment Installation

conda env create -f environment.yml
conda activate craft

Option 2: Pip Installation

pip3 install -r requirements.txt

Running on custom images

Put the images inside a folder.
Get a pre-trained model from the pre-trained model list (Currently only strong supervision using SYNTH-Text available)
Run the command

python3 main.py synthesize --model=./model/final_model.pkl --folder=./input

Pre-trained models

Strong Supervision

SynthText(CRAFT Model) - download here
SynthText(ResNet-UNet Model) - comming
Original Model by authors - download here

Weak Supervision

Datapile - In Progress

How to train the model from scratch

Strong Supervision on Synthetic dataset

Download the pre-trained model on Synthetic dataset at here
Otherwise if you want to train from scratch
Download my generated Japanese SynthText dataset at here
Run the command

python3 main.py train_synth

To test your model on SynthText, Run the command

python3 main.py test_synth --model /path/to/model

Weak Supervision

First Pre-Process your dataset

The assumed structure of the dataset is

.
├── generated (This folder will contain the weak-supervision intermediate targets)
├── train
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
│   └── ...
│   └── train_gt.json (This can be generated using the pre_process function described below)
├── test
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
│   └── ...
│   └── test_gt.json (This can be generated using the pre_process function described below)

First convert datapile dataset to OCR only format using datapile_to_onmt.py script
To generate the json files for Datapile

In config.py change the corresponding values

'datapile': {
    'train': {
        'target_json_name': 'train_gt.json',
        'base_path': './input/datapile/train/',
    },
    'test': {
        'target_json_name': 'test_gt.json',
        'base_path': './input/datapile/test/',
    }

Run the command:

python3 main.py pre_process --dataset datapile

Second Train your model based on weak-supervision

Run the command

python3 main.py weak_supervision --model /path/to/strong/supervision/model --iterations <num_of_iterations(20)>

This will train the weak supervision model for the number of iterations you specified

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
input		input
model		model
src		src
train_synth		train_synth
train_weak_supervision		train_weak_supervision
.gitignore		.gitignore
ReadMe.md		ReadMe.md
config.py		config.py
environment.yml		environment.yml
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh

lamhoangtung/CRAFT-Japanese

Folders and files

Latest commit

History

Repository files navigation

Re-Implementing CRAFT-Character Region Awareness for Text Detection

Objective

Clone the repository

Option 1: Conda Environment Installation

Option 2: Pip Installation

Running on custom images

Pre-trained models

Strong Supervision

Weak Supervision

How to train the model from scratch

Strong Supervision on Synthetic dataset

Weak Supervision

First Pre-Process your dataset

Second Train your model based on weak-supervision

About

Resources

Stars

Watchers

Forks

Languages