Focused on Japanese Text
- Reproduce weak-supervision training as mentioned in the paper https://arxiv.org/pdf/1904.01941.pdf
- Generate character bbox on all Datapile's data sets.
git clone https://github.com/autonise/CRAFT-Remade.git
cd CRAFT-Remade
conda env create -f environment.yml
conda activate craft
pip3 install -r requirements.txt
- Put the images inside a folder.
- Get a pre-trained model from the pre-trained model list (Currently only strong supervision using SYNTH-Text available)
- Run the command
python3 main.py synthesize --model=./model/final_model.pkl --folder=./input
- SynthText(CRAFT Model) - download here
- SynthText(ResNet-UNet Model) - comming
- Original Model by authors - download here
- Datapile - In Progress
- Download the pre-trained model on Synthetic dataset at here
- Otherwise if you want to train from scratch
- Download my generated Japanese SynthText dataset at here
- Run the command
python3 main.py train_synth
- To test your model on SynthText, Run the command
python3 main.py test_synth --model /path/to/model
- The assumed structure of the dataset is
.
├── generated (This folder will contain the weak-supervision intermediate targets)
├── train
│ ├── img_1.jpg
│ ├── img_2.jpg
│ ├── img_3.jpg
│ ├── img_4.jpg
│ └── img_5.jpg
│ └── ...
│ └── train_gt.json (This can be generated using the pre_process function described below)
├── test
│ ├── img_1.jpg
│ ├── img_2.jpg
│ ├── img_3.jpg
│ ├── img_4.jpg
│ └── img_5.jpg
│ └── ...
│ └── test_gt.json (This can be generated using the pre_process function described below)
- First convert datapile dataset to OCR only format using datapile_to_onmt.py script
- To generate the json files for Datapile
In config.py change the corresponding values
'datapile': {
'train': {
'target_json_name': 'train_gt.json',
'base_path': './input/datapile/train/',
},
'test': {
'target_json_name': 'test_gt.json',
'base_path': './input/datapile/test/',
}
- Run the command:
python3 main.py pre_process --dataset datapile
- Run the command
python3 main.py weak_supervision --model /path/to/strong/supervision/model --iterations <num_of_iterations(20)>
- This will train the weak supervision model for the number of iterations you specified