<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.5/doc/PaddleOCR_log.png">

<!--- @wandbcode{paddleocr} -->

<img src="https://i.imgur.com/gb6B4ig.png" width="400" alt="Weights & Biases" />

# Train and Debug Your OCR Models using PaddleOCR and Weights & Biases 🪄🐝

This notebook talks about how you can use W&B with PaddleOCR to track training metrics and log model checkpoints for all your OCR needs!

To use the W&B logger with the PaddleOCR training script just add the following at the bottom of your `config.yml` file.

```
wandb:
    project: CoolOCR
    entity: my_team
    name: MyOCRModel
```

To log the metrics and checkpoints to W&B during training, the wandb client now has a direct integration into PaddleOCR. Using wandb for logging automatically adds all the metrics to your W&B dashboard, saves the models at every evaluation step, tags the best model and adds appropriate metadata for the saved model. An example dashboard is available [here](https://wandb.ai/manan-goel/text_detection).

## Setup 🖥

We begin by cloning the PaddleOCR library and installing the the package.

In [None]:
# %%shell
# git clone https://github.com/PaddlePaddle/PaddleOCR
# pip install paddlepaddle-gpu pyclipper attrdict -qqq
# cd PaddleOCR
# pip install -e .

In [1]:
!pip install paddlepaddle-gpu pyclipper attrdict -qqq

In [3]:
# !cp -r /content/PaddleOCR /content/drive/MyDrive/MyOCR-gpu

## Training 🏋️‍♀️

PaddleOCR comes with a huge array of pre-implemented models involved in the OCR pipeline. For this tutorial we will be looking at the text detection models.

### Downloading Training and Validation Data 💾

We will use the ICDAR2015 dataset available [here](https://rrc.cvc.uab.es/?ch=4&com=downloads). The data has been logged as W&B artifacts for ease of use.

### Downloading pretrained weights📈

In [None]:
# !wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams

### **Untar pretrained model**

In [None]:
# cd ./pretrain_models/

/content/drive/.shortcut-targets-by-id/1cSshdYUZTZkYNN2CZQLwNyiRaY41sdpH/PaddleOCR/pretrain_models


In [None]:
# !tar -xf rec_r45_abinet_train.tar

### Setup the config.yml file to use W&B🛠

In [None]:
# import yaml

# with open("configs/det/det_r50_vd_sast_icdar15.yml", "r") as f:
#     config = yaml.safe_load(f)
# config.update({
#     'wandb': {
#         'project': 'text_detection_2'
#     }
# })
# config['Global'].update({
#     'epoch_num': 5,
#     'eval_batch_step': [0, 1000],
#     'calc_metric_during_train': True
# })

# with open("configs/det/det_r50_db++_icdar15.yml", "w") as f:
#     yaml.safe_dump(config, f)

### Train your Model 🏋️‍♀️

The following command will finetune the pretrained MobileNetV3 on the ICDAR2015 dataset while logging all training and validation metrics to a W&B dashboard.

In [5]:
!python -m wget 'http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb'

# !sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb

Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\ProgramData\miniconda3\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "d:\training\vn_paddleocr\lib\site-packages\wget.py", line 568, in <module>
    filename = download(args[0], out=options.output)
  File "d:\training\vn_paddleocr\lib\site-packages\wget.py", line 526, in download
    (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
  File "C:\ProgramData\miniconda3\lib\urllib\request.py", line 241, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "C:\ProgramData\miniconda3\lib\urllib\request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "C:\ProgramData\miniconda3\lib\urllib\request.py", line 503, in open
    req = Request(fullurl, data)
  File "C:\ProgramData\miniconda3\lib\urllib\request.py", line 322, in __init__


In [5]:
!pip install lmdb rapidfuzz visualdl pyclipper

Collecting lmdb
  Downloading lmdb-1.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (299 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m299.2/299.2 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting rapidfuzz
  Downloading rapidfuzz-3.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m55.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting visualdl
  Downloading visualdl-2.5.3-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m109.9 MB/s[0m eta [36m0:00:00[0m
Collecting bce-python-sdk (from visualdl)
  Downloading bce_python_sdk-0.8.97-py3-none-any.whl (241 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m241.5/241.5 kB[0m [31m31.7 MB/s[0m eta [36m0:00:00[0m
Collecting Flask-Babel>=3.0.0 (from visualdl)
  Downloading flask_babel-4.0.0-py3-none-any.whl (9

In [None]:
!python3 tools/train.py -c configs/det/det_r50_db++_icdar15.yml  \
         -o Global.pretrained_model=./pretrain_models/ResNet50_dcn_asf_synthtext_pretrained

### Evaluate text detection model

In [None]:
!python3 tools/eval.py -c configs/det/det_r50_db++_icdar15.yml  -o Global.checkpoints="./artifacts/model-txx4644w:v6/model_ckpt" PostProcess.box_thresh=0.1 PostProcess.unclip_ratio=1.5 Eval.dataset.data_dir='./train_data/data_ocr_doc/test_dir' Eval.dataset.label_file_list='./train_data/data_ocr_doc/test_label.txt'

### Perform detection on test images

In [None]:
!python3 tools/infer_det.py -c configs/det/det_r50_db++_icdar15.yml -o Global.infer_img='./doc/imgs_en/' Global.pretrained_model="./output/det_r50_icdar15/latest"

### Train recognition model

### **Downloading pretrained model for Recognition**

In [None]:
!wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/PP-OCRv4/english/en_PP-OCRv4_rec_train.tar

--2023-10-31 02:42:30--  https://paddleocr.bj.bcebos.com/PP-OCRv4/english/en_PP-OCRv4_rec_train.tar
Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 103.235.46.61, 2409:8c04:1001:1002:0:ff:b001:368a
Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|103.235.46.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 190740480 (182M) [application/x-tar]
Saving to: ‘./pretrain_models/en_PP-OCRv4_rec_train.tar’


2023-10-31 02:42:35 (35.4 MB/s) - ‘./pretrain_models/en_PP-OCRv4_rec_train.tar’ saved [190740480/190740480]



**Untar pretrained model**

In [None]:
cd /content/drive/MyDrive/PaddleOCR/pretrain_models

/content/drive/.shortcut-targets-by-id/1cSshdYUZTZkYNN2CZQLwNyiRaY41sdpH/PaddleOCR/pretrain_models


In [None]:
!tar -xf en_PP-OCRv4_rec_train.tar

In [None]:
cd /content/drive/MyDrive/PaddleOCR

/content/drive/.shortcut-targets-by-id/1cSshdYUZTZkYNN2CZQLwNyiRaY41sdpH/PaddleOCR


### Train recognition model

In [12]:
import yaml

with open("configs/rec/rec_r34_vd_none_bilstm_ctc.yml", "r") as f:
    config = yaml.safe_load(f)

config["Optimizer"].update({"regularizer": {"factor": 1e-4, "name": "L2"}})

config["Optimizer"].update({"lr": {"learning_rate": 5e-5}})

config["Train"].update(
    {
        "loader": {
            "batch_size_per_card": 32,
            "drop_last": True,
            "num_workers": 8,
            "shuffle": True,
        }
    }
)

config["Eval"].update(
    {
        "loader": {
            "batch_size_per_card": 8,
            "drop_last": False,
            "num_workers": 6,
            "shuffle": False,
        }
    }
)
# config['Optimizer']['lr']['learning_rate'] = 5e-5

# config['Train']['loader']['batch_size_per_card'] = 16
# config['Eval']['loader']['batch_size_per_card'] = 8

# modified_config_path = './configs/rec/rec_r34_vd_none_bilstm_ctc_vie_modified.yml'

# with open(modified_config_path, 'w') as file:
#   yaml.dump(config, file)
with open("configs/rec/rec_r34_vd_none_bilstm_ctc.yml", "w") as f:
    yaml.safe_dump(config, f)

In [None]:
!python tools/train.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml 
                        # -o Global.pretrained_model='./pretrain_models/rec_r34_vd_none_bilstm_ctc_v2.0_train/best_accuracy' \
                        # Global.checkpoints=./output/rec/r34_vd_none_bilstm_ctc_vie_v2/latest

In [None]:
!python3 tools/train.py -c configs/rec/PP-OCRv4/en_PP-OCRv4_vie_rec.yml \
                        -o Global.pretrained_model=./pretrain_models/en_PP-OCRv4_rec_train/best_accuracy \

In [None]:
# GPU evaluation
!python3 tools/eval.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml \
                       -o Global.pretrained_model='./output/rec/r34_vd_none_bilstm_ctc_vie/latest' \
                       Eval.dataset.data_dir='./train_data/2nd_phase/rec_data/Vietnamese' \
                       Eval.dataset.label_file_list='./train_data/2nd_phase/rec_data/output.txt' \
                      #  Eval.dataset.RecResizeImg.image_shape = [3, 32, 1024]

### **Make predictions with model**

In [12]:
# The configuration file used for prediction must match the training
!python3 tools/infer_rec.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml -o Global.pretrained_model='./output/rec/r34_vd_none_bilstm_ctc_vie_v2/latest' Global.infer_img=train_data/2nd_phase/rec_data/double_word_data/argument_data_4/978.jpg

[2023/11/27 04:07:12] ppocr INFO: Architecture : 
[2023/11/27 04:07:12] ppocr INFO:     Backbone : 
[2023/11/27 04:07:12] ppocr INFO:         layers : 34
[2023/11/27 04:07:12] ppocr INFO:         name : ResNet
[2023/11/27 04:07:12] ppocr INFO:     Head : 
[2023/11/27 04:07:12] ppocr INFO:         fc_decay : 0
[2023/11/27 04:07:12] ppocr INFO:         name : CTCHead
[2023/11/27 04:07:12] ppocr INFO:     Neck : 
[2023/11/27 04:07:12] ppocr INFO:         encoder_type : rnn
[2023/11/27 04:07:12] ppocr INFO:         hidden_size : 256
[2023/11/27 04:07:12] ppocr INFO:         name : SequenceEncoder
[2023/11/27 04:07:12] ppocr INFO:     Transform : None
[2023/11/27 04:07:12] ppocr INFO:     algorithm : CRNN
[2023/11/27 04:07:12] ppocr INFO:     model_type : rec
[2023/11/27 04:07:12] ppocr INFO: Eval : 
[2023/11/27 04:07:12] ppocr INFO:     dataset : 
[2023/11/27 04:07:12] ppocr INFO:         data_dir : ./train_data/2nd_phase/rec_data/Vietnamese
[2023/11/27 04:07:12] ppocr INFO:         label_

In [None]:
!python3 tools/infer_rec.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.pretrained_model='./output/v3_en_mobile/latest' Global.infer_img=train_data/rec_data_final/

### **Export model**

In [None]:
!python3 tools/export_model.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml -o Global.pretrained_model="./output/rec/r34_vd_none_bilstm_ctc_vie/best_accuracy"  Global.save_inference_dir=./inference/crnn_vie

W1031 06:06:17.083590 50008 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.0, Runtime API Version: 11.8
W1031 06:06:17.084780 50008 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
[2023/10/31 06:06:19] ppocr INFO: resume from ./output/rec/r34_vd_none_bilstm_ctc_vie/latest
I1031 06:06:27.363452 50008 interpretercore.cc:237] New Executor is Running.
[2023/10/31 06:06:34] ppocr INFO: inference model is saved to ./inference/crnn_vie/inference


In [None]:
!python3 tools/export_model.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.pretrained_model="./output/v3_en_mobile/best_accuracy" Global.save_inference_dir=./inference/en_rec_pprocr_v3

In [None]:
!python3 tools/infer/predict_det.py \
            --image_dir="./train_data/61bb9a7943343e03bb9fcd1b_documents-product-template-software.png" \
            --det_model_dir="./inference/DB++/" \
            --det_algorithm="DB++"  \
            --det_db_box_thresh=0.1 \
            --det_db_thresh=0.1

In [None]:
!python3 tools/infer/predict_rec.py --image_dir="./train_data/kolapa_challange/images/10/" \
                                    --rec_model_dir="./inference/crnn_vie/" \
                                    --rec_image_shape="3,32,100" \
                                    --rec_char_dict_path="./ppocr/utils/dict/custom.txt" \
                                    --use_space_char=True

In [None]:
!python3 tools/infer/predict_rec.py --image_dir="./train_data/rec_data_final/20230420_000001.jpg"  \
                                    --rec_model_dir="./inference/en_rec_pprocr_v3/"  \
                                    --rec_image_shape="3,48,320"  \
                                    --rec_char_dict_path="./ppocr/utils/en_dict.txt"

[2023/10/13 01:42:02] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
[2023/10/13 01:42:04] ppocr INFO: Predicts of ./train_data/rec_data_final/20230420_000001.jpg:('"IieCDESs"', 0.7385871410369873)


### Predict System

In [2]:
!python ./tools/infer/predict_system.py \
           --image_dir="D:\training\PaddleOCR\page-1.png"  \
           --use_gpu=True \
           --det_algorithm="DB++"  \
           --det_model_dir="./inference/DB++/"  \
           --det_db_thresh=0.1  \
           --det_db_box_thresh=0.1  \
           --det_db_unclip_ratio=2.5  \
           --rec_model_dir="./inference/crnn_vie/"  \
           --rec_algorithm="CRNN"  \
           --rec_image_shape="3,32,100"  \
           --rec_char_dict_path="./ppocr/utils/dict/custom.txt" \
           --vis_font_path="./ppocr/utils/font-times-new-roman.ttf" \
           --use_space_char=True
          #  --drop_score=0.5  \

[2023/11/30 16:11:07] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
[2023/11/30 16:11:09] ppocr DEBUG: dt_boxes num : 28, elapsed : 2.495605230331421
[2023/11/30 16:11:11] ppocr DEBUG: rec_res num  : 28, elapsed : 1.397994041442871
[2023/11/30 16:11:11] ppocr DEBUG: 0  Predict time of D:\training\PaddleOCR\page-1.png: 3.950s
[2023/11/30 16:11:11] ppocr DEBUG: "HAI ĐỨA TR - THẠCH LAM", 0.971
[2023/11/30 16:11:11] ppocr DEBUG: "Phân tích tâm trạng hai chị em Liên khi đọi tàu", 0.981
[2023/11/30 16:11:11] ppocr DEBUG: "Một truyện ngn hay theo quan niệm truyền thống phải có cốt truyện đc biệt được tạo ra", 0.992
[2023/11/30 16:11:11] ppocr DEBUG: "bởi nhng tình huống éo le đy kịch tính. Không đi theo lối mòn đó, truyện -"Hai đứa tr/ in", 0.980
[2023/11/30 16:11:11] ppocr DEBUG: trOng tp Nãng trOng Vườn Của Thạch Lam Chỉ là mỘt Chuyện tâm tình nh nhẹ nh

# Test

In [None]:
import subprocess

def evaluate_detection(model_path, data_test_path, label_test_path):
  """
  Inputs:
    model_path (str): Path lead to model checkpoint
    data_test_path (str): Path lead to test data directory
    label_test_path (str): Path lead to test label text file
  """
  command = "python3 tools/eval.py"
  det_config = "configs/det/det_r50_db++_icdar15.yml"
  cmd_string = f"{command} -c {det_config} -o Global.checkpoints='{model_path}' Eval.dataset.data_dir='{data_test_path}' Eval.dataset.label_file_list='{label_test_path}'"
  result = subprocess.run(cmd_string , shell= True, text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

  if result.returncode == 0:
    print("Detection Model Evaluation Result:", end='\n')
    print(result.stdout)
  else:
    print("Error in Detection Model Evaluation:", end='\n')
    print(result.stderr)

def evaluate_recognition(model_path, data_test_path, label_test_path):
  command = "python3 tools/eval.py"
  rec_config = "configs/rec/rec_r50_fpn_srn.yml"
  cmd_string = f"{command} -c {rec_config} -o Global.checkpoints='{model_path}' Eval.dataset.data_dir='{data_test_path}' Eval.dataset.label_file_list='{label_test_path}'"
  result = subprocess.run(cmd_string, shell=True, text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

  if result.returncode == 0:
    print("Recognition Model Evaluation Result:", end='\n')
    print(result.stdout)
  else:
    print("Error in Recognition Model Evaluation:", end='\n')
    print(result.stderr)

model_path = "./artifacts/model-txx4644w:v6/model_ckpt"
data_test_path = "./train_data/data_ocr_doc/"
label_test_path = "./train_data/label_final4eval/det_label_final.txt"
rec_model_path = "./output/rec/srn_new/best_accuracy"
rec_data_test_path = "./train_data/rec_data_final/"
rec_label_test_path = "./train_data/label_final4eval/rec_label_eval.txt"

# evaluate_detection(model_path, data_test_path, label_test_path)
evaluate_recognition(rec_model_path, rec_data_test_path, rec_label_test_path)

In [None]:
!python3 tools/eval.py -c configs/det/det_r50_db++_icdar15.yml -o Global.checkpoints='./artifacts/model-txx4644w:v6/model_ckpt' Eval.dataset.data_dir='./train_data/data_ocr_doc/' Eval.dataset.label_file_list='./train_data/label_final4eval/det_label_final.txt'

### Evaluate functions

In [None]:
import subprocess

class ModelEvaluation:
  def __init__(self, model_path, data_test_path, label_test_path):
    self.model_path = model_path
    self.data_test_path = data_test_path
    self.label_test_path = label_test_path

  def _run_evaluation(self, command, config):
    cmd_string = f"{command} -c {config} -o Global.checkpoints='{self.model_path}' Eval.dataset.data_dir='{self.data_test_path}' Eval.dataset.label_file_test='{self.label_test_path}'"
    result = subprocess.run(cmd_string, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    if result.returncode == 0:
      print("Model Evaluation Result:")
      print(result.stdout)
    else:
      print("Error in Model Evaluation:")
      print(result.stderr)

  def evaluate_detection(self):
    command = "python3 tools/eval.py"
    config = "configs/det/det_r50_db++_icdar15.yml"
    self._run_evaluation(command, config)

  def evaluate_recognition(self):
    command = "python3 tools/eval.py"
    config = "configs/rec/rec_r50_fpn_srn.yml"
    self._run_evaluation(command, config)


# det_model_path = "./artifacts/model-txx4644w:v6/model_ckpt"
# det_data_test_path = "./train_data/data_ocr_doc/"
# det_label_test_path = "./train_data/label_final4eval/det_label_final.txt"

# det_evaluator = ModelEvaluation(det_model_path, det_data_test_path, det_label_test_path)
# det_evaluator.evaluate_detection()

rec_model_path = "./output/rec/srn_new/best_accuracy"
rec_data_test_path = "./train_data/rec_data_final"
rec_label_test_path = "./train_data/label_final4eval/rec_label_eval.txt"

rec_evaluator = ModelEvaluation(rec_model_path, rec_data_test_path, rec_label_test_path)
rec_evaluator.evaluate_recognition()

Error in Model Evaluation:
b'W1005 09:22:15.793431 10536 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.0, Runtime API Version: 11.8\nW1005 09:22:15.794628 10536 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.\n\reval model::   0%|          | 0/32 [00:00<?, ?it/s]Exception in thread Thread-2 (_thread_loop):\nTraceback (most recent call last):\n  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\nTraceback (most recent call last):\n  File "/content/drive/.shortcut-targets-by-id/1cSshdYUZTZkYNN2CZQLwNyiRaY41sdpH/PaddleOCR/tools/eval.py", line 146, in <module>\n    self.run()\n  File "/usr/lib/python3.10/threading.py", line 953, in run\n    self._target(*self._args, **self._kwargs)\n  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 604, in _thread_loop\n    batch = self._get_data()\n  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.p