# 손글씨 인식 Application
Colab 환경에서 손글씨 인식 애플리케이션을 만들어봅시다. 애플리케이션 사용자의 유스케이스는 아래와 같습니다.
- 사용자는 손글씨 이미지 파일을 업로드할 수 있다.
- 사용자는 캔버스에 손글씨를 쓸 수 있다.
- 사용자는 텍스트 결과를 확인할 수 있다.

## 패키지 및 예제 데이터 다운로드하기
python package들을 설치합니다. 예제로 사용할 이미지들도 다운로드 받습니다. Colab에서 실행하지 않는 경우 이 셀은 실행하지 않습니다.

In [1]:
!wget https://raw.githubusercontent.com/mrsyee/dl_apps/main/ocr/requirements-colab.txt
!pip install -r requirements-colab.txt

!mkdir examples
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello_cursive.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Red.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/sentence.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/i_love_you.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/merrychristmas.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Rock.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Bob.png

--2024-11-06 08:24:44--  https://raw.githubusercontent.com/mrsyee/dl_apps/main/ocr/requirements-colab.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 123 [text/plain]
Saving to: ‘requirements-colab.txt’


2024-11-06 08:24:45 (5.47 MB/s) - ‘requirements-colab.txt’ saved [123/123]

Collecting sentencepiece==0.1.97 (from -r requirements-colab.txt (line 2))
  Downloading sentencepiece-0.1.97-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting protobuf==3.20.0 (from -r requirements-colab.txt (line 3))
  Downloading protobuf-3.20.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (698 bytes)
Collecting gradio==3.40.0 (from -r requirements-colab.txt (line 6))
  Downloading gradio-3.40.0-py3-none-any.whl.met

--2024-11-06 08:25:17--  https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello.png
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/mrsyee/dl_apps/main/ocr/examples/Hello.png [following]
--2024-11-06 08:25:17--  https://raw.githubusercontent.com/mrsyee/dl_apps/main/ocr/examples/Hello.png
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42139 (41K) [image/png]
Saving to: ‘Hello.png’


2024-11-06 08:25:18 (36.4 MB/s) - ‘Hello.png’ saved [42139/42139]

--2024-11-06 08:25:18--  https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello_cursive.png
Resolving github.com (github.com)... 20

## 패키지 불러오기

In [2]:
import os

import gradio as gr
import numpy as np
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel



## 이미지 파일 업로드 UI 구현하기

In [3]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    image = gr.Image(label="Handwritten image file")
    output = gr.Textbox(label="Output Box")
    convert_btn = gr.Button("Convert")

In [4]:
app.launch(inline=False, share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://2a6062f11690d71580.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [5]:
app.close()

Closing server running on port: 7860


## TrOCR 추론기 클래스 구현하기
TrOCR 추론기 클래스는 TrOCR 모델 및 processor 초기화와 추론 작업을 수행하는 클래스입니다.

In [6]:
class TrOCRInferencer:
    def __init__(self):
        print("[INFO] Initialize TrOCR Inferencer.")
        self.processor = TrOCRProcessor.from_pretrained(
            "microsoft/trocr-base-handwritten"
        )
        self.model = VisionEncoderDecoderModel.from_pretrained(
            "microsoft/trocr-base-handwritten"
        )

    def inference(self, image: Image) -> str:
        """Inference using model.

        It is performed as a procedure of preprocessing - inference - postprocessing.
        """
        # preprocess
        pixel_values = self.processor(images=image, return_tensors="pt").pixel_values
        # inference
        generated_ids = self.model.generate(pixel_values)
        # postprocess
        generated_text = self.processor.batch_decode(
            generated_ids, skip_special_tokens=True
        )[0]

        return generated_text

In [7]:
inferencer = TrOCRInferencer()

[INFO] Initialize TrOCR Inferencer.


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/224 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/4.17k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-base-handwritten and are newly initialized: ['encoder.pooler.dense.bias', 'encoder.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## 추론 기능 구현하기

In [8]:
def image_to_text(image: np.ndarray) -> str:
    image = Image.fromarray(image).convert("RGB")
    text = inferencer.inference(image)
    return text

In [9]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    image = gr.Image(label="Handwritten image file")
    output = gr.Textbox(label="Output Box")
    convert_btn = gr.Button("Convert")
    convert_btn.click(
        fn=image_to_text, inputs=image, outputs=output
    )

app.launch(inline=False, share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
IMPORTANT: You are using gradio version 3.40.0, however version 4.44.1 is available, please upgrade.
--------
Running on public URL: https://b1c10c2f8ac00ac1fd.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [10]:
app.close()

Closing server running on port: 7860


## 캔버스 UI 구현하기

In [11]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    sketchpad = gr.Sketchpad(
        label="Handwritten Sketchpad",
        shape=(600, 192),
        brush_radius=2,
        invert_colors=False,
    )
    output = gr.Textbox(label="Output Box")
    convert_btn = gr.Button("Convert")
    convert_btn.click(
        fn=image_to_text, inputs=sketchpad, outputs=output
    )

app.launch(inline=False, share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
IMPORTANT: You are using gradio version 3.40.0, however version 4.44.1 is available, please upgrade.
--------
Running on public URL: https://823c7780d2457d393e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [12]:
app.close()

Closing server running on port: 7860


## 최종 App 구현하기

In [13]:
# Implement inferencer
class TrOCRInferencer:
    def __init__(self):
        print("[INFO] Initialize TrOCR Inferencer.")
        self.processor = TrOCRProcessor.from_pretrained(
            "microsoft/trocr-base-handwritten"
        )
        self.model = VisionEncoderDecoderModel.from_pretrained(
            "microsoft/trocr-base-handwritten"
        )

    def inference(self, image: Image) -> str:
        """Inference using model.

        It is performed as a procedure of preprocessing - inference - postprocessing.
        """
        # preprocess
        pixel_values = self.processor(images=image, return_tensors="pt").pixel_values
        # inference
        generated_ids = self.model.generate(pixel_values)
        # postprocess
        generated_text = self.processor.batch_decode(
            generated_ids, skip_special_tokens=True
        )[0]

        return generated_text

inferencer = TrOCRInferencer()


# Implement event function
def image_to_text(image: np.ndarray) -> str:
    image = Image.fromarray(image).convert("RGB")
    text = inferencer.inference(image)
    return text


# Implement app
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    with gr.Tab("Image upload"):
        image = gr.Image(label="Handwritten image file")
        output = gr.Textbox(label="Output Box")
        convert_btn = gr.Button("Convert")
        convert_btn.click(
            fn=image_to_text, inputs=image, outputs=output
        )

        gr.Markdown("## Image Examples")
        gr.Examples(
            examples=[
                os.path.join(os.getcwd(), "examples/Hello.png"),
                os.path.join(os.getcwd(), "examples/Hello_cursive.png"),
                os.path.join(os.getcwd(), "examples/Red.png"),
                os.path.join(os.getcwd(), "examples/sentence.png"),
                os.path.join(os.getcwd(), "examples/i_love_you.png"),
                os.path.join(os.getcwd(), "examples/merrychristmas.png"),
                os.path.join(os.getcwd(), "examples/Rock.png"),
                os.path.join(os.getcwd(), "examples/Bob.png"),
            ],
            inputs=image,
            outputs=output,
            fn=image_to_text,
        )

    with gr.Tab("Drawing"):
        sketchpad = gr.Sketchpad(
            label="Handwritten Sketchpad",
            shape=(600, 192),
            brush_radius=2,
            invert_colors=False,
        )
        output = gr.Textbox(label="Output Box")
        convert_btn = gr.Button("Convert")
        convert_btn.click(
            fn=image_to_text, inputs=sketchpad, outputs=output
        )

[INFO] Initialize TrOCR Inferencer.


Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-base-handwritten and are newly initialized: ['encoder.pooler.dense.bias', 'encoder.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [14]:
# App 실행
app.launch(inline=False, share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://cc508e6dfa23f4e5e2.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
app.close()