# 손글씨 인식 application
* 영문 손글씨 이미지 파일을 업로드
* 사용자가 캔버스에 손글씨를 쓸 수 있다.
* 업로드한 이미지와 직접 쓴 글씨를 텍스트로 변환해 받을 수 있다.

In [14]:
!mkdir examples
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello_cursive.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Red.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/sentence.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/i_love_you.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/merrychristmas.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Rock.png
!cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Bob.png

mkdir: `examples' 디렉터리를 만들 수 없습니다: 파일이 있습니다
--2024-09-23 16:48:01--  https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello.png
github.com (github.com) 해석 중... 20.200.245.247
다음으로 연결 중: github.com (github.com)|20.200.245.247|:443... 연결했습니다.
HTTP 요청을 보냈습니다. 응답 기다리는 중... 302 Found
위치: https://raw.githubusercontent.com/mrsyee/dl_apps/main/ocr/examples/Hello.png [따라감]
--2024-09-23 16:48:01--  https://raw.githubusercontent.com/mrsyee/dl_apps/main/ocr/examples/Hello.png
raw.githubusercontent.com (raw.githubusercontent.com) 해석 중... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
다음으로 연결 중: raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... 연결했습니다.
HTTP 요청을 보냈습니다. 응답 기다리는 중... 200 OK
길이: 42139 (41K) [image/png]
저장 위치: ‘Hello.png.4’


2024-09-23 16:48:02 (3.40 MB/s) - ‘Hello.png.4’ 저장함 [42139/42139]

--2024-09-23 16:48:02--  https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello_cursive.png
github.com (github.com) 해석 중... 20.200.245.247
다음으로 연결 중:

In [1]:
import os
import numpy as np
from PIL import Image
import gradio as gr
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

  from .autonotebook import tqdm as notebook_tqdm


In [16]:
import torch
print(torch.cuda.is_available())

False


## 이미지 파일 업로드 UI 구현하기

In [17]:
with gr.Blocks() as app: 
    gr.Markdown("# Handwritten Image OCR")
    image = gr.Image(label="Handwritten Image file")
    ouput = gr.Textbox(label = "Output Box")
    convert_btn = gr.Button("Convert")
    

IMPORTANT: You are using gradio version 3.40.0, however version 4.29.0 is available, please upgrade.
--------


In [18]:
app.launch(inline=False, share=True)

Running on local URL:  http://127.0.0.1:7870
Running on public URL: https://76b302f2fb4fcefecb.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [19]:
app.close()

Closing server running on port: 7870


# TrOCR 모델을 사용한 손글씨 인식기 구현하기

In [2]:
class TrOCRInferencer:
    def __init__(self):
        print("[info] init TrOCR Inferencer ")
        self.processor = TrOCRProcessor.from_pretrained("microsoft/trocr-large-handwritten")
        self.model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-large-handwritten")
        
    def inference(self, image):
        pixel_values = self.processor(images=image, return_tensors='pt').pixel_values
        generated_ids = self.model.generate(pixel_values)
        generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
        
        return generated_text

In [3]:
inferencer = TrOCRInferencer()

[info] init TrOCR Inferencer 


Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-large-handwritten and are newly initialized: ['encoder.pooler.dense.bias', 'encoder.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# 추론 기능 구현하기

In [4]:
def image_to_text(image):
    image = Image.fromarray(image).convert('RGB')
    text = inferencer.inference(image)
    return text

In [5]:
with gr.Blocks() as app: 
    gr.Markdown("# Handwritten Image OCR")
    image = gr.Image(label="Handwritten Image file")
    output = gr.Textbox(label = "Output Box")
    convert_btn = gr.Button("Convert")
    convert_btn.click( fn=image_to_text, inputs=image, outputs=output)
    
app.launch(inline=False, share=True)
    

Running on local URL:  http://127.0.0.1:7860
IMPORTANT: You are using gradio version 3.40.0, however version 4.29.0 is available, please upgrade.
--------
Running on public URL: https://52a27738ba7826a104.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [24]:
# pip install gradio==3.40.0




Note: you may need to restart the kernel to use updated packages.


# 캔버스 UI만들고 손글씨 인식하기

In [25]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    sketchpad = gr.Sketchpad(
        label = "Handwritten Sektchpad",
        shape = (600, 300),
        brush_radius = 5,
        invert_colors = False
    )
    output = gr.Textbox(label="Output Box")
    convert_btn = gr.Button("Convert")
    convert_btn.click(
        fn=image_to_text, inputs=sketchpad, outputs=output
    )
app.launch(inline=False, share = True)

Running on local URL:  http://127.0.0.1:7871
IMPORTANT: You are using gradio version 3.40.0, however version 4.29.0 is available, please upgrade.
--------
Running on public URL: https://3fb7908735b524f1e6.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [6]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    with gr.Tab("Image upload"):
        image = gr.Image(label="Handritten image file")
        output = gr.Textbox(label="Output Box")
        convert_btn = gr.Button("Convert")
        convert_btn.click(
            fn=image_to_text, inputs=image, outputs=output
        )
        gr.Markdown("## Image Examples")
        gr.Examples(
            examples=[
                os.path.join(os.getcwd(), "examples/Hello.png"),
                os.path.join(os.getcwd(), "examples/Hello_cursive.png"),
                os.path.join(os.getcwd(), "examples/Red.png"),
                os.path.join(os.getcwd(), "examples/sentence.png"),
                os.path.join(os.getcwd(), "examples/i_love_you.png"),
                os.path.join(os.getcwd(), "examples/merrychristmas.png"),
                os.path.join(os.getcwd(), "examples/Rock.png"),
                os.path.join(os.getcwd(), "examples/Bob.png"),
                ],
            inputs=image,
            outputs=output,
            fn=image_to_text
            )
    with gr.Tab("Drawing"):
        gr.Markdown("# Handwritten Image OCR")
        sketchpad = gr.Sketchpad(
            label = "Handwritten Sektchpad",
            shape=(600, 300),
            brush_radius=3,
            invert_colors=False,
            )
        output = gr.Textbox(label="Output Box")
        convert_btn = gr.Button("Convert")
        convert_btn.click(
            fn=image_to_text, inputs=sketchpad, outputs=output
            )
app.launch(inline=False, share=True)

Running on local URL:  http://127.0.0.1:7861
IMPORTANT: You are using gradio version 3.40.0, however version 4.29.0 is available, please upgrade.
--------
Running on public URL: https://9f97ad203ad7117148.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [7]:
app.close()

Closing server running on port: 7861


## 최종 app 구현하기

In [9]:
class TrOCRInferencer:
    def __init__(self):
        print("[info] init TrOCR Inferencer")
        self.processor = TrOCRProcessor.from_pretrained("microsoft/trocr-large-handwritten")
        self.model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-large-handwritten")
        
    def inference(self, image):
        pixel_values = self.processor(images=image, return_tensors='pt').pixel_values
        generated_ids = self.model.generate(pixel_values)
        generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

        return generated_text

inferencer = TrOCRInferencer()

def image_to_text(image):
    image = Image.fromarray(image).convert('RGB')
    text = inferencer.inference(image)
    return text
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")

#첫 번째 탭 : 이미지 업로드 방식
    with gr.Tab("Image upload"):
        image = gr.Image(label="Handritten image file")
        output = gr.Textbox(label="Output Box")
        convert_btn = gr.Button("Convert")
        convert_btn.click(
            fn=image_to_text, inputs=image, outputs=output
        )
        # 예시 이미지 제공
        gr.Markdown("## Image Examples")
        gr.Examples(
            examples=[
                os.path.join(os.getcwd(), "examples/Hello.png"),
                os.path.join(os.getcwd(), "examples/Hello_cursive.png"),
                os.path.join(os.getcwd(), "examples/Red.png"),
                os.path.join(os.getcwd(), "examples/sentence.png"),
                os.path.join(os.getcwd(), "examples/i_love_you.png"),
                os.path.join(os.getcwd(), "examples/merrychristmas.png"),
                os.path.join(os.getcwd(), "examples/Rock.png"),
                os.path.join(os.getcwd(), "examples/Bob.png"),
                ],
            inputs=image,
            outputs=output,
            fn=image_to_text
            )

#두 번째 탭
    with gr.Tab("Drawing"):
        gr.Markdown("# Handwritten Image OCR")
        sketchpad = gr.Sketchpad(
            label = "Handwritten Sektchpad",
            shape=(600, 300),
            brush_radius=3,
            invert_colors=False,
            )
        output = gr.Textbox(label="Output Box")
        convert_btn = gr.Button("Convert")
        convert_btn.click(
            fn=image_to_text, inputs=sketchpad, outputs=output
            )
app.launch(inline=False, share=True)

[info] init TrOCR Inferencer


Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-large-handwritten and are newly initialized: ['encoder.pooler.dense.bias', 'encoder.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Running on local URL:  http://127.0.0.1:7862
IMPORTANT: You are using gradio version 3.40.0, however version 4.29.0 is available, please upgrade.
--------
Running on public URL: https://3956aa5aece97ee83e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




