# **0. Setup**

## **Imports**

In [1]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
from utils import *
import sys

## **Pybraille**

Pybraille performs grade-1 Braille translation. Eventual goal is [grade-2 translation](https://www.brailletranslator.org).

Installation:
```
pip install pybraille

```

In [2]:
from pybraille import convertText

print(convertText("hello"))

print(convertText("filename.txt")) #eg: tests/sample.txt

⠓⠑⠇⠇⠕
⠋⠊⠇⠑⠝⠁⠍⠑⠲⠞⠭⠞


## **EasyOCR**

There are a number of competing packages which offer scene text detection out-of-the-box. EasyOCR has a simple setup process, and offers nice performance when GPU-accelerated (but not ideally fast — i.e., not suitable for real-time)

Installation
```
pip install git+https://github.com/jaidedai/easyocr.git
```
Note: using vanilla pip, it may cause a reinstall of PyTorch. Reinstall with appropriate CUDA version [here](https://pytorch.org).  
Check CUDA version using `nvcc --version`

In [3]:
import easyocr
reader = easyocr.Reader(['en'], gpu=True)

  from .autonotebook import tqdm as notebook_tqdm


## **PaddleOCR**

Offers multiple models, including pruned and quantized models for mobile deployment, which is exactly what we need.  
It comes with *layout analysis* feature (auto-detects title/picture/table/etc), could prove useful.

Installation
```
pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
pip install paddleocr
```
Note: Windows: you may need to install shapely (`pip install shapely`)

In [1]:
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(lang="en")  # need to run only once to download and load model into memory

[2022/05/27 14:39:41] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=True, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, image_dir=None, det_algorithm='DB', det_model_dir='C:\\Users\\hocbu/.paddleocr/whl\\det\\en\\en_PP-OCRv3_det_infer', det_limit_side_len=960, det_limit_type='max', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_box_type='quad', det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, det_fce_box_type='poly', rec_algorithm='SVTR_LCNet', rec_model_dir='C:\\Users\\hocbu/.paddleocr/whl\\rec\\en\\en_PP-OCRv3_rec_infer', rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_pa

### Installation Log (tells default model) (can delete later)

```download https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_ppocr_mobile_v2.0_det_infer.tar to C:\Users\hocbu/.paddleocr/whl\det\en\en_ppocr_mobile_v2.0_det_infer\en_ppocr_mobile_v2.0_det_infer.tar
100%|██████████| 3.16M/3.16M [00:10<00:00, 313kiB/s] 
download https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar to C:\Users\hocbu/.paddleocr/whl\rec\en\en_number_mobile_v2.0_rec_infer\en_number_mobile_v2.0_rec_infer.tar
100%|██████████| 2.70M/2.70M [00:09<00:00, 277kiB/s] 
download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to C:\Users\hocbu/.paddleocr/whl\cls\ch_ppocr_mobile_v2.0_cls_infer\ch_ppocr_mobile_v2.0_cls_infer.tar
100%|██████████| 1.45M/1.45M [00:08<00:00, 175kiB/s] [2022/04/10 17:03:35] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=True, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, image_dir=None, det_algorithm='DB', det_model_dir='C:\\Users\\hocbu/.paddleocr/whl\\det\\en\\en_ppocr_mobile_v2.0_det_infer', det_limit_side_len=960, det_limit_type='max', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_box_type='quad', det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, det_fce_box_type='poly', rec_algorithm='CRNN', rec_model_dir='C:\\Users\\hocbu/.paddleocr/whl\\rec\\en\\en_number_mobile_v2.0_rec_infer', rec_image_shape='3, 32, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='C:\\Users\\hocbu\\AppData\\Roaming\\Python\\Python39\\site-packages\\paddleocr\\ppocr\\utils\\en_dict.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='C:\\Users\\hocbu/.paddleocr/whl\\cls\\ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, output='./output', table_max_len=488, table_model_dir=None, table_char_type='en', table_char_dict_path=None, layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', layout_label_map=None, mode='structure', lang='en', det=True, rec=True, type='ocr', ocr_version='PP-OCRv2', structure_version='STRUCTURE')```

# **1. Single Image**

## Objectives
- Scene text detection
- Braille translation
- Text highlight and braille caption

### Read Image

In [5]:
folder = './demo_image/'
filename = 'demodemo2.jpg'

In [6]:
img = cv2.imread(folder+filename)
img_d, sf = imresize(img) # downscale, remember scaling factor
# resize를 하고 돌리면 integer coordinate가 나옴 (정확한 이유 불명)

img_ = img.copy()  # backup
img_d_ = img_d.copy() # backup

In [7]:
# visualize images
# print(img_d.shape)
# print(img.shape)
# plt.imshow(img)
# plt.imshow(img_d)

### Scene Text Detection (EasyOCR)

It may have a built-in resolution reduction (& recovery) feature. A little annoying to repeatedly scale points afterwards

In [8]:
result = reader.readtext(img_d)

아직 좀 느린데, quantization, treeshake 해볼까?  

[Medium 글](https://medium.com/swlh/ocr-engine-comparison-tesseract-vs-easyocr-729be893d3ae)에 따르면 0.07s 라는데, 아마 scene text detection은 그보다 느릴 듯. text detection 열고있어도 웹캠 윈도우 업데이트 되나?

### Scene Text Detection (PaddleOCR)

In [9]:
# result2 = ocr.ocr(img_d, cls=True)

### Highlight Text Regions
✅

### Text Region Selection
- make a selection, save it
  - there should be single selection. if multiple texts fit within tolerance, choose the closest one
- apply a stronger highlight

In [10]:
center = (int(img.shape[1]/2), int(img.shape[0]/2))
gaze_point = (999,999) # change this. possibly interactive. 

In [11]:
# matplotlib details
fontsize = 0.025
plt.figure(figsize=(15, 15))

# highlight
weak = np.zeros(shape=img.shape)
strong = np.zeros(shape=img.shape)

# selection
min_dist = 200
selection = None

# iterate through each text
for i, r in enumerate(result): 
        
    # debug
    print(r)

    # points
    pt1 = [int(n*sf) for n in r[0][0]]
    pt2 = [int(n*sf) for n in r[0][2]]

    # add highlight
    try: weak = add_mask(weak, pt1, pt2)
    except Exception: print('highlight error')

    # add caption
    try:
        br = convertText(r[1])
        plt.figtext(0.5, 0.065+fontsize-fontsize*2*i, r[1], ha="center", fontsize=18)
        plt.figtext(0.5, 0.065+0.005-fontsize*2*i, br, ha="center", fontsize=18, bbox={"facecolor":"orange", "alpha":0.5, "pad":1})
    except Exception: 
        plt.figtext(0.5, 0.065+fontsize-fontsize*2*i, r[1], ha="center", fontsize=18)
        plt.figtext(0.5, 0.065+0.005-fontsize*2*i, "(error: contains special symbols)", ha="center", fontsize=18, bbox={"facecolor":"orange", "alpha":0.5, "pad":1})

    # distance
    dist = distance([pt1, pt2], center)
    # print(dist, r[1])
    if dist < min_dist: 
        min_dist = dist
        selection = r

pt1, pt2 = [int(n*sf) for n in selection[0][0]], [int(n*sf) for n in selection[0][2]]
strong = add_mask(strong, pt1, pt2)
img = cv2.rectangle(img, pt1, pt2, (255,255,255), thickness=20) # selection outline
img = cv2.rectangle(img, center, center, (0,50,0), thickness=50) # visualize center
highlighted = apply_highlight(img, weak, strong)

print('Selected: ', selection[1])

plt.imshow(highlighted) # this line has to be in this cell.

([[90, 146], [156, 146], [156, 172], [90, 172]], 'AUDIO &', 0.9374154164120676)
([[101, 162], [193, 162], [193, 190], [101, 190]], 'RECORDING', 0.9990951954757775)
([[113, 179], [182, 179], [182, 206], [113, 206]], 'STUDiOS', 0.6264606800384376)
([[155.23076923076923, 149.15384615384616], [202.90531842243738, 154.3919196524136], [199.76923076923077, 176.84615384615384], [152.09468157756262, 170.6080803475864]], 'VIDEO', 0.9853656861861788)


TypeError: argument for rectangle() given by name ('thickness') and position (4)

<Figure size 1080x1080 with 0 Axes>

# **2. Interactive Video**

## Objetives
- Receive webcam stream ✅
    - DroidCam prototype
- Interactive selection 
    - Intuitive selection behavior: hysteresis 🆇
    - Braille output on bottom (or as tooltip) ✅
- Real-time optimization 

### Pygame 
- rationale: more flexible, powerful graphics than OpenCV GUI
  - possible addition of Dot Display module
  - possible addition of text presence animation (propagating/expanding dot wave)

In [113]:
import pygame
import cv2
import numpy as np
import time

In [114]:
def process(img, texts, sf):
    # selection
    min_dist = 200
    selection = None
    center = (int(img.shape[1]/2), int(img.shape[0]/2))


    # highlight mask
    weak = np.zeros(shape=img.shape)
    strong = np.zeros(shape=img.shape)

    # iterate through each text
    for i, r in enumerate(texts): 
        # print(r)

        # points
        pt1 = [int(n*sf) for n in r[0][0]]
        pt2 = [int(n*sf) for n in r[0][2]]

        # add highlight
        try: weak = add_mask(weak, pt1, pt2)
        except Exception: print('highlight error')

        # select
        dist = distance([pt1, pt2], center)
        if dist < min_dist: 
            min_dist = dist
            selection = r

    # selected text
    # print('Selected text: ', selection)
    if selection:
        pt1, pt2 = [int(n*sf) for n in selection[0][0]], [int(n*sf) for n in selection[0][2]]
        pt1, pt2 = tuple(pt1), tuple(pt2) # OpenCV's rectangle doesn't like coordinates given in list
        strong = add_mask(strong, pt1, pt2)
        img = cv2.rectangle(img, pt1, pt2, (255,255,255), 15) # outline

    # visualization
    img = cv2.rectangle(img, center, center, (0,50,0), 20) # center
    highlighted = apply_highlight(img, weak, strong)

    return highlighted, selection
    

In [115]:
pygame.init()
pygame.display.set_caption("Camera Braille Translation")
surface = pygame.display.set_mode([1280, 720])

# OpenCV
cap = cv2.VideoCapture(1)

# OpenCV FPS
fps = cap.get(cv2.CAP_PROP_FPS)
cap.set(cv2.CAP_PROP_FPS, 60)

# pygame
pygame.font.init()
main_font = pygame.font.SysFont('segoeuisymbol', 30) # for Braille
side_font = pygame.font.SysFont('segoeuisymbol', 15) # for Braille

background_color = (30,30,30)

# pygame FPS
clock = pygame.time.Clock()

initialized = False
stop = False


while not initialized:
    success, frame = cap.read()
    if success: break

while not stop:
    # frames per second 
    clock.tick()
    # profiling
    ts_base = time.time() # timestamp base

    # read cam
    success, frame = cap.read()
    if not success: print("not success")
    ts_read = time.time() - ts_base # time: reading from cam

    # scene text detection
    frame_d, sf = imresize(frame)
    texts = reader.readtext(frame_d)
    # print(texts)
    ts_ocr = time.time()-ts_base - ts_read # time: running scene text detection

    # ANNOTATION
    # frame processing
    frame, sel = process(frame, texts, sf)
    ts_hili = time.time()-ts_base - ts_ocr # time: highlighting
    # pre-pygame-processing
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame = np.rot90(np.fliplr(frame))

    # RENDERING
    # background
    surface.fill(background_color)
    # frame
    frame_surface = pygame.surfarray.make_surface(frame)
    surface.blit(frame_surface, (0,0))
    ts_game = time.time()-ts_base - ts_ocr

    # DASHBOARD
    # selected text (in English and Braille)
    try: 
        selected_surf = main_font.render(sel[1], False, (255,255,255))
        surface.blit(selected_surf, (30,550))

        br = convertText(sel[1])
        b = main_font.render(br, False, (255,255,255))
        surface.blit(b, (30,610))
        # print(sel[1], br)
    except TypeError as e: pass
    ts_text = time.time()-ts_base - ts_game
        
    # labels
    # selected
    selected_label_surf = side_font.render('English', False, (230,230,230))
    surface.blit(selected_label_surf, (30,535))
    # braille
    braille_label_surf = side_font.render('Grade I Braille', False, (230,230,230))
    surface.blit(braille_label_surf, (30,590))
    # framerate
    fps_text_surface = side_font.render('FPS:'+str(round(clock.get_fps(), 2)), False, (255,255,255))
    surface.blit(fps_text_surface, (950, 680))
    ts_label = time.time()-ts_base - ts_text

    # profiling
    ts_sum = sum([ts_read, ts_ocr, ts_hili, ts_game, ts_text, ts_label])
    percentages = [round(ts_read/ts_sum*100),round(ts_ocr/ts_sum*100), round(ts_hili/ts_sum*100), round(ts_game/ts_sum*100), round(ts_text/ts_sum*100), round(ts_label/ts_sum*100)] # make this mapped
    percentages_label = ['Reading from camera', 'Running OCR', 'Highlighting', 'Rendering frame', 'Rendering text', 'Rendering label']

    # render title
    profile_surf = main_font.render('Profiler', False, (255,255,255))
    surface.blit(profile_surf, (950,490))
    # render stats
    for index, percentage in enumerate(percentages):
        surface.blit(side_font.render(str(percentage)+'%   '+percentages_label[index], False, (255,255,255)), (950, 540+(index*20)))

    # pygame essentials
    pygame.display.flip()
    # end management
    for event in pygame.event.get():
        if event.type == pygame.QUIT: 
            pygame.quit()
            stop = True

    # except TypeError as te: 
    #     print(te) # occurs in the beginning, when capture has not finished initialization

cap.release()

# calculate mismatch between time.time based vs. pygame's estimated FPS
# so we can check if we're missing something from profiling

# https://stackoverflow.com/questions/59948996/how-to-use-webcam-as-a-screen-of-pygame

General Observation Notes
- Selection outline too thick
- Framerate is usable, but worsens as the amount of text increases

### Optimization

### Interactivity & UX
- DroidCam image size meter
- selection change event listener