# Using OCR to read players numbers

In this notebook:
1. I use EasyOCR package in an attempt at reading the number on the players jerseys

This code was inspired on [this](http://https://www.kaggle.com/jinssaa/jersey-number-detection-using-ocr) great kernel

The first thing is to install EasyOCR with `pip install`

In [None]:
! pip install -q easyocr

Folowed by importing some basic libraries

In [None]:
import cv2
import numpy as np
import pandas as pd
import easyocr
import matplotlib.pyplot as plt

from tqdm.notebook import tqdm

from glob import glob
from random import sample
from PIL import Image, ImageFont, ImageDraw, ImageEnhance

from pathlib import Path

I also loaded some defaults like the font I will be using and the color for the overlay text

In [None]:
FONT = ImageFont.truetype("../input/arial-font/arial.ttf", 15)
GREEN = (57,255,20)

To use EasyOCR, first you need to initialize the model. This will load (or download it too if needed). You can pass the language an if it should be loaded into the GPU or not (and some other paramters too)

In [None]:
reader = easyocr.Reader(['en'], gpu=False)

To apply the OCR model to an image, it is as easy as a single line

In [None]:
file = '../input/nfl-helmet-safety-cropped-jerseys-dataset-png/57583_000082_Sideline_342_H36.png'
reader.readtext(file, allowlist ='0123456789')

But this didn't work... The reason why is that our crop is too small. A quick-n-dirty hack is to re-scale the image

In [None]:
img = Image.open(file)
img

In [None]:
img = Image.open(file).resize((128,128))
img

In [None]:
reader.readtext(np.array(img), allowlist ='0123456789')

The model now reads the number '36' (second argument). The values `readtext` returns are:
1. Bounding box
1. Label
1. Confidence

## Runing against 1k images

Now that we got the basics covered, we can run this trough a bunch of images. The folowing code do:
1. Get all the files inside my custom cropped dataset
1. Make a new directory named `annotated`
1. Loop trough the first N pictures and:
    1. Try to read any number on the image
    1. If it finds anything it writes the bbox, label and confidence over the image and save it inside `annotated` folder
1. Zip the folder and delete all files

With that, you can simply download the zipped folder and see the final results

In [None]:
FILES = glob('../input/nfl-helmet-safety-cropped-jerseys-dataset-png/*.png')

In [None]:
!mkdir annotated

In [None]:
N = 1000
correct_predictions = []
conf_th = 0.01
save = False
for file in tqdm(FILES[:N]):
    file = Path(file)
    img = Image.open(file).resize((128,128))
    draw = ImageDraw.Draw(img)
    for bbox, label, conf in reader.readtext(np.array(img), allowlist ='0123456789'):
        correct = file.stem.split('_')[-1][1:] == label
        correct_predictions.append(correct)
        if conf > conf_th:
            save = True
            draw.rectangle((tuple(bbox[0]), tuple(bbox[2])), outline = GREEN, width = 2)
            draw.text(((bbox[0][0] + bbox[1][0])/2, bbox[0][1] - 2), f'{label}({conf:.2f})', anchor="ms", font=FONT, fill = GREEN)
    if save:
        img.save(Path('./annotated')/file.name)
        save = False

As you can see, the accuracy is not very good. We get about 30% right when we try to predict anything at all. That number alone is not that bad. The problem is that the algoritm hardly predicts anything. Most of the time the numbers are too blury or just simply occluded for the model to do any prediction.

In [None]:
print(f'Average number of RELATIVE correct predictions {np.array(correct_predictions).mean()}')
print(f'Average number of TOTAL correct predictions {np.array(correct_predictions).sum()/N}')

## Thanks for reading

That would be all for this kernel. As you can see this technique is not strong enough to be used alone. My hopes of sharing this publicly is that maybe someone could improve this approach so we can actualy use it.

In [None]:
from zipfile import ZipFile
import shutil
import os
def zip_folder(folder, rm_original = True):
    # iterate over all the files in directory
    for folderName, subfolders, filenames in os.walk(folder):
        # create a ZipFile object
        with ZipFile(folderName.split('/')[-1] + '.zip', 'w') as zipObj:
            for filename in filenames:
                # create complete filepath of file in directory
                filePath = os.path.join(folderName, filename)
                # add file to zip
                zipObj.write(filePath, os.path.basename(filePath))
                # delete the file to open space
                if rm_original:
                    os.remove(filePath)
    if rm_original:
        shutil.rmtree(folder)

In [None]:
zip_folder('annotated')