## High level usage of Receipt Recognizer

I approached the problem by first detecting the text by a detector called craft, then recognized the letters by two models google's tesseract and a resnet + bilstm + attention model by kind of ensembling them. After that I trained a ner model for detecting the business names, dates and total amounts.

#### Brief explaination
I tried to write as modular as possible. I want to change it to factory design pattern where the models classes can be called seperately. The usage of it is fairly easy with few flaws which I am showing below. It starts with loading the pretrained model then it adds on to itself. I wanted to add a ner model on top of everything but because of the time constraints I couldn't however I will train a ner model with a prior repository of mine after resolving its bugs and share that in this repository.

#### Analysis part
Since this is image processing there is not much to do however I wrote a analysis part which has filters especially derivative filter which finds the horizontal and vertical edges. By applying max pooling to that it can be clearly seen that the part that has characters and not can be easily seperable. However since I will do ner after the seperable parts can be more than one line most of the time so doing ocr to that parts will give irrational sentence sequences which will effect ner model (I will use pre-trained large bert uncased model) that is why I used a big and a bit heavy detection model. However that model is fairly easy to understand and hasn't got much loops and dynamic parts which means it can be pruned easily. In addition to that it can be quantized and converted to intermediate representations like ONNX which will not only make model faster but also will save from its dependencies so that will make it deployable to any machine.

In [1]:
!pip install receiptrecognizer



In [2]:
import os
import cv2
from receiptrecognizer import ReceiptRecognizer as rr
from receiptrecognizer.utils import ImageUtils
from receiptrecognizer.models import Tesseract

In [3]:
# initialize the class with first model which is detection
model = rr.from_pretrained("craft_mlt_25k.pth")
# convert detector to eval mode
model.detector.eval()

INFO:receiptrecognizer.model_downloader:Downloding model to /home/kemalaraz/receipt_models from google drive


Downloading 1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ into /home/kemalaraz/receipt_models/craft_mlt_25k.pth... 
79.3 MiB          

INFO:receiptrecognizer.module:Model downloaded to /home/kemalaraz/receipt_models/craft_mlt_25k.pth


Done.


Craft(
  (backbone): VGG16(
    (slice1): Sequential(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (7): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (8): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (9): ReLU(inplace=True)
      (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (slice2): Sequential(
      (12): ReLU(inplace=True)
      (13): MaxPool2d(kernel_

In [4]:
test_path = "src/test_instances/raw_images" # path of the test image
results_path = "src/test_instances/results_detection" # path of the results

### One image example - if more than one put into folder this will also work
THE GOOGLE DRIVE DOWNLOAD LINK IS BROKEN FOR RECOGNITION MODEL SO AUTO DOWNLOAD WONT WORK PLEASE DOWNLOAD IT FROM 
https://drive.google.com/file/d/1b59rXuGGmKne1AuHnkgDzoYgKeETNMv9/view AND PUT UNDER /home/user/receipt_models FOR UBUNTU AND C:\USERS\username\receipt_models FOR WINDOWS.

In [9]:
all_results = []
for k, image_path in enumerate(os.listdir(test_path)): # TODO: Can be changed with imutils paths.listimages(path)
    image_path = os.path.join(test_path, image_path)
    image = ImageUtils.loadImage(image_path)

    # Get the outcome of the detection model
    bboxes, polys, heatmap = model.detection(image)

    # crop images according to bboxes
    cropped_images = ImageUtils.crop_image(image[:,:,::-1], bboxes)

    image_results_path = os.path.join(results_path, os.path.basename(image_path).split("-")[0])
    if not os.path.exists(image_results_path):
        os.makedirs(image_results_path, exist_ok = True)

    # Get tesseract results and write cropped images to a folder
    tesseract_results = []
    for e, cropped_image in enumerate(cropped_images):
        cv2.imwrite(os.path.join(image_results_path, f"{e}-det-"+ os.path.basename(image_path)), cropped_image)
        tesseract_results.append(Tesseract.predict(cropped_image))

    # Find the characters from the folders that cropped images were written
    # Do recognition with recognition model
    craft_rec_results = model.recognition(image_folder = image_results_path)
    filename, file_ext = os.path.splitext(os.path.basename(image_path))
    mask_file = results_path + "/res_" + filename + '_mask.jpg'
    cv2.imwrite(mask_file, heatmap)

    #FileHandler.saveResult(image_path, image[:,:,::-1], bboxes)
    
    # Ensemble the tesseract's and recognition model's results and regex them a bit to get final results
    alligned_results = model.allign_char_results(tesseract_results, craft_rec_results)
    all_results.append(alligned_results)

In [10]:
print(all_results)

[['Dona Mercedes Restaurant', '1030 1/2 San Fernando Rd', 'San Fernando CA 91341', 'Vero', 'CENTERL', '1 CHicharon', '8225', '3 Pupusa Queso', '$6.75', '1 Platanos Orden', '8775', '1 Diet coke', '$1.50', '2 Quesadilla salvadorena', '$4.00', 'SUBTOTAL: $22.25', 'TAX: $2.22', 'TOTAL: $24.47', 'TIP SUGGESTIONS', '18714440', '20%: $4.89', '25%: $6.12', 'Thank You!']]


#### Explainable AI
I saved the attentions that the craft model is attending at the time of inference for understanding where the model is concentrating when creating the bounding boxes, given time I can extend it to all the layers and add it to other models.

### NER
I was planning to train a ner model in order to find the named entities requested however due to the time constraints I was unable to do that. I even labelled the dataset which is also shared and was editing my named entity recognition repository which is https://github.com/kemalaraz/NamedEntityRecognizer. But it has some flaws when it comes to CoNNL format which I labelled my dataset according to that is why I am skipping that part.