In [2]:
from IPython.display import Image

## Image Binarization for Handwriting OCR

### Ross Kimberlin

### Nashville Software School

## Agenda

1) The problem domain <br>
2) The inspiration <br>
3) The data and the goal <br>
4) First approach (simple thresholding) <br>
5) How to measure success <br>
6) Second approach (neural network) <br>
7) The impact of this work

### 1) The Problem Domain - Optical Character Recognition (OCR)

Can artificial intelligence look at images and recognize meaningful characters, such as letters and numbers? <br>

A classic example is reading license plates.  To us, it's obvious what is the plate number and what is the background.  But for a computer, this takes training.

TO ADJUST IMAGE SIZE -- SEE https://stackoverflow.com/questions/41598916/resize-the-image-in-jupyter-notebook-using-markdown AND https://stackoverflow.com/questions/14675913/changing-image-size-in-markdown

<div>
<img src="attachment:Screenshot.png" width="500"/>
</div>

In this case, can AI look at handwritten documents and accurately predict what is text and what is background surface?
![HW%20sample%201.png](attachment:HW%20sample%201.png)

### 2) The Inspiration

The world is full of old handwritten documents that give us a priceless connection with history.  The more that the world changes, the more valuable this connection becomes.

However, many historical documents are hidden in library archives that are too far away for researchers to travel.  Libraries tend not to circulate or share these documents, since they are fragile.

Scanning these documents would let more people access them, learn from them and enjoy them, without exposing the documents to damage and requiring people to make long research trips.  

Once they have been scanned and put online, it would add even more value to transcribe these documents automatically using artificial intelligence.  AI might even be able to translate them.  This could speed up collaborative research by streamlining some of the most time-consuming parts of scholars' jobs.

The current project does not attempt the transcription and translation of scanned document images.  

Instead, it sets the stage for those later steps by processing the image scans into a format that lets computers read the document text more easily.

### 3) The Data and the Goal for the Data

#### Data source: The Handwritten Document Image Binarization Contest

H-DIBCO 2016 Dataset | H-DIBCO 2017 Dataset
-------------------- | --------------------
10 images | 20 images
7,031 x 17,003 pixels | 19, 483 x 30,262 pixels


### The Goal
![Side-by-side_1.png](attachment:Side-by-side_1.png)


### What you just saw is a binarized image.

The processed image does not look as natural to the human eye, but it is much easier for computers to understand.

This is because the "noise" on the writing surface--the stains, scratches, etc.--has been removed by the binarization process, making the "signal," the meaningful text, much easier to interpret.

### 4) First Approach - Use binary thresholding in OpenCV

Binary thresholding means turning a range of color values into one of two possible values.  Color images can have Red, Green, and Blue (RGB) values from 0 to 255.  

To binarize, read in a grayscale image and then select a **threshold**, or cutoff value.  Every value above the threshold becomes a black pixel, and every value below the threshold becomes a white pixel.

### Example of Results

< INSERT SIDE-BY-SIDE >

### Results for Simple Thresholding

The average prediction success fell just slightly upon adding more data.
![Threshold%20scores%201-2.png](attachment:Threshold%20scores%201-2.png)

### 5) How to Measure Success

The appropriate measurement for this kind of classification problem is the **f1 score**. 

![f1%20score.png](attachment:f1%20score.png)

The f1 score is the right measurement here because handwritten documents are **imbalanced data**--there is much more background surface than there is actual handwriting. 

We also care much more about one of the classes (**text**) than we care about the other class (**background**).  Therefore, we want to penalize **false negatives**, so that we lose as little text as possible.

#### The occasional tiny spot of false positive prediction is OK (ink where there should be none in the predicted image on the right), as long as we don't have blank space where there should be ink.
![Side-by-side%202.png](attachment:Side-by-side%202.png)

#### 6) Second Approach - Convolutional Neural Network with Keras
![CNN%203.png](attachment:CNN%203.png)

A **neural network** is an artificial intelligence network that mimics the structure of the human brain by having different nodes connect and communicate with one another the way that synapses do in our brains.

There are many types of neural networks, but the one used here is a **convolutional neural network**. These are common in image processing.

The convolutions are layers that communicate pixel data regarding the image.  When we say a CNN has three **layers**, this is the same as saying it has three convolutions, and so on. We first tried three layers in our network, then five.
![Gallego%203-layer%20NN.png](attachment:Gallego%203-layer%20NN.png)

### The Process

### < TO DO - WALK THEM THROUGH THE PROCESS!!

### TO MAKE A WINDOW, YOU HAVE TO ... >

### Results for Neural Networks
![CNN%20scores%201-3.png](attachment:CNN%20scores%201-3.png)

#### Based on these figures, it looks as if the f1 score of the simple thresholding approach will decrease as the amount of training data increases, whereas the reverse appears to be true with our convolutional neural network.
![Scores%20side-by-side%201.png](attachment:Scores%20side-by-side%201.png)

## 7) The Impact of This Work

Automatic transcription and translation of handwritten documents would not only reduce the cost and potential downside of viewing them, it would allow greater informational exchange across borders, cultures and languages than humanity has ever had.

This could have a revolutionary effect on the availability of historical, linguistic and literary resources to students and researchers worldwide.  

### Thank you for your attention!

### Appendix - Resources Cited 1

Calvo-Zaragoza, Jorge & Gallego, Antonio-Javier.  "A selectional auto-encoder approach for document image binarization."  Pattern Recognition. DOI: 10.1016/j.patcog.2018.08.011.

https://arxiv.org/pdf/1706.10241.pdf

https://github.com/ajgallego/document-image-binarization

### Appendix - Resources Cited 2

ICFHR Handwritten Document Image Binarization Contest (H-DIBCO).

http://vc.ee.duth.gr/h-dibco2016/

### Appendix - Resources Cited 3

OpenCV Binarization Documents

https://docs.opencv.org/3.4/d7/d4d/tutorial_py_thresholding.html


### What are the transposed network layers?

### What is batch normalization

### What is reLu

### What is an activation layer