## Evaluate OCR engine quality

Before processing all files, we evaluate OCR quality on a sample image part for evaluation(source: [Deutsche Zeitung, Ausgaben am Montag, 23.12.1918](https://zefys.staatsbibliothek-berlin.de/kalender/auswahl/date/1918-12-23/30744015/)):
![sample.jpg](sample.jpg)

Let us OCR it:

In [None]:
import pytesseract
from PIL import Image

In [None]:
ocr_output = pytesseract.image_to_string(Image.open('sample.jpg'), lang='frk')  # using German fraktur OCR model

In [None]:
print(ocr_output)

#### 2.1.1 Manually create  the 'ground truth' to evaluate against

In [None]:
ground_truth = input('Please insert corrected string: ')

In [None]:
print(ground_truth)

#### 2.1.2 Measure OCR precision, recall and F-measure

In the context of Optical Character Recognition (OCR), precision, recall, and F-measure are metrics used to evaluate the accuracy and efficiency of OCR systems in converting images of typed, handwritten, or printed text into machine-encoded text. These metrics help to understand how well an OCR system performs, especially in terms of correctly identifying characters, words, or specific information within documents. Here's how these metrics apply to OCR quality evaluation:

###### Precision in OCR
In OCR, precision measures the accuracy of the recognized text against the actual text in the document images. It calculates the proportion of correctly identified characters or words out of all the characters or words that the OCR system identified. High precision means that most of the text the OCR system identified as present in the document was actually correct, indicating fewer false positives (i.e., incorrectly identified as present).

![](precision.png)

##### Recall in OCR
Recall in the context of OCR measures the OCR system's ability to capture all the relevant characters or words from the document images. It is the ratio of the correctly identified characters or words to all the characters or words that are actually present in the documents. High recall indicates that the OCR system is able to identify most of the actual text present, minimizing false negatives (i.e., failing to recognize text that is there).

![](recall.png)

##### F-measure (F1 Score) in OCR
The F-measure or F1 score in OCR provides a single metric that combines both precision and recall to give a balanced view of the OCR system's overall performance. Since precision and recall have a trade-off (improving one can often lead to a reduction in the other), the F1 score helps to evaluate the OCR system's effectiveness at recognizing text accurately while minimizing both false positives and false negatives.

![](fmeasure.png)

These metrics are critical for assessing OCR systems, particularly in applications where the accuracy of text recognition directly impacts the outcome, such as document automation, data extraction from scanned documents, and automated processing of handwritten forms. A balance between high precision and high recall is often desired to ensure that the OCR system is both accurate and comprehensive in its text recognition capabilities.

In [None]:
from aux.measure_ocr_quality import measure_ocr_quality

In [None]:
precision, recall, f_score = measure_ocr_quality(ocr_output, ground_truth)

In [None]:
print(f'Precision: {round(precision, 4)}\nRecall: {round(recall, 4)}\nF1-score: {round(f_score, 4)}')