# File-Processing OCR Demo
This notebook demonstrates how to use the `file-processing-ocr` library to add OCR functionality to the `file-processing` library as a plugin. We'll apply OCR to extract text from two image files: `Charter_of_Rights_and_Freedoms.jpg` and `Health_Canada_logo.png`.

In [1]:
# Import the necessary classes
from file_processing import File
from file_processing_ocr.ocr_decorator import OCRDecorator

# Use the helper function to get the path to the test files
from file_processing_test_data import get_test_files_path

# Load the test image files
test_files_path = get_test_files_path()
image_file_1_path = test_files_path / 'Charter_of_Rights_and_Freedoms.jpg'
image_file_2_path = test_files_path / 'Health_Canada_logo.png'

# Initialize File objects for the images and wrap with OCRDecorator
image_file_1 = OCRDecorator(File(str(image_file_1_path)))
image_file_2 = OCRDecorator(File(str(image_file_2_path)))

## Applying OCR to Images
The `OCRDecorator` wraps around the base `File` class from `file-processing`, adding OCR functionality as a plugin. The extracted OCR text is stored in the `metadata` dictionary under the key `ocr_text`.

In [2]:
# Process the first image file and extract OCR text
image_file_1.process()

# Access and print only the OCR text from the metadata dictionary
print(f"OCR Text from Charter_of_Rights_and_Freedoms.jpg:\n{image_file_1.metadata.get('ocr_text', 'No OCR text extracted')}")

OCR Text from Charter_of_Rights_and_Freedoms.jpg:
CANADIAN
seasimar—-~ CHARTER OF RIGHTS
aitteemee _ AND FREEDOMS

‘Sihenteaannivnsnstr ero

iredtmety efter
Democratic Rights
caitueteemsererernndend — Gemeanneenstnnnetdeuties miter iipten ti mama tyenmigihiet nt

Minority Language
cational Rights

Soatienaiatetaerere

Enforcement
Tepe a et nap Sgn pee

General
Fe cendeatieeresie scents

Citation



## OCR Text for Second Image
Let's apply OCR to the `Health_Canada_logo.png` file and view the extracted text.

In [3]:
# Process the second image file and extract OCR text
image_file_2.process()

# Access and print only the OCR text from the metadata dictionary
print(f"OCR Text from Health_Canada_logo.png:\n{image_file_2.metadata.get('ocr_text', 'No OCR text extracted')}")

OCR Text from Health_Canada_logo.png:
Health Sante
Canada Canada



# Conclusion
In this demo, we used the `file-processing-ocr` library as a plugin to add OCR capabilities to the `file-processing` library. By wrapping `File` objects with `OCRDecorator`, we can extract text from images and access it directly through the `metadata` dictionary under the `ocr_text` key. This plugin approach allows for seamless addition of OCR functionality to supported file types.