
# **OCR Bootcamp Notes**

*(Optical Character Recognition)*

---

## ✅ **1. Introduction to OCR**

### **What is OCR?**

* OCR = **Optical Character Recognition**
* Converts **printed or handwritten text in images** into **editable text**.
* Works by detecting **characters, words, and text layout** from an image or document.

---

### **Why OCR?**

* Automates data entry.
* Digitizes physical documents.
* Enables **searchable PDFs**.
* Used in banking, healthcare, transportation, retail.

---

### **Real-world Applications**

* Scanning books into eBooks.
* License Plate Recognition.
* Invoice & Receipt Automation.
* Passport/ID verification.
* Subtitle extraction from videos.

---

### **OCR vs ICR**

* **OCR**: Recognizes printed text.
* **ICR**: Recognizes **handwritten text** using AI.

---

## ✅ **2. How OCR Works**

1. **Image Acquisition** → Capture image/document.
2. **Preprocessing** → Clean image (noise removal, binarization).
3. **Text Detection** → Locate text regions.
4. **Character Recognition** → Extract text.
5. **Post-processing** → Correct errors.

---

## ✅ **3. Image Preprocessing for OCR**

**Why preprocessing?**
Bad quality images → Low OCR accuracy.
Preprocessing helps clean the image for better results.

### **Techniques**

* **Grayscale conversion** → Reduce complexity.
* **Thresholding** → Convert to black & white.

  * Binary threshold.
  * Adaptive threshold.
* **Noise removal** → Gaussian blur, Median filter.
* **Morphological operations** → Remove small artifacts.
* **Deskewing** → Correct tilted text.

**Code Example (OpenCV Preprocessing):**


In [1]:
import cv2

# Use raw string for path
img = cv2.imread(r'C:\Users\hp\Documents\ds_materials\9.Deep_learning\cv\5.ocr_text_recognition\images\lifestyle-02.jpg')

# Check if image is loaded
if img is None:
    raise FileNotFoundError("Image not found. Check the file path.")

# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Apply threshold
_, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)

# Save processed image
cv2.imwrite('images\processed.jpg', thresh)


True


## ✅ **4. OCR Engines & Libraries**

* **Tesseract OCR** (most popular, by Google)
* **EasyOCR** (Deep learning-based, supports multiple languages)
* **PaddleOCR** (High accuracy)
* **Google Vision API** (Cloud-based)

---

### **Installing Tesseract**

**Windows/Linux Setup:**

* Install Tesseract from official site.
* Add path to environment variables.
* Install Python wrapper:


---

### **Basic OCR in Python:**

import pytesseract
from PIL import Image

img = Image.open('text_image.png')
text = pytesseract.image_to_string(img)
print(text)

In [2]:
!pip install pytesseract opencv-python


Collecting pytesseract
  Using cached pytesseract-0.3.13-py3-none-any.whl.metadata (11 kB)
Using cached pytesseract-0.3.13-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.13



## ✅ **5. Advanced OCR Features**

* **Multilingual OCR** → Support for 100+ languages.
* **Custom language models** → Train for new fonts.
* **Extracting structured data** → Tables, forms.
* **Confidence scores** → Check accuracy.
* **Handwriting recognition** → Using deep learning (ICR).

---

## ✅ **6. Improving OCR Accuracy**

* Use **high-resolution images**.
* Apply **deskewing** & **denoising**.
* Convert to **grayscale or binary**.
* Train custom models for complex fonts.
* Use **Deep Learning (CRNN)** for better accuracy.

---

## ✅ **7. OCR in Applications**

* **Extract text from PDFs** using `pdf2image` + OCR.
* **Real-time OCR** using webcam feed.
* **Batch OCR** for multiple documents.
* **Integrating OCR with Flask/FastAPI for APIs**.

In [3]:
!pip install pdf2image

Collecting pdf2image
  Using cached pdf2image-1.17.0-py3-none-any.whl.metadata (6.2 kB)
Using cached pdf2image-1.17.0-py3-none-any.whl (11 kB)
Installing collected packages: pdf2image
Successfully installed pdf2image-1.17.0


In [4]:
!pip install pdf2image



In [1]:
### **Example: Extract Text from PDF**

from pdf2image import convert_from_path
import pytesseract

pages = convert_from_path('document.pdf', 300)
for page in pages:
    text = pytesseract.image_to_string(page)
    print(text)

PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

- PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?


## Exract data from image

In [None]:
import streamlit as st
from ocr_utils import *
import numpy as np
import cv2

st.set_page_config(page_title="OCR Text Recognition")
st.title("📝 OCR Text Recognition with Pytesseract")

uploaded_file = st.file_uploader("Upload an image with text", type=["jpg", "jpeg", "png"])
if uploaded_file:
    image = load_image(uploaded_file.read())
    st.image(image, caption="Original Image", use_column_width=True)

    thresh = preprocess_image(image)
    st.image(thresh, caption="Preprocessed Image", use_column_width=True)

    st.subheader("📄 Extracted Text")
    text = extract_text(thresh)
    st.code(text)

    st.subheader("🔍 Word Detection with Bounding Boxes")
    data = extract_data(thresh)
    n_boxes = len(data['level'])
    for i in range(n_boxes):
        (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
        cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
    st.image(image, caption="Detected Words", use_column_width=True)



---

## ✅ **8. Projects & Case Studies**

### **Beginner**

✔ Extract text from an image.
✔ Convert scanned PDF to editable text.

### **Intermediate**

✔ License plate recognition.
✔ Business card reader using OCR + OpenCV.

### **Advanced**

✔ Automated invoice processing with table extraction.
✔ Real-time subtitle extraction from video feed.

---

## ✅ **Practice Session**

**Questions:**

1. Define OCR and its real-world uses.
2. Explain why image preprocessing is important for OCR.
3. Write Python code to extract text from an image using Tesseract.
4. How do you improve OCR accuracy?
5. What is the difference between OCR and ICR?

---

## ✅ **Assignments**

1. Extract text from a scanned handwritten note.
2. Create an OCR pipeline for multilingual documents.
3. Implement OCR for real-time video stream using OpenCV.
4. Process 100 scanned documents and export extracted text into Excel.

---

## ✅ **Tools & Libraries**

* **OpenCV** → Image preprocessing.
* **Pytesseract** → OCR engine.
* **pdf2image** → Convert PDFs to images.
* **EasyOCR** → Multilingual deep learning-based OCR.
* **Flask/FastAPI** → Build OCR APIs.

---