**Install pytesseract**

Installs the pytesseract library, providing a Python wrapper for the Tesseract OCR engine, enabling text extraction from images.

In [1]:
%pip install pytesseract

Collecting pytesseract
  Obtaining dependency information for pytesseract from https://files.pythonhosted.org/packages/7a/33/8312d7ce74670c9d39a532b2c246a853861120486be9443eebf048043637/pytesseract-0.3.13-py3-none-any.whl.metadata
  Downloading pytesseract-0.3.13-py3-none-any.whl.metadata (11 kB)
Downloading pytesseract-0.3.13-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.13
Note: you may need to restart the kernel to use updated packages.




**Installs the OpenCV library (opencv-python) for advanced computer vision tasks.**

This library allows for image and video processing, including reading, writing, and analyzing visual data.


In [2]:
%pip install opencv-python

Note: you may need to restart the kernel to use updated packages.




**Install YOLOv11 Library**

Installs the Ultralytics YOLO library, a cutting-edge tool for training and deploying YOLOv11 models.



In [3]:
%pip install -U ultralytics

Collecting ultralytics
  Obtaining dependency information for ultralytics from https://files.pythonhosted.org/packages/c4/af/9d2d794f6a72ef75c8be1771b04dd043ded99744e59b9696bb090fbf9ebb/ultralytics-8.3.55-py3-none-any.whl.metadata
  Downloading ultralytics-8.3.55-py3-none-any.whl.metadata (35 kB)
Downloading ultralytics-8.3.55-py3-none-any.whl (904 kB)
   ---------------------------------------- 0.0/904.3 kB ? eta -:--:--
   - -------------------------------------- 30.7/904.3 kB 1.3 MB/s eta 0:00:01
   --- ----------------------------------- 71.7/904.3 kB 975.2 kB/s eta 0:00:01
   ----- ---------------------------------- 122.9/904.3 kB 1.0 MB/s eta 0:00:01
   ------- -------------------------------- 174.1/904.3 kB 1.0 MB/s eta 0:00:01
   --------- ------------------------------ 225.3/904.3 kB 1.1 MB/s eta 0:00:01
   ----------- ---------------------------- 266.2/904.3 kB 1.0 MB/s eta 0:00:01
   -------------- ------------------------- 317.4/904.3 kB 1.0 MB/s eta 0:00:01
   ------------



**Import Required Libraries**

Imports essential libraries:

* `YOLO` for object detection
* `cv2` (OpenCV) for image processing
* `pytesseract` for OCR Additionally, sets the path to the Tesseract OCR executable.

In [2]:
from ultralytics import YOLO
from IPython.display import Image
import cv2
import pytesseract

# Provide the path to the Tesseract executable
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

import os
HOME = os.getcwd()  # Getting the current working directory
print(HOME)

c:\Users\hramphul\Desktop\yolo_ocr


**Define OCR Function**

Defines a function to perform OCR on the cropped regions of detected bounding boxes. It includes the following steps:

1. Extract bounding box coordinates and class names.
2. Crop the detected regions from the image.
3. Preprocess the cropped image (convert to grayscale and binarize) for better OCR results.
4. Apply OCR to extract text and print the results.

In [3]:
# OCR Function
def perform_ocr(image, detections):
    """
    Perform OCR on cropped regions from the detected bounding boxes and include class names.
    """
    for i, detection in enumerate(detections):
        # Extract bounding box and class name
        x1, y1, x2, y2, class_name = detection
        cropped_image = image[int(y1):int(y2), int(x1):int(x2)]
        
        # Preprocess for better OCR results
        gray = cv2.cvtColor(cropped_image, cv2.COLOR_BGR2GRAY)
        _, binary = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)
        
        # Perform OCR using Tesseract
        text = pytesseract.image_to_string(binary, lang='eng')
        print(f"Class '{class_name}' detected: {text.strip()}")

        # Save or display cropped regions (optional)
    #     cv2.imshow(f"Region - {class_name}", binary)
    #     cv2.waitKey(0)
    # cv2.destroyAllWindows()

**Load Model and Perform Inference**

Loads the trained YOLO model and performs inference on all invoice images in the specified folder. For each image:

* Detects objects using the YOLO model.
* Extracts bounding box coordinates and corresponding class names.
* Performs OCR on detected regions.

In [17]:
# Loading the best-trained model and performing inference on a test image.
model = YOLO(f'{HOME}/runs/detect/train/weights/best.pt') 
   
# Path to the images folder
category = "invoices"
images_folder = f"{HOME}/{category}"
target_classes = [1]

# Iterate through all files in the images folder
for file_name in os.listdir(images_folder):
    if file_name.endswith(('.jpg', '.jpeg', '.png')):  # Check for image files
        file_path = os.path.join(images_folder, file_name)

        # Load the image
        image = cv2.imread(file_path)

        # Run the YOLO model on the current image
        results = model(file_path)
        
        # Extract bounding box coordinates and class names
        detections = []
        for box in results[0].boxes.data.tolist():
            x1, y1, x2, y2, conf, cls = box[:6]
            class_name = results[0].names[int(cls)]  # Map class index to class name
            detections.append((x1, y1, x2, y2, class_name))
        
        print(f"Detected objects for {file_name}: {detections}")

        # Perform OCR on detected regions
        perform_ocr(image, detections)

print("Processing complete for all images!")                                               


image 1/1 c:\Users\hramphul\Desktop\yolo_ocr\invoices\service-invoice-template-1x.jpg: 640x480 1 Name_Client, 1 Products, 1 Subtotal, 2 Taxs, 1 Tel_Client, 1 billing address, 2 invoice dates, 2 invoice numbers, 1 total, 64.8ms
Speed: 3.0ms preprocess, 64.8ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 480)
Detected objects for service-invoice-template-1x.jpg: [(92.17758178710938, 964.3442993164062, 1211.083984375, 1309.714599609375, 'Products'), (1056.4527587890625, 1544.5643310546875, 1193.139404296875, 1574.37939453125, 'total'), (1089.963623046875, 1371.732421875, 1196.6043701171875, 1396.4669189453125, 'Subtotal'), (1067.8226318359375, 772.3897705078125, 1215.751953125, 801.2837524414062, 'invoice number'), (1046.158203125, 302.2987060546875, 1226.0513916015625, 339.7677307128906, 'invoice date'), (82.00032043457031, 556.3425903320312, 406.1117858886719, 589.2283935546875, 'billing address'), (1033.93994140625, 822.3646240234375, 1214.4046630859375, 850.43786621093