<div align="center">

  <a href="https://ultralytics.com/yolo" target="_blank">
    <img width="1024", src="https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png"></a>

  [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [Türkçe](https://docs.ultralytics.com/tr/) | [Tiếng Việt](https://docs.ultralytics.com/vi/) | [العربية](https://docs.ultralytics.com/ar/)

  <a href="https://github.com/ultralytics/ultralytics/actions/workflows/ci.yml"><img src="https://github.com/ultralytics/ultralytics/actions/workflows/ci.yml/badge.svg" alt="Ultralytics CI"></a>
  <a href="https://colab.research.google.com/github/ultralytics/notebooks/blob/main/notebooks/how-to-use-ultralytics-yolo-with-openai-for-number-plate-recognition.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
  
  <a href="https://ultralytics.com/discord"><img alt="Discord" src="https://img.shields.io/discord/1089800235347353640?logo=discord&logoColor=white&label=Discord&color=blue"></a>
  <a href="https://community.ultralytics.com"><img alt="Ultralytics Forums" src="https://img.shields.io/discourse/users?server=https%3A%2F%2Fcommunity.ultralytics.com&logo=discourse&label=Forums&color=blue"></a>
  <a href="https://reddit.com/r/ultralytics"><img alt="Ultralytics Reddit" src="https://img.shields.io/reddit/subreddit-subscribers/ultralytics?style=flat&logo=reddit&logoColor=white&label=Reddit&color=blue"></a>
  
  Welcome to the ANPR with Ultralytics YOLO11 notebook! <a href="https://github.com/ultralytics/ultralytics">YOLO11</a> is the latest version of the YOLO (You Only Look Once) AI models developed by <a href="https://ultralytics.com">Ultralytics</a>. We hope that the resources in this notebook will help you get the most out of YOLO11. Please browse the YOLO11 <a href="https://docs.ultralytics.com/">Docs</a> for details, raise an issue on <a href="https://github.com/ultralytics/ultralytics">GitHub</a> for support, and join our <a href="https://ultralytics.com/discord">Discord</a> community for questions and discussions!</div>

# Automatic Number Plate Recognition using Ultralytics YOLO11 + OpenAI `gpt-4o-mini`

This notebook provides a comprehensive guide to implementing automatic number plate recognition (ANPR) using the YOLO11 model in combination with `gpt-4o-mini`.

## What is Automatic Number Plate Recognition (ANPR)?
Automatic number plate recognition is a technology designed to identify and extract vehicle number plate information from images or videos. By leveraging the powerful capabilities of Ultralytics YOLO11 for object detection and OpenAI `gpt-4o-mini` for text recognition, ANPR becomes an efficient solution for automating vehicle identification tasks.

## Why Use YOLO11 + GPT-4o-Mini for ANPR?

- Accurate Detection: YOLO11’s object detection capabilities ensure precise localization of license plates in various conditions, such as low light or high-speed movement.

- Seamless Text Recognition: With `gpt-4o-mini`, extracted license plate regions are processed to recognize alphanumeric text accurately, even with variations in font, angle, or clarity.

- Real-Time Processing: The integration allows for real-time vehicle monitoring, making it ideal for applications in traffic management, parking systems, and security surveillance.🚗

### Setup

pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware.

[![PyPI - Version](https://img.shields.io/pypi/v/ultralytics?logo=pypi&logoColor=white)](https://pypi.org/project/ultralytics/) [![Downloads](https://static.pepy.tech/badge/ultralytics)](https://clickpy.clickhouse.com/dashboard/ultralytics) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/ultralytics?logo=python&logoColor=gold)](https://pypi.org/project/ultralytics/)

In [1]:
!uv pip install ultralytics

import base64

import cv2
import ultralytics
from openai import OpenAI
from ultralytics import YOLO
from ultralytics.utils.downloads import safe_download
from ultralytics.utils.plotting import Annotator, colors

ultralytics.checks()

Ultralytics 8.3.203 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (Tesla T4, 15095MiB)
Setup complete ✅ (2 CPUs, 12.7 GB RAM, 39.1/112.6 GB disk)


### Download the Video and Model File

Download the sample video we’ll use for processing. If you prefer to use your own video file, downloading the sample is not necessary.  

✅ We’ll download a license plate detection model `anpr-demo-model.pt` trained on a small dataset, designed to detect license plates in the sample video. You are welcome to use your own custom models as well.  

⚠️ Note: This license plate detection model `anpr-demo-model.pt` is intended solely for proof of concept (POC) purposes and may only work with `anpr-demo-video.mp4`.

In [2]:
# download the sample video file
safe_download("https://github.com/ultralytics/assets/releases/download/v0.0.0/anpr-demo-video.mp4")

# download the sample model file
safe_download("https://github.com/ultralytics/assets/releases/download/v0.0.0/anpr-demo-model.pt")

[KDownloading https://ultralytics.com/assets/anpr-demo-video.mp4 to 'anpr-demo-video.mp4': 100% ━━━━━━━━━━━━ 9.5MB 20.6MB/s 0.5s
[KDownloading https://ultralytics.com/assets/anpr-demo-model.pt to 'anpr-demo-model.pt': 100% ━━━━━━━━━━━━ 5.2MB 5.4MB/s 1.0s


PosixPath('anpr-demo-model.pt')

### Read the Video and Model File

You can either read the video file directly or stream content from an RTSP (Real-Time Streaming Protocol) source, offering flexible video input options to meet your requirements.

✅ By default, we will use the demo video `anpr-demo-video.mp4` downloaded in the previous step. Additionally, we’ll set up the video writer to manage the output video processing.

✅ The model `anpr-demo-model.pt` will also be initialized in memory to handle the processing. You can also use your own model for license plate detection.

In [3]:
cap = cv2.VideoCapture("anpr-demo-video.mp4")
assert cap.isOpened(), "Error reading video file"

# Video writer
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))
video_writer = cv2.VideoWriter("anpr-output.avi", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))

# Load the Ultralytics YOLO license plate detection model
model = YOLO("anpr-demo-model.pt")

<img align="left" src="https://github.com/user-attachments/assets/101152c3-25ba-48a7-9916-c32c3be5857e" height="640">

In [4]:
import cv2
from ultralytics import YOLO

cap = cv2.VideoCapture("anpr-demo-video.mp4")
assert cap.isOpened(), "Error reading video file"

# Video writer
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH,
                                       cv2.CAP_PROP_FRAME_HEIGHT,
                                       cv2.CAP_PROP_FPS))
video_writer = cv2.VideoWriter("anpr-output.avi",
                               cv2.VideoWriter_fourcc(*"mp4v"),
                               fps, (w, h))

# Load the Ultralytics YOLO license plate detection model
model = YOLO("anpr-demo-model.pt")

# Hitung maksimal frame (misalnya hanya 2 detik)
max_frames = int(fps * 2)

frame_count = 0
while cap.isOpened():
    success, im0 = cap.read()
    if not success or frame_count >= max_frames:
        break

    # Output tetap default (tidak ada prediksi tambahan)
    video_writer.write(im0)

    frame_count += 1

cap.release()
video_writer.release()


### Configure OpenAI Client

It’s time to configure the OpenAI client that will accept `base64` image data, extract the number plate text, and return it as a response. The configuration allows you to adjust the content sent to the OpenAI model, and you can also choose different models, such as `gpt-4o`.

⚠️ For the API key, i.e., `OpenAI(api_key="api_key")`, you'll need to visit the [OpenAI API key settings page](https://platform.openai.com/settings/organization/api-keys) and generate an API key for use.

The `extract_text` function is designed to process a base64-encoded image containing a vehicle's number plate.

✅ It sends the image data to the OpenAI client using the `gpt-4o-mini` model and requests the extraction of only the license plate text.

✅ If the text cannot be extracted, the function ensures a response of None. -

✅ Additionally, it filters out any text written near the license plate to provide accurate results.

In [None]:
client = OpenAI(api_key="**")

# Define the text prompt
prompt = """
Can you extract the vehicle number plate text inside the image?
If you are not able to extract text, please respond with None.  # Fallback instruction
Only output text, please.  # Ensure no extra formatting
If any text character is not from the English language, replace it with a dot (.)  # Handle non-English characters, because OpenCV directly can't process these.
"""


def extract_text(base64_encoded_data):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_encoded_data}"},
                    },
                ],
            }
        ],
    )
    return response

### Process Video Frames

In this step, we will process video frames to detect objects, crop regions with padding, and extract license plate text using an OpenAI model. Here's how it works:

✅ Frames are read using OpenCV, and objects are detected using the YOLO11 model.
Bounding boxes are adjusted with padding to crop regions of interest while ensuring proper boundaries.

✅ Cropped regions are encoded in base64 and sent to the `extract_text` function, which uses OpenAI’s model to retrieve license plate text.

✅ The extracted text is added as a label to the bounding box on the video frame.

✅ Processed frames are saved to an output video file.

In [6]:
padding = 10  # Adjust the padding value as needed

while cap.isOpened():
    success, im0 = cap.read()

    if not success:
        break

    results = model.predict(im0)[0].boxes
    boxes = results.xyxy.cpu()
    clss = results.cls.cpu()

    ann = Annotator(im0, line_width=3)

    for (
        cls,
        box,
    ) in zip(clss, boxes):
        height, width, _ = im0.shape  # Get the dimensions of the original image

        # Calculate padded coordinates
        x1 = max(int(box[0]) - padding, 0)
        y1 = max(int(box[1]) - padding, 0)
        x2 = min(int(box[2]) + padding, width)
        y2 = min(int(box[3]) + padding, height)

        # Crop the object with padding and encode the numpy array to base64 format.
        base64_im0 = base64.b64encode(cv2.imencode(".jpg", im0[y1:y2, x1:x2])[1]).decode("utf-8")

        response = extract_text(base64_im0).choices[0].message.content

        print(f"Extracted text: {response}")

        ann.box_label(box, label=str(response), color=colors(cls, True))  # Draw the bounding boxes

    video_writer.write(im0)  # Write the processed video frame

cap.release()  # Release the video capture
video_writer.release()  # Release the video writer

<img align="left" src="https://github.com/user-attachments/assets/d9f5efe7-d058-48f4-9515-f7ccf6084c2f" height="640">