<a href="https://colab.research.google.com/github/klugenik/klugenik-Personal-Rep/blob/main/Digit_recognition_OCR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **OCR-instructions**

**Abstract**

The OCR approach was utilized for the recognition and extraction of numerical readings from electricity meter images. EasyOCR was chosen over Keras and Tesseract due to several key advantages: its user-friendly API that facilitates implementation, its reduced complexity by offering pre-trained models for immediate use (which is particularly beneficial for companies), its high accuracy and robustness in digit recognition across diverse conditions and fonts, its superior performance with non-standard fonts, and its seamless integration with widely-used Python libraries such as OpenCV and Matplotlib, enabling efficient processing and visualization of results.

*Important: Before executing the Python code, it is essential to upload the images 'meter0001.jpg' and 'meter0002.jpg' to the Google Colab environment. *kursiver Text*

# **OCR-model**

**Installation**

The installation commands facilitate the setup of essential libraries crucial for developing an Optical Character Recognition (OCR) model tailored to recognize digits from images of electricity meters.

Firstly, *'opencv-python-headless'* is installed to provide a streamlined version of OpenCV, optimized for server environments or systems without graphical interfaces. This version supports a range of computer vision tasks such as image reading, processing, and feature detection, pivotal for enhancing and analyzing meter images without the need for graphical output.

Next, *'easyocr'* is installed, specializing in text extraction from images. This library supports multilingual OCR capabilities and employs efficient algorithms to accurately recognize text, including numerical digits, across diverse environmental conditions. It integrates seamlessly into the project to facilitate the extraction of meter readings from captured images, contributing to the overall functionality of the OCR solution.

Finally, *'matplotlib'* is installed to provide comprehensive visualization tools. As a versatile plotting library, matplotlib enables the creation of static, animated, and interactive visualizations. Its integration ensures the project can visually represent processed images, annotate detected digits, and present OCR results in a clear and interpretable manner. Together, these installations establish a robust environment for developing and evaluating an OCR model designed specifically for digit recognition in electricity meter images.

In [None]:
# Installation of OpenCV
!pip install opencv-python-headless

# Installation of EasyOCR
!pip install easyocr

# Installation of Matplotlib
!pip install matplotlib



**Importing Libraries**

The essential libaries required  in this step are imported for developing an Optical Character Recognition (OCR) model aimed at digit recognition in electricity meter images.

*'Cv2 (OpenCV)'* is imported to handle various computer vision tasks such as image reading, manipulation, and processing, which are foundational for enhancing image quality and extracting relevant features from meter images.

*'Easyocr'* is imported as a specialized OCR library designed to extract text from images. It supports multiple languages and provides algorithms optimized for accurate character recognition, specifically focusing on digit identification from diverse image backgrounds and conditions.

*'matplotlib.pyplot'* is imported to facilitate data visualization tasks, enabling the generation of graphical outputs that aid in visualizing processed images, annotating recognized digits, and presenting OCR results. This library plays a crucial role in visual feedback and evaluation of the OCR model's performance.

Lastly, *'os'* is imported to interact with the operating system, allowing for file management tasks such as navigating directories, accessing image files, and ensuring compatibility across different operating environments. Together, these libraries establish a comprehensive framework necessary for implementing and evaluating an OCR solution tailored for digit recognition in electricity meter images.

In [None]:
# Importing libraries for image processing, OCR, visualization, and operating system tasks
import cv2  # Library for computer vision tasks
import easyocr  # Optical Character Recognition (OCR) library
import matplotlib.pyplot as plt  # Visualization library
import os  # Library for interacting with the operating system

ModuleNotFoundError: No module named 'bidi.algorithm'

**Loading OCR Model**

This step initializes an Optical Character Recognition (OCR) model by loading the OCR model with English language support. EasyOCR, is a robust OCR library and designed to extract text from images efficiently. By specifying ['en'], the model is tailored to recognize and process text in English. This setup is crucial for accurately identifying and interpreting digits and characters in images, such as in our exmample those from electricity meters. This initialization prepares the model for subsequent text recognition tasks, ensuring it is optimized for the specified language, thus enhancing its accuracy and reliability in extracting textual information from images.


In [None]:
# Loading the OCR model with English language support
reader = easyocr.Reader(['en'])

**Defining Image Files**

This step is fundamental in organizing and managing the image files that will be used in the digit extraction process, ensuring a structured and efficient workflow, by initializing a list named image_files which contains the filenames of images, specifically 'meter0001.jpg' and 'meter0002.jpg'. These filenames correspond to images of electricity meters. By defining this list, the code prepares for the systematic access and processing of each image in subsequent operations. The list serves as a reference point, enabling the code to iterate through each image file, read the images, and perform OCR (Optical Character Recognition) to extract digits.

In [None]:
# List of image filenames that should be extracted for digit recognition
image_files = [
    'meter0001.jpg', 'meter0002.jpg'
]

**Defining Digit Positions**

In the following step a dictionary named positions is established, with each key representing an image filename and each corresponding value being a list of tuples. These tuples contain the coordinates (x, y, width, height) that define the regions within the respective images where digits are located. For 'meter0001.jpg' and 'meter0002.jpg', the coordinates specify the precise locations and dimensions of regions of interest (ROIs), indicating where the digits are expected to be found. This facilitates targeted image processing and analysis operations on these specific regions.

In [None]:
# Dictionary defining the exact positions (x, y, width, height) of digits in each image
positions = {
'meter0001.jpg': [(9, 13, 84, 146), (99, 13, 76, 144,), (178, 13, 77, 143,), (261, 13, 77, 143,), (350, 17, 66, 136,)],
'meter0002.jpg': [(6, 7, 49, 92), (60, 7, 49, 91), (113, 5, 58, 92,), (174, 5, 50, 92,), (229, 6, 48, 91,)]
}

**Setting Image Folder Path**

Even if the images are already manually loaded in a former step, this path serves to organize and ensure that the images are accessible and correctly loaded.The variable image_folder is defined and the string '/content/' assigned (varies between storage locations). The purpose of this assignment is to specify the directory path where the image files required for further processing are located. By setting this variable, the code ensures a consistent reference point for accessing these images, facilitating file operations such as reading and writing.

In [None]:
# Directory path for the images
image_folder = '/content/'  # The image path can vary due to folder location

**Defining function for Executing Optical Character Recognition (OCR) and Visualizing Detected Text in Image Regions**

The function process_image(image_filename, positions) in this step is used to perform Optical Character Recognition (OCR) on specified regions of an input image and display the results. The function begins by reading the image from the specified filename using cv2.imread. It then verifies if the image was successfully loaded, raising a FileNotFoundError if the image could not be loaded. Once the image is confirmed to be loaded, it is displayed using matplotlib, with the color space converted from BGR to RGB for correct visualization.

The function iterates over each position in the provided positions list, which contains the coordinates and dimensions of regions of interest (ROIs) within the image. For each position, it extracts the corresponding ROI from the image. OCR is then applied to the extracted ROI using the reader.readtext method, which returns the detected text and its coordinates within the ROI.

For each detected text result, the function calculates the absolute coordinates of the detected text in the context of the original image by adjusting the coordinates relative to the ROI's position. It also retrieves the detected text string and the confidence level of the detection. These results are then displayed on the image by plotting the bounding box of the detected text in red and annotating it with the detected text and its confidence level. Additionally, the detected text and confidence level are printed to the console.

Finally, the function turns off the axis display using plt.axis('off') and shows the annotated image with plt.show(), thus completing the visualization of the OCR results.

In [None]:
# Function to perform OCR and display the results
def process_image(image_filename, positions):
    # Read image
    img = cv2.imread(image_filename)

    # Make sure the image has loaded
    if img is None:
        raise FileNotFoundError(f"Bild konnte nicht geladen werden: {image_filename}")

    # Show image
    plt.figure(figsize=(10, 10))
    plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

    for idx, pos in enumerate(positions):
        x, y, w, h = pos
        # Cut out the area of ​​interest
        roi = img[y:y+h, x:x+w]

        # Apply OCR to the cutout area
        results = reader.readtext(roi)

        for res in results:
            # Coordinates of the recognized text relative to the ROI
            (xr1, yr1), (xr2, yr2), (xr3, yr3), (xr4, yr4) = res[0]
            # Absolute coordinates in the original image
            x1, y1 = x + xr1, y + yr1
            x2, y2 = x + xr2, y + yr2
            x3, y3 = x + xr3, y + yr3
            x4, y4 = x + xr4, y + yr4
            # Text and confidence of recognition
            det, conf = res[1], res[2]
            # View results
            plt.plot([x1, x2, x3, x4, x1], [y1, y2, y3, y4, y1], 'r-')
            plt.text(x1, y1 - 10, f'{det} [{round(conf, 2)}]', color='red', fontsize=12)
            print(f'Digit {idx+1}: Text: {det}, Konfidenz: {conf:.2f}')

    plt.axis('off')
    plt.show()



**Iterating Over Image Files and processing**

This step involves the iterative processing of multiple images (meter0001.jpg and meter0002.jpg). For each image file in the list, the complete path to the image is generated by concatenating the directory path (image_folder) with the image filename (image_file). This path is then used to load and process the specific image. A message is printed to the console to indicate which image is being processed, thereby providing clarity on the current operation. Subsequently, the process_image function is called with the constructed image path and the corresponding positional data retrieved from the positions dictionary for the specific image file. This procedure ensures that each image is processed according to its designated regions of interest, as defined by the positional data.

In [None]:
# Iterate through the image files and process them
for image_file in image_files:
    image_path = os.path.join(image_folder, image_file)
    print(f"Processing Image: {image_path}")
    process_image(image_path, positions[image_file])

# **Image preprocessing - deblurring and denoising**

**Installation**

The installation commands facilitate the setup of essential libraries crucial for developing an Optical Character Recognition (OCR) model tailored to recognize digits from images of electricity meters.

Firstly, *'opencv-python-headless'* is installed to provide a streamlined version of OpenCV, optimized for server environments or systems without graphical interfaces. This version supports a range of computer vision tasks such as image reading, processing, and feature detection, pivotal for enhancing and analyzing meter images without the need for graphical output.

Next, *'easyocr'* is installed, specializing in text extraction from images. This library supports multilingual OCR capabilities and employs efficient algorithms to accurately recognize text, including numerical digits, across diverse environmental conditions. It integrates seamlessly into the project to facilitate the extraction of meter readings from captured images, contributing to the overall functionality of the OCR solution.

Finally, *'matplotlib'* is installed to provide comprehensive visualization tools. As a versatile plotting library, matplotlib enables the creation of static, animated, and interactive visualizations. Its integration ensures the project can visually represent processed images, annotate detected digits, and present OCR results in a clear and interpretable manner. Together, these installations establish a robust environment for developing and evaluating an OCR model designed specifically for digit recognition in electricity meter images.

In [None]:
# Installation of OpenCV
!pip install opencv-python-headless

# Installation of EasyOCR
!pip install easyocr

# Installation of Matplotlib
!pip install matplotlib

**Importing Libraries**

The essential libaries required  in this step are imported for developing an Optical Character Recognition (OCR) model aimed at digit recognition in electricity meter images.

*'Cv2 (OpenCV)'* is imported to handle various computer vision tasks such as image reading, manipulation, and processing, which are foundational for enhancing image quality and extracting relevant features from meter images.

*'Easyocr'* is imported as a specialized OCR library designed to extract text from images. It supports multiple languages and provides algorithms optimized for accurate character recognition, specifically focusing on digit identification from diverse image backgrounds and conditions.

*'matplotlib.pyplot'* is imported to facilitate data visualization tasks, enabling the generation of graphical outputs that aid in visualizing processed images, annotating recognized digits, and presenting OCR results. This library plays a crucial role in visual feedback and evaluation of the OCR model's performance.

Lastly, *'os'* is imported to interact with the operating system, allowing for file management tasks such as navigating directories, accessing image files, and ensuring compatibility across different operating environments. Together, these libraries establish a comprehensive framework necessary for implementing and evaluating an OCR solution tailored for digit recognition in electricity meter images.

In [None]:
# Importing libraries for image processing, OCR, visualization, and operating system tasks
import cv2  # Library for computer vision tasks
import easyocr  # Optical Character Recognition (OCR) library
import matplotlib.pyplot as plt  # Visualization library
import os  # Library for interacting with the operating system

**Loading OCR Model**

This step initializes an Optical Character Recognition (OCR) model by loading the OCR model with English language support. EasyOCR, is a robust OCR library and designed to extract text from images efficiently. By specifying ['en'], the model is tailored to recognize and process text in English. This setup is crucial for accurately identifying and interpreting digits and characters in images, such as in our exmample those from electricity meters. This initialization prepares the model for subsequent text recognition tasks, ensuring it is optimized for the specified language, thus enhancing its accuracy and reliability in extracting textual information from images.

In [None]:
# Loading the OCR model with English language support
reader = easyocr.Reader(['en'])

**Defining Image Files**

This step is fundamental in organizing and managing the image files that will be used in the digit extraction process, ensuring a structured and efficient workflow, by initializing a list named image_files which contains the filenames of images, specifically 'meter0001.jpg' and 'meter0002.jpg'. These filenames correspond to images of electricity meters. By defining this list, the code prepares for the systematic access and processing of each image in subsequent operations. The list serves as a reference point, enabling the code to iterate through each image file, read the images, and perform OCR (Optical Character Recognition) to extract digits.

In [None]:
# List of image filenames that should be extracted for digit recognition
image_files = [
    'meter0001.jpg', 'meter0002.jpg'
]

**Defining Digit Positions**

In the following step a dictionary named positions is established, with each key representing an image filename and each corresponding value being a list of tuples. These tuples contain the coordinates (x, y, width, height) that define the regions within the respective images where digits are located. For 'meter0001.jpg' and 'meter0002.jpg', the coordinates specify the precise locations and dimensions of regions of interest (ROIs), indicating where the digits are expected to be found. This facilitates targeted image processing and analysis operations on these specific regions.

In [None]:
# Dictionary defining the exact positions (x, y, width, height) of digits in each image
positions = {
    'meter0001.jpg': [(9, 13, 84, 146), (99, 13, 76, 144,), (178, 13, 77, 143,), (261, 13, 77, 143,), (350, 17, 66, 136,)],
    'meter0002.jpg': [(6, 7, 49, 92), (60, 7, 49, 91), (113, 5, 58, 92,), (174, 5, 50, 92,), (229, 6, 48, 91,)]
}


**Setting Image Folder Path**

Even if the images are already manually loaded in a former step, this path serves to organize and ensure that the images are accessible and correctly loaded.The variable image_folder is defined and the string '/content/' assigned (varies between storage locations). The purpose of this assignment is to specify the directory path where the image files required for further processing are located. By setting this variable, the code ensures a consistent reference point for accessing these images, facilitating file operations such as reading and writing.

In [None]:
# Directory path for the images
image_folder = '/content/'  # The image path can vary due to folder location

**Defining deblur_image function**

The deblur_image function is designed to mitigate the effects of blurriness in an image. Initially, the function applies a Gaussian blur to the input image using cv2.GaussianBlur, which involves convolving the image with a Gaussian kernel of size 5x5. This operation creates a blurred version of the image. Subsequently, the function employs the cv2.addWeighted method to enhance the clarity of the original image by combining it with the blurred image. Specifically, the original image is weighted by a factor of 1.5, while the blurred image is weighted by a factor of -0.5. The resulting image, which is a linear combination of the original and blurred images, aims to reduce the appearance of blur and enhance the sharpness. The function then returns this processed image as the output.

In [None]:
# Blur reduction function
def deblur_image(image):
    blurred = cv2.GaussianBlur(image, (5, 5), 0)
    deblurred = cv2.addWeighted(image, 1.5, blurred, -0.5, 0)
    return deblurred

**Dedining denoise_image function**

The denoise_image function is designed to reduce noise in an image. It utilizes the cv2.fastNlMeansDenoisingColored method, which is an implementation of the Non-Local Means Denoising algorithm tailored for color images. This algorithm works by comparing patches within the image and averaging similar patches to suppress noise while preserving important image details. The parameters provided to this function include a h-value of 10, which controls the filter strength, and parameters for the template and search window sizes, set at 7 and 21 respectively. The resulting output is a denoised image, which is returned by the function.

In [None]:
# Noise reduction function
def denoise_image(image):
    denoised = cv2.fastNlMeansDenoisingColored(image, None, 10, 10, 7, 21)
    return denoised

**Defining function for Executing Optical Character Recognition (OCR) and Visualizing Detected Text in Image Regions**

The process_image function is used to perform Optical Character Recognition (OCR) on specified regions of an image and display the results. Initially, the function reads the image from the file path provided using cv2.imread. It then checks whether the image has been successfully loaded; if not, it raises a FileNotFoundError indicating the failure to load the specified image.

Upon successful loading, the function proceeds to visualize the image by converting its color space from BGR to RGB and displaying it using matplotlib. The function iterates through a list of positions, each defining a rectangular region of interest (ROI) within the image. For each position, it extracts the corresponding ROI and applies the OCR process to this segment using reader.readtext, which detects and reads text within the ROI.

The OCR results include the bounding box coordinates of the detected text within the ROI. These coordinates are adjusted to the absolute position in the original image by adding the ROI's coordinates. The detected text and its confidence level are then displayed on the image. Bounding boxes are drawn around the detected text using red lines, and the text, along with its confidence score, is annotated above the detected text.

Finally, the function disables the axis labels and ticks with plt.axis('off') and shows the annotated image using plt.show(). This comprehensive approach ensures that the detected text and its location are clearly presented on the image.

In [None]:
62 / 5.000
# Function to perform OCR and display the results
def process_image(image_filename, positions):
    # Read image
    img = cv2.imread(image_filename)

    # Make sure the image has loaded
    if img is None:
        raise FileNotFoundError(f"Bild konnte nicht geladen werden: {image_filename}")

    # Image deblur and denoise
    deblurred_img = deblur_image(img)
    denoised_img = denoise_image(deblurred_img)

    # Show image
    plt.figure(figsize=(10, 10))
    plt.imshow(cv2.cvtColor(denoised_img, cv2.COLOR_BGR2RGB))

    for idx, pos in enumerate(positions):
        x, y, w, h = pos
        # Cut out the area of ​​interest
        roi = denoised_img[y:y+h, x:x+w]

        # Apply OCR to the cutout area
        results = reader.readtext(roi)

        for res in results:
            # Coordinates of the recognized text relative to the ROI
            (xr1, yr1), (xr2, yr2), (xr3, yr3), (xr4, yr4) = res[0]
            # Absolute coordinates in the original image
            x1, y1 = x + xr1, y + yr1
            x2, y2 = x + xr2, y + yr2
            x3, y3 = x + xr3, y + yr3
            x4, y4 = x + xr4, y + yr4
            Text and confidence of recognition
            det, conf = res[1], res[2]
            # View results
            plt.plot([x1, x2, x3, x4, x1], [y1, y2, y3, y4, y1], 'r-')
            plt.text(x1, y1 - 10, f'{det} [{round(conf, 2)}]', color='red', fontsize=12)
            print(f'Digit {idx+1}: Text: {det}, Konfidenz: {conf:.2f}')

    plt.axis('off')
    plt.show()

**Iterating Over Image Files and processing**

This step involves the iterative processing of multiple images (meter0001.jpg and meter0002.jpg). For each image file in the list, the complete path to the image is generated by concatenating the directory path (image_folder) with the image filename (image_file). This path is then used to load and process the specific image. A message is printed to the console to indicate which image is being processed, thereby providing clarity on the current operation. Subsequently, the process_image function is called with the constructed image path and the corresponding positional data retrieved from the positions dictionary for the specific image file. This procedure ensures that each image is processed according to its designated regions of interest, as defined by the positional data.

In [None]:
# Iterate through the image files and process them
for image_file in image_files:
    image_path = os.path.join(image_folder, image_file)
    print(f"Verarbeite Bild: {image_path}")
    process_image(image_path, positions[image_file])

# **Image preprocessing - black and white conversion**

**Installation**

The installation commands facilitate the setup of essential libraries crucial for developing an Optical Character Recognition (OCR) model tailored to recognize digits from images of electricity meters.

Firstly, *'opencv-python-headless'* is installed to provide a streamlined version of OpenCV, optimized for server environments or systems without graphical interfaces. This version supports a range of computer vision tasks such as image reading, processing, and feature detection, pivotal for enhancing and analyzing meter images without the need for graphical output.

Next, *'easyocr'* is installed, specializing in text extraction from images. This library supports multilingual OCR capabilities and employs efficient algorithms to accurately recognize text, including numerical digits, across diverse environmental conditions. It integrates seamlessly into the project to facilitate the extraction of meter readings from captured images, contributing to the overall functionality of the OCR solution.

Finally, *'matplotlib'* is installed to provide comprehensive visualization tools. As a versatile plotting library, matplotlib enables the creation of static, animated, and interactive visualizations. Its integration ensures the project can visually represent processed images, annotate detected digits, and present OCR results in a clear and interpretable manner. Together, these installations establish a robust environment for developing and evaluating an OCR model designed specifically for digit recognition in electricity meter images.

In [None]:
# Installation of OpenCV
!pip install opencv-python-headless

# Installation of EasyOCR
!pip install easyocr

# Installation of Matplotlib
!pip install matplotlib

**Importing Libraries**

The essential libaries required  in this step are imported for developing an Optical Character Recognition (OCR) model aimed at digit recognition in electricity meter images.

*'Cv2 (OpenCV)'* is imported to handle various computer vision tasks such as image reading, manipulation, and processing, which are foundational for enhancing image quality and extracting relevant features from meter images.

*'Easyocr'* is imported as a specialized OCR library designed to extract text from images. It supports multiple languages and provides algorithms optimized for accurate character recognition, specifically focusing on digit identification from diverse image backgrounds and conditions.

*'matplotlib.pyplot'* is imported to facilitate data visualization tasks, enabling the generation of graphical outputs that aid in visualizing processed images, annotating recognized digits, and presenting OCR results. This library plays a crucial role in visual feedback and evaluation of the OCR model's performance.

Lastly, *'os'* is imported to interact with the operating system, allowing for file management tasks such as navigating directories, accessing image files, and ensuring compatibility across different operating environments. Together, these libraries establish a comprehensive framework necessary for implementing and evaluating an OCR solution tailored for digit recognition in electricity meter images.

In [None]:
# Importing libraries for image processing, OCR, visualization, and operating system tasks
import cv2  # Library for computer vision tasks
import easyocr  # Optical Character Recognition (OCR) library
import matplotlib.pyplot as plt  # Visualization library
import os  # Library for interacting with the operating system

**Loading OCR Model**

This step initializes an Optical Character Recognition (OCR) model by loading the OCR model with English language support. EasyOCR, is a robust OCR library and designed to extract text from images efficiently. By specifying ['en'], the model is tailored to recognize and process text in English. This setup is crucial for accurately identifying and interpreting digits and characters in images, such as in our exmample those from electricity meters. This initialization prepares the model for subsequent text recognition tasks, ensuring it is optimized for the specified language, thus enhancing its accuracy and reliability in extracting textual information from images.

In [None]:
# Loading the OCR model with English language support
reader = easyocr.Reader(['en'])

**Defining Image Files**

This step is fundamental in organizing and managing the image files that will be used in the digit extraction process, ensuring a structured and efficient workflow, by initializing a list named image_files which contains the filenames of images, specifically 'meter0001.jpg' and 'meter0002.jpg'. These filenames correspond to images of electricity meters. By defining this list, the code prepares for the systematic access and processing of each image in subsequent operations. The list serves as a reference point, enabling the code to iterate through each image file, read the images, and perform OCR (Optical Character Recognition) to extract digits.

In [None]:
# List of image filenames that should be extracted for digit recognition
image_files = [
    'meter0001.jpg', 'meter0002.jpg'
]

**Defining Digit Positions**

In the following step a dictionary named positions is established, with each key representing an image filename and each corresponding value being a list of tuples. These tuples contain the coordinates (x, y, width, height) that define the regions within the respective images where digits are located. For 'meter0001.jpg' and 'meter0002.jpg', the coordinates specify the precise locations and dimensions of regions of interest (ROIs), indicating where the digits are expected to be found. This facilitates targeted image processing and analysis operations on these specific regions.

In [None]:
# Dictionary defining the exact positions (x, y, width, height) of digits in each image
positions = {
'meter0001.jpg': [(9, 13, 84, 146), (99, 13, 76, 144,), (178, 13, 77, 143,), (261, 13, 77, 143,), (350, 17, 66, 136,)],
'meter0002.jpg': [(6, 7, 49, 92), (60, 7, 49, 91), (113, 5, 58, 92,), (174, 5, 50, 92,), (229, 6, 48, 91,)]
}

**Setting Image Folder Path**

Even if the images are already manually loaded in a former step, this path serves to organize and ensure that the images are accessible and correctly loaded.The variable image_folder is defined and the string '/content/' assigned (varies between storage locations). The purpose of this assignment is to specify the directory path where the image files required for further processing are located. By setting this variable, the code ensures a consistent reference point for accessing these images, facilitating file operations such as reading and writing.

In [None]:
# Directory path for the images
image_folder = '/content/'  # The image path can vary due to folder location

**Defining function for Executing Optical Character Recognition (OCR) and Visualizing Detected Text in Image Regions**

In this step the process_image function is used to apply Optical Character Recognition (OCR) on an image and display the results, incorporating several preprocessing steps to enhance text detection. Initially, the function reads the image from the specified file path using cv2.imread and checks for successful loading, raising a FileNotFoundError if the image cannot be loaded.

Following the successful loading of the image, the function converts the image to grayscale using cv2.cvtColor, which simplifies the image by removing color information. This grayscale image is then subjected to a binarization process through cv2.threshold. In this step, pixel values are thresholded to create a binary image where pixels are either black or white. The binarization utilizes Otsu’s thresholding method to automatically determine the optimal threshold value.

The function then visualizes the binary image using matplotlib, setting the colormap to grayscale to accurately represent the binary image. It iterates over the specified positions, each defining a rectangular region of interest (ROI) within the binary image. For each ROI, the function extracts the relevant segment and performs OCR using reader.readtext to detect and read any text present within the ROI.

The results of the OCR process include bounding box coordinates for the detected text. These coordinates are adjusted to reflect their positions in the context of the original image. The function then visualizes the detected text by drawing bounding boxes around the detected regions in red and annotating the text along with its confidence score on the image. Finally, the function disables axis labels and ticks using plt.axis('off') and displays the annotated image with plt.show(), thereby providing a clear view of the OCR results and their locations within the image.

In [None]:
# Function to perform OCR and display the results
def process_image(image_filename, positions):
    # Read image
    img = cv2.imread(image_filename)

    # Make sure the image has loaded
    if img is None:
        raise FileNotFoundError(f"Bild konnte nicht geladen werden: {image_filename}")

    # Convert image to grayscale
    gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Binarize image (thresholding)
    _, binary_img = cv2.threshold(gray_img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)

    # Show image
    plt.figure(figsize=(10, 10))
    plt.imshow(binary_img, cmap='gray')

    for idx, pos in enumerate(positions):
        x, y, w, h = pos
        # Cut out the area of ​​interest
        roi = binary_img[y:y+h, x:x+w]

        # Apply OCR to the cutout area
        results = reader.readtext(roi)

        for res in results:
            # Coordinates of the recognized text relative to the ROI
            (xr1, yr1), (xr2, yr2), (xr3, yr3), (xr4, yr4) = res[0]
            # Absolute coordinates in the original image
            x1, y1 = x + xr1, y + yr1
            x2, y2 = x + xr2, y + yr2
            x3, y3 = x + xr3, y + yr3
            x4, y4 = x + xr4, y + yr4
            # Text and confidence of recognition
            det, conf = res[1], res[2]
            # View results
            plt.plot([x1, x2, x3, x4, x1], [y1, y2, y3, y4, y1], 'r-')
            plt.text(x1, y1 - 10, f'{det} [{round(conf, 2)}]', color='red', fontsize=12)
            print(f'Digit {idx+1}: Text: {det}, Konfidenz: {conf:.2f}')

    plt.axis('off')
    plt.show()

**Iterating Over Image Files and processing**

This step involves the iterative processing of multiple images (meter0001.jpg and meter0002.jpg). For each image file in the list, the complete path to the image is generated by concatenating the directory path (image_folder) with the image filename (image_file). This path is then used to load and process the specific image. A message is printed to the console to indicate which image is being processed, thereby providing clarity on the current operation. Subsequently, the process_image function is called with the constructed image path and the corresponding positional data retrieved from the positions dictionary for the specific image file. This procedure ensures that each image is processed according to its designated regions of interest, as defined by the positional data.

In [None]:
# Iterate through the image files and process them
for image_file in image_files:
    image_path = os.path.join(image_folder, image_file)
    print(f"Verarbeite Bild: {image_path}")
    process_image(image_path, positions[image_file])

# **Image preprocessing - grayscaling**

**Installation**

The installation commands facilitate the setup of essential libraries crucial for developing an Optical Character Recognition (OCR) model tailored to recognize digits from images of electricity meters.

Firstly, *'opencv-python-headless'* is installed to provide a streamlined version of OpenCV, optimized for server environments or systems without graphical interfaces. This version supports a range of computer vision tasks such as image reading, processing, and feature detection, pivotal for enhancing and analyzing meter images without the need for graphical output.

Next, *'easyocr'* is installed, specializing in text extraction from images. This library supports multilingual OCR capabilities and employs efficient algorithms to accurately recognize text, including numerical digits, across diverse environmental conditions. It integrates seamlessly into the project to facilitate the extraction of meter readings from captured images, contributing to the overall functionality of the OCR solution.

Finally, *'matplotlib'* is installed to provide comprehensive visualization tools. As a versatile plotting library, matplotlib enables the creation of static, animated, and interactive visualizations. Its integration ensures the project can visually represent processed images, annotate detected digits, and present OCR results in a clear and interpretable manner. Together, these installations establish a robust environment for developing and evaluating an OCR model designed specifically for digit recognition in electricity meter images.

In [None]:
# Installation of OpenCV
!pip install opencv-python-headless

# Installation of EasyOCR
!pip install easyocr

# Installation of Matplotlib
!pip install matplotlib

**Importing Libraries**

The essential libaries required  in this step are imported for developing an Optical Character Recognition (OCR) model aimed at digit recognition in electricity meter images.

*'Cv2 (OpenCV)'* is imported to handle various computer vision tasks such as image reading, manipulation, and processing, which are foundational for enhancing image quality and extracting relevant features from meter images.

*'Easyocr'* is imported as a specialized OCR library designed to extract text from images. It supports multiple languages and provides algorithms optimized for accurate character recognition, specifically focusing on digit identification from diverse image backgrounds and conditions.

*'matplotlib.pyplot'* is imported to facilitate data visualization tasks, enabling the generation of graphical outputs that aid in visualizing processed images, annotating recognized digits, and presenting OCR results. This library plays a crucial role in visual feedback and evaluation of the OCR model's performance.

Lastly, *'os'* is imported to interact with the operating system, allowing for file management tasks such as navigating directories, accessing image files, and ensuring compatibility across different operating environments. Together, these libraries establish a comprehensive framework necessary for implementing and evaluating an OCR solution tailored for digit recognition in electricity meter images.

In [None]:
# Importing libraries for image processing, OCR, visualization, and operating system tasks
import cv2  # Library for computer vision tasks
import easyocr  # Optical Character Recognition (OCR) library
import matplotlib.pyplot as plt  # Visualization library
import os  # Library for interacting with the operating system

**Loading OCR Model**

This step initializes an Optical Character Recognition (OCR) model by loading the OCR model with English language support. EasyOCR, is a robust OCR library and designed to extract text from images efficiently. By specifying ['en'], the model is tailored to recognize and process text in English. This setup is crucial for accurately identifying and interpreting digits and characters in images, such as in our exmample those from electricity meters. This initialization prepares the model for subsequent text recognition tasks, ensuring it is optimized for the specified language, thus enhancing its accuracy and reliability in extracting textual information from images.

In [None]:
# Loading the OCR model with English language support
reader = easyocr.Reader(['en'])

**Defining Image Files**

This step is fundamental in organizing and managing the image files that will be used in the digit extraction process, ensuring a structured and efficient workflow, by initializing a list named image_files which contains the filenames of images, specifically 'meter0001.jpg' and 'meter0002.jpg'. These filenames correspond to images of electricity meters. By defining this list, the code prepares for the systematic access and processing of each image in subsequent operations. The list serves as a reference point, enabling the code to iterate through each image file, read the images, and perform OCR (Optical Character Recognition) to extract digits.

In [None]:
# List of image filenames that should be extracted for digit recognition
image_files = [
    'meter0001.jpg', 'meter0002.jpg'
]

**Defining Digit Positions**

In the following step a dictionary named positions is established, with each key representing an image filename and each corresponding value being a list of tuples. These tuples contain the coordinates (x, y, width, height) that define the regions within the respective images where digits are located. For 'meter0001.jpg' and 'meter0002.jpg', the coordinates specify the precise locations and dimensions of regions of interest (ROIs), indicating where the digits are expected to be found. This facilitates targeted image processing and analysis operations on these specific regions.

In [None]:
# Dictionary defining the exact positions (x, y, width, height) of digits in each image
positions = {
'meter0001.jpg': [(9, 13, 84, 146), (99, 13, 76, 144,), (178, 13, 77, 143,), (261, 13, 77, 143,), (350, 17, 66, 136,)],
'meter0002.jpg': [(6, 7, 49, 92), (60, 7, 49, 91), (113, 5, 58, 92,), (174, 5, 50, 92,), (229, 6, 48, 91,)]

}

**Setting Image Folder Path**

Even if the images are already manually loaded in a former step, this path serves to organize and ensure that the images are accessible and correctly loaded.The variable image_folder is defined and the string '/content/' assigned (varies between storage locations). The purpose of this assignment is to specify the directory path where the image files required for further processing are located. By setting this variable, the code ensures a consistent reference point for accessing these images, facilitating file operations such as reading and writing.

In [None]:
# Directory path for the images
image_folder = '/content/'  # The image path can vary due to folder location

**Defining function for Executing Optical Character Recognition (OCR) and Visualizing Detected Text in Image Regions**

The process_image function in this step is designed to execute Optical Character Recognition (OCR) on specific regions of a grayscale image and to visualize the results. The process begins with reading the image from the specified file path using the cv2.imread function. The function then verifies that the image has been successfully loaded; if not, it raises a FileNotFoundError to indicate the failure to load the image.

Upon successful loading, the image is converted to grayscale with the cv2.cvtColor function, which simplifies the image by removing color information and reducing it to shades of gray. This grayscale image is then displayed using matplotlib to provide a visual reference for the OCR results.

The function iterates over a list of positions, where each position specifies a rectangular region of interest (ROI) within the image. For each position, it extracts the corresponding ROI from the grayscale image. The OCR process is applied to this ROI using the reader.readtext method, which identifies and reads any text present within this segment.

The results from the OCR process include the coordinates of the detected text within the ROI. These coordinates are adjusted to reflect their positions in the context of the original image. The function then visualizes the OCR results by drawing bounding boxes around the detected text in red and annotating these boxes with the detected text and its confidence level. This information is also printed to the console for further reference.

Finally, the function disables the display of axis labels and ticks using plt.axis('off'), and shows the annotated image with plt.show(). This comprehensive approach ensures that the detected text and its spatial location within the image are clearly presented.

In [None]:
# Function to perform OCR and display the results
def process_image(image_filename, positions):
    # Read image
    img = cv2.imread(image_filename)

    # Make sure the image has loaded
    if img is None:
        raise FileNotFoundError(f"Bild konnte nicht geladen werden: {image_filename}")

    # Convert image to grayscale
    gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Show image
    plt.figure(figsize=(10, 10))
    plt.imshow(gray_img, cmap='gray')

    for idx, pos in enumerate(positions):
        x, y, w, h = pos
        # Cut out the area of ​​interest
        roi = gray_img[y:y+h, x:x+w]

        # Apply OCR to the cutout area
        results = reader.readtext(roi)

        for res in results:
            # Coordinates of the recognized text relative to the ROI
            (xr1, yr1), (xr2, yr2), (xr3, yr3), (xr4, yr4) = res[0]
            # Absolute coordinates in the original image
            x1, y1 = x + xr1, y + yr1
            x2, y2 = x + xr2, y + yr2
            x3, y3 = x + xr3, y + yr3
            x4, y4 = x + xr4, y + yr4
            # Text and confidence of recognition
            det, conf = res[1], res[2]
            # View results
            plt.plot([x1, x2, x3, x4, x1], [y1, y2, y3, y4, y1], 'r-')
            plt.text(x1, y1 - 10, f'{det} [{round(conf, 2)}]', color='red', fontsize=12)
            print(f'Digit {idx+1}: Text: {det}, Konfidenz: {conf:.2f}')

    plt.axis('off')
    plt.show()

**Iterating Over Image Files and processing**

This step involves the iterative processing of multiple images (meter0001.jpg and meter0002.jpg). For each image file in the list, the complete path to the image is generated by concatenating the directory path (image_folder) with the image filename (image_file). This path is then used to load and process the specific image. A message is printed to the console to indicate which image is being processed, thereby providing clarity on the current operation. Subsequently, the process_image function is called with the constructed image path and the corresponding positional data retrieved from the positions dictionary for the specific image file. This procedure ensures that each image is processed according to its designated regions of interest, as defined by the positional data.

In [None]:
# Iterate through the image files and process them
for image_file in image_files:
    image_path = os.path.join(image_folder, image_file)
    print(f"Verarbeite Bild: {image_path}")
    process_image(image_path, positions[image_file])

# **Image preprocessing - adjustment of digit positioning**

**Installation**

The installation commands facilitate the setup of essential libraries crucial for developing an Optical Character Recognition (OCR) model tailored to recognize digits from images of electricity meters.

Firstly, *'opencv-python-headless'* is installed to provide a streamlined version of OpenCV, optimized for server environments or systems without graphical interfaces. This version supports a range of computer vision tasks such as image reading, processing, and feature detection, pivotal for enhancing and analyzing meter images without the need for graphical output.

Next, *'easyocr'* is installed, specializing in text extraction from images. This library supports multilingual OCR capabilities and employs efficient algorithms to accurately recognize text, including numerical digits, across diverse environmental conditions. It integrates seamlessly into the project to facilitate the extraction of meter readings from captured images, contributing to the overall functionality of the OCR solution.

Finally, *'matplotlib'* is installed to provide comprehensive visualization tools. As a versatile plotting library, matplotlib enables the creation of static, animated, and interactive visualizations. Its integration ensures the project can visually represent processed images, annotate detected digits, and present OCR results in a clear and interpretable manner. Together, these installations establish a robust environment for developing and evaluating an OCR model designed specifically for digit recognition in electricity meter images.

In [None]:
# Installation of OpenCV
!pip install opencv-python-headless

# Installation of EasyOCR
!pip install easyocr

# Installation of Matplotlib
!pip install matplotlib

**Importing Libraries**

The essential libaries required in this step are imported for developing an Optical Character Recognition (OCR) model aimed at digit recognition in electricity meter images.

*'Cv2 (OpenCV)'* is imported to handle various computer vision tasks such as image reading, manipulation, and processing, which are foundational for enhancing image quality and extracting relevant features from meter images.

*'Easyocr'* is imported as a specialized OCR library designed to extract text from images. It supports multiple languages and provides algorithms optimized for accurate character recognition, specifically focusing on digit identification from diverse image backgrounds and conditions.

*'matplotlib.pyplot'* is imported to facilitate data visualization tasks, enabling the generation of graphical outputs that aid in visualizing processed images, annotating recognized digits, and presenting OCR results. This library plays a crucial role in visual feedback and evaluation of the OCR model's performance.

Lastly, *'os'* is imported to interact with the operating system, allowing for file management tasks such as navigating directories, accessing image files, and ensuring compatibility across different operating environments. Together, these libraries establish a comprehensive framework necessary for implementing and evaluating an OCR solution tailored for digit recognition in electricity meter images.

In [None]:
# Importing libraries for image processing, OCR, visualization, and operating system tasks
import cv2  # Library for computer vision tasks
import easyocr  # Optical Character Recognition (OCR) library
import matplotlib.pyplot as plt  # Visualization library
import os  # Library for interacting with the operating system

**Loading OCR Model**

This step initializes an Optical Character Recognition (OCR) model by loading the OCR model with English language support. EasyOCR, is a robust OCR library and designed to extract text from images efficiently. By specifying ['en'], the model is tailored to recognize and process text in English. This setup is crucial for accurately identifying and interpreting digits and characters in images, such as in our exmample those from electricity meters. This initialization prepares the model for subsequent text recognition tasks, ensuring it is optimized for the specified language, thus enhancing its accuracy and reliability in extracting textual information from images.

In [None]:
# Loading the OCR model with English language support
reader = easyocr.Reader(['en'])

**Defining Image Files**

This step is fundamental in organizing and managing the image files that will be used in the digit extraction process, ensuring a structured and efficient workflow, by initializing a list named image_files which contains the filenames of images, specifically 'meter0001.jpg' and 'meter0002.jpg'. These filenames correspond to images of electricity meters. By defining this list, the code prepares for the systematic access and processing of each image in subsequent operations. The list serves as a reference point, enabling the code to iterate through each image file, read the images, and perform OCR (Optical Character Recognition) to extract digits.

In [None]:
# List of image filenames that should be extracted for digit recognition
image_files = [
    'meter0001.jpg', 'meter0002.jpg'
]

**Defining Digit Positions**

In the following step a dictionary named positions is established, with each key representing an image filename and each corresponding value being a list of tuples. These tuples contain the coordinates (x, y, width, height) that define the regions within the respective images where digits are located. For 'meter0001.jpg' and 'meter0002.jpg', the coordinates specify the precise locations and dimensions of regions of interest (ROIs), indicating where the digits are expected to be found. This facilitates targeted image processing and analysis operations on these specific regions.

In [None]:
# Dictionary defining the exact positions (x, y, width, height) of digits in each image
positions = {
'meter0001.jpg': [(9, 2, 74, 169), (99, 2, 68, 144,), (178, 2, 77, 143), (261, 2, 77, 143,), (350, 17, 66, 136,)],
'meter0002.jpg': [(6, 5, 50, 93,), (60, 2, 50, 98), (113, 5, 58, 92,), (174, 5, 50, 92,), (229, 6, 48, 91,)]

}

**Setting Image Folder Path**

Even if the images are already manually loaded in a former step, this path serves to organize and ensure that the images are accessible and correctly loaded.The variable image_folder is defined and the string '/content/' assigned (varies between storage locations). The purpose of this assignment is to specify the directory path where the image files required for further processing are located. By setting this variable, the code ensures a consistent reference point for accessing these images, facilitating file operations such as reading and writing.

In [None]:
# Directory path for the images
image_folder = '/content/'  # The image path can vary due to folder location

**Defining function for Executing Optical Character Recognition (OCR) and Visualizing Detected Text in Image Regions**

The following part with the function process_image is designed to conduct Optical Character Recognition (OCR) on designated regions of an image and to display the outcomes. Initially, the function reads the image from the specified file path using cv2.imread. It then ensures that the image has been successfully loaded; if not, it raises a FileNotFoundError, indicating that the image could not be found or opened.

Once the image is successfully loaded, it is displayed using matplotlib. The color space of the image is converted from BGR (Blue, Green, Red) to RGB (Red, Green, Blue) using cv2.cvtColor to ensure accurate color representation in the display. The function then iterates through a list of positions, each representing a rectangular region of interest (ROI) within the image. For each specified position, it extracts the corresponding ROI from the image.

The extracted ROI is subjected to OCR using the reader.readtext method, which detects and reads the text contained within this region. The results include bounding box coordinates for each detected text element, which are initially relative to the ROI. These coordinates are adjusted to reflect their absolute position in the context of the entire image.

The function then visualizes the results by plotting red bounding boxes around the detected text and annotating these boxes with the recognized text and the associated confidence level. Additionally, it prints the detected text and its confidence level to the console. Finally, the function turns off axis labels and ticks using plt.axis('off') and displays the annotated image with plt.show(), thus providing a comprehensive view of the OCR results and their spatial locations within the original image.

In [None]:
# Function to perform OCR and display the results
def process_image(image_filename, positions):
    # Read image
    img = cv2.imread(image_filename)

    # Make sure the image has loaded
    if img is None:
        raise FileNotFoundError(f"Bild konnte nicht geladen werden: {image_filename}")

    # Show image
    plt.figure(figsize=(10, 10))
    plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

    for idx, pos in enumerate(positions):
        x, y, w, h = pos
        # Cut out the area of ​​interest
        roi = img[y:y+h, x:x+w]

        # Apply OCR to the cutout area
        results = reader.readtext(roi)

        for res in results:
            # Coordinates of the recognized text relative to the ROI
            (xr1, yr1), (xr2, yr2), (xr3, yr3), (xr4, yr4) = res[0]
            # Absolute coordinates in the original image
            x1, y1 = x + xr1, y + yr1
            x2, y2 = x + xr2, y + yr2
            x3, y3 = x + xr3, y + yr3
            x4, y4 = x + xr4, y + yr4
            # Text and confidence of recognition
            det, conf = res[1], res[2]
            # View results
            plt.plot([x1, x2, x3, x4, x1], [y1, y2, y3, y4, y1], 'r-')
            plt.text(x1, y1 - 10, f'{det} [{round(conf, 2)}]', color='red', fontsize=12)
            print(f'Digit {idx+1}: Text: {det}, Konfidenz: {conf:.2f}')

    plt.axis('off')
    plt.show()


**Iterating Over Image Files and processing**

This step involves the iterative processing of multiple images (meter0001.jpg and meter0002.jpg). For each image file in the list, the complete path to the image is generated by concatenating the directory path (image_folder) with the image filename (image_file). This path is then used to load and process the specific image. A message is printed to the console to indicate which image is being processed, thereby providing clarity on the current operation. Subsequently, the process_image function is called with the constructed image path and the corresponding positional data retrieved from the positions dictionary for the specific image file. This procedure ensures that each image is processed according to its designated regions of interest, as defined by the positional data.

In [None]:
# Iterate through the image files and process them
for image_file in image_files:
    image_path = os.path.join(image_folder, image_file)
    print(f"Verarbeite Bild: {image_path}")
    process_image(image_path, positions[image_file])