<h1 style="text-align: center;"><strong>Zero-shot-OCR</strong> :</h1>

## (OCR) system that can recognize and extract text from images without requiring prior training on specific fonts, styles, or contexts .

![](http://imgur.com/CdE5O84.gif)

<a id="1"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(60, 121, 245) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 1. Importing Some Libraries / Dependencies </b></div>

In [None]:
!pip install verovio
!pip install tiktoken

In [None]:
import os
import pandas as pd
from transformers import AutoModel, AutoTokenizer
import matplotlib.pyplot as plt
import re

<a id="2"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(255, 217, 19) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 2. Initializing the Tokenizer and Model </b></div>

<h1 style="text-align: center;"><strong>General OCR Theory</strong> :</h1>

## Towards OCR-2.0 via a Unified End-to-end Model by stepfun-ai
![jzE1yg9.png](https://imgur.com/jzE1yg9.png)

<h1 style="text-align: center;"><strong>MODEL ARCHITECTURE</strong> :</h1>

![dWJtgvA.png](https://imgur.com/dWJtgvA.png)

In [None]:
tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', 
                                          trust_remote_code=True)

model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', 
                                  trust_remote_code=True, 
                                  low_cpu_mem_usage=True, 
                                  device_map='cuda', 
                                  use_safetensors=True, 
                                  pad_token_id=tokenizer.eos_token_id)

model = model.eval().cuda()

<a id="3"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(31, 193, 27) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 3. Loading Files </b></div>

In [None]:
base_dir = '/kaggle/input/ai-of-god-3/Public_data/test_images'

In [None]:
submission_df = pd.read_excel("/kaggle/input/ai-of-god-3/Public_data/submission.csv.xlsx")

<a id="4"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(255, 156, 85) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 4. A List to Store Submission Data </b></div>

In [None]:
submission_data = []

<a id="5"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(82, 15, 70) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 5. A Function to apply OCR to a given image </b></div>

In [None]:
def apply_ocr(image_path):
    res = model.chat(tokenizer, image_path, ocr_type='ocr')
    return res


<a id="6"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(200, 13, 12) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 6. A Function to sort image files based on numeric value in the name </b></div>

In [None]:
def natural_sort_key(s):
    return [int(text) if text.isdigit() else text for text in re.split(r'(\d+)', s)]

<a id="7"></a>
<div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(69, 13, 12) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(34, 34, 34);">
    <b>7. PREDICTING TEXTS FROM THE TEST IMAGES</b>
</div>

1. **Looping Through Folders:** The code iterates over each folder in the main directory (`base_dir`), sorting them for orderly processing.

2. **Identifying Page Number:** It extracts the page number from each folder name (e.g., 'Page_1' becomes '1').

3. **Checking for Directories:** It verifies if the current item is indeed a folder.

4. **Looping Through Images:** Inside each folder, the code loops through all PNG images, sorting them by line number using natural sorting (Because the images in our `test_images` folder are not stored in sequential order).

5. **Extracting Line Number:** It retrieves the line number from the image filename (like 'L_1.png', L_2.png). 

6. **Formating Image ID:** The code formats each image ID as `P_{page number}_L_{line number}` (Example: `P_1_L_1`).

7. **Applying OCR:** It applies OCR to extract texts from the images.

8. **Storeing Results:** The formatted image ID and predicted texts are saved in a list (4.) called `submission_data`.

9. **Displaying Images:** For verifying it will displays the first few images along with their predicted text .



In [None]:
!pip install language-tool-python

import language_tool_python

# Initialize the tool for Spanish
tool = language_tool_python.LanguageTool('es')

In [None]:
for folder in sorted(os.listdir(base_dir)):
    folder_path = os.path.join(base_dir, folder)
    page_number = folder.split('_')[-1]
    
    if os.path.isdir(folder_path):
        for image_file in sorted(os.listdir(folder_path), key=natural_sort_key):
            if image_file.endswith('.png'):  
                image_path = os.path.join(folder_path, image_file)
                line_number = image_file.split('_')[-1].split('.')[0]
                formatted_image_id = f'P_{page_number}_L_{line_number}'
                predicted_text = apply_ocr(image_path)

                # Incorrect Spanish sentence
                incorrect_text = predicted_text
                
                # Check and correct the sentence
                matches = tool.check(incorrect_text)
                predicted_text = language_tool_python.utils.correct(incorrect_text, matches)
                                
                submission_data.append({'unique id': formatted_image_id, 'prediction': predicted_text})
                
                print(f"Processed {formatted_image_id}: {predicted_text}")
                if len(submission_data) < 10:   #-------> First 10 predicted Images
                    img = plt.imread(image_path)
                    plt.imshow(img)
                    plt.axis('off')
                    plt.show()

<a id="8"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(20, 13, 121) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 8. Creating a Submission file </b></div>

In [None]:
for index, row in submission_df.iterrows():
    matching_prediction = next((pred for pred in submission_data if pred['unique id'] == row['unique id']), None)
    if matching_prediction:
        submission_df.at[index, 'prediction'] = matching_prediction['prediction']  


In [None]:
submission_df.rename(columns={'unique id': 'unique Id'}, inplace=True)

In [None]:
submission_df.to_csv('submission.csv', index=False)

print("submission file created successfully!")

In [None]:
submission_df

# <div style="box-shadow: rgba(240, 46, 170, 0.4) -5px 5px inset, rgba(240, 46, 170, 0.3) -10px 10px inset, rgba(240, 46, 170, 0.2) -15px 15px inset, rgba(240, 46, 170, 0.1) -20px 20px inset, rgba(240, 46, 170, 0.05) -25px 25px inset; padding:20px; font-size:30px; font-family: consolas; display:fill; border-radius:15px; color: rgba(240, 46, 170, 0.7)"> <b> 💻 Thank You!</b></div>

<p style="font-family:verdana; color:rgb(34, 34, 34); font-family: consolas; font-size: 16px;"> If you enjoy this zero-shot OCR,upvote this notebook. Happy coding!🚀💻🌟. <br>
    </p>