# Eksperimen 1: Full Pipeline (OCR + Gemini LLM)

Notebook ini menjalankan pipeline lengkap:
1.  **OCR (EasyOCR)**: Mengubah gambar tulisan tangan menjadi teks mentah.
2.  **LLM Correction (Gemini)**: Memperbaiki typo dan struktur menggunakan Google Gemini.


In [7]:
!pip install easyocr google-generativeai pandas

Collecting google-generativeai
  Downloading google_generativeai-0.8.6-py3-none-any.whl.metadata (3.9 kB)
Collecting google-ai-generativelanguage==0.6.15 (from google-generativeai)
  Downloading google_ai_generativelanguage-0.6.15-py3-none-any.whl.metadata (5.7 kB)
Collecting google-api-core (from google-generativeai)
  Downloading google_api_core-2.29.0-py3-none-any.whl.metadata (3.3 kB)
Collecting google-api-python-client (from google-generativeai)
  Downloading google_api_python_client-2.187.0-py3-none-any.whl.metadata (7.0 kB)
Collecting google-auth>=2.15.0 (from google-generativeai)
  Downloading google_auth-2.47.0-py3-none-any.whl.metadata (6.4 kB)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-ai-generativelanguage==0.6.15->google-generativeai)
  Downloading proto_plus-1.27.0-py3-none-any.whl.metadata (2.2 kB)
Collecting protobuf (from google-generativeai)
  Downloading protobuf-5.29.5-cp310-abi3-win_amd64.whl.metadata (592 bytes)
Collecting googleapis-common-protos<2.0.0,

In [8]:
import easyocr
import os
import glob
import time
import pandas as pd
import google.generativeai as genai

# --- CONFIGURATION ---
GEMINI_API_KEY = "AIzaSyBkHjbWnXjvya8LKYwngKA-gExoseuQVuM"
DATASET_DIR = r'F:\projek dosen\tutoring\Agentic Multimodal Tutor - SLL\playwithOCR\dataset\test'
IMAGES_DIR = os.path.join(DATASET_DIR, 'images')
GT_DIR = os.path.join(DATASET_DIR, 'gt')
PROMPT_FILE = "prompt_correction.txt"

genai.configure(api_key=GEMINI_API_KEY)
print("Gemini Configured!")

print("Loading EasyOCR...")
reader = easyocr.Reader(['en', 'id'], gpu=True)
print("EasyOCR Loaded!")


All support for the `google.generativeai` package has ended. It will no longer be receiving 
updates or bug fixes. Please switch to the `google.genai` package as soon as possible.
See README for more details:

https://github.com/google-gemini/deprecated-generative-ai-python/blob/main/README.md

  import google.generativeai as genai
Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.


Gemini Configured!
Loading EasyOCR...
EasyOCR Loaded!


In [9]:
def read_ground_truth(filename_base):
    gt_path = os.path.join(GT_DIR, f"{filename_base}.txt")
    if os.path.exists(gt_path):
        with open(gt_path, 'r', encoding='utf-8') as f:
            return f.read().strip()
    return ""

def run_ocr_phase():
    image_files = glob.glob(os.path.join(IMAGES_DIR, "*.jpg")) + \
                  glob.glob(os.path.join(IMAGES_DIR, "*.jpeg")) + \
                  glob.glob(os.path.join(IMAGES_DIR, "*.png"))
    
    print(f"Found {len(image_files)} images.")
    results = []
    
    for img_path in image_files:
        filename = os.path.basename(img_path)
        filename_base = os.path.splitext(filename)[0]
        print(f"OCR Processing: {filename}...")
        
        start_t = time.time()
        raw_list = reader.readtext(img_path, detail=0)
        time_taken = time.time() - start_t
        
        ocr_text = " ".join(raw_list)
        gt_text = read_ground_truth(filename_base)
        
        results.append({
            'filename': filename,
            'ocr_raw': ocr_text,
            'ground_truth': gt_text,
            'ocr_time': time_taken
        })
        
    return pd.DataFrame(results)

# Step 1: Run OCR
df = run_ocr_phase()
df.to_csv("ocr_stage1_results.csv", index=False)
print("OCR Stage Done. Saved to ocr_stage1_results.csv")

Found 11 images.
OCR Processing: if4908_103012500097_nomor1.jpg...




OCR Processing: if4908_103012500098_nomor1.jpg...




OCR Processing: if4908_103012500281_nomor1.jpg...




OCR Processing: if4908_103012500305_nomor1.jpg...




OCR Processing: if4908_103012500322_nomor1.jpg...




OCR Processing: if4908_103012530052_nomor1.jpg...




OCR Processing: if4910_103012500004_nomor1.jpg...




OCR Processing: if4911_103012500384_nomor1.jpg...




OCR Processing: if4909_103012500132_nomor1.jpeg...




OCR Processing: if4909_103012530074_nomor1.jpeg...




OCR Processing: if4910_103012500367_nomor1.png...




OCR Stage Done. Saved to ocr_stage1_results.csv


In [10]:
def run_llm_phase(df_input):
    # Load Prompt Template
    with open(PROMPT_FILE, 'r', encoding='utf-8') as f:
        template = f.read()
        
    model = genai.GenerativeModel('gemini-pro')
    
    refined_texts = []
    llm_times = []
    
    print("Starting LLM Correction...")
    for index, row in df_input.iterrows():
        ocr_text = row['ocr_raw']
        filename = row['filename']
        
        # Construct Prompt
        prompt = template.replace("{OCR_TEXT}", str(ocr_text))
        
        print(f"LLM Processing: {filename}...")
        start_t = time.time()
        try:
            response = model.generate_content(prompt)
            refined_text = response.text.strip()
        except Exception as e:
            print(f"Error LLM: {e}")
            refined_text = "[ERROR LLM]"
            
        time_taken = time.time() - start_t
        
        refined_texts.append(refined_text)
        llm_times.append(time_taken)
        
    df_input['llm_refined'] = refined_texts
    df_input['llm_time'] = llm_times
    return df_input

# Step 2: Run LLM
df_final = run_llm_phase(df)
df_final.to_csv("ocr_llm_final_results.csv", index=False)
print("Full Pipeline Done! Saved to ocr_llm_final_results.csv")

Starting LLM Correction...
LLM Processing: if4908_103012500097_nomor1.jpg...
Error LLM: 404 models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.
LLM Processing: if4908_103012500098_nomor1.jpg...
Error LLM: 404 models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.
LLM Processing: if4908_103012500281_nomor1.jpg...
Error LLM: 404 models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.
LLM Processing: if4908_103012500305_nomor1.jpg...
Error LLM: 404 models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.
LLM P

In [11]:
# Display Comparison
for index, row in df_final.head().iterrows():
    print(f"--- {row['filename']} ---")
    print("[OCR RAW]:", row['ocr_raw'][:100], "...")
    print("[LLM FIX]:", row['llm_refined'][:100], "...")
    print("[GT]:     ", row['ground_truth'][:100], "...")
    print("\n")

--- if4908_103012500097_nomor1.jpg ---
[OCR RAW]: progran Enkripsi Ku Mus M, Jvhı , d3 ,/4 , P ,4 inte{ K 1 Iaeq ( algontma M: Iooo ( = M < : 999 [apu ...
[LLM FIX]: [ERROR LLM] ...
[GT]:      program Enkripsi
kamus
   	m,d1,d2,d3,d4,p,q : integer
   	k : integer
algortitma
   	m = 1000 <= m  ...


--- if4908_103012500098_nomor1.jpg ---
[OCR RAW]: program Enkripsi Karvs 8ı , 81 , J4, 84 inleger P 1 , k : inlege Iyair?. inf ( J1, Ja, J1, J 4 P, 1) ...
[LLM FIX]: [ERROR LLM] ...
[GT]:      program enkripsi
kamus
        d1, d2, d3, d4: integer
        p, q, k: integer
algoritma
        in ...


--- if4908_103012500281_nomor1.jpg ---
[OCR RAW]: program Posyandu Vumuj N,w;h 1n+9w (E 2 , Oilai bMi 1 : [2ui qlyoritmg Japut ( n] FOc i . ^ Lu do Io ...
[LLM FIX]: [ERROR LLM] ...
[GT]:      program enkripsi
kamus
	d1, d2, d3, d4, p, q, k : integer

algoritma
       	input(d1, d2, d3, d4, p ...


--- if4908_103012500305_nomor1.jpg ---
[OCR RAW]: Prodran Erkrlora kamus dı /  da, 43 , d4, Fı 9 ;In