# üìú OCR/HTR with Gemini AI

Extract text from **PDFs** and **images** using Google's Gemini AI. Supports both printed documents (OCR) and handwritten manuscripts (HTR) in multiple languages.

In [None]:
# Install required packages
!pip install -q google-genai PyPDF2 Pillow ipywidgets

# Import necessary libraries
import os
import io
import shutil
from pathlib import Path
from google.colab import files
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output
from google import genai
from google.genai import types
from PyPDF2 import PdfReader, PdfWriter
from PIL import Image

# ============================================
# SUPPORTED FILE FORMATS
# ============================================

# Image formats supported by Gemini
SUPPORTED_IMAGE_FORMATS = {'.jpg', '.jpeg', '.png', '.webp', '.heic', '.heif'}
SUPPORTED_PDF_FORMAT = {'.pdf'}
ALL_SUPPORTED_FORMATS = SUPPORTED_IMAGE_FORMATS | SUPPORTED_PDF_FORMAT

# MIME type mapping
MIME_TYPES = {
    '.jpg': 'image/jpeg',
    '.jpeg': 'image/jpeg',
    '.png': 'image/png',
    '.webp': 'image/webp',
    '.heic': 'image/heic',
    '.heif': 'image/heif',
    '.pdf': 'application/pdf'
}

# ============================================
# CREATE FOLDER STRUCTURE
# ============================================

FOLDERS = {
    'input': 'input_files',  # For PDFs and images
    'results': 'results',
    'prompts': 'prompts',
    'log': 'logs'
}

for folder_name, folder_path in FOLDERS.items():
    os.makedirs(folder_path, exist_ok=True)

# ============================================
# CREATE PROMPT FILES
# ============================================

PROMPT_CONTENT = {
    "ocr_universal.md": '''# Universal OCR/HTR System Prompt

You are a high-precision OCR/HTR system engineered to produce research-grade, archival-quality text extraction from any document type in any language. Your output directly supports academic research and archival preservation.

## Core Principles

1. **Research-Grade Accuracy:** TRANSCRIBE every single word and character with absolute precision. Work character by character, word by word, line by line to minimize Character Error Rate (CER) and Word Error Rate (WER).
2. **Historical Authenticity:** PRESERVE the text exactly as written. RETAIN all spelling variations, grammatical structures, and punctuation. DO NOT normalize, modernize, or correct the text.
3. **Systematic Zone Analysis:** IDENTIFY and PROCESS distinct content zones in their precise reading order.
4. **Pure Archival Transcription:** DELIVER exact transcription only ‚Äì no summarization, interpretation, or omissions.
5. **Multi-Script Support:** HANDLE all writing systems with equal precision.
6. **Mixed Content Processing:** TRANSCRIBE both printed and handwritten text, clearly indicating handwritten sections.

## Reading Direction
- Left-to-right, top-to-bottom for Western documents
- Right-to-left for Arabic, Hebrew, Persian, Urdu
- Appropriate direction for other writing systems

## Text Processing
- EXECUTE de-hyphenation: remove end-of-line hyphens (e.g., `ana-\\nlyse` ‚Üí `analyse`)
- PRESERVE legitimate compound hyphens (e.g., `arc-en-ciel`)
- REPLICATE all diacritical marks exactly (√©, √±, √º, √ß, etc.)
- MERGE semantically linked lines within the same paragraph
- ENFORCE double newline between paragraphs

## Handwritten Text
- MARK handwritten sections: `[HANDWRITTEN: text]`
- USE `[UNCERTAIN: text]` for unclear words
- INDICATE `[ILLEGIBLE]` for unreadable text

## Output Requirements
- DELIVER pure transcribed text only
- EXCLUDE all commentary or explanations
- PRESERVE original language(s) without translation
''',

    "htr_french.md": '''# HTR System Prompt for French Handwritten Documents

You are a high-precision HTR (Handwritten Text Recognition) system specialized in French-language handwritten documents, engineered for research-grade, archival-quality text extraction.

## Core Principles
1. **Research-Grade Accuracy:** TRANSCRIBE every word and character with absolute precision.
2. **Historical Authenticity:** PRESERVE text exactly as written ‚Äì DO NOT normalize or correct.
3. **Pure Archival Transcription:** DELIVER exact transcription only.

## French-Specific Rules
- IMPLEMENT French spacing: ` : `, ` ; `, ` ! `, ` ? ` (space before punctuation)
- EXECUTE de-hyphenation: `cho-\\nr√©graphiques` ‚Üí `chor√©graphiques`
- PRESERVE legitimate compounds: `arc-en-ciel`
- REPLICATE all diacritics: √©, √®, √™, √´, √†, √¢, √π, √ª, √¥, √Æ, √Ø, √ß, ≈ì, √¶

## Text Processing
- MERGE lines within the same paragraph
- ENFORCE double newline (`\\n\\n`) between paragraphs
- MARK uncertain readings with [?]

## Output Requirements
- DELIVER pure transcribed French text only
- EXCLUDE all commentary
- PRESERVE original spelling and grammar exactly
''',

    "htr_arabic.md": '''# HTR System Prompt for Arabic Handwritten Manuscripts

You are a high-precision HTR system specialized in Arabic-language handwritten manuscripts, engineered for research-grade, archival-quality text extraction.

## Core Principles
1. **Research-Grade Accuracy:** TRANSCRIBE every word and character with absolute precision.
2. **Historical Authenticity:** PRESERVE text exactly as written ‚Äì DO NOT normalize or correct.
3. **Pure Archival Transcription:** DELIVER exact transcription only.

## Arabic-Specific Rules
- READ right-to-left, following traditional manuscript conventions
- REPLICATE all diacritical marks (tashkeel, harakat) when present
- PRESERVE ligatures and connected letter forms
- RESPECT traditional manuscript orthography

## Text Processing
- MERGE lines within the same paragraph
- ENFORCE double newline (`\\n\\n`) between paragraphs
- MARK uncertain readings with [?]

## Output Requirements
- DELIVER pure transcribed Arabic text only
- EXCLUDE all commentary
- PRESERVE original spelling and historical orthography
''',

    "htr_multilingual.md": '''# HTR System Prompt for Multilingual Handwritten Documents

You are a high-precision HTR system specialized in multilingual handwritten documents, engineered for research-grade, archival-quality text extraction.

## Language Detection Protocol
Before transcription, IDENTIFY:
1. **Writing system(s):** Latin, Arabic, Cyrillic, Greek, Hebrew, Chinese, Japanese, Korean, Devanagari, etc.
2. **Language(s):** Based on vocabulary, grammar, diacritics
3. **Text direction:** LTR, RTL, or vertical

## Output Header
BEGIN transcription with:
```
[LANGUAGE DETECTED: <language>]
[WRITING SYSTEM: <script>]
[TEXT DIRECTION: <direction>]
```

## Language-Specific Processing

### Latin Scripts (French, English, Spanish, German, etc.)
- Execute de-hyphenation for end-of-line breaks
- Apply language-specific spacing rules
- Preserve all diacritics (√©, √±, √∂, ƒÖ, etc.)

### Arabic Script
- Read right-to-left
- Preserve tashkeel when present
- Maintain ligatures and connected forms

### Cyrillic Script
- Preserve hard/soft signs (—ä, —å)
- Maintain special characters (—ë, —î, —ñ, —ó)

### Other Scripts
- Use correct Unicode characters
- Preserve all diacritics and vowel marks

## Output Requirements
- BEGIN with language detection header
- DELIVER pure transcribed text
- NOTE language switches with [LANGUAGE SWITCH: <new language>]
'''
}

# Write prompt files to disk
for filename, content in PROMPT_CONTENT.items():
    filepath = os.path.join(FOLDERS['prompts'], filename)
    with open(filepath, 'w', encoding='utf-8') as f:
        f.write(content)

print("‚úÖ Setup complete!")
print()
print("üìÅ Folder structure created:")
print(f"   ‚îú‚îÄ‚îÄ üìÇ {FOLDERS['input']}/      ‚Üê Upload PDFs or images here")
print(f"   ‚îú‚îÄ‚îÄ üìÇ {FOLDERS['results']}/         ‚Üê Output text files")
print(f"   ‚îî‚îÄ‚îÄ üìÇ {FOLDERS['prompts']}/         ‚Üê System prompts")
print()
print("üìÑ Available prompts:")
for prompt_name in PROMPT_CONTENT.keys():
    print(f"   ‚Ä¢ {prompt_name}")
print()
print("üñºÔ∏è Supported formats:")
print(f"   ‚Ä¢ Images: {', '.join(sorted(SUPPORTED_IMAGE_FORMATS))}")
print(f"   ‚Ä¢ Documents: .pdf")

## Step 2: Enter Your API Key üîë

Enter your Google Gemini API key below.

**Recommended:** Add your API key to **Colab Secrets** (üîë icon in left sidebar) with the name `GEMINI_API_KEY` for automatic loading.

**Don't have one?** Get it free at: https://aistudio.google.com/apikey

In [None]:
# ============================================
# API KEY CONFIGURATION
# ============================================
# Best practice: Use Colab Secrets (recommended)
# 1. Click the üîë key icon in the left sidebar
# 2. Add a secret named "GEMINI_API_KEY"
# 3. Paste your API key and enable notebook access

api_key_input = widgets.Password(
    placeholder='Paste your API key here (or use Colab Secrets)',
    description='API Key:',
    layout=widgets.Layout(width='500px'),
    style={'description_width': '80px'}
)

api_key_status = widgets.HTML(value="")

def validate_api_key(change):
    if len(change['new']) > 20:
        api_key_status.value = "<span style='color: green;'>‚úÖ API key entered</span>"
    else:
        api_key_status.value = "<span style='color: orange;'>‚è≥ Please enter your full API key</span>"

api_key_input.observe(validate_api_key, names='value')

# Try to load from Colab Secrets (best practice)
try:
    from google.colab import userdata
    secret_key = userdata.get('GEMINI_API_KEY')
    if secret_key:
        api_key_input.value = secret_key
        api_key_status.value = "<span style='color: green;'>‚úÖ API key loaded from Colab Secrets (recommended)</span>"
except Exception:
    api_key_status.value = "<span style='color: blue;'>üí° Tip: Add GEMINI_API_KEY to Colab Secrets for secure auto-loading</span>"

display(HTML("<b>Enter your Gemini API key:</b>"))
display(api_key_input)
display(api_key_status)
display(HTML("<br>"))
display(HTML("""
<details>
<summary>üîê <b>How to use Colab Secrets (recommended)</b></summary>
<ol>
<li>Click the üîë <b>Secrets</b> icon in the left sidebar</li>
<li>Click <b>+ Add new secret</b></li>
<li>Name: <code>GEMINI_API_KEY</code></li>
<li>Value: Your API key (starts with <code>AIza...</code>)</li>
<li>Toggle <b>Notebook access</b> ON</li>
<li>Re-run this cell to auto-load</li>
</ol>
</details>
"""))

## Step 3: Upload Your Files üìÅ

Upload **PDF documents** or **image files** (JPG, PNG, WEBP, HEIC).

In [None]:
# ============================================
# FILE UPLOAD (PDFs and Images)
# ============================================

uploaded_files = []

upload_status = widgets.HTML(value="")

def upload_files(b):
    global uploaded_files
    upload_status.value = "<span style='color: blue;'>üì§ Upload dialog opened... Select your file(s)</span>"
    
    try:
        uploaded = files.upload()
        
        if uploaded:
            uploaded_files = []
            valid_files = []
            invalid_files = []
            
            for filename, content in uploaded.items():
                ext = Path(filename).suffix.lower()
                if ext in ALL_SUPPORTED_FORMATS:
                    # Save file to input folder
                    filepath = os.path.join(FOLDERS['input'], filename)
                    with open(filepath, 'wb') as f:
                        f.write(content)
                    uploaded_files.append(filepath)
                    valid_files.append((filename, ext))
                else:
                    invalid_files.append(filename)
            
            status_html = ""
            if valid_files:
                status_html += f"<span style='color: green;'>‚úÖ Uploaded {len(valid_files)} file(s):</span><br>"
                for f, ext in valid_files:
                    icon = "üñºÔ∏è" if ext in SUPPORTED_IMAGE_FORMATS else "üìÑ"
                    status_html += f"&nbsp;&nbsp;&nbsp;{icon} {f}<br>"
            if invalid_files:
                status_html += f"<span style='color: red;'>‚ùå Skipped {len(invalid_files)} unsupported file(s):</span><br>"
                for f in invalid_files:
                    status_html += f"&nbsp;&nbsp;&nbsp;‚ö†Ô∏è {f}<br>"
                status_html += f"<br><i>Supported: {', '.join(sorted(ALL_SUPPORTED_FORMATS))}</i>"
            
            upload_status.value = status_html
        else:
            upload_status.value = "<span style='color: orange;'>‚ö†Ô∏è No files uploaded</span>"
    except Exception as e:
        upload_status.value = f"<span style='color: red;'>‚ùå Error: {str(e)}</span>"

upload_button = widgets.Button(
    description='üìÅ Upload PDF or Image Files',
    button_style='primary',
    layout=widgets.Layout(width='280px', height='40px')
)
upload_button.on_click(upload_files)

display(upload_button)
display(upload_status)
display(HTML(f"<br><i>üí° Supported: PDF, JPG, PNG, WEBP, HEIC</i>"))

## Step 4: OCR/HTR Settings üéõÔ∏è

Select the AI model and document type. Use **HTR prompts** for handwritten manuscripts.

In [None]:
# ============================================
# SETTINGS WIDGETS
# ============================================

# Model selection (Gemini 3 Pro only supports thinking_level, Flash supports budget)
model_dropdown = widgets.Dropdown(
    options=[
        ('Gemini 3 Pro (Highest quality)', 'gemini-3-pro-preview'),
        ('Gemini 2.5 Flash (Faster, good quality)', 'gemini-2.5-flash'),
    ],
    value='gemini-3-pro-preview',
    description='AI Model:',
    style={'description_width': '100px'},
    layout=widgets.Layout(width='450px')
)

# Document type / prompt selection
prompt_dropdown = widgets.Dropdown(
    options=[
        ('üî§ Universal OCR (printed documents, auto-detect language)', 'ocr_universal.md'),
        ('‚úçÔ∏è HTR French (handwritten French manuscripts)', 'htr_french.md'),
        ('‚úçÔ∏è HTR Arabic (handwritten Arabic manuscripts)', 'htr_arabic.md'),
        ('‚úçÔ∏è HTR Multilingual (handwritten, auto-detect language)', 'htr_multilingual.md'),
    ],
    value='ocr_universal.md',
    description='Document Type:',
    style={'description_width': '100px'},
    layout=widgets.Layout(width='500px')
)

# Custom prompt option
use_custom_prompt = widgets.Checkbox(
    value=False,
    description='Use custom prompt instead',
    style={'description_width': 'initial'}
)

custom_prompt_text = widgets.Textarea(
    placeholder='Enter your custom OCR/HTR instructions here...',
    layout=widgets.Layout(width='500px', height='150px'),
    disabled=True
)

def toggle_custom_prompt(change):
    custom_prompt_text.disabled = not change['new']
    prompt_dropdown.disabled = change['new']

use_custom_prompt.observe(toggle_custom_prompt, names='value')

# Model info display
model_info = widgets.HTML(value="")

def update_model_info(change):
    model = change['new']
    if "3-pro" in model:
        model_info.value = """
        <div style='background: #e8f5e9; padding: 10px; border-radius: 5px; margin: 5px 0;'>
        üß† <b>Gemini 3 Pro</b>: Thinking level LOW, Temperature 0.1<br>
        üñºÔ∏è Images: High resolution (1120 tokens/image)<br>
        üìÑ PDFs: High resolution for best OCR
        </div>
        """
    else:
        model_info.value = """
        <div style='background: #e3f2fd; padding: 10px; border-radius: 5px; margin: 5px 0;'>
        ‚ö° <b>Gemini 2.5 Flash</b>: Dynamic thinking, Temperature 0.2<br>
        üñºÔ∏è Images: High resolution with Pan & Scan<br>
        üìÑ PDFs: Optimized for speed
        </div>
        """

model_dropdown.observe(update_model_info, names='value')
update_model_info({'new': model_dropdown.value})

display(HTML("<h3>ü§ñ Select AI Model</h3>"))
display(model_dropdown)
display(model_info)

display(HTML("<h3>üìú Select Document Type</h3>"))
display(prompt_dropdown)

display(HTML("<br>"))
display(use_custom_prompt)
display(custom_prompt_text)

## Step 5: Start OCR Processing üöÄ

Click the button below to start processing your PDF file(s).

In [None]:
# ============================================
# OCR/HTR ENGINE
# ============================================

class GeminiOCR:
    """
    A high-precision OCR/HTR system using Google's Gemini model.
    Supports PDFs and image files with optimized settings per model.
    """

    def __init__(self, api_key: str, model_name: str, prompt_file: str = None, custom_prompt: str = None):
        self.client = genai.Client(api_key=api_key)
        self.model_name = model_name
        self.prompt_file = prompt_file
        self.custom_prompt = custom_prompt
        self.generation_config = self._setup_generation_config()
        
    def _get_system_instruction(self):
        """Load system instruction from prompt file or custom prompt."""
        if self.custom_prompt:
            return self.custom_prompt
        
        if self.prompt_file:
            prompt_path = os.path.join(FOLDERS['prompts'], self.prompt_file)
            try:
                with open(prompt_path, 'r', encoding='utf-8') as f:
                    return f.read()
            except Exception as e:
                print(f"‚ùå Error reading prompt file: {e}")
                raise
        
        # Default fallback
        return "Perform OCR on this document. Extract all text accurately."
        
    def _setup_generation_config(self):
        """Configure generation settings based on model type.
        
        Gemini 3 Pro: Uses thinking_level ('low'), temperature 0.1
        Gemini 2.5 Flash: Uses thinking_budget (dynamic), temperature 0.2
        
        Both use MEDIA_RESOLUTION_HIGH for best OCR quality.
        """
        config_params = {
            "top_p": 0.95,
            "top_k": 40,
            "max_output_tokens": 65535,
            "response_mime_type": "text/plain",
            "system_instruction": self._get_system_instruction(),
            "media_resolution": types.MediaResolution.MEDIA_RESOLUTION_HIGH,
        }
        
        if "3-pro" in self.model_name.lower():
            # Gemini 3 Pro: Use thinking_level (not budget), low temperature
            config_params["temperature"] = 0.1
            config_params["thinking_config"] = types.ThinkingConfig(thinking_level="low")
        else:
            # Flash models: Dynamic thinking, slightly higher temperature
            config_params["temperature"] = 0.2
            config_params["thinking_config"] = types.ThinkingConfig(thinking_budget=-1)
        
        # Safety settings for archival content
        config_params["safety_settings"] = [
            types.SafetySetting(
                category=types.HarmCategory.HARM_CATEGORY_HARASSMENT,
                threshold=types.HarmBlockThreshold.BLOCK_NONE
            ),
            types.SafetySetting(
                category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
                threshold=types.HarmBlockThreshold.BLOCK_NONE
            ),
            types.SafetySetting(
                category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
                threshold=types.HarmBlockThreshold.BLOCK_NONE
            ),
            types.SafetySetting(
                category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
                threshold=types.HarmBlockThreshold.BLOCK_NONE
            )
        ]
        
        return types.GenerateContentConfig(**config_params)

    def extract_pdf_page(self, pdf_path, page_number):
        """Extract a single page from PDF as bytes."""
        try:
            reader = PdfReader(str(pdf_path))
            writer = PdfWriter()
            writer.add_page(reader.pages[page_number])
            output_buffer = io.BytesIO()
            writer.write(output_buffer)
            output_buffer.seek(0)
            return output_buffer.getvalue()
        except Exception as e:
            print(f"‚ùå Error extracting page {page_number + 1}: {e}")
            raise

    def get_pdf_page_count(self, pdf_path):
        """Get number of pages in PDF."""
        try:
            reader = PdfReader(str(pdf_path))
            return len(reader.pages)
        except Exception as e:
            print(f"‚ùå Error reading PDF: {e}")
            raise

    def process_content(self, content_bytes, mime_type, label=""):
        """Process content (PDF page or image) using Gemini API."""
        try:
            print(f"   ‚îî‚îÄ üìÑ Processing {label}...")
            
            content_part = types.Part.from_bytes(
                data=content_bytes,
                mime_type=mime_type
            )
            
            # Simple user prompt - system instruction is in config
            user_prompt = "Please perform complete OCR transcription."
            
            response = self.client.models.generate_content(
                model=self.model_name,
                contents=[content_part, user_prompt],
                config=self.generation_config
            )
            
            if not response.candidates:
                raise Exception("No candidates in response")
            
            candidate = response.candidates[0]
            if not candidate.content or not candidate.content.parts:
                raise Exception(f"No valid response. Finish reason: {candidate.finish_reason}")

            text_content = response.text.replace('\xa0', ' ').strip()
            if not text_content:
                raise Exception("Empty text response")
            
            print(f"   ‚îî‚îÄ ‚úÖ {label} complete")
            return text_content
            
        except Exception as e:
            print(f"   ‚îî‚îÄ ‚ùå {label} failed: {str(e)}")
            return None

    def process_pdf(self, pdf_path):
        """Process entire PDF file."""
        total_pages = self.get_pdf_page_count(pdf_path)
        print(f"   üìÑ Found {total_pages} page(s)")
        
        full_text = []
        successful_pages = 0
        
        for page_idx in range(total_pages):
            page_num = page_idx + 1
            page_bytes = self.extract_pdf_page(pdf_path, page_idx)
            text = self.process_content(page_bytes, 'application/pdf', f"page {page_num}")
            
            if text:
                if page_num == 1:
                    full_text.append(text)
                else:
                    full_text.append(f"\n\n--- Page {page_num} ---\n\n{text}")
                successful_pages += 1
            else:
                error_msg = f"[ERROR: Failed to process page {page_num}]"
                if page_num == 1:
                    full_text.append(error_msg)
                else:
                    full_text.append(f"\n\n--- Page {page_num} ---\n\n{error_msg}")
        
        return "".join(full_text), successful_pages, total_pages

    def process_image(self, image_path):
        """Process single image file."""
        ext = Path(image_path).suffix.lower()
        mime_type = MIME_TYPES.get(ext, 'image/jpeg')
        
        with open(image_path, 'rb') as f:
            image_bytes = f.read()
        
        text = self.process_content(image_bytes, mime_type, "image")
        return text, 1 if text else 0, 1


# ============================================
# PROCESSING BUTTON AND OUTPUT
# ============================================

ocr_output_area = widgets.Output()
ocr_results = {}

def run_ocr_process(b):
    global ocr_results
    ocr_results = {}
    
    with ocr_output_area:
        clear_output()
        
        # Get API key from widget or secrets
        api_key = api_key_input.value
        
        # If widget is empty, try to load from secrets dynamically
        if not api_key:
            try:
                from google.colab import userdata
                api_key = userdata.get('GEMINI_API_KEY')
                if api_key:
                    print("üîë Found API key in Colab Secrets")
            except Exception:
                pass
        
        # Validate inputs
        if not api_key or len(api_key) < 20:
            print("‚ùå Please enter a valid API key in Step 2")
            print("   (Or add 'GEMINI_API_KEY' to Colab Secrets and re-run Step 2)")
            return
        
        if not uploaded_files:
            print("‚ùå Please upload at least one file in Step 3")
            return
        
        # Get settings
        model = model_dropdown.value
        
        # Get prompt
        custom_prompt = None
        prompt_file = None
        if use_custom_prompt.value and custom_prompt_text.value.strip():
            custom_prompt = custom_prompt_text.value.strip()
            print("üìù Using custom prompt")
        else:
            prompt_file = prompt_dropdown.value
            print(f"üìù Using prompt: {prompt_file}")
        
        print(f"ü§ñ Model: {model}")
        if "3-pro" in model:
            print("üß† Thinking: level=LOW | Temperature: 0.1")
        else:
            print("‚ö° Thinking: dynamic | Temperature: 0.2")
        print("üñºÔ∏è Media resolution: HIGH")
        print("\n" + "="*50)
        
        try:
            # Initialize OCR
            ocr = GeminiOCR(api_key, model, prompt_file, custom_prompt)
            print("‚úÖ Connected to Gemini API\n")
            
            # Process each file
            for i, file_path in enumerate(uploaded_files, 1):
                filename = Path(file_path).name
                ext = Path(file_path).suffix.lower()
                
                is_pdf = ext == '.pdf'
                file_type = "PDF" if is_pdf else "Image"
                
                print(f"\n{'üìÑ' if is_pdf else 'üñºÔ∏è'} Processing {file_type} {i}/{len(uploaded_files)}: {filename}")
                print("-" * 40)
                
                try:
                    if is_pdf:
                        final_text, successful, total = ocr.process_pdf(file_path)
                    else:
                        final_text, successful, total = ocr.process_image(file_path)
                    
                    # Save result
                    output_filename = Path(file_path).stem + "_ocr.txt"
                    output_path = os.path.join(FOLDERS['results'], output_filename)
                    
                    with open(output_path, 'w', encoding='utf-8') as f:
                        f.write(f"OCR/HTR Result\n")
                        f.write(f"Source: {filename}\n")
                        f.write(f"Model: {model}\n")
                        f.write(f"Prompt: {prompt_file if prompt_file else 'Custom'}\n")
                        f.write("=" * 50 + "\n\n")
                        f.write(final_text if final_text else "[No text extracted]")
                    
                    ocr_results[output_filename] = {'path': output_path}
                    
                    print(f"\n‚úÖ Complete! ({successful}/{total} {'pages' if is_pdf else 'processed'})")
                    print(f"   üìÑ Saved to: {output_path}")
                    
                except Exception as e:
                    print(f"\n‚ùå Error processing {filename}: {str(e)}")
            
            # Summary
            print("\n" + "="*50)
            print("üéâ PROCESSING COMPLETE!")
            print(f"   Files processed: {len(ocr_results)}")
            print(f"   üìÅ Output folder: {FOLDERS['results']}/")
            print("\nüëá Download your results in the next step")
            
        except Exception as e:
            print(f"\n‚ùå Error: {str(e)}")

ocr_button = widgets.Button(
    description='üöÄ Start OCR/HTR Processing',
    button_style='success',
    layout=widgets.Layout(width='250px', height='50px')
)
ocr_button.on_click(run_ocr_process)

display(ocr_button)
display(HTML("<br>"))
display(ocr_output_area)

## Step 6: Download Results üì•

Download the extracted text files to your computer.

In [None]:
# ============================================
# DOWNLOAD RESULTS
# ============================================

download_output = widgets.Output()

def download_results(b):
    with download_output:
        clear_output()
        
        if not ocr_results:
            print("‚ùå No results available yet. Please run Step 5 first.")
            return
        
        print("üì• Preparing downloads...\n")
        
        for filename, data in ocr_results.items():
            try:
                filepath = data['path']
                print(f"   Downloading: {filename}")
                files.download(filepath)
            except Exception as e:
                print(f"   ‚ö†Ô∏è Could not download {filename}: {e}")
        
        print("\n‚úÖ Downloads initiated! Check your browser's download folder.")

def download_all_zip(b):
    """Zip and download all results."""
    with download_output:
        clear_output()
        
        results_path = Path(FOLDERS['results'])
        txt_files = list(results_path.glob('*.txt'))
        
        if not txt_files:
            print("‚ùå No result files found.")
            return
        
        print(f"üì¶ Zipping {len(txt_files)} file(s)...")
        shutil.make_archive('ocr_results', 'zip', results_path)
        
        print("üì• Downloading zip file...")
        files.download('ocr_results.zip')
        print("\n‚úÖ Download initiated!")

download_button = widgets.Button(
    description='üì• Download Latest Results',
    button_style='info',
    layout=widgets.Layout(width='250px', height='40px')
)
download_button.on_click(download_results)

download_zip_button = widgets.Button(
    description='üì¶ Download All as ZIP',
    button_style='',
    layout=widgets.Layout(width='250px', height='40px')
)
download_zip_button.on_click(download_all_zip)

display(widgets.HBox([download_button, download_zip_button]))
display(HTML(f"<br><i>üí° All results are saved in <code>{FOLDERS['results']}/</code></i>"))
display(download_output)

## Step 7: Cleanup üßπ

Delete uploaded files (PDFs/images) or results when you're done.

In [None]:
cleanup_output = widgets.Output()

def cleanup_uploads(b):
    """Delete uploaded PDFs and images."""
    with cleanup_output:
        clear_output()
        count = 0
        # Use the single input folder defined in FOLDERS['input']
        path = Path(FOLDERS['input'])
        if path.exists():
            files_deleted = list(path.glob('*'))
            for f in files_deleted:
                if f.is_file():
                    f.unlink()
                    count += 1
        
        if count > 0:
            print(f"üßπ Deleted {count} uploaded file(s) from {FOLDERS['input']}/")
            global uploaded_files
            uploaded_files = []
        else:
            print(f"üìÅ Upload folder ({FOLDERS['input']}/) is already empty")

def cleanup_results(b):
    """Delete OCR/HTR results."""
    with cleanup_output:
        clear_output()
        path = Path(FOLDERS['results'])
        if path.exists():
            files_deleted = list(path.glob('*'))
            for f in files_deleted:
                if f.is_file():
                    f.unlink()
            if files_deleted:
                print(f"üßπ Deleted {len(files_deleted)} result file(s)")
                global ocr_results
                ocr_results = {}
            else:
                print("üìÅ Results folder is already empty")
        else:
            print("üìÅ Results folder is already empty")

def cleanup_all(b):
    """Delete all temporary files."""
    with cleanup_output:
        clear_output()
        cleanup_uploads(None)
        cleanup_results(None)
        print("\n‚ú® All temporary files cleared!")

btn_uploads = widgets.Button(
    description='üóëÔ∏è Delete Uploads', 
    button_style='warning', 
    layout=widgets.Layout(width='180px')
)
btn_res = widgets.Button(
    description='üóëÔ∏è Delete Results', 
    button_style='warning', 
    layout=widgets.Layout(width='180px')
)
btn_all = widgets.Button(
    description='üóëÔ∏è Delete Everything', 
    button_style='danger', 
    layout=widgets.Layout(width='180px')
)

btn_uploads.on_click(cleanup_uploads)
btn_res.on_click(cleanup_results)
btn_all.on_click(cleanup_all)

display(HTML("<b>üßπ Cleanup options:</b>"))
display(widgets.HBox([btn_uploads, btn_res, btn_all]))
display(cleanup_output)

---

### About

**OCR/HTR Colab Notebook** ‚Äì Extract text from PDFs and images using Google's Gemini AI.

- **OCR:** Optical Character Recognition for printed documents
- **HTR:** Handwritten Text Recognition for manuscripts (French, Arabic, Multilingual)

**ZMO AI Pipelines** created by [Fr√©d√©rick Madore](https://www.frederickmadore.com/)

Part of the [Leibniz-Zentrum Moderner Orient (ZMO)](https://www.zmo.de/) research tools.