## Image Descriptions with Gemini Vision

Generate detailed textual descriptions for extracted images using Gemini 2.0 Flash.

**Prerequisites:**
- Run notebook 06-01 first to extract images
- Google API key set in .env file

**Output:**
- Markdown descriptions saved to `data/rag-data/images_desc/{company}/{document}/page_X.md`

### 1. Setup and Imports

In [1]:
from dotenv import load_dotenv
load_dotenv()

from pathlib import Path
from langchain_google_genai import ChatGoogleGenerativeAI
from PIL import Image
import base64
import io

### 2. Configuration

In [2]:
# Paths
IMAGES_DIR = "data/rag-data/images"
OUTPUT_DESC_DIR = "data/rag-data/images_desc"

# Model configuration
MODEL_NAME = "gemini-2.5-flash"

### 3. Initialize Gemini Model

In [3]:
# Configure Gemini model
model = ChatGoogleGenerativeAI(model=MODEL_NAME)

print(f"✓ Gemini model initialized: {MODEL_NAME}")

✓ Gemini model initialized: gemini-2.5-flash


### 4. Description Generation Function

In [4]:
def generate_image_description(image_path: Path) -> str:
    """
    Generate detailed textual description of an image using Gemini Vision.
    
    Args:
        image_path: Path to image file
    
    Returns:
        Detailed description as string
    """
    # Load and encode image
    image = Image.open(image_path)
    
    # Convert to base64
    buffered = io.BytesIO()
    image.save(buffered, format="PNG")
    image_base64 = base64.b64encode(buffered.getvalue()).decode()
    
    # Create message with image
    from langchain_core.messages import HumanMessage
    
    message = HumanMessage(
        content=[
            {
                "type": "text",
                "text": """Analyze this financial document page image and provide a detailed description.

Focus on:
- Charts and graphs: describe data trends, axes labels, and key insights
- Tables: describe structure and key data points
- Text content: summarize main points
- Visual elements: describe layout and important visual information

Provide a comprehensive description that would help someone understand the content without seeing the image."""
            },
            {
                "type": "image_url",
                "image_url": f"data:image/png;base64,{image_base64}"
            }
        ]
    )
    
    # Generate description
    response = model.invoke([message])
    
    return response.content

### 5. Process All Images

In [5]:
def process_company_images(company_dir: Path) -> int:
    """Process all images for a company."""
    company_name = company_dir.name
    desc_count = 0
    
    # Process each document directory
    for doc_dir in company_dir.iterdir():
        if doc_dir.is_dir():
            # Create output directory
            output_dir = Path(OUTPUT_DESC_DIR) / company_name / doc_dir.name
            output_dir.mkdir(parents=True, exist_ok=True)
            
            # Process each image
            for image_file in doc_dir.glob("page_*.png"):
                # Skip if description already exists
                desc_file = output_dir / f"{image_file.stem}.md"
                if desc_file.exists():
                    continue
                
                # Generate description
                try:
                    description = generate_image_description(image_file)
                    desc_file.write_text(description, encoding='utf-8')
                    desc_count += 1
                except Exception as e:
                    print(f"  ✗ Error processing {image_file.name}: {e}")
    
    return desc_count

In [6]:
# Find all company directories
images_path = Path(IMAGES_DIR)
company_dirs = [d for d in images_path.iterdir() if d.is_dir()]

print(f"Found {len(company_dirs)} companies\n")
print("=== Generating Image Descriptions ===\n")

total_descriptions = 0
for idx, company_dir in enumerate(company_dirs, 1):
    print(f"[{idx}/{len(company_dirs)}] {company_dir.name}...", end=" ")
    count = process_company_images(company_dir)
    total_descriptions += count
    print(f"✓ {count} descriptions")

print(f"\nTotal descriptions generated: {total_descriptions}")

Found 5 companies

=== Generating Image Descriptions ===

[1/5] amazon... ✓ 0 descriptions
[2/5] apple... ✓ 0 descriptions
[3/5] google... ✓ 0 descriptions
[4/5] meta... 

Retrying langchain_google_genai.chat_models._chat_with_retry.<locals>._chat_with_retry in 2.0 seconds as it raised DeadlineExceeded: 504 Deadline Exceeded.


✓ 60 descriptions
[5/5] meta10-k... ✓ 0 descriptions

Total descriptions generated: 60


### 6. Verify Output

In [None]:
# Count total description files
desc_path = Path(OUTPUT_DESC_DIR)
total_desc_files = len(list(desc_path.rglob("*.md")))

print(f"\n=== Summary ===")
print(f"Total description files: {total_desc_files}")
print(f"Output directory: {OUTPUT_DESC_DIR}")

### 7. Sample Description

In [None]:
# Show a sample description
desc_files = list(desc_path.rglob("*.md"))
if desc_files:
    sample = desc_files[0]
    print(f"Sample description from: {sample.name}\n")
    print(sample.read_text(encoding='utf-8')[:500] + "...")