<a href="https://colab.research.google.com/github/simon-clematide/colab-notebooks-for-teaching/blob/main/openai_image_description.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image Description Sample Code

This notebook:

- Reads images from a specified folder
- Generates short descriptions for each image using OpenAI's vision model
- Writes the results to:
  - a JSONL file (canonical output)
  - an Excel file with filename, description, and thumbnail path for human inspection

## Setup


In [1]:
%pip install openai openpyxl pillow



In [2]:
import os
import json
import base64
from pathlib import Path
from typing import List

from openpyxl import Workbook
from openpyxl.drawing.image import Image as ExcelImage
from PIL import Image
from openai import OpenAI

# --- OpenAI API key handling ---

# Option 1: temporary manual paste (DO NOT commit or publish)
API_KEY = None  # e.g. "sk-..."  (leave as None in shared code)

# Option 2: environment variable / Colab secrets (recommended)
try:
    # Available in Google Colab
    from google.colab import userdata

    API_KEY = (
        API_KEY or os.environ.get("OPENAI_API_KEY") or userdata.get("OPENAI_API_KEY")
    )
except ImportError:
    # Local Jupyter / script
    API_KEY = API_KEY or os.environ.get("OPENAI_API_KEY")

if not API_KEY:
    raise RuntimeError(
        "OPENAI_API_KEY not found. "
        "Set it as an environment variable or via Colab userdata."
    )

client = OpenAI(api_key=API_KEY)

## Download Starter Images (Optional)

Download the sample images from Google Drive if you want to use the starter dataset.


In [3]:
import zipfile
import requests

# Google Drive file ID and download URL
FILE_ID = "1FE3OrUZrWcL8pH4kYOBbaF4kLIXrMmXO"
ZIP_PATH = Path("images-starter.zip")
EXTRACT_DIR = Path(".")

# Download from Google Drive
print("Downloading images from Google Drive...")
url = f"https://drive.google.com/uc?export=download&id={FILE_ID}"

# Download the file
response = requests.get(url, stream=True)
with open(ZIP_PATH, "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)

print(f"Downloaded {ZIP_PATH.name}")

# Extract the zip file
print(f"Extracting to {EXTRACT_DIR}...")
with zipfile.ZipFile(ZIP_PATH, "r") as zip_ref:
    zip_ref.extractall(EXTRACT_DIR)

# Clean up zip file
ZIP_PATH.unlink()

print(f"✓ Images extracted to {EXTRACT_DIR.absolute()}")

Downloading images from Google Drive...
Downloaded images-starter.zip
Extracting to ....
✓ Images extracted to /content
  Found 3 files


## Input Configuration

Specify the folder containing your images


In [4]:
# Folder containing images to process
IMAGE_FOLDER = Path("./images-starter")  # Change this to your image folder path

# Supported image extensions
SUPPORTED_EXTENSIONS = {".jpg", ".jpeg", ".png", ".gif", ".webp"}

# Get list of image files
image_files = [
    f for f in IMAGE_FOLDER.glob("*") if f.suffix.lower() in SUPPORTED_EXTENSIONS
]

print(f"Found {len(image_files)} images in {IMAGE_FOLDER}")
for img in image_files:
    print(f"  - {img.name}")

Found 5 images in images-starter
  - img003.jpg
  - img002.jpg
  - img001.jpg
  - img005.jpg
  - img004.jpg


## Settings for Processing


In [5]:
MODEL_NAME = "gpt-4o-mini"  # gpt-4o, gpt-4o-mini
MAX_TOKENS = 150  # Max tokens for description

## Image Description Function


In [6]:
def encode_image(image_path: Path) -> str:
    """
    Encode image to base64 string for API transmission.

    Args:
        image_path: Path to the image file

    Returns:
        Base64 encoded string of the image
    """
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


def resize_image_for_excel(image_path: Path, max_width: int = 200) -> Path:
    """
    Resize image to fit within max_width while maintaining aspect ratio.
    Saves resized image to a temporary file.

    Args:
        image_path: Path to the original image
        max_width: Maximum width in pixels (default: 200)

    Returns:
        Path to the resized temporary image
    """
    img = Image.open(image_path)

    # Calculate new dimensions maintaining aspect ratio
    width, height = img.size
    if width > max_width:
        ratio = max_width / width
        new_width = max_width
        new_height = int(height * ratio)
        img = img.resize((new_width, new_height), Image.Resampling.LANCZOS)

    # Save to temporary file
    temp_path = image_path.parent / f"_temp_resized_{image_path.name}"
    img.save(temp_path)
    return temp_path


def describe_image(image_path: Path) -> str:
    """
    Generate a short description of an image using OpenAI's vision model.

    Args:
        image_path: Path to the image file

    Returns:
        A concise description of the image
    """
    base64_image = encode_image(image_path)

    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a helpful assistant that describes images concisely and"
                    " accurately."
                ),
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": (
                            "Provide a short, clear description of this image in 1-2"
                            " sentences."
                        ),
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                    },
                ],
            },
        ],
        max_tokens=MAX_TOKENS,
        temperature=0.7,
    )

    return response.choices[0].message.content.strip()

## Prepare Output Files


In [7]:
jsonl_path = Path("image_descriptions.jsonl")
excel_path = Path("image_descriptions.xlsx")

workbook = Workbook()
worksheet = workbook.active
worksheet.title = "Image Descriptions"

worksheet.append(["Image", "Filename", "Full Path", "Description"])
worksheet.row_dimensions[1].height = 20

## Process All Images


In [8]:
temp_files = []  # Track temporary resized images for cleanup

with jsonl_path.open("w", encoding="utf-8") as jsonl_file:
    for idx, image_path in enumerate(
        image_files, start=2
    ):  # Start at row 2 (after header)
        filename = image_path.name
        print(f"Processing {filename}...")

        try:
            description = describe_image(image_path)

            # --- JSONL (canonical output) ---
            record = {
                "filename": filename,
                "path": str(image_path.absolute()),
                "description": description,
            }
            jsonl_file.write(json.dumps(record, ensure_ascii=False) + "\n")

            # --- Excel (human inspection) ---
            # Append row data (leaving first cell empty for image)
            worksheet.append(["", filename, str(image_path.absolute()), description])

            # Resize and embed image in Excel
            temp_img_path = resize_image_for_excel(image_path, max_width=200)
            temp_files.append(temp_img_path)

            excel_img = ExcelImage(str(temp_img_path))
            cell_address = f"A{idx}"
            worksheet.add_image(excel_img, cell_address)

            # Set row height to accommodate image
            img = Image.open(temp_img_path)
            row_height = img.size[1] * 0.75  # Convert pixels to points
            worksheet.row_dimensions[idx].height = row_height

            print(f"  ✓ {description[:60]}...")

        except Exception as e:
            print(f"  ✗ Error processing {filename}: {e}")
            worksheet.append(
                ["", filename, str(image_path.absolute()), f"ERROR: {str(e)}"]
            )

Processing img003.jpg...
  ✓ The image shows a grid with a series of filled squares (blac...
Processing img002.jpg...
  ✓ The image features a vintage adding machine with a typewrite...
Processing img001.jpg...
  ✓ The image depicts two hockey players engaged in a competitiv...
Processing img005.jpg...
  ✓ The image depicts a war-torn scene with soldiers examining a...
Processing img004.jpg...
  ✓ The image depicts a religious ceremony featuring a figure in...


In [9]:
# Format Excel columns
worksheet.column_dimensions["A"].width = 30  # Image column
worksheet.column_dimensions["B"].width = 25  # Filename column
worksheet.column_dimensions["C"].width = 60  # Path column
worksheet.column_dimensions["D"].width = 80  # Description column

workbook.save(excel_path)

# Clean up temporary resized images
for temp_file in temp_files:
    try:
        temp_file.unlink()
    except Exception:
        pass

print(f"\n✓ Results saved to:")
print(f"  - {jsonl_path.absolute()}")
print(f"  - {excel_path.absolute()}")


✓ Results saved to:
  - /content/image_descriptions.jsonl
  - /content/image_descriptions.xlsx


## Usage Instructions

1. Create a folder (e.g., `./images`) and add your images
2. Update the `IMAGE_FOLDER` variable in cell 4 to point to your folder
3. Run all cells to process the images
4. Check the output files for results

The notebook will generate:

- `image_descriptions.jsonl`: Machine-readable output
- `image_descriptions.xlsx`: Human-readable spreadsheet with all descriptions
