# Generate Scene Descriptions with Gemini Batch API

---

## How to Run This Notebook

### Prerequisites
1. Google Cloud Project with Vertex AI API enabled
2. GCS Bucket with data uploaded:
   - `gs://harshasekar-comics-data/training_sequences/train_sequences.pkl`
   - `gs://harshasekar-comics-data/raw_panel_images/`
3. Authentication configured

### Steps

| Step | Action | Time |
|------|--------|------|
| 1 | Run Part A (Setup) | 1 min |
| 2 | Run Part B (Test 5 examples) | 2 min |
| 3 | STOP - Share results with Claude | - |
| 4 | If approved, run Part C (Build shards) | 10 min |
| 5 | Run Part D (Upload & submit) | 5 min |
| 6 | Run commands in Cloud Shell | 20-24 hrs |
| 7 | Run Part E (Monitor) | - |
| 8 | Run Part F (Download results) | 10 min |
| 9 | Run Part G (Merge & upload) | 5 min |

### Cost
- Test job: ~$0.01
- Full batch: ~$25

---
# PART A: Configuration
---

In [2]:
# A1: Install dependencies (Colab)
# !pip install -q google-cloud-storage google-cloud-aiplatform tqdm
print("Ready")

Ready


In [3]:
# A2: Authenticate (Colab)
# from google.colab import auth
# auth.authenticate_user()
print("Ready")

Ready


In [4]:
# A3: Configuration
from pathlib import Path

PROJECT_ID = "fluent-justice-478703-f8"
LOCATION = "us-central1"
BUCKET = "harshasekar-comics-data"

GCS_SEQUENCES_PATH = "training_sequences/train_sequences.pkl"
GCS_IMAGES_PREFIX = "raw_panel_images"
GCS_BATCH_INPUT = "batch_inputs/scene_descriptions"
GCS_BATCH_OUTPUT = "scene_descriptions/outputs"

SHARD_SIZE = 35000
MODEL = "gemini-2.5-flash-lite"

WORKDIR = Path(".")
SHARDS_DIR = WORKDIR / "scene_desc_shards"
SHARDS_DIR.mkdir(exist_ok=True)

print(f"Project: {PROJECT_ID}")
print(f"Bucket: {BUCKET}")
print(f"Model: {MODEL}")

Project: fluent-justice-478703-f8
Bucket: harshasekar-comics-data
Model: gemini-2.5-flash-lite


In [5]:
# A4: Load sequences
import pickle
from google.cloud import storage

print(f"Loading from gs://{BUCKET}/{GCS_SEQUENCES_PATH}")

client = storage.Client(project=PROJECT_ID)
bucket_obj = client.bucket(BUCKET)
blob = bucket_obj.blob(GCS_SEQUENCES_PATH)

pkl_bytes = blob.download_as_bytes()
sequences = pickle.loads(pkl_bytes)

print(f"Loaded {len(sequences):,} sequences")

Loading from gs://harshasekar-comics-data/training_sequences/train_sequences.pkl
Loaded 249,576 sequences


In [20]:
# A5: Path conversion helper
def delta_path_to_gcs_uri(delta_path: str) -> str:
    path = Path(delta_path)
    comic_no = path.parent.name
    filename = path.name
    return f"gs://{BUCKET}/{GCS_IMAGES_PREFIX}/{comic_no}/{filename}"

# Test
test_path = sequences[0]['context'][0]['image_path']
print(f"Delta: {test_path}")
print(f"GCS:   {delta_path_to_gcs_uri(test_path)}")

Delta: /scratch/bftl/hsekar/comics_project/data/images/0/2_0.jpg
GCS:   gs://harshasekar-comics-data/raw_panel_images/0/2_0.jpg


In [21]:
# A6: Prompt template (matches fine-tuning prompt exactly)
SCENE_DESCRIPTION_PROMPT = '''You are looking at 6 consecutive panels from a comic book.

Here is the text from each panel:
{context_dialogue}
Panel 6: {target_dialogue}

Based on what you see in these panels, describe what happens in Panel 6 (the last panel).

Include the scene, any dialogue, and sound effects.

Write your response as a single flowing paragraph. Do not use bullet points, 
numbered lists, bold text, asterisks, or any markdown formatting. 
Weave the dialogue naturally into your description.

'''

print("Prompt template:")
print(SCENE_DESCRIPTION_PROMPT)

Prompt template:
You are looking at 6 consecutive panels from a comic book.

Here is the text from each panel:
{context_dialogue}
Panel 6: {target_dialogue}

Based on what you see in these panels, describe what happens in Panel 6 (the last panel).

Include the scene, any dialogue, and sound effects.

Write your response as a single flowing paragraph. Do not use bullet points, 
numbered lists, bold text, asterisks, or any markdown formatting. 
Weave the dialogue naturally into your description.




---
# PART B: Test Job (5 Examples)
---

**Run this before the full batch to verify output quality!**

In [22]:
# B1: Setup Gemini
import vertexai
from vertexai.generative_models import GenerativeModel, Part
import time

vertexai.init(project=PROJECT_ID, location=LOCATION)
model = GenerativeModel(MODEL)
print(f"Model ready: {MODEL}")

Model ready: gemini-2.5-flash-lite


In [23]:
# B2: Run test
import random

random.seed(42)
test_indices = random.sample(range(len(sequences)), 5)
test_sequences = [sequences[i] for i in test_indices]

print(f"Testing {len(test_sequences)} sequences: {test_indices}\n")

test_results = []

for i, seq in enumerate(test_sequences):
    print(f"[{i+1}/5] Comic {seq['comic_no']}, Story {seq['story_idx']}...")
    
    # Build context dialogue
    context_texts = seq.get('context_texts', [])
    context_parts = []
    for j, text in enumerate(context_texts, 1):
        if text and text.strip():
            context_parts.append(f"Panel {j}: {text.strip()[:400]}")
        else:
            context_parts.append(f"Panel {j}: [No text]")
    context_dialogue = "\n".join(context_parts)
    
    # Target dialogue
    target_dialogue = seq.get('target_text', '') or '[No text]'
    target_dialogue = target_dialogue.strip()[:500] if target_dialogue.strip() else '[No text]'
    
    # Fill prompt
    prompt = SCENE_DESCRIPTION_PROMPT.format(
        context_dialogue=context_dialogue,
        target_dialogue=target_dialogue
    )
    
    # Build image parts
    image_parts = []
    for panel in seq['context']:
        gcs_uri = delta_path_to_gcs_uri(panel['image_path'])
        image_parts.append(Part.from_uri(gcs_uri, mime_type="image/jpeg"))
    target_gcs_uri = delta_path_to_gcs_uri(seq['target']['image_path'])
    image_parts.append(Part.from_uri(target_gcs_uri, mime_type="image/jpeg"))
    
    # Call Gemini
    try:
        response = model.generate_content(
            image_parts + [prompt],
            generation_config={"temperature": 0.3, "top_p": 0.9}
        )
        scene_description = response.text.strip()
        test_results.append({
            'index': test_indices[i],
            'comic_no': seq['comic_no'],
            'story_idx': seq['story_idx'],
            'target_text_ocr': target_dialogue,
            'scene_description': scene_description,
            'status': 'success'
        })
        print(f"   OK")
    except Exception as e:
        test_results.append({
            'index': test_indices[i],
            'comic_no': seq['comic_no'],
            'story_idx': seq['story_idx'],
            'target_text_ocr': target_dialogue,
            'error': str(e),
            'status': 'failed'
        })
        print(f"   ERROR: {e}")
    
    time.sleep(1)

print(f"\nDone: {sum(1 for r in test_results if r['status']=='success')}/5 successful")

Testing 5 sequences: [167621, 29184, 6556, 194393, 72097]

[1/5] Comic 992, Story 0...
   OK
[2/5] Comic 191, Story 3...
   OK
[3/5] Comic 40, Story 0...
   OK
[4/5] Comic 1149, Story 5...
   OK
[5/5] Comic 469, Story 0...
   OK

Done: 5/5 successful


In [25]:
# B3: Display results - COPY THIS OUTPUT AND SHARE WITH CLAUDE
print("="*70)
print("TEST RESULTS")
print("="*70)

for i, result in enumerate(test_results):
    print(f"\n{'─'*70}")
    print(f"TEST {i+1}/5 | Index: {result['index']} | Comic: {result['comic_no']}, Story: {result['story_idx']}")
    print(f"{'─'*70}")
    
    print(f"\nTARGET TEXT (OCR):")
    print(f"  {result['target_text_ocr']}")
    
    if result['status'] == 'success':
        print(f"\nGEMINI SCENE DESCRIPTION:")
        print(f"  {result['scene_description']}")
    else:
        print(f"\nERROR: {result.get('error')}")

print(f"\n{'─'*70}")
print("END OF TEST RESULTS")

TEST RESULTS

──────────────────────────────────────────────────────────────────────
TEST 1/5 | Index: 167621 | Comic: 992, Story: 0
──────────────────────────────────────────────────────────────────────

TARGET TEXT (OCR):
  THE DOLL MAN GOES INTO ACTION... HEADS UP!! WHAT TH.. !!?

GEMINI SCENE DESCRIPTION:
  In Panel 6, the scene shifts to a dramatic moment as the text announces "THE DOLL MAN GOES INTO ACTION...". A man with a shocked expression, his mouth agape, looks upwards. Above him, a superhero figure, presumably the Doll Man, is in mid-air, his cape flowing behind him. The Doll Man shouts "HEADS UP!!", and the shocked man exclaims "WHAT TH.. !!?". There are no sound effects in this panel.

──────────────────────────────────────────────────────────────────────
TEST 2/5 | Index: 29184 | Comic: 191, Story: 3
──────────────────────────────────────────────────────────────────────

TARGET TEXT (OCR):
  BUT A FEW BLOCKS AWAY, AMAY APPEARS AGAIN, AND HAILS A TAXI- TO THE AIRPORT- AND

In [15]:
# B4: Save test results
import json
with open(WORKDIR / "test_results.json", 'w') as f:
    json.dump(test_results, f, indent=2, default=str)
print("Saved to test_results.json")

Saved to test_results.json


---
# STOP - Share test results with Claude before continuing!
---

---
# PART C: Build JSONL Shards
---

In [26]:
# C1: Request builder
import json

def make_scene_request(seq_idx, sequence):
    context_texts = sequence.get('context_texts', [])
    context_parts = []
    for i, text in enumerate(context_texts, 1):
        if text and text.strip():
            context_parts.append(f"Panel {i}: {text.strip()[:400]}")
        else:
            context_parts.append(f"Panel {i}: [No text]")
    context_dialogue = "\n".join(context_parts)
    
    target_dialogue = sequence.get('target_text', '') or '[No text]'
    target_dialogue = target_dialogue.strip()[:500] if target_dialogue.strip() else '[No text]'
    
    prompt = SCENE_DESCRIPTION_PROMPT.format(
        context_dialogue=context_dialogue,
        target_dialogue=target_dialogue
    )
    
    image_parts = []
    for panel in sequence['context']:
        gcs_uri = delta_path_to_gcs_uri(panel['image_path'])
        image_parts.append({"file_data": {"file_uri": gcs_uri, "mime_type": "image/jpeg"}})
    target_gcs_uri = delta_path_to_gcs_uri(sequence['target']['image_path'])
    image_parts.append({"file_data": {"file_uri": target_gcs_uri, "mime_type": "image/jpeg"}})
    
    request_body = {
        "contents": [{"role": "user", "parts": image_parts + [{"text": prompt}]}],
        "generation_config": {"temperature": 0.3, "max_output_tokens": 512, "top_p": 0.9}
    }
    
    custom_id = f"{seq_idx}_{sequence['comic_no']}_{sequence['story_idx']}_{sequence['target']['page_no']}_{sequence['target']['panel_no']}"
    return {"custom_id": custom_id, "request": request_body}

print("Request builder ready")

Request builder ready


In [27]:
# C2: Create shards
from tqdm import tqdm

print(f"Creating shards ({len(sequences):,} sequences, {SHARD_SIZE:,} per shard)...")

shard_idx = 0
lines_in_shard = 0
shard_paths = []

current_shard_path = SHARDS_DIR / f"shard_{shard_idx:04d}.jsonl"
shard_file = current_shard_path.open("w", encoding="utf-8")

for seq_idx, seq in enumerate(tqdm(sequences)):
    request = make_scene_request(seq_idx, seq)
    shard_file.write(json.dumps(request) + "\n")
    lines_in_shard += 1
    
    if lines_in_shard >= SHARD_SIZE:
        shard_file.close()
        shard_paths.append(current_shard_path)
        shard_idx += 1
        lines_in_shard = 0
        current_shard_path = SHARDS_DIR / f"shard_{shard_idx:04d}.jsonl"
        shard_file = current_shard_path.open("w", encoding="utf-8")

if lines_in_shard > 0:
    shard_file.close()
    shard_paths.append(current_shard_path)
else:
    shard_file.close()

print(f"Created {len(shard_paths)} shards")

Creating shards (249,576 sequences, 35,000 per shard)...


100%|██████████| 249576/249576 [00:21<00:00, 11347.96it/s]

Created 8 shards





---
# PART D: Upload & Submit
---

In [28]:
# D1: Upload shards
from tqdm import tqdm

uploaded_uris = []
for shard_path in tqdm(shard_paths, desc="Uploading"):
    gcs_path = f"{GCS_BATCH_INPUT}/{shard_path.name}"
    blob = bucket_obj.blob(gcs_path)
    blob.upload_from_filename(str(shard_path))
    uploaded_uris.append(f"gs://{BUCKET}/{gcs_path}")

print(f"Uploaded {len(uploaded_uris)} shards")

Uploading: 100%|██████████| 8/8 [00:05<00:00,  1.41it/s]

Uploaded 8 shards





In [29]:
# D2: Generate submission commands
print("BATCH COMMANDS - Run these in Cloud Shell:")
print("="*70 + "\n")

for idx, uri in enumerate(uploaded_uris):
    output_uri = f"gs://{BUCKET}/{GCS_BATCH_OUTPUT}/job_{idx:04d}/"
    print(f"# Batch {idx}")
    print(f"gcloud ai models batch-predict \\")
    print(f"  --model={MODEL} \\")
    print(f"  --project={PROJECT_ID} \\")
    print(f"  --region={LOCATION} \\")
    print(f"  --input-uri={uri} \\")
    print(f"  --output-uri={output_uri}")
    print()

BATCH COMMANDS - Run these in Cloud Shell:

# Batch 0
gcloud ai models batch-predict \
  --model=gemini-2.5-flash-lite \
  --project=fluent-justice-478703-f8 \
  --region=us-central1 \
  --input-uri=gs://harshasekar-comics-data/batch_inputs/scene_descriptions/shard_0000.jsonl \
  --output-uri=gs://harshasekar-comics-data/scene_descriptions/outputs/job_0000/

# Batch 1
gcloud ai models batch-predict \
  --model=gemini-2.5-flash-lite \
  --project=fluent-justice-478703-f8 \
  --region=us-central1 \
  --input-uri=gs://harshasekar-comics-data/batch_inputs/scene_descriptions/shard_0001.jsonl \
  --output-uri=gs://harshasekar-comics-data/scene_descriptions/outputs/job_0001/

# Batch 2
gcloud ai models batch-predict \
  --model=gemini-2.5-flash-lite \
  --project=fluent-justice-478703-f8 \
  --region=us-central1 \
  --input-uri=gs://harshasekar-comics-data/batch_inputs/scene_descriptions/shard_0002.jsonl \
  --output-uri=gs://harshasekar-comics-data/scene_descriptions/outputs/job_0002/

# Bat

In [None]:
# D3: Save script
script_path = WORKDIR / "submit_batches.sh"
with open(script_path, 'w') as f:
    f.write("#!/bin/bash\n")
    for idx, uri in enumerate(uploaded_uris):
        output_uri = f"gs://{BUCKET}/{GCS_BATCH_OUTPUT}/job_{idx:04d}/"
        f.write(f"gcloud ai models batch-predict --model={MODEL} --project={PROJECT_ID} --region={LOCATION} --input-uri={uri} --output-uri={output_uri}\n")
print(f"Saved: {script_path}")

---
# PART E: Monitor
---

In [1]:
# E1: Check status command
print("Run in Cloud Shell:")
print(f"gcloud ai batch-prediction-jobs list --project={PROJECT_ID} --region={LOCATION}")

Run in Cloud Shell:


NameError: name 'PROJECT_ID' is not defined

In [6]:
# E2: Check output files
prefix = f"{GCS_BATCH_OUTPUT}/"
blobs = list(bucket_obj.list_blobs(prefix=prefix))
jobs = set()
for blob in blobs:
    parts = blob.name.split('/')
    if len(parts) >= 3:
        jobs.add(parts[2])
print(f"Found {len(jobs)} job folders")

Found 8 job folders


---
# PART F: Download Results
---

In [7]:
# F1: Parse results
from tqdm import tqdm
import json

prefix = f"{GCS_BATCH_OUTPUT}/"
result_blobs = [b for b in bucket_obj.list_blobs(prefix=prefix) if b.name.endswith('.jsonl')]
print(f"Found {len(result_blobs)} result files")

results = {}
for blob in tqdm(result_blobs, desc="Parsing"):
    content = blob.download_as_text()
    for line in content.strip().split('\n'):
        if not line:
            continue
        try:
            data = json.loads(line)
            custom_id = data.get('custom_id', '')
            candidates = data.get('response', {}).get('candidates', [])
            if candidates:
                parts = candidates[0].get('content', {}).get('parts', [])
                if parts:
                    results[custom_id] = parts[0].get('text', '').strip()
        except:
            pass

print(f"Parsed {len(results):,} results")

Found 8 result files


Parsing: 100%|██████████| 8/8 [00:16<00:00,  2.12s/it]

Parsed 249,571 results





---
# PART G: Merge & Upload
---

In [8]:
# G1: Merge
from tqdm import tqdm

matched = 0
for seq_idx, seq in enumerate(tqdm(sequences, desc="Merging")):
    custom_id = f"{seq_idx}_{seq['comic_no']}_{seq['story_idx']}_{seq['target']['page_no']}_{seq['target']['panel_no']}"
    if custom_id in results:
        seq['scene_description'] = results[custom_id]
        matched += 1
    else:
        seq['scene_description'] = seq.get('target_text', '')

print(f"Matched: {matched:,}/{len(sequences):,} ({100*matched/len(sequences):.1f}%)")

Merging: 100%|██████████| 249576/249576 [00:00<00:00, 387783.39it/s]

Matched: 249,571/249,576 (100.0%)





In [9]:
# G2: Save & upload
import pickle

local_path = WORKDIR / "train_sequences_with_descriptions.pkl"
with open(local_path, 'wb') as f:
    pickle.dump(sequences, f)
print(f"Saved locally: {local_path}")

gcs_path = "training_sequences/train_sequences_with_descriptions.pkl"
blob = bucket_obj.blob(gcs_path)
blob.upload_from_filename(str(local_path))
print(f"Uploaded to: gs://{BUCKET}/{gcs_path}")

Saved locally: train_sequences_with_descriptions.pkl
Uploaded to: gs://harshasekar-comics-data/training_sequences/train_sequences_with_descriptions.pkl


In [10]:
# G3: Show comparison
print("Sample comparison:")
for i in range(3):
    seq = sequences[i]
    print(f"\n--- Sequence {i} ---")
    print(f"OCR: {seq['target_text'][:100]}...")
    print(f"Scene: {seq.get('scene_description', 'N/A')[:200]}...")

Sample comparison:

--- Sequence 0 ---
OCR: I'VE BEEN WORKING ON IT FOR DAYS! I MUST GET A BREATH OF AIR AND CLEAR MY HEAD BEFORE I BEGIN MY TES...
Scene: In Panel 6, the scene shifts to a close-up of a man with a prominent mustache, wearing a blue hat and overalls, and smoking a pipe. He is wiping sweat from his brow with a handkerchief, looking exhaus...

--- Sequence 1 ---
OCR: AH, I FEEL BETTER ALREADY! NOW FOR A BRISK WALK TO CALM MY JANGLED NERVES!...
Scene: In Panel 6, a man with a large mustache and a pipe in his mouth, wearing overalls and a hat, steps out of a doorway onto a porch. He appears to be feeling refreshed, exclaiming, "AH, I FEEL BETTER ALR...

--- Sequence 2 ---
OCR: And WEIGHTLESS WIGGINS, THUG AND PROFESSIONAL SECOND STORY MAN, WHO IS LURKING NEAR GIMMICK'S HOUSE....
Scene: In Panel 6, the scene shifts outdoors to Gimmick's house, where a character named Weightless Wiggins, described as a thug and professional second-story man, is lurking. He observes Gimmick, w

---
# Done!
---

**Output file:** `gs://harshasekar-comics-data/training_sequences/train_sequences_with_descriptions.pkl`

**Next:** Update fine-tuning notebook to use `scene_description` instead of `target_text`