# Generate Scene PREDICTIONS with Gemini Batch API

---

## ⚠️ KEY DIFFERENCE FROM PREVIOUS NOTEBOOK

**OLD Notebook (08):** Gemini SAW Panel 6 → Described it (DESCRIPTION task)

**THIS Notebook (09):** Gemini does NOT see Panel 6 → Predicts it (PREDICTION task)

This aligns Gemini's task with LLaVA's task, fixing the training mismatch.

---

## How to Run This Notebook

### Prerequisites
1. Google Cloud Project with Vertex AI API enabled
2. GCS Bucket with data uploaded:
   - `gs://harshasekar-comics-data/training_sequences/train_sequences.pkl`
   - `gs://harshasekar-comics-data/raw_panel_images/`
3. Authentication configured

### Steps

| Step | Action | Time |
|------|--------|------|
| 1 | Run Part A (Setup) | 1 min |
| 2 | Run Part B (Test 5 examples) | 2 min |
| 3 | STOP - Verify predictions look reasonable | - |
| 4 | If approved, run Part C (Build shards) | 10 min |
| 5 | Run Part D (Upload & submit) | 5 min |
| 6 | Run commands in Cloud Shell | 20-24 hrs |
| 7 | Run Part E (Monitor) | - |
| 8 | Run Part F (Download results) | 10 min |
| 9 | Run Part G (Merge & upload) | 5 min |

### Cost
- Test job: ~$0.01
- Full batch: ~$20-25 (slightly less than before - no Panel 6 image)

---
# PART A: Configuration
---

In [1]:
# A1: Install dependencies (Colab)
# !pip install -q google-cloud-storage google-cloud-aiplatform tqdm
print("Ready")

Ready


In [2]:
# A2: Authenticate (Colab)
# from google.colab import auth
# auth.authenticate_user()
print("Ready")

Ready


In [3]:
# A3: Configuration
from pathlib import Path

PROJECT_ID = "fluent-justice-478703-f8"
LOCATION = "us-central1"
BUCKET = "harshasekar-comics-data"

GCS_SEQUENCES_PATH = "training_sequences/train_sequences.pkl"
GCS_IMAGES_PREFIX = "raw_panel_images"

# NEW: Different output paths to avoid overwriting old data
GCS_BATCH_INPUT = "batch_inputs/scene_predictions"  # CHANGED
GCS_BATCH_OUTPUT = "scene_predictions/outputs"      # CHANGED

SHARD_SIZE = 35000
MODEL = "gemini-2.5-flash-lite"

WORKDIR = Path(".")
SHARDS_DIR = WORKDIR / "scene_pred_shards"  # CHANGED
SHARDS_DIR.mkdir(exist_ok=True)

print(f"Project: {PROJECT_ID}")
print(f"Bucket: {BUCKET}")
print(f"Model: {MODEL}")
print(f"\n⚠️ This notebook generates PREDICTIONS (Gemini won't see Panel 6)")

Project: fluent-justice-478703-f8
Bucket: harshasekar-comics-data
Model: gemini-2.5-flash-lite

⚠️ This notebook generates PREDICTIONS (Gemini won't see Panel 6)


In [4]:
# A4: Load sequences
import pickle
from google.cloud import storage

print(f"Loading from gs://{BUCKET}/{GCS_SEQUENCES_PATH}")

client = storage.Client(project=PROJECT_ID)
bucket_obj = client.bucket(BUCKET)
blob = bucket_obj.blob(GCS_SEQUENCES_PATH)

pkl_bytes = blob.download_as_bytes()
sequences = pickle.loads(pkl_bytes)

print(f"Loaded {len(sequences):,} sequences")

Loading from gs://harshasekar-comics-data/training_sequences/train_sequences.pkl
Loaded 249,576 sequences


In [5]:
# A5: Path conversion helper
def delta_path_to_gcs_uri(delta_path: str) -> str:
    path = Path(delta_path)
    comic_no = path.parent.name
    filename = path.name
    return f"gs://{BUCKET}/{GCS_IMAGES_PREFIX}/{comic_no}/{filename}"

# Test
test_path = sequences[0]['context'][0]['image_path']
print(f"Delta: {test_path}")
print(f"GCS:   {delta_path_to_gcs_uri(test_path)}")

Delta: /scratch/bftl/hsekar/comics_project/data/images/0/2_0.jpg
GCS:   gs://harshasekar-comics-data/raw_panel_images/0/2_0.jpg


In [6]:
# A6: Prompt template - IDENTICAL TO LLAVA PROMPT
#
# ╔═══════════════════════════════════════════════════════════════════════════╗
# ║  KEY CHANGE: This prompt is now IDENTICAL to what LLaVA sees!             ║
# ║  - Only 5 panels (no Panel 6)                                             ║
# ║  - Task is to PREDICT, not DESCRIBE                                       ║
# ║  - No target_dialogue (we don't know what's in Panel 6)                   ║
# ╚═══════════════════════════════════════════════════════════════════════════╝

SCENE_PREDICTION_PROMPT = '''You are looking at 5 consecutive panels from a comic book.

Here is the text from each panel:
{context_dialogue}

Based on the story so far, predict what happens in the next panel (Panel 6).
Describe the likely scene, characters, actions, and any dialogue.

Write your response as a single flowing paragraph. Do not use bullet points, 
numbered lists, bold text, asterisks, or any markdown formatting. 
Weave the dialogue naturally into your description.

'''

print("Prompt template (PREDICTION - no Panel 6):")
print("="*60)
print(SCENE_PREDICTION_PROMPT)
print("="*60)
print("\n✅ This prompt is IDENTICAL to what LLaVA will see during training and inference.")

Prompt template (PREDICTION - no Panel 6):
You are looking at 5 consecutive panels from a comic book.

Here is the text from each panel:
{context_dialogue}

Based on the story so far, predict what happens in the next panel (Panel 6).
Describe the likely scene, characters, actions, and any dialogue.

Write your response as a single flowing paragraph. Do not use bullet points, 
numbered lists, bold text, asterisks, or any markdown formatting. 
Weave the dialogue naturally into your description.



✅ This prompt is IDENTICAL to what LLaVA will see during training and inference.


---
# PART B: Test Job (5 Examples)
---

**Run this before the full batch to verify output quality!**

The predictions should be:
- Reasonable guesses based on story context
- NOT specific details that can only be known by seeing Panel 6
- Similar in style to what we want LLaVA to produce

In [7]:
# B1: Setup Gemini
import vertexai
from vertexai.generative_models import GenerativeModel, Part
import time

vertexai.init(project=PROJECT_ID, location=LOCATION)
model = GenerativeModel(MODEL)
print(f"Model ready: {MODEL}")

Model ready: gemini-2.5-flash-lite




In [8]:
# B2: Run test - ONLY 5 PANELS, NO PANEL 6!
import random

random.seed(42)
test_indices = random.sample(range(len(sequences)), 5)
test_sequences = [sequences[i] for i in test_indices]

print(f"Testing {len(test_sequences)} sequences: {test_indices}\n")

test_results = []

for i, seq in enumerate(test_sequences):
    print(f"[{i+1}/5] Comic {seq['comic_no']}, Story {seq['story_idx']}...")
    
    # Build context dialogue - ONLY PANELS 1-5
    context_texts = seq.get('context_texts', [])
    context_parts = []
    for j, text in enumerate(context_texts, 1):
        if text and text.strip():
            context_parts.append(f"Panel {j}: {text.strip()[:400]}")
        else:
            context_parts.append(f"Panel {j}: [No text]")
    context_dialogue = "\n".join(context_parts)
    
    # NO target_dialogue - we're predicting!
    
    # Fill prompt - only context_dialogue now
    prompt = SCENE_PREDICTION_PROMPT.format(
        context_dialogue=context_dialogue
    )
    
    # Build image parts - ONLY 5 CONTEXT PANELS, NOT PANEL 6!
    image_parts = []
    for panel in seq['context']:
        gcs_uri = delta_path_to_gcs_uri(panel['image_path'])
        image_parts.append(Part.from_uri(gcs_uri, mime_type="image/jpeg"))
    
    # ❌ REMOVED: target panel image - we don't send Panel 6!
    
    # Call Gemini
    try:
        response = model.generate_content(
            image_parts + [prompt],
            generation_config={"temperature": 0.3, "top_p": 0.9}
        )
        scene_prediction = response.text.strip()
        test_results.append({
            'index': test_indices[i],
            'comic_no': seq['comic_no'],
            'story_idx': seq['story_idx'],
            'actual_target_text': seq.get('target_text', '[No text]'),  # For comparison
            'scene_prediction': scene_prediction,
            'status': 'success'
        })
        print(f"   OK")
    except Exception as e:
        test_results.append({
            'index': test_indices[i],
            'comic_no': seq['comic_no'],
            'story_idx': seq['story_idx'],
            'actual_target_text': seq.get('target_text', '[No text]'),
            'error': str(e),
            'status': 'failed'
        })
        print(f"   ERROR: {e}")
    
    time.sleep(1)

print(f"\nDone: {sum(1 for r in test_results if r['status']=='success')}/5 successful")

Testing 5 sequences: [167621, 29184, 6556, 194393, 72097]

[1/5] Comic 992, Story 0...
   OK
[2/5] Comic 191, Story 3...
   OK
[3/5] Comic 40, Story 0...
   OK
[4/5] Comic 1149, Story 5...
   OK
[5/5] Comic 469, Story 0...
   OK

Done: 5/5 successful


In [9]:
# B3: Display results - VERIFY THESE LOOK LIKE REASONABLE PREDICTIONS
print("="*70)
print("TEST RESULTS - PREDICTION STYLE (Gemini did NOT see Panel 6)")
print("="*70)

for i, result in enumerate(test_results):
    print(f"\n{'─'*70}")
    print(f"TEST {i+1}/5 | Index: {result['index']} | Comic: {result['comic_no']}, Story: {result['story_idx']}")
    print(f"{'─'*70}")
    
    print(f"\nACTUAL PANEL 6 TEXT (for reference - Gemini did NOT see this):")
    print(f"  {result['actual_target_text'][:200]}..." if len(result.get('actual_target_text', '')) > 200 else f"  {result.get('actual_target_text', '[No text]')}")
    
    if result['status'] == 'success':
        print(f"\nGEMINI PREDICTION (what Gemini thinks happens next):")
        print(f"  {result['scene_prediction']}")
    else:
        print(f"\nERROR: {result.get('error')}")

print(f"\n{'─'*70}")
print("END OF TEST RESULTS")
print("\n⚠️ VERIFY: Predictions should be reasonable guesses, NOT specific details.")
print("   If predictions contain impossible-to-know details, something is wrong.")

TEST RESULTS - PREDICTION STYLE (Gemini did NOT see Panel 6)

──────────────────────────────────────────────────────────────────────
TEST 1/5 | Index: 167621 | Comic: 992, Story: 0
──────────────────────────────────────────────────────────────────────

ACTUAL PANEL 6 TEXT (for reference - Gemini did NOT see this):
  THE DOLL MAN GOES INTO ACTION... HEADS UP!! WHAT TH.. !!?

GEMINI PREDICTION (what Gemini thinks happens next):
  The lights have just come back on after being abruptly turned off, and the scene is now one of chaos and robbery. The man holding the gun, likely the leader of the robbers, is issuing orders to his accomplices, telling them to "take everything that ain't nailed down." The other men present, who were likely gathered for some sort of event or meeting before the lights went out, are now in a state of panic and fear, with some looking around in confusion and others reacting with alarm to the armed robbery. It's possible that Rocky Perrone, mentioned in the first pan

In [10]:
# B4: Save test results
import json
with open(WORKDIR / "test_prediction_results.json", 'w') as f:
    json.dump(test_results, f, indent=2, default=str)
print("Saved to test_prediction_results.json")

Saved to test_prediction_results.json


---
# STOP - Verify predictions look reasonable before continuing!
---

### What to Check:

✅ **Good predictions:** "The characters likely continue their escape. There may be dialogue about their next move..."

❌ **Bad predictions:** "John says 'The code is 7432' while standing next to the BLUE door..." (too specific)

If predictions look reasonable, continue to Part C.

---
# PART C: Build JSONL Shards
---

In [13]:
# C1: Request builder - ONLY 5 PANELS, NO PANEL 6!
import json

def make_prediction_request(seq_idx, sequence):
    """Create a batch request with ONLY 5 context panels (no Panel 6)."""
    
    # Build context dialogue - ONLY PANELS 1-5
    context_texts = sequence.get('context_texts', [])
    context_parts = []
    for i, text in enumerate(context_texts, 1):
        if text and text.strip():
            context_parts.append(f"Panel {i}: {text.strip()[:400]}")
        else:
            context_parts.append(f"Panel {i}: [No text]")
    context_dialogue = "\n".join(context_parts)
    
    # NO target_dialogue - we're predicting!
    
    # Fill prompt - only context_dialogue
    prompt = SCENE_PREDICTION_PROMPT.format(
        context_dialogue=context_dialogue
    )
    
    # Build image parts - ONLY 5 CONTEXT PANELS!
    image_parts = []
    for panel in sequence['context']:
        gcs_uri = delta_path_to_gcs_uri(panel['image_path'])
        image_parts.append({"file_data": {"file_uri": gcs_uri, "mime_type": "image/jpeg"}})
    
    # ❌ REMOVED: target panel image
    
    request_body = {
        "contents": [{"role": "user", "parts": image_parts + [{"text": prompt}]}],
        "generation_config": {"temperature": 0.3, "max_output_tokens": 512, "top_p": 0.9}
    }
    
    custom_id = f"{seq_idx}_{sequence['comic_no']}_{sequence['story_idx']}_{sequence['target']['page_no']}_{sequence['target']['panel_no']}"
    return {"custom_id": custom_id, "request": request_body}

print("Request builder ready")
print("⚠️ Each request contains ONLY 5 images (no Panel 6)")

Request builder ready
⚠️ Each request contains ONLY 5 images (no Panel 6)


In [12]:
# C2: Create shards
from tqdm import tqdm

print(f"Creating shards ({len(sequences):,} sequences, {SHARD_SIZE:,} per shard)...")

shard_idx = 0
lines_in_shard = 0
shard_paths = []

current_shard_path = SHARDS_DIR / f"shard_{shard_idx:04d}.jsonl"
shard_file = current_shard_path.open("w", encoding="utf-8")

for seq_idx, seq in enumerate(tqdm(sequences)):
    request = make_prediction_request(seq_idx, seq)  # CHANGED function name
    shard_file.write(json.dumps(request) + "\n")
    lines_in_shard += 1
    
    if lines_in_shard >= SHARD_SIZE:
        shard_file.close()
        shard_paths.append(current_shard_path)
        shard_idx += 1
        lines_in_shard = 0
        current_shard_path = SHARDS_DIR / f"shard_{shard_idx:04d}.jsonl"
        shard_file = current_shard_path.open("w", encoding="utf-8")

if lines_in_shard > 0:
    shard_file.close()
    shard_paths.append(current_shard_path)
else:
    shard_file.close()

print(f"Created {len(shard_paths)} shards")

Creating shards (249,576 sequences, 35,000 per shard)...


100%|██████████| 249576/249576 [00:21<00:00, 11770.67it/s]

Created 8 shards





---
# PART D: Upload & Submit
---

In [14]:
# D1: Upload shards
from tqdm import tqdm

uploaded_uris = []
for shard_path in tqdm(shard_paths, desc="Uploading"):
    gcs_path = f"{GCS_BATCH_INPUT}/{shard_path.name}"
    blob = bucket_obj.blob(gcs_path)
    blob.upload_from_filename(str(shard_path))
    uploaded_uris.append(f"gs://{BUCKET}/{gcs_path}")

print(f"Uploaded {len(uploaded_uris)} shards")

Uploading: 100%|██████████| 8/8 [00:05<00:00,  1.38it/s]

Uploaded 8 shards





In [15]:
# D2: Generate Cloud Shell commands
print("=" * 70)
print("RUN THESE COMMANDS IN GOOGLE CLOUD SHELL")
print("=" * 70)
print()

for i, uri in enumerate(uploaded_uris):
    output_uri = f"gs://{BUCKET}/{GCS_BATCH_OUTPUT}/shard_{i:04d}/"
    print(f"# Shard {i}")
    print(f"""curl -X POST \\
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \\
  -H "Content-Type: application/json" \\
  https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/batchPredictionJobs \\
  -d '{{
    "displayName": "scene-predictions-shard-{i:04d}",
    "model": "publishers/google/models/{MODEL}",
    "inputConfig": {{
      "instancesFormat": "jsonl",
      "gcsSource": {{
        "uris": ["{uri}"]
      }}
    }},
    "outputConfig": {{
      "predictionsFormat": "jsonl",
      "gcsDestination": {{
        "outputUriPrefix": "{output_uri}"
      }}
    }}
  }}'""")
    print()
    print("sleep 2")
    print()

RUN THESE COMMANDS IN GOOGLE CLOUD SHELL

# Shard 0
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://us-central1-aiplatform.googleapis.com/v1/projects/fluent-justice-478703-f8/locations/us-central1/batchPredictionJobs \
  -d '{
    "displayName": "scene-predictions-shard-0000",
    "model": "publishers/google/models/gemini-2.5-flash-lite",
    "inputConfig": {
      "instancesFormat": "jsonl",
      "gcsSource": {
        "uris": ["gs://harshasekar-comics-data/batch_inputs/scene_predictions/shard_0000.jsonl"]
      }
    },
    "outputConfig": {
      "predictionsFormat": "jsonl",
      "gcsDestination": {
        "outputUriPrefix": "gs://harshasekar-comics-data/scene_predictions/outputs/shard_0000/"
      }
    }
  }'

sleep 2

# Shard 1
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://us-central1-aiplatform.googleapis.com/v1

---
# PART E: Monitor Jobs
---

In [None]:
# E1: Check status command
print("Run in Cloud Shell:")
print(f"gcloud ai batch-prediction-jobs list --project={PROJECT_ID} --region={LOCATION}")

In [None]:
# E2: Check output files
prefix = f"{GCS_BATCH_OUTPUT}/"
blobs = list(bucket_obj.list_blobs(prefix=prefix))
jobs = set()
for blob in blobs:
    parts = blob.name.split('/')
    if len(parts) >= 3:
        jobs.add(parts[2])
print(f"Found {len(jobs)} job folders")

---
# PART F: Download Results
---

In [6]:
# F1: Parse results
from tqdm import tqdm
import json

prefix = f"{GCS_BATCH_OUTPUT}/"
result_blobs = [b for b in bucket_obj.list_blobs(prefix=prefix) if b.name.endswith('.jsonl')]
print(f"Found {len(result_blobs)} result files")

results = {}
for blob in tqdm(result_blobs, desc="Parsing"):
    content = blob.download_as_text()
    for line in content.strip().split('\n'):
        if not line:
            continue
        try:
            data = json.loads(line)
            custom_id = data.get('custom_id', '')
            candidates = data.get('response', {}).get('candidates', [])
            if candidates:
                parts = candidates[0].get('content', {}).get('parts', [])
                if parts:
                    results[custom_id] = parts[0].get('text', '').strip()
        except:
            pass

print(f"Parsed {len(results):,} results")

Found 8 result files


Parsing: 100%|██████████| 8/8 [00:16<00:00,  2.06s/it]

Parsed 249,571 results





---
# PART G: Merge & Upload
---

In [7]:
# G1: Merge - saves as 'scene_prediction' (not 'scene_description')
from tqdm import tqdm

matched = 0
for seq_idx, seq in enumerate(tqdm(sequences, desc="Merging")):
    custom_id = f"{seq_idx}_{seq['comic_no']}_{seq['story_idx']}_{seq['target']['page_no']}_{seq['target']['panel_no']}"
    if custom_id in results:
        seq['scene_prediction'] = results[custom_id]  # NEW KEY NAME
        matched += 1
    else:
        seq['scene_prediction'] = None

print(f"Matched: {matched:,}/{len(sequences):,} ({100*matched/len(sequences):.1f}%)")

Merging: 100%|██████████| 249576/249576 [00:00<00:00, 360200.20it/s]

Matched: 249,571/249,576 (100.0%)





In [8]:
# G2: Save & upload - NEW FILENAME
import pickle

local_path = WORKDIR / "train_sequences_with_predictions.pkl"  # CHANGED
with open(local_path, 'wb') as f:
    pickle.dump(sequences, f)
print(f"Saved locally: {local_path}")

gcs_path = "training_sequences/train_sequences_with_predictions.pkl"  # CHANGED
blob = bucket_obj.blob(gcs_path)
blob.upload_from_filename(str(local_path))
print(f"Uploaded to: gs://{BUCKET}/{gcs_path}")

Saved locally: train_sequences_with_predictions.pkl
Uploaded to: gs://harshasekar-comics-data/training_sequences/train_sequences_with_predictions.pkl


In [9]:
# G3: Show comparison
print("Sample comparison:")
print("="*70)
for i in range(3):
    seq = sequences[i]
    print(f"\n--- Sequence {i} ---")
    print(f"ACTUAL Panel 6 text: {seq['target_text'][:100]}...")
    print(f"PREDICTED (Gemini):  {seq.get('scene_prediction', 'N/A')[:200]}...")
    print()

Sample comparison:

--- Sequence 0 ---
ACTUAL Panel 6 text: I'VE BEEN WORKING ON IT FOR DAYS! I MUST GET A BREATH OF AIR AND CLEAR MY HEAD BEFORE I BEGIN MY TES...
PREDICTED (Gemini):  The next panel will likely show the "Weightless Wiggins" character, the inventor from panels 3-5, testing his new contraption. He might be holding the device he just finished, perhaps a spring-loaded ...


--- Sequence 1 ---
ACTUAL Panel 6 text: AH, I FEEL BETTER ALREADY! NOW FOR A BRISK WALK TO CALM MY JANGLED NERVES!...
PREDICTED (Gemini):  The villain, Weightless Wiggins, having just completed his device, steps outside for some fresh air, perhaps to the rooftop where Plastic Man is currently struggling to understand why Wiggins isn't fa...


--- Sequence 2 ---
ACTUAL Panel 6 text: And WEIGHTLESS WIGGINS, THUG AND PROFESSIONAL SECOND STORY MAN, WHO IS LURKING NEAR GIMMICK'S HOUSE....
PREDICTED (Gemini):  The inventor, having just completed his mysterious contraption and feeling the need for some fresh 

---
# Done!
---

## Output File

**NEW:** `gs://harshasekar-comics-data/training_sequences/train_sequences_with_predictions.pkl`

This file contains:
- `scene_prediction`: Gemini's PREDICTION of Panel 6 (without seeing it)

## Key Difference from Previous File

| Old File | New File |
|----------|----------|
| `train_sequences_with_descriptions.pkl` | `train_sequences_with_predictions.pkl` |
| `scene_description` (Gemini SAW Panel 6) | `scene_prediction` (Gemini did NOT see Panel 6) |
| Specific details | Reasonable predictions |
| Task mismatch with LLaVA | Task ALIGNED with LLaVA |

## Next Step

Update fine-tuning notebook to use:
- File: `train_sequences_with_predictions.pkl`
- Field: `scene_prediction` (instead of `scene_description`)