In [4]:
import os
from transformers import pipeline
from diffusers import StableDiffusionPipeline
import torch
from gtts import gTTS
from PIL import Image
from pathlib import Path

# ========== SETUP ==========
print("🔧 Loading models...")
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")

device = "cuda" if torch.cuda.is_available() else "cpu"
text2image = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16 if device == "cuda" else torch.float32
).to(device)

# Create output directory
output_dir = Path("output_story")
output_dir.mkdir(exist_ok=True)

# ========== INPUT ==========
story_text = """
INT. DARK ROOM - NIGHT

A single light bulb flickers above. A detective sits at a desk full of photos and maps, eyes tired.

SFX: Thunder in the distance.

Suddenly, the door creaks open.

INT. FOREST - MORNING

The camera pans across a foggy forest. Birds chirp faintly in the background.

A figure in a long coat walks between the trees, leaving footprints in the wet grass.
"""

# ========== PROCESSING ==========
print("✂️ Splitting script into scenes...")
scenes = [s.strip() for s in story_text.strip().split("\n\n") if s.strip()]

print(f"🧩 Total scenes detected: {len(scenes)}")

for idx, scene in enumerate(scenes, start=1):
    print(f"\n🎬 Scene {idx}: {scene[:60]}...")

    # Summarize to focus prompt for image
    summary = summarizer(scene, max_length=50, min_length=20, do_sample=False)[0]['summary_text']
    print(f"📝 Prompt: {summary}")

    # Generate image
    print("🖼 Generating image...")
    image = text2image(summary).images[0]
    image_path = output_dir / f"scene_{idx:02}.png"
    image.save(image_path)

    # Generate speech
    print("🔊 Generating speech...")
    tts = gTTS(scene)
    audio_path = output_dir / f"scene_{idx:02}.mp3"
    tts.save(audio_path)

print("\n✅ All scenes generated and saved in 'output_story' folder!")


🔧 Loading models...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Device set to use cpu


model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/4.72k [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/308 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/547 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

Your max_length is set to 50, but your input_length is only 9. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=4)


✂️ Splitting script into scenes...
🧩 Total scenes detected: 7

🎬 Scene 1: INT. DARK ROOM - NIGHT...
📝 Prompt:  The Darkest Room is set to be set in the middle of the night . It is set in a dark room with dark lighting, dark lighting and dark lighting .
🖼 Generating image...


  0%|          | 0/50 [00:00<?, ?it/s]

🔊 Generating speech...


Your max_length is set to 50, but your input_length is only 25. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=12)



🎬 Scene 2: A single light bulb flickers above. A detective sits at a de...
📝 Prompt:  A detective sits at a desk full of photos and maps, eyes tired and eyes tired . A single light bulb flickers above .
🖼 Generating image...


  0%|          | 0/50 [00:00<?, ?it/s]

🔊 Generating speech...


Your max_length is set to 50, but your input_length is only 10. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=5)



🎬 Scene 3: SFX: Thunder in the distance....
📝 Prompt:  SFX: Thunder in the distance. SFX : Thunder is thundering in the background of the scene . SFX is Thunder .
🖼 Generating image...


  0%|          | 0/50 [00:00<?, ?it/s]

Your max_length is set to 50, but your input_length is only 10. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=5)


🔊 Generating speech...

🎬 Scene 4: Suddenly, the door creaks open....
📝 Prompt:  Suddenly, the door creaks open. Suddenly, a doorcrackles open. The door is opened. Suddenly the door is filled with people .
🖼 Generating image...


  0%|          | 0/50 [00:00<?, ?it/s]

🔊 Generating speech...

🎬 Scene 5: INT. FOREST - MORNING...


Your max_length is set to 50, but your input_length is only 9. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=4)


📝 Prompt:  The Forest is a weekly feature of our daily newspaper, including iReporter's weekly Travel Snapshots . Use this weekly Newsquiz to test your knowledge of stories you saw on CNN.com .
🖼 Generating image...


  0%|          | 0/50 [00:00<?, ?it/s]

🔊 Generating speech...


Your max_length is set to 50, but your input_length is only 20. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=10)



🎬 Scene 6: The camera pans across a foggy forest. Birds chirp faintly i...
📝 Prompt:  The camera pans across a foggy forest . Birds chirp faintly in the background .
🖼 Generating image...


  0%|          | 0/50 [00:00<?, ?it/s]

Your max_length is set to 50, but your input_length is only 20. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=10)


🔊 Generating speech...

🎬 Scene 7: A figure in a long coat walks between the trees, leaving foo...
📝 Prompt:  A figure in a long coat walks between the trees, leaving footprints in the wet grass .
🖼 Generating image...


  0%|          | 0/50 [00:00<?, ?it/s]

🔊 Generating speech...

✅ All scenes generated and saved in 'output_story' folder!
