<p align="center">
<b>Gen AI</b><br>

</p>

** **

In [None]:
import torch
import numpy as np
import cv2
import os
import random
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from google.colab import drive

# Drive
drive.mount('/content/drive')
output_dir = "/content/drive/MyDrive/generated_videos"
os.makedirs(output_dir, exist_ok=True)

# Models used
models = [
    "damo-vilab/text-to-video-ms-1.7b",
    "cerspense/zeroscope_v2_576w"
]

# Basic prompts
prompts = [
    "Early school kids playing football during break time",
    "A person scrolling on a phone in cafe with coffee",
    "Old lady watering garden grass in the courtyard",
    "A playful puppy running across a grassy field, chasing butterflies.",
    "A 3D animation showing how the human heart pumps blood through the body."
]

# style modifiers
style_modifiers = [
    "cinematic wide shot",
    "close-up view",
    "from a drone camera perspective",
    "with dramatic lighting",
    "in the morning sunlight",
    "in a cozy environment",
    "hyper-realistic style",
    "soft natural colors",
    "dynamic camera movement",
    "digital art style"
]

def expand_prompt(base_prompt):
    extras = random.sample(style_modifiers, k=random.randint(1, 3))
    return f"{base_prompt}, " + ", ".join(extras)

# duration
target_duration_sec = 10

# Generate videos
video_count = 0
for model_index, model_id in enumerate(models):
    print(f"\nLoading model: {model_id}")

    if "zeroscope" in model_id:
        pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
    else:
        pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, variant="fp16")

    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
    pipe.enable_model_cpu_offload()

    if model_index == 0:
        selected_prompts = prompts[:3]
    else:
        selected_prompts = prompts[3:]

    for prompt in selected_prompts:
        video_count += 1
        final_prompt = expand_prompt(prompt)

        print(f"\nGenerating video {video_count}")
        print(f"Base prompt: {prompt}")
        print(f"Expanded prompt: {final_prompt}\n")

        fps = random.randint(8, 12)
        seed = random.randint(0, 999999)
        generator = torch.manual_seed(seed)

        target_frames = fps * target_duration_sec

        result = pipe(
            final_prompt,
            num_inference_steps=50,
            num_frames=target_frames,
            generator=generator
        )

        video_frames_batches = result.frames
        all_frames = [frame for batch in video_frames_batches for frame in batch]

        corrected_frames = []
        for frame in all_frames:
            if isinstance(frame, torch.Tensor):
                frame = frame.cpu().numpy()
            if frame.ndim == 2:
                frame = np.stack([frame]*3, axis=-1)
            elif frame.ndim == 3 and frame.shape[2] == 1:
                frame = np.concatenate([frame]*3, axis=2)
            elif frame.ndim == 3 and frame.shape[2] in [3, 4]:
                pass
            else:
                raise ValueError(f"Unsupported frame shape: {frame.shape}")
            if frame.dtype != np.uint8:
                frame = (frame * 255).clip(0, 255).astype(np.uint8)
            corrected_frames.append(frame)

        height, width, _ = corrected_frames[0].shape
        video_filename = os.path.join(output_dir, f"video_{video_count}.mp4")

        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        video_writer = cv2.VideoWriter(video_filename, fourcc, fps, (width, height))

        for frame in corrected_frames:
            if frame.shape[2] == 4:
                frame = frame[:, :, :3]
            video_writer.write(frame)

        video_writer.release()
        print(f"Saved video at: {video_filename}")


Mounted at /content/drive

Loading model: damo-vilab/text-to-video-ms-1.7b


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/384 [00:00<?, ?B/s]

Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

merges.txt: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/755 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/787 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

text_encoder/model.fp16.safetensors:   0%|          | 0.00/681M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/657 [00:00<?, ?B/s]

unet/diffusion_pytorch_model.fp16.safete(…):   0%|          | 0.00/2.82G [00:00<?, ?B/s]

vae/diffusion_pytorch_model.fp16.safeten(…):   0%|          | 0.00/167M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]

`torch_dtype` is deprecated! Use `dtype` instead!
The TextToVideoSDPipeline has been deprecated and will not receive bug fixes or feature updates after Diffusers version 0.33.1. 



Generating video 1
Base prompt: Early school kids playing football during break time
Expanded prompt: Early school kids playing football during break time, dynamic camera movement, hyper-realistic style, from a drone camera perspective



  0%|          | 0/50 [00:00<?, ?it/s]

Saved video at: /content/drive/MyDrive/generated_videos/video_1.mp4

Generating video 2
Base prompt: A person scrolling on a phone in cafe with coffee
Expanded prompt: A person scrolling on a phone in cafe with coffee, hyper-realistic style



  0%|          | 0/50 [00:00<?, ?it/s]

Saved video at: /content/drive/MyDrive/generated_videos/video_2.mp4

Generating video 3
Base prompt: Old lady watering garden grass in the courtyard
Expanded prompt: Old lady watering garden grass in the courtyard, with dramatic lighting, soft natural colors



  0%|          | 0/50 [00:00<?, ?it/s]

Saved video at: /content/drive/MyDrive/generated_videos/video_3.mp4

Loading model: cerspense/zeroscope_v2_576w


model_index.json:   0%|          | 0.00/384 [00:00<?, ?B/s]

Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/737 [00:00<?, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

text_encoder/pytorch_model.bin:   0%|          | 0.00/681M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

unet/diffusion_pytorch_model.bin:   0%|          | 0.00/2.82G [00:00<?, ?B/s]

vae/diffusion_pytorch_model.bin:   0%|          | 0.00/167M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]

An error occurred while trying to fetch /root/.cache/huggingface/hub/models--cerspense--zeroscope_v2_576w/snapshots/6963642a64dbefa93663d1ecebb4ceda2d9ecb28/unet: Error no file named diffusion_pytorch_model.safetensors found in directory /root/.cache/huggingface/hub/models--cerspense--zeroscope_v2_576w/snapshots/6963642a64dbefa93663d1ecebb4ceda2d9ecb28/unet.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
An error occurred while trying to fetch /root/.cache/huggingface/hub/models--cerspense--zeroscope_v2_576w/snapshots/6963642a64dbefa93663d1ecebb4ceda2d9ecb28/vae: Error no file named diffusion_pytorch_model.safetensors found in directory /root/.cache/huggingface/hub/models--cerspense--zeroscope_v2_576w/snapshots/6963642a64dbefa93663d1ecebb4ceda2d9ecb28/vae.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
The TextToVideoSDPipeline has been deprecated and will not receive bug fixes or feature updates after 


Generating video 4
Base prompt: A playful puppy running across a grassy field, chasing butterflies.
Expanded prompt: A playful puppy running across a grassy field, chasing butterflies., digital art style



  0%|          | 0/50 [00:00<?, ?it/s]

Saved video at: /content/drive/MyDrive/generated_videos/video_4.mp4

Generating video 5
Base prompt: A 3D animation showing how the human heart pumps blood through the body.
Expanded prompt: A 3D animation showing how the human heart pumps blood through the body., with dramatic lighting, soft natural colors, close-up view



  0%|          | 0/50 [00:00<?, ?it/s]

Saved video at: /content/drive/MyDrive/generated_videos/video_5.mp4


**Discussion:** For this task, I applied text-to-video generation using models like damo-vilab/text-to-video-ms-1.7b and zeroscope_v2_576w. I focused on aligning frame numbers with FPS to control video length. Testing variations in prompts and settings gave me more natural results.
