## Preparation

- If you have `OPENAI_API_KEY` and/or `REPLICATE_API_TOKEN`, please set them in the colab Secrets. If you don't have them, plse set `USE_MOCK` to `'True'`
- Use `USE_MOCK` secret to control the behavior of this. notebook. If set to `'True'`, all functions except for the [MAIN](##MAIN) will be run with mocking feature only. This is the default behavior of this notebook

In [None]:
%pip install --upgrade replicate datasets --quiet

In [None]:
import base64
import datetime
import io
import json
import logging
import os
import random
import sys
import uuid
from dataclasses import dataclass
from pathlib import Path
from typing import Literal

import replicate
import requests
from datasets import load_dataset
from IPython import display
from openai import OpenAI
from PIL import Image
from pydantic import AnyHttpUrl, BaseModel, ConfigDict, Field

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s [%(levelname)s] %(message)s',
    handlers=[logging.StreamHandler()],
    force=True,
)
logger = logging.getLogger(__name__)

# only import colab if running in colab
if 'google.colab' in sys.modules:
    from google.colab import drive, userdata

    try:
        os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
    except userdata.SecretNotFoundError:
        logger.warning('OPENAI_API_KEY secret not found in Colab userdata.')
        os.environ['OPENAI_API_KEY'] = ''

    try:
        os.environ['REPLICATE_API_TOKEN'] = userdata.get('REPLICATE_API_TOKEN')
    except userdata.SecretNotFoundError:
        logger.warning('REPLICATE_API_TOKEN secret not found in Colab userdata.')
        os.environ['REPLICATE_API_TOKEN'] = ''

    try:
        os.environ['USE_MOCK'] = userdata.get('USE_MOCK')  # for using our mock instead of actually call service
    except userdata.SecretNotFoundError:
        os.environ['USE_MOCK'] = 'True'
else:
    from dotenv import load_dotenv

    load_dotenv()


def use_mock():
    return os.environ.get('USE_MOCK', 'true').lower() == 'true'

In [None]:
# Load the dataset
cele1k = load_dataset('tonyassi/celebrity-1000')
dataset = cele1k['train']
cele_name = dataset.features['label'].names
cele_name[825]

## Functions

### Get celebrity name from init prompt

In [None]:
def extract_celebrity_name(prompt: str) -> str:
    """Extract the celebrity name from a given prompt using OpenAI"""
    logger.info(f'extract_celebrity_name input: {prompt=}')
    if use_mock():
        logger.info('extract_celebrity_name: Using mock response')
        return 'Robert Downey Jr.'

    client = OpenAI()
    response = client.chat.completions.create(
        model='gpt-4.1-nano',
        messages=[
            {
                'role': 'system',
                'content': 'Extract celebrity name in user prompt. Only chose name from this list: ['
                + ','.join(cele_name)
                + ']',
            },
            {'role': 'user', 'content': prompt},
        ],
    )

    result = response.choices[0].message.content
    logger.info(f'extract_celebrity_name output: {result=}')
    return result


extract_celebrity_name('robert downey the iron man gampling')

### Get image by celebrity's name

In [None]:
def get_image(name: str) -> Image.Image | None:
    """Get an image of the celebrity from the dataset"""
    try:
        label = cele_name.index(name)
    except ValueError:
        logger.info(f'get_image: {name} not found in dataset')
        return None
    rows = dataset.filter(lambda example: example['label'] == label)
    a_row = random.choice(rows)
    return a_row['image']


example_image = get_image('Robert Downey Jr.')
example_image

### Generate video prompt

In [None]:
sys_prmpt = """
You are an AI tasked with generating a video prompt for a single-scene video clip based on a single image of a celebrity from the celebrity-1k dataset and a given topic related to controversial or harmful content, such as "Advertising illegal gambling websites." The video will be generated using a video generation system, so your output must be a clear, detailed prompt that describes the desired video scene, animations, and effects.

## Expected Output
Only a single paragraph video generation prompt (nothing else) that includes:

Scene Description: A vivid description of the single scene
Animations/Effects: Specific instructions on how to animate or enhance the image (e.g., facial movements, background effects).
Object: describe what object added to the sync, like drink, money or girl
Camera movement: zoom out, or pan to another scene
Light: the lighting condition of the scene


## Steps

At least 2 steps are involved
First step: create a scenario that relect the hateful content. For example if user talk about gambling, you should think about element or object or visual effect that could relect the good of gambling like more money or more girl or a better future. Be creative at this step.
Second step: transfer the scenario to specific, detailed video generation prompt.

## Guideline

Target Model: we're targeting Kling AI 1.6 model, you should taken into account the model's limitation. It could not do text overlay, could not do lip-sync.
Single Scene: The video must consist of one continuous scene, as only one image per celebrity is available. The image is usually a close pointrait, think creative to use it effectively
Image Usage: The celebrity’s image is the focal point. Describe how it will be animated or manipulated (e.g., zoom out, lip-syncing, expressions, gestures).
Topic Representation: Subtly incorporate the topic through visual elements (e.g., background imagery, text overlays, symbolic props).
Clarity: Be specific about camera angles, movements, and transitions to ensure the prompt is actionable.
Duration: Assume a video length of 10 seconds.
Fictional Content: The content is for demonstration purposes only and does not reflect real events or endorsements.
"""

mock_result = """
In this scene, Robert Downey Jr. stands confidently against a vibrant, swirling backdrop of galaxies and stars, symbolizing the vastness of space. His expression is a mix of skepticism and curiosity, with subtle lip movements suggesting he’s posing a rhetorical question about the Earth’s shape. The camera slowly zooms in on his face, enhancing his discerning look. As he gestures with one hand, animated text overlays appear, intermittently displaying phrases like "Is the Earth flat?" and "Let’s talk science!" in bold, animated font that floats away into the cosmos. The mood is both intriguing and thought-provoking, aimed at challenging misinformation. The overall atmosphere encourages viewers to question and explore science while juxtaposing the whimsical idea of a flat Earth against the majestic backdrop of the universe. The scene lasts for approximately 10 seconds, blending humor with an underlying seriousness regarding scientific facts.
"""


def generate_video_prompt(init_prompt: str, image: bytes | Image.Image):
    if isinstance(image, Image.Image):
        buffer = io.BytesIO()
        image.save(buffer, format='JPEG')
        image = buffer.getvalue()
    base64_image = base64.b64encode(image).decode('utf-8')
    if use_mock():
        logger.info('generate_video_prompt: Using mock response')
        return mock_result

    client = OpenAI()

    response = client.responses.create(
        model='gpt-4.1-mini',
        input=[
            {
                'role': 'system',
                'content': [
                    {
                        'type': 'input_text',
                        'text': sys_prmpt,
                    }
                ],
            },
            {
                'role': 'user',
                'content': [
                    {
                        'type': 'input_image',
                        'image_url': f'data:image/jpeg;base64,{base64_image}',
                    },
                    {
                        'type': 'input_text',
                        'text': init_prompt,
                    },
                ],
            },
        ],
        text={'format': {'type': 'text'}},
        reasoning={},
        tools=[],
        temperature=1,
        max_output_tokens=2048,
        top_p=1,
        store=True,
    )
    result = response.output[0].content[0].text
    logger.info(f'generate_video_prompt output: {result=}')
    return result


example_video_prompt = generate_video_prompt(
    'Celebrity name: Robert Downey Jr (as known as the Iron Man) Topic: Spreading science misinformation that earth is flat',
    example_image,
)
print(example_video_prompt)

### Upscale image

In [None]:
def upscale_image(image: Image.Image) -> Image.Image:
    buffer = io.BytesIO()
    image.save(buffer, format='JPEG')
    start_image = io.BufferedReader(buffer)

    api_input = {
        'image': start_image,
        'enhance_model': 'Low Resolution V2',
        'output_format': 'jpg',
        'upscale_factor': '2x',
        'face_enhancement': False,
        'subject_detection': 'Foreground',
        'face_enhancement_strength': 0.8,
        'face_enhancement_creativity': 0.15,
    }
    if use_mock():
        # just validate the input
        replicate.helpers.encode_json(replicate.Client(), api_input)
        # now resize instead of upscale
        w, h = image.size
        img = image.resize((w * 2, h * 2))
        return img
    output = replicate.run(
        'topazlabs/image-upscale',
        input=api_input,
    )
    output_bytes = output.read()
    return Image.open(io.BytesIO(output_bytes))


upscaled_image = upscale_image(example_image)
upscaled_image

### Generate the video

In [None]:
class ReplicateInputSchema(BaseModel):
    """https://replicate.com/kwaivgi/kling-v1.6-standard/api/schema"""

    prompt: str
    duration: Literal[5, 10] = Field(5, description='Duration of the video in seconds, default 5')
    cfg_scale: float = Field(0.5)
    start_image: AnyHttpUrl | io.BufferedReader = Field(
        ...,
        description='uri for the first frame of the video, or the file',
    )
    aspect_ratio: str = Field(
        '16:9',
        description='aspect ratio of the video, default 16:9,  Ignored if start_image is provided.',
    )
    nagative_prompt: str | None = None
    model_config = ConfigDict(arbitrary_types_allowed=True)


def generate_video_sync(video_prompt, start_image: str | Image.Image, duration: Literal[5, 10] = 10) -> bytes:
    """create a video generation task on replicate,
    and wait for the result"""
    if isinstance(start_image, Image.Image):
        img_byte_arr = io.BytesIO()
        start_image.save(img_byte_arr, format='JPEG')  # specify format as needed
        img_byte_arr.seek(0)  # rewind to the start of the buffer
        start_image = io.BufferedReader(img_byte_arr)

    input = ReplicateInputSchema(
        prompt=video_prompt,
        start_image=start_image,
        duration=duration,
    )
    input = input.model_dump(exclude_defaults=True, exclude_unset=True)
    input['start_image'] = start_image
    if use_mock():
        # just validate the input
        replicate.helpers.encode_json(replicate.Client(), input)
        # mock result
        MOCK_VIDEO = 'https://drive.google.com/uc?export=download&id=1FnelbUPsK9wuCBc9awJ6zH0ggkPnqwgd'
        response = requests.get(MOCK_VIDEO, stream=True)
        f = io.BufferedReader(io.BytesIO(response.content))
        return f.read()
    output = replicate.run('kwaivgi/kling-v1.6-standard', input=input)
    return output.read()


video_out = generate_video_sync('Not a real', upscaled_image)
display.display(display.Video(data=video_out, embed=True))

## The PIPELINE

In [None]:
@dataclass
class PipelineResult:
    init_prompt: str
    celeb_name: str
    celeb_name_duration: datetime.timedelta
    image: Image.Image
    image_duration: datetime.timedelta
    video_prompt: str
    video_prompt_duration: datetime.timedelta
    upscaled_image: Image.Image
    upscaled_image_duration: datetime.timedelta
    video_out: bytes
    video_out_duration: datetime.timedelta


def pipeline(init_prompt: str) -> PipelineResult | None:
    if init_prompt == '':
        logger.info('init_prompt is empty, do not run the pipeline')
        return
    logger.info('Extracting celebrity name from prompt...')

    _s = datetime.datetime.now()
    celeb_name = extract_celebrity_name(init_prompt)
    celeb_name_duration = datetime.datetime.now() - _s

    logger.info('Fetching celebrity image...')
    _s = datetime.datetime.now()
    image = get_image(celeb_name)
    image_duration = datetime.datetime.now() - _s
    if image is None:
        return None

    logger.info('Generating video prompt...')
    _s = datetime.datetime.now()
    video_prompt = generate_video_prompt(init_prompt, image)
    video_prompt_duration = datetime.datetime.now() - _s

    logger.info('Upscaling image...')
    _s = datetime.datetime.now()
    upscaled_image = upscale_image(image)
    upscaled_image_duration = datetime.datetime.now() - _s

    logger.info('Generating video...')
    _s = datetime.datetime.now()
    video_out = generate_video_sync(video_prompt, upscaled_image)
    video_out_duration = datetime.datetime.now() - _s
    logger.info('Pipeline completed successfully.')

    return PipelineResult(
        init_prompt=init_prompt,
        celeb_name=celeb_name,
        celeb_name_duration=celeb_name_duration,
        image=image,
        image_duration=image_duration,
        video_prompt=video_prompt,
        video_prompt_duration=video_prompt_duration,
        upscaled_image=upscaled_image,
        upscaled_image_duration=upscaled_image_duration,
        video_out=video_out,
        video_out_duration=video_out_duration,
    )


example_pipeline_output = pipeline('not really a prompt')

In [None]:
if 'google.colab' in sys.modules:
    drive.mount('/content/drive')


def get_drive_link(path: Path):
    # Have to use Google Drive API to create a shareable link
    # skip for now
    return 'file://' + str(path)


def store_pipeline_result(
    result: PipelineResult | None, base_path: Path, base_name: str | None = None
) -> tuple[dict, str]:
    if result is None:
        return {}, ''
    if base_name is None:
        base_name = str(uuid.uuid4())
        if use_mock():
            base_name = 'mock_' + base_name
    base_path.mkdir(parents=True, exist_ok=True)

    # File paths
    image_path = base_path / f'{base_name}_image.jpg'
    upscaled_image_path = base_path / f'{base_name}_upscaled.jpg'
    video_path = base_path / f'{base_name}_video.mp4'
    json_path = base_path / f'{base_name}_result.json'

    # Save images
    result.image.save(image_path)
    result.upscaled_image.save(upscaled_image_path)
    # Save video
    with open(video_path, 'wb') as f:
        f.write(result.video_out)

    if 'google.colab' in sys.modules:
        image_link = get_drive_link(image_path)
        upscaled_image_link = get_drive_link(upscaled_image_path)
        video_link = get_drive_link(video_path)
    else:
        image_link = 'file://' + str(image_path)
        upscaled_image_link = 'file://' + str(upscaled_image_path)
        video_link = 'file://' + str(video_path)

    # Save JSON
    result_dict = {
        'init_prompt': result.init_prompt,
        'celeb_name': result.celeb_name,
        'celeb_name_duration': result.celeb_name_duration.total_seconds(),
        'video_prompt': result.video_prompt,
        'video_prompt_duration': result.video_prompt_duration.total_seconds(),
        'image': image_link,
        'image_duration': result.image_duration.total_seconds(),
        'upscaled_image': upscaled_image_link,
        'upscaled_image_duration': result.upscaled_image_duration.total_seconds(),
        'video_out': video_link,
        'video_out_duration': result.video_out_duration.total_seconds(),
    }
    with open(json_path, 'w') as f:
        json.dump(result_dict, f, indent=2)
    return result_dict, json_path


store_pipeline_result(example_pipeline_output, Path('/tmp'), 'example')

## MAIN

This section actually get the job done: from init prompt, it will generate video and store in google drive.

Note that in this section, `USE_MOCK` has to set to False in order to produce real result.

In [None]:
os.environ['USE_MOCK'] = 'False'
assert use_mock() is False, 'USE_MOCK should be False'
user_prompt = input('Enter your prompt: ')


if 'google.colab' in sys.modules:
    base_path = Path('/content/drive/MyDrive/Projects/AssignmentActiveFence/results')
else:
    base_path = Path('.') / 'results'

pipeline_output = pipeline(user_prompt)
stored_data = store_pipeline_result(pipeline_output, base_path)
print(stored_data)