### <font color='#4285f4'>Overview</font>

This demo showcases the end-to-end creation of marketing videos using the power of generative AI. Leveraging a combination of Gemini and Vertex AI's, the process begins by brainstorming video concepts and crafting tailored Veo 2 prompts. Veo 2 then generates short video segments based on these prompts, which are seamlessly stitched together. Gemini then analyzes the compiled video to generate engaging voice over scripts. Finally, we utilize text-to-speech technology to create narrations in both English and French, resulting in a complete and captivating marketing video.

Process Flow:

1. Transform the Veo 2 comprehensive documentation into a set of clear instructions that specifically guide the creation of effective Veo 2 prompts.
2. Create several Veo 2 prompts using Gemini (pass in the Veo 2 instructions). 
    * a. A prompt should include "styles", "composition", "Ambiance & Emotions", "Cinematic effects" (doc). By having Gemini author our Veo 2 prompt we can get new ideas. Also, by passing in the Veo 2 instructions our prompt will use the proper techniques for Veo 2.
3. Call GenAI to generate videos via Veo 2.
    * a. Currently, GenAI creates 6 second videos. So we prompt Gemini for 6 second segments.
4. Merge the videos into one video.
5. Ask Gemini to watch the merged video and create a voice over script.
6. Use Vertex AI text-to-speech to generate the voice over using a British accent and a male voice.
7. Use Vertex AI text-to-speech to generate the voice over using a French accent and a male voice.
8. Overlay the text-to-speech to merged video.

Notes:

1. The notebook does merge the videos, but the quality is much better when you merge with a high quality video tool.
2. The text-to-speech does not always align perfectly with the video.
3. You can also adjust the speed of the text-to-speech to control how fast the words are spoken.

Cost:
* Veo 2: 50 cents per second of generated video
* Medium: Remember to stop your Colab Enterprise Notebook Runtime

Author: 
* Adam Paternostro

In [None]:
# Architecture Diagram
from IPython.display import Image
Image(url='https://storage.googleapis.com/data-analytics-golden-demo/chocolate-ai/v1/Artifacts/Campaign-Assets-Text-to-Video-01-Architecture.png', width=1200)

### <font color='#4285f4'>Video Walkthrough</font>

[![Video](https://storage.googleapis.com/data-analytics-golden-demo/chocolate-ai/v1/Videos/adam-paternostro-video.png)](https://storage.googleapis.com/data-analytics-golden-demo/chocolate-ai/v1/Videos/Campaign-Assets-Text-to-Video-01.mp4)


In [None]:
from IPython.display import HTML

HTML("""
<video width="800" height="600" controls>
  <source src="https://storage.googleapis.com/data-analytics-golden-demo/chocolate-ai/v1/Videos/Campaign-Assets-Text-to-Video-01.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")

### <font color='#4285f4'>License</font>

```
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
```

### <font color='#4285f4'>Pip installs</font>

In [None]:
# PIP Installs
import sys

# https://pypi.org/project/moviepy/
!{sys.executable} -m pip install moviepy

### <font color='#4285f4'>Initialize</font>

In [None]:
from PIL import Image
from IPython.display import HTML
from IPython.display import Audio
from functools import reduce
import IPython.display
import google.auth
import requests
import json
import uuid
import base64
import os
import cv2
import random
import time
import datetime
import base64
import random
import datetime

import logging
from tenacity import retry, wait_exponential, stop_after_attempt, before_sleep_log, retry_if_exception

In [None]:
# Set these (run this cell to verify the output)

bigquery_location = "${bigquery_location}"
region = "${region}"
location = "${location}"
storage_account = "${chocolate_ai_bucket}"
dataset_name = "${bigquery_chocolate_ai_dataset}"
public_storage_storage_account = "data-analytics-golden-demo"

# Get the current date and time
now = datetime.datetime.now()

# Format the date and time as desired
formatted_date = now.strftime("%Y-%m-%d-%H-%M")

# Get some values using gcloud
project_id = !(gcloud config get-value project)
user = !(gcloud auth list --filter=status:ACTIVE --format="value(account)")

if len(project_id) != 1:
  raise RuntimeError(f"project_id is not set: {project_id}")
project_id = project_id[0]

if len(user) != 1:
  raise RuntimeError(f"user is not set: {user}")
user = user[0]

print(f"project_id = {project_id}")
print(f"user = {user}")

### <font color='#4285f4'>Helper Methods</font>

#### restAPIHelper
Calls the Google Cloud REST API using the current users credentials.

In [None]:
def restAPIHelper(url: str, http_verb: str, request_body: str) -> str:
  """Calls the Google Cloud REST API passing in the current users credentials"""

  import requests
  import google.auth
  import json

  # Get an access token based upon the current user
  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request()
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
    "Content-Type" : "application/json",
    "Authorization" : "Bearer " + access_token
  }

  if http_verb == "GET":
    response = requests.get(url, headers=headers)
  elif http_verb == "POST":
    response = requests.post(url, json=request_body, headers=headers)
  elif http_verb == "PUT":
    response = requests.put(url, json=request_body, headers=headers)
  elif http_verb == "PATCH":
    response = requests.patch(url, json=request_body, headers=headers)
  elif http_verb == "DELETE":
    response = requests.delete(url, headers=headers)
  else:
    raise RuntimeError(f"Unknown HTTP verb: {http_verb}")

  if response.status_code == 200:
    return json.loads(response.content)
    #image_data = json.loads(response.content)["predictions"][0]["bytesBase64Encoded"]
  else:
    error = f"Error restAPIHelper -> ' Status: '{response.status_code}' Text: '{response.text}'"
    raise RuntimeError(error)

#### RetryCondition (for retrying LLM calls)

In [None]:
def RetryCondition(error):
  error_string = str(error)
  print(error_string)

  retry_errors = [
      "RESOURCE_EXHAUSTED",
      "No content in candidate",
      # Add more error messages here as needed
  ]

  for retry_error in retry_errors:
    if retry_error in error_string:
      print("Retrying...")
      return True

  return False

#### Gemini LLM (Pro 1.0 , Pro 1.5)

In [None]:
@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))
def GeminiLLM(prompt, model = "gemini-2.0-flash", response_schema = None,
                 temperature = 1, topP = 1, topK = 32):

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#supported_models
  # model = "gemini-2.0-flash"

  llm_response = None
  if temperature < 0:
    temperature = 0

  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request() # required to acess access token
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
      "Content-Type" : "application/json",
      "Authorization" : "Bearer " + access_token
  }

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
  url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:generateContent"

  generation_config = {
    "temperature": temperature,
    "topP": topP,
    "maxOutputTokens": 8192,
    "candidateCount": 1,
    "responseMimeType": "application/json",
  }

  # Add inthe response schema for when it is provided
  if response_schema is not None:
    generation_config["responseSchema"] = response_schema

  if model == "gemini-2.0-flash":
    generation_config["topK"] = topK

  payload = {
    "contents": {
      "role": "user",
      "parts": {
          "text": prompt
      },
    },
    "generation_config": {
      **generation_config
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }

  response = requests.post(url, json=payload, headers=headers)

  if response.status_code == 200:
    try:
      json_response = json.loads(response.content)
    except Exception as error:
      raise RuntimeError(f"An error occurred parsing the JSON: {error}")

    if "candidates" in json_response:
      candidates = json_response["candidates"]
      if len(candidates) > 0:
        candidate = candidates[0]
        if "content" in candidate:
          content = candidate["content"]
          if "parts" in content:
            parts = content["parts"]
            if len(parts):
              part = parts[0]
              if "text" in part:
                text = part["text"]
                llm_response = text
              else:
                raise RuntimeError("No text in part: {response.content}")
            else:
              raise RuntimeError("No parts in content: {response.content}")
          else:
            raise RuntimeError("No parts in content: {response.content}")
        else:
          raise RuntimeError("No content in candidate: {response.content}")
      else:
        raise RuntimeError("No candidates: {response.content}")
    else:
      raise RuntimeError("No candidates: {response.content}")

    # Remove some typically response characters (if asking for a JSON reply)
    llm_response = llm_response.replace("```json","")
    llm_response = llm_response.replace("```","")
    llm_response = llm_response.replace("\n","")

    return llm_response

  else:
    raise RuntimeError(f"Error with prompt:'{prompt}'  Status:'{response.status_code}' Text:'{response.text}'")

#### Gemini LLM - Multimodal

In [None]:
@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))
def GeminiLLM_Multimodal(multimodal_prompt_list, model = "gemini-2.0-flash", response_schema = None,
                 temperature = 1, topP = 1, topK = 32):

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#supported_models
  # model = "gemini-2.0-flash"

  llm_response = None
  if temperature < 0:
    temperature = 0

  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request() # required to acess access token
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
      "Content-Type" : "application/json",
      "Authorization" : "Bearer " + access_token
  }

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
  url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:generateContent"

  generation_config = {
    "temperature": temperature,
    "topP": topP,
    "maxOutputTokens": 8192,
    "candidateCount": 1,
    "responseMimeType": "application/json",
  }

  # Add inthe response schema for when it is provided
  if response_schema is not None:
    generation_config["responseSchema"] = response_schema

  if model == "gemini-2.0-flash" or model == "gemini-1.0-pro":
    generation_config["topK"] = topK

  payload = {
    "contents": {
      "role": "user",
      "parts": multimodal_prompt_list
    },
    "generation_config": {
      **generation_config
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }

  response = requests.post(url, json=payload, headers=headers)

  if response.status_code == 200:
    try:
      json_response = json.loads(response.content)
    except Exception as error:
      raise RuntimeError(f"An error occurred parsing the JSON: {error}")

    if "candidates" in json_response:
      candidates = json_response["candidates"]
      if len(candidates) > 0:
        candidate = candidates[0]
        if "content" in candidate:
          content = candidate["content"]
          if "parts" in content:
            parts = content["parts"]
            if len(parts):
              part = parts[0]
              if "text" in part:
                text = part["text"]
                llm_response = text
              else:
                raise RuntimeError("No text in part: {response.content}")
            else:
              raise RuntimeError("No parts in content: {response.content}")
          else:
            raise RuntimeError("No parts in content: {response.content}")
        else:
          raise RuntimeError("No content in candidate: {response.content}")
      else:
        raise RuntimeError("No candidates: {response.content}")
    else:
      raise RuntimeError("No candidates: {response.content}")

    # Remove some typically response characters (if asking for a JSON reply)
    llm_response = llm_response.replace("```json","")
    llm_response = llm_response.replace("```","")
    llm_response = llm_response.replace("\n","")

    return llm_response

  else:
    raise RuntimeError(f"Error with prompt:'{multimodal_prompt_list[0]}'  Status:'{response.status_code}' Text:'{response.text}'")

#### Text to Speech Generation

In [None]:
def TextToSpeechLanguageList(language_code):
  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request()
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
      "Content-Type" : "application/json",
      "Authorization" : "Bearer " + access_token,
      "x-goog-user-project" : project
  }

  # https://cloud.google.com/text-to-speech/docs/reference/rest/v1/voices/list
  url = f"https://texttospeech.googleapis.com/v1/voices?languageCode={language_code}"

  response = requests.get(url, headers=headers)

  if response.status_code == 200:
    return response.text
  else:
    error = f"Error with language_code:'{language_code}'  Status:'{response.status_code}' Text:'{response.text}'"
    raise RuntimeError(error)

In [None]:
def TextToSpeech(local_filename, text, language_code, language_code_name, ssml_gender, speaking_rate = 1):
  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request()
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
      "Content-Type" : "application/json",
      "Authorization" : "Bearer " + access_token,
      "x-goog-user-project" : project
  }

  # https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize
  url = f"https://texttospeech.googleapis.com/v1/text:synthesize"

  payload = {
   "input": {
      "text": text
   },
   "voice": {
      "languageCode": language_code,
      "name": language_code_name,
      "ssmlGender": ssml_gender # FEMALE | MALE
   },
   "audioConfig": {
      "audioEncoding": "MP3",
      "speakingRate": speaking_rate,
   }
  }

  response = requests.post(url, json=payload, headers=headers)

  if response.status_code == 200:
    audio_data = json.loads(response.content)["audioContent"]
    audio_data = base64.b64decode(audio_data)
    with open(local_filename, "wb") as f:
      f.write(audio_data)
    print(f"Audio generated OK.")
    return local_filename
  else:
    error = f"Error with text:'{text}'  Status:'{response.status_code}' Text:'{response.text}'"
    raise RuntimeError(error)

#### Overlay text to speech to video

In [None]:
from moviepy.editor import VideoFileClip, AudioFileClip

def MergeVideoAndAudio(video_filename, audio_filename, output_filename):
  # Load the video and audio files
  video = VideoFileClip(video_filename)
  audio = AudioFileClip(audio_filename)

  # Combine the video and audio
  final_clip = video.set_audio(audio)

  # Save the combined video
  final_clip.write_videofile(output_filename)

#### GetMenuItems

In [None]:
def GetMenuItems():
  sql = f"""SELECT menu_id, menu_name, menu_description
  FROM `{dataset_name}.menu`
  ORDER BY menu_id"""

  result_df = RunQuery(sql)
  result_list = []

  for index, row in result_df.iterrows():
    result_list.append({
        "menu_id": row['menu_id'],
        "menu_name": row['menu_name'],
        "menu_description": row['menu_description']
    })

  return result_list

#### Combine Videos

In [None]:
from moviepy.editor import VideoFileClip, concatenate_videoclips

def merge_videos_sorted(folder_path, output_video_name):
  """
  Merges all MP4 video files in the specified folder into a single video,
  sorted by file name.

  Args:
      folder_path: The path to the folder containing the videos.
  """

  video_files = [f for f in os.listdir(folder_path) if f.endswith('.mp4')]
  video_files.sort()  # Sort the files by name

  clips = [VideoFileClip(os.path.join(folder_path, video)) for video in video_files]

  final_clip = concatenate_videoclips(clips)
  final_clip.write_videofile(os.path.join(folder_path, output_video_name))

#### Helper Functions

In [None]:
def RunQuery(sql):
  import time
  from google.cloud import bigquery
  client = bigquery.Client()

  if (sql.startswith("SELECT") or sql.startswith("WITH")):
      df_result = client.query(sql).to_dataframe()
      return df_result
  else:
    job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.INTERACTIVE)
    query_job = client.query(sql, job_config=job_config)

    # Check on the progress by getting the job's updated state.
    query_job = client.get_job(
        query_job.job_id, location=query_job.location
    )
    print("Job {} is currently in state {} with error result of {}".format(query_job.job_id, query_job.state, query_job.error_result))

    while query_job.state != "DONE":
      time.sleep(2)
      query_job = client.get_job(
          query_job.job_id, location=query_job.location
          )
      print("Job {} is currently in state {} with error result of {}".format(query_job.job_id, query_job.state, query_job.error_result))

    if query_job.error_result == None:
      return True
    else:
      raise Exception(query_job.error_result)

In [None]:
# This was generated by GenAI

def copy_file_to_gcs(local_file_path, bucket_name, destination_blob_name):
  """Copies a file from a local drive to a GCS bucket.

  Args:
      local_file_path: The full path to the local file.
      bucket_name: The name of the GCS bucket to upload to.
      destination_blob_name: The desired name of the uploaded file in the bucket.

  Returns:
      None
  """

  import os
  from google.cloud import storage

  # Ensure the file exists locally
  if not os.path.exists(local_file_path):
      raise FileNotFoundError(f"Local file '{local_file_path}' not found.")

  # Create a storage client
  storage_client = storage.Client()

  # Get a reference to the bucket
  bucket = storage_client.bucket(bucket_name)

  # Create a blob object with the desired destination path
  blob = bucket.blob(destination_blob_name)

  # Upload the file from the local filesystem
  content_type = ""
  if local_file_path.endswith(".html"):
    content_type = "text/html; charset=utf-8"

  if local_file_path.endswith(".json"):
    content_type = "application/json; charset=utf-8"

  if content_type == "":
    blob.upload_from_filename(local_file_path)
  else:
    blob.upload_from_filename(local_file_path, content_type = content_type)

  print(f"File '{local_file_path}' uploaded to GCS bucket '{bucket_name}' as '{destination_blob_name}.  Content-Type: {content_type}'.")

In [None]:
def download_from_gcs(destination_file_name, gcs_storage_bucket, object_name):
  # prompt: Write python code to download a blob from a gcs bucket.  do not use the requests method

  from google.cloud import storage
  storage_client = storage.Client()
  bucket = storage_client.bucket(gcs_storage_bucket)

  # Construct a client side representation of a blob.
  # Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
  # any content from Google Cloud Storage. As we don't need additional data,
  # using `Bucket.blob` is preferred here.
  blob = bucket.blob(object_name)
  blob.download_to_filename(destination_file_name)

  print(
      "Downloaded storage object {} from bucket {} to local file {}.".format(
          object_name, gcs_storage_bucket, destination_file_name
      )
  )

In [None]:
# prompt: python to delete a file even if it does not exist

def delete_file(filename):
  try:
    os.remove(filename)
    print(f"File '{filename}' deleted successfully.")
  except FileNotFoundError:
    print(f"File '{filename}' not found.")

In [None]:
def PrettyPrintJson(json_string):
  json_object = json.loads(json_string)
  json_formatted_str = json.dumps(json_object, indent=2)
  # print(json_formatted_str)
  return json_formatted_str

### <font color='#4285f4'>Teach the LLM how to write Veo 2 prompts</font>

In [None]:
# We need to tell the LLM how to write text-to-video prompts

text_to_video_prompt_guide = """
Text-to-Video Prompt Writing Help:
<text-to-video-prompt-guide>
Here are some our best practices for text-to-video prompts:

Detailed prompts = better videos:
  - More details you add, the more control you’ll have over the video.
  - A prompt should look like this: "Camera dollies to show a close up of a desperate man in a green trench coat is making a call on a rotary style wall-phone, green neon light, movie scene."
    - Here is a break down of elements need to create a text-to-video prompt using the above prompt as an example:
      - "Camera dollies to show" = "Camera Motion"
      - "A close up of" = "Composition"
      - "A desperate man in a green trench coat" = "Subject"
      - "Is making a call" = "Action"
      - "On a roary style wall-phone" = "Scene"
      - "Green Neon light" = "Ambiance"
      - "Movie Scene" = "Style"

Use the right keywords for better control:
  - Here is a list of some keywords that work well with text-to-video, use these in your prompts to get the desired camera motion or style.
  - Subject: Who or what is the main focus of the shot.  Example: "happy woman in her 30s".
  - Scene: Where is the location of the shot. Example "on a busy street, in space".
  - Action: What is the subject doing Examples: "walking", "running", "turning head".
  - Camera Motion: What the camera is doing. Example: "POV shot", "Aerial View", "Tracking Drone view", "Tracking Shot".

Example text-to-video prompt using the above keywords:
  - Example text-to-video prompt: "Tracking drone view of a man driving a red convertible car in Palm Springs, 1970s, warm sunlight, long shadows"
  - Example text-to-video prompt: "A POV shot from a vintage car driving in the rain, Canada at night, cinematic"

Styles:
   - Overall aesthetic. Consider using specific film style keywords.  Examples: "horror film", "film noir, "animated styles", "3D cartoon style render".
  - Example text-to-video prompt: "Over the shoulder of a young woman in a car, 1970s, film grain, horror film, cinematic he Film noir style, man and woman walk on the street, mystery, cinematic, black and white"
  - Example text-to-video prompt: "A cute creatures with snow leopard-like fur is walking in winter forest, 3D cartoon style render. An architectural rendering of a white concrete apartment building with flowing organic shapes, seamlessly blending with lush greenery and futuristic elements."

Composition:
  - How the shot is framed. This is often relative to the subject e.g. wide shot, close-up, low angle
  - Example text-to-video prompt: "Extreme close-up of a an eye with city reflected in it. A wide shot of surfer walking on a beach with a surfboard, beautiful sunset, cinematic"

Ambiance & Emotions:
  - How the color and light contribute to the scene (blue tones, night)
  - Example text-to-video prompt: "A close-up of a girl holding adorable golden retriever puppy in the park, sunlight Cinematic close-up shot of a sad woman riding a bus in the rain, cool blue tones, sad mood"

Cinematic effects:
  - e.g. double exposure, projected, glitch camera effect.
  - Example text-to-video prompt: "A double exposure of silhouetted profile of a woman walking and lake, walking in a forest Close-up shot of a model with blue light with geometric shapes projected on her face"
  - Example text-to-video prompt: "Silhouette of a man walking in collage of cityscapes Glitch camera effect, close up of woman’s face speaking, neon colors"
</text-to-video-prompt-guide>
"""

### <font color='#4285f4'>Generate the Veo 2 Prompts using Gemini (let Gemini write the prompts for us)</font>

In [None]:
text_to_video_prompts = []

##### Text-to-Video: Random Menu item prompts



In [None]:
# Write me the json in  OpenAPI 3.0 schema object for the below object.
# Make all fields required.
#  {
#    "text-to-video-prompt" : "text"
#  }
response_schema = {
  "type": "object",
  "required": [
    "text-to-video-prompt"
  ],
  "properties": {
    "text-to-video-prompt": {
      "type": "string"
    }
  }
}

menu_item_list = GetMenuItems()
menu_id = random.randint(1, len(menu_item_list))
menu_name = menu_item_list[menu_id-1]["menu_name"]
menu_description = menu_item_list[menu_id-1]["menu_description"]

gemini_ad_prompt = f"""You are a marketing expert and are creating a marketing video for a company that sells premium handcrafted chocolates, handmade desserts and delicious coffee drinks.
You need to create a marketing video for the company.
The company is based in Paris, France.

The video should be about the following menu item:
{menu_name}: {menu_description}

The video should show the menu item being made.
Focus the video on the finishing touches like sprinkling chocolate, adding a layer of cream, adding a layer of sugar, adding a layer of coffee, adding a layer of milk.

Output Fields:
- "text-to-video-prompt":
 - Read the  "Text-to-Video Prompt Writing Help" to learn more about how to create good text-to-video prompts.
 - Make sure you include all the relevant best practices when creating the text-to-video prompt:
 - A detailed prompt for generating the video using text-to-video technology.
 - Focus on creating the menu item with an artistic flair.
 - Do not include "text overlays" in the text-to-video prompt.
 - The prompt should also reference that we are in a chocolate, dessert, coffee shop in Paris so it knows the context of the video.
 - Do not include children in the text-to-video prompt.

{text_to_video_prompt_guide}
"""

#print(gemini_ad_prompt)
llm_result = GeminiLLM(gemini_ad_prompt, response_schema=response_schema)
gemini_ad_results_dict = json.loads(llm_result)
orginal_ad_results_dict = gemini_ad_results_dict # in case we swap it for a pre-canned one
print(f"Menu Name: {menu_name} - {menu_description}")
print()
print(PrettyPrintJson(json.dumps(gemini_ad_results_dict)))
text_to_video_prompts.append(gemini_ad_results_dict)

##### Text-to-Video: Swan cake with Friends

In [None]:
# Write me the json in  OpenAPI 3.0 schema object for the below object.
# Make all fields required.
#  {
#    "text-to-video-prompt" : "text"
#  }
response_schema = {
  "type": "object",
  "required": [
    "text-to-video-prompt"
  ],
  "properties": {
    "text-to-video-prompt": {
      "type": "string"
    }
  }
}

menu_item_list = GetMenuItems()
menu_id = random.randint(1, len(menu_item_list))
menu_id = 8
menu_name = menu_item_list[menu_id-1]["menu_name"]
menu_description = menu_item_list[menu_id-1]["menu_description"]

gemini_ad_prompt = f"""You are a marketing expert and are creating a marketing video for a company that sells premium handcrafted chocolates, handmade desserts and delicious coffee drinks.
You need to create a marketing video for the company.
The company is based in Paris, France.

Show people around a table with a Chocolate Swan Cake in the middle.
The people should be having fun like it is a party.
The party needs to be upscale.

The video should be about the following menu item:
{menu_name}: {menu_description}

Output Fields:
- "text-to-video-prompt":
 - Read the  "Text-to-Video Prompt Writing Help" to learn more about how to create good text-to-video prompts.
 - Make sure you include all the relevant best practices when creating the text-to-video prompt:
 - A detailed prompt for generating the video using text-to-video technology.
 - Focus on creating the menu item with an artistic flair.
 - Do not include "text overlays" in the text-to-video prompt.
 - The prompt should also reference that we are in a chocolate, dessert, coffee shop in Paris so it knows the context of the video.
 - Do not include children in the text-to-video prompt.

{text_to_video_prompt_guide}
"""

#print(gemini_ad_prompt)
llm_result = GeminiLLM(gemini_ad_prompt, response_schema=response_schema)
gemini_ad_results_dict = json.loads(llm_result)
orginal_ad_results_dict = gemini_ad_results_dict # in case we swap it for a pre-canned one
print(f"Menu Name: {menu_name} - {menu_description}")
print()
print(PrettyPrintJson(json.dumps(gemini_ad_results_dict)))
text_to_video_prompts.append(gemini_ad_results_dict)

##### Text-to-Video: Friends at Picnic

In [None]:
# Write me the json in  OpenAPI 3.0 schema object for the below object.
# Make all fields required.
#  {
#    "text-to-video-prompt" : "text"
#  }
response_schema = {
  "type": "object",
  "required": [
    "text-to-video-prompt"
  ],
  "properties": {
    "text-to-video-prompt": {
      "type": "string"
    }
  }
}

menu_item_list = GetMenuItems()
menu_id = random.randint(1, len(menu_item_list))
menu_name = menu_item_list[menu_id-1]["menu_name"]
menu_description = menu_item_list[menu_id-1]["menu_description"]

gemini_ad_prompt = f"""You are a marketing expert and are creating a marketing video for a company that sells premium handcrafted chocolates, handmade desserts and delicious coffee drinks.
You need to create a marketing video for the company.
The company is based in Paris, France.

A group of friends eating "{menu_name}" at a picnic in a French park.

The video should be about the following menu item:
{menu_name}: {menu_description}

Output Fields:
- "text-to-video-prompt":
 - Read the  "Text-to-Video Prompt Writing Help" to learn more about how to create good text-to-video prompts.
 - Make sure you include all the relevant best practices when creating the text-to-video prompt:
 - A detailed prompt for generating the video using text-to-video technology.
 - Focus on creating the menu item with an artistic flair.
 - Do not include "text overlays" in the text-to-video prompt.
 - The prompt should also reference that we are in a chocolate, dessert, coffee shop in Paris so it knows the context of the video.
 - Do not include children in the text-to-video prompt.

{text_to_video_prompt_guide}
"""

#print(gemini_ad_prompt)
llm_result = GeminiLLM(gemini_ad_prompt, response_schema=response_schema)
gemini_ad_results_dict = json.loads(llm_result)
orginal_ad_results_dict = gemini_ad_results_dict # in case we swap it for a pre-canned one
print(f"Menu Name: {menu_name} - {menu_description}")
print()
print(PrettyPrintJson(json.dumps(gemini_ad_results_dict)))
text_to_video_prompts.append(gemini_ad_results_dict)

##### Text-to-Video: Chocolatier in Front of Window

In [None]:
# Write me the json in  OpenAPI 3.0 schema object for the below object.
# Make all fields required.
#  {
#    "text-to-video-prompt" : "text"
#  }
response_schema = {
  "type": "object",
  "required": [
    "text-to-video-prompt"
  ],
  "properties": {
    "text-to-video-prompt": {
      "type": "string"
    }
  }
}

menu_item_list = GetMenuItems()
menu_id = random.randint(1, len(menu_item_list))
menu_name = menu_item_list[menu_id-1]["menu_name"]
menu_description = menu_item_list[menu_id-1]["menu_description"]

gemini_ad_prompt = f"""You are a marketing expert and are creating a marketing video for a company that sells premium handcrafted chocolates, handmade desserts and delicious coffee drinks.
You need to create a marketing video for the company.
The company is based in Paris, France.

A chocolatier chef preparing a "{menu_name}" and looking out a large picturesque window.
The video should pan from the chef to the window.

The video should be about the following menu item:
{menu_name}: {menu_description}

Output Fields:
- "text-to-video-prompt":
 - Read the  "Text-to-Video Prompt Writing Help" to learn more about how to create good text-to-video prompts.
 - Make sure you include all the relevant best practices when creating the text-to-video prompt:
 - A detailed prompt for generating the video using text-to-video technology.
 - Focus on creating the menu item with an artistic flair.
 - Do not include "text overlays" in the text-to-video prompt.
 - The prompt should also reference that we are in a chocolate, dessert, coffee shop in Paris so it knows the context of the video.
 - Do not include children in the text-to-video prompt.

{text_to_video_prompt_guide}
"""

#print(gemini_ad_prompt)
llm_result = GeminiLLM(gemini_ad_prompt, response_schema=response_schema)
gemini_ad_results_dict = json.loads(llm_result)
orginal_ad_results_dict = gemini_ad_results_dict # in case we swap it for a pre-canned one
print(f"Menu Name: {menu_name} - {menu_description}")
print()
print(PrettyPrintJson(json.dumps(gemini_ad_results_dict)))
text_to_video_prompts.append(gemini_ad_results_dict)

##### Text-to-Video: Friends at an Outdoor Chocolate Buffet

In [None]:
# Write me the json in  OpenAPI 3.0 schema object for the below object.
# Make all fields required.
#  {
#    "text-to-video-prompt" : "text"
#  }
response_schema = {
  "type": "object",
  "required": [
    "text-to-video-prompt"
  ],
  "properties": {
    "text-to-video-prompt": {
      "type": "string"
    }
  }
}

menu_item_list = GetMenuItems()
menu_id = random.randint(1, len(menu_item_list))
menu_name = menu_item_list[menu_id-1]["menu_name"]
menu_description = menu_item_list[menu_id-1]["menu_description"]

gemini_ad_prompt = f"""You are a marketing expert and are creating a marketing video for a company that sells premium handcrafted chocolates, handmade desserts and delicious coffee drinks.
You need to create a marketing video for the company.
The company is based in Paris, France.

A group of friends at an outdoor buffet that has custom chocolates and desserts on the streets of Paris.
They are enjoying the sunshine and eating the below menu items.

The video should be about the following menu items:
{menu_name}: {menu_description}
{menu_item_list[menu_id]["menu_name"]}: {menu_item_list[menu_id]["menu_description"]}
{menu_item_list[menu_id+1]["menu_name"]}: {menu_item_list[menu_id+1]["menu_description"]}
{menu_item_list[menu_id+2]["menu_name"]}: {menu_item_list[menu_id+2]["menu_description"]}


Output Fields:
- "text-to-video-prompt":
 - Read the  "Text-to-Video Prompt Writing Help" to learn more about how to create good text-to-video prompts.
 - Make sure you include all the relevant best practices when creating the text-to-video prompt:
 - A detailed prompt for generating the video using text-to-video technology.
 - Focus on creating the menu item with an artistic flair.
 - Do not include "text overlays" in the text-to-video prompt.
 - The prompt should also reference that we are in a chocolate, dessert, coffee shop in Paris so it knows the context of the video.
 - Do not include children in the text-to-video prompt.

{text_to_video_prompt_guide}
"""

#print(gemini_ad_prompt)
llm_result = GeminiLLM(gemini_ad_prompt, response_schema=response_schema)
gemini_ad_results_dict = json.loads(llm_result)
orginal_ad_results_dict = gemini_ad_results_dict # in case we swap it for a pre-canned one
print(f"Menu Name: {menu_name} - {menu_description}")
print()
print(PrettyPrintJson(json.dumps(gemini_ad_results_dict)))
text_to_video_prompts.append(gemini_ad_results_dict)

##### Text-to-Video: Discovering the Chocolate Shop

In [None]:
# Write me the json in  OpenAPI 3.0 schema object for the below object.
# Make all fields required.
#  {
#    "text-to-video-prompt" : "text"
#  }
response_schema = {
  "type": "object",
  "required": [
    "text-to-video-prompt"
  ],
  "properties": {
    "text-to-video-prompt": {
      "type": "string"
    }
  }
}

menu_item_list = GetMenuItems()
menu_id = random.randint(1, len(menu_item_list))
menu_name = menu_item_list[menu_id-1]["menu_name"]
menu_description = menu_item_list[menu_id-1]["menu_description"]

gemini_ad_prompt = f"""You are a marketing expert and are creating a marketing video for a company that sells premium handcrafted chocolates, handmade desserts and delicious coffee drinks.
You need to create a marketing video for the company.
The company is based in Paris, France.

Hidden in the winding streets of Paris there is a small but beautiful shop that sells chocolates and desserts.
Two friends discover it while walking down the street.
They look in the store window and see beautiful chocolates and desserts.


Output Fields:
- "text-to-video-prompt":
 - Read the  "Text-to-Video Prompt Writing Help" to learn more about how to create good text-to-video prompts.
 - Make sure you include all the relevant best practices when creating the text-to-video prompt:
 - A detailed prompt for generating the video using text-to-video technology.
 - Focus on creating the menu item with an artistic flair.
 - Do not include "text overlays" in the text-to-video prompt.
 - The prompt should also reference that we are in a chocolate, dessert, coffee shop in Paris so it knows the context of the video.
 - Do not include children in the text-to-video prompt.

{text_to_video_prompt_guide}
"""

#print(gemini_ad_prompt)
llm_result = GeminiLLM(gemini_ad_prompt, response_schema=response_schema)
gemini_ad_results_dict = json.loads(llm_result)
orginal_ad_results_dict = gemini_ad_results_dict # in case we swap it for a pre-canned one
print(f"Menu Name: {menu_name} - {menu_description}")
print()
print(PrettyPrintJson(json.dumps(gemini_ad_results_dict)))
text_to_video_prompts.append(gemini_ad_results_dict)

##### Text-to-Video: The inside of a display of chocolates and desserts.

In [None]:
# Write me the json in  OpenAPI 3.0 schema object for the below object.
# Make all fields required.
#  {
#    "text-to-video-prompt" : "text"
#  }
response_schema = {
  "type": "object",
  "required": [
    "text-to-video-prompt"
  ],
  "properties": {
    "text-to-video-prompt": {
      "type": "string"
    }
  }
}

menu_item_list = GetMenuItems()
menu_id = random.randint(1, len(menu_item_list))
menu_name = menu_item_list[menu_id-1]["menu_name"]
menu_description = menu_item_list[menu_id-1]["menu_description"]

gemini_ad_prompt = f"""You are a marketing expert and are creating a marketing video for a company that sells premium handcrafted chocolates, handmade desserts and delicious coffee drinks.
You need to create a marketing video for the company.
The company is based in Paris, France.

The inside of a display of chocolates and desserts.

Output Fields:
- "text-to-video-prompt":
 - Read the  "Text-to-Video Prompt Writing Help" to learn more about how to create good text-to-video prompts.
 - Make sure you include all the relevant best practices when creating the text-to-video prompt:
 - A detailed prompt for generating the video using text-to-video technology.
 - Focus on creating the menu item with an artistic flair.
 - Do not include "text overlays" in the text-to-video prompt.
 - The prompt should also reference that we are in a chocolate, dessert, coffee shop in Paris so it knows the context of the video.
 - Do not include children in the text-to-video prompt.

{text_to_video_prompt_guide}
"""

#print(gemini_ad_prompt)
llm_result = GeminiLLM(gemini_ad_prompt, response_schema=response_schema)
gemini_ad_results_dict = json.loads(llm_result)
orginal_ad_results_dict = gemini_ad_results_dict # in case we swap it for a pre-canned one
print(f"Menu Name: {menu_name} - {menu_description}")
print()
print(PrettyPrintJson(json.dumps(gemini_ad_results_dict)))
text_to_video_prompts.append(gemini_ad_results_dict)

##### Text-to-Video: Show inside of a beautiful Paris chocolate shop with chocolate truffles being made.

In [None]:
# Write me the json in  OpenAPI 3.0 schema object for the below object.
# Make all fields required.
#  {
#    "text-to-video-prompt" : "text"
#  }
response_schema = {
  "type": "object",
  "required": [
    "text-to-video-prompt"
  ],
  "properties": {
    "text-to-video-prompt": {
      "type": "string"
    }
  }
}

menu_item_list = GetMenuItems()
menu_id = random.randint(1, len(menu_item_list))
menu_name = menu_item_list[menu_id-1]["menu_name"]
menu_description = menu_item_list[menu_id-1]["menu_description"]

gemini_ad_prompt = f"""You are a marketing expert and are creating a marketing video for a company that sells premium handcrafted chocolates, handmade desserts and delicious coffee drinks.
You need to create a marketing video for the company.
The company is based in Paris, France.

Show inside of a beautiful Paris chocolate shop with chocolate truffles being made.
Dark chocolate decorated with a white drizzle.
The item should be like a piece of art work and be very detailed.
Show the finishing touches of the item.


Output Fields:
- "text-to-video-prompt":
 - Read the  "Text-to-Video Prompt Writing Help" to learn more about how to create good text-to-video prompts.
 - Make sure you include all the relevant best practices when creating the text-to-video prompt:
 - A detailed prompt for generating the video using text-to-video technology.
 - Focus on creating the menu item with an artistic flair.
 - Do not include "text overlays" in the text-to-video prompt.
 - The prompt should also reference that we are in a chocolate, dessert, coffee shop in Paris so it knows the context of the video.
 - Do not include children in the text-to-video prompt.

{text_to_video_prompt_guide}
"""

#print(gemini_ad_prompt)
llm_result = GeminiLLM(gemini_ad_prompt, response_schema=response_schema)
gemini_ad_results_dict = json.loads(llm_result)
orginal_ad_results_dict = gemini_ad_results_dict # in case we swap it for a pre-canned one
print(f"Menu Name: {menu_name} - {menu_description}")
print()
print(PrettyPrintJson(json.dumps(gemini_ad_results_dict)))
text_to_video_prompts.append(gemini_ad_results_dict)

### <font color='#4285f4'>View Generate Text-to-Video Prompts</font>

In [None]:
for item in text_to_video_prompts:
  print("------------------------------------------------------------------------------------------------------------------------------------")
  print(item["text-to-video-prompt"])
  print("------------------------------------------------------------------------------------------------------------------------------------")

### <font color='#4285f4'>Create local working directories</font>

In [None]:
# Create a directory to download the videos from GCS and a directory to combine the text to speech and videos
directory_text_to_video = f"text-to-video-{formatted_date}"
directory_text_to_speech = f"text-to-speech-{formatted_date}"
directory_video_and_audio = f"video-and-audio-{formatted_date}"
directory_full_video = f"full-video-{formatted_date}"

# Holds the local video files (no audio)
os.makedirs(directory_text_to_video, exist_ok=True)
directory_text_to_video_path = os.getcwd() + f"/{directory_text_to_video}/"
print(f"directory_text_to_video_path: {directory_text_to_video_path}")

# Holds the audio file(s)
os.makedirs(directory_text_to_speech, exist_ok=True)
directory_text_to_speech_path = os.getcwd() + f"/{directory_text_to_speech}/"
print(f"directory_text_to_speech_path: {directory_text_to_speech_path}")

# Full video
os.makedirs(directory_full_video, exist_ok=True)
directory_full_video_path = os.getcwd() + f"/{directory_full_video}/"
print(f"directory_full_video_path: {directory_full_video_path}")

full_video_filename_with_audio_en_GB = directory_full_video_path + "full-video-with-audio-en-GB.mp4"
print(f"full_video_filename_with_audio_en_GB: {full_video_filename_with_audio_en_GB}")

full_video_filename_with_audio_fr_FR = directory_full_video_path + "full-video-with-audio-fr-FR.mp4"
print(f"full_video_filename_with_audio_fr_FR: {full_video_filename_with_audio_fr_FR}")

full_video_filename_no_audio = directory_full_video_path + "full-video-no-audio.mp4"
print(f"full_video_filename_no_audio: {full_video_filename_no_audio}")

### <font color='#4285f4'>Video Generation</font>

##### Function generateVideo (Call Veo 2 REST API)

In [None]:
def generateVideo(prompt, storage_account, output_gcs_path):
  """Calls text-to-video to create the video and waits for the output (which can be several minutes).  Saves the prompt/parameters with the vidoe.  Returns the outputted path."""

  full_output_gcs_path = f"gs://{storage_account}/{output_gcs_path}"
  model = "veo-2.0-generate-001"
  url = f"https://{location}-aiplatform.googleapis.com/v1beta1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:predictLongRunning"

  request_body = {
      "instances": [
          {
              "prompt": prompt
          }
        ],
      "parameters": {
          "storageUri": full_output_gcs_path,
          "aspectRatio":"16:9"
          }
      }

  rest_api_parameters = request_body.copy()

  print(f"url: {url}")
  print(f"request_body: {request_body}")
  json_result = restAPIHelper(url, "POST", request_body)
  print(f"json_result: {json_result}")
  operation_name = json_result["name"] # odd this is name

  url = f"https://{location}-aiplatform.googleapis.com/v1beta1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:fetchPredictOperation"

  request_body = {
      "operationName": operation_name
      }

  status = False
  # {
  # "name": "projects/chocolate-ai-demo-xxxxxx/locations/us-central1/publishers/google/models/veo-2.0-generate-001/operations/6d737b7c-5824-4f44-bc58-2e8d8226d2c2",
  # "done": True,
  # "response": {
  #      "@type": "type.googleapis.com/cloud.ai.large_models.vision.GenerateVideoResponse",
  #      "raiMediaFilteredCount": 0,
  #      "videos": [
  #          {
  #              "gcsUri": "gs: //chocolate-ai-data-xxxxxx/text-to-video/text-to-video-2025-04-15-13-59/9874965778463625250/sample_0.mp4",
  #              "mimeType": "video/mp4"
  #          }
  #      ]
  #  }
  # }

  while status == False:
    time.sleep(10)
    print(f"url: {url}")
    print(f"request_body: {request_body}")
    json_result = restAPIHelper(url, "POST", request_body)
    print(f"json_result: {json_result}")
    if "done" in json_result:
      status = bool(json_result["done"]) # in the future might be a status of running
    else:
      print("Status 'done' JSON attribute not present.  Assuming not done...")

  # Get the filename of our video
  filename = json_result["response"]["videos"][0]["gcsUri"]

  # Save our prompt (this was we know what we used to generate the video)
  json_filename = "text-to-video-prompt.json"
  with open(json_filename, "w") as f:
    f.write(json.dumps(rest_api_parameters))

  # get the random number directory from text-to-video
  text_to_video_output_directory = filename.replace(full_output_gcs_path,"")
  text_to_video_output_directory = text_to_video_output_directory.split("/")[1]
  text_to_video_output_directory

  # Write the prompt to the same path as our outputted video.  Saving the prompt allow us to know how to regenerate it (you should also save the seed and any other settings)
  copy_file_to_gcs(json_filename, storage_account, f"{output_gcs_path}/{text_to_video_output_directory}/{json_filename}")
  delete_file(json_filename)

  return filename

#### Use pre-geneated videos

##### Story 1 (Friends)

In [None]:
# Pre-Generated Videos (text-to-video)
# This download the files from a public bucket and then upload them as though text-to-video generated them
"""
video_and_audio_processing = []

# If you want to use the pre-generate video this will use data from the public storage account
video_and_audio_processing.append({
    "video-filename" : "text-to-video-01.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-02.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-03.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-04.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-05.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-06.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-07.mp4"
})

# Download the videos to the notebook computer
for item in video_and_audio_processing:
  download_from_gcs(f"{directory_text_to_video_path}{item['video-filename']}", "data-analytics-golden-demo", f"chocolate-ai/v1/Campaign-Assets-Text-to-Video-01/story-01/{item['video-filename']}")

  # Simulate that text-to-video generated the videos (pretend that we have the files outputed from text-to-video on storage)
  copy_file_to_gcs(f"{directory_text_to_video_path}{item['video-filename']}", storage_account, f"chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/{item['video-filename']}")
"""

##### Story 2 (Swan Cake)

In [None]:
# Pre-Generated Videos (text-to-video)
# This download the files from a public bucket and then upload them as though text-to-video generated them
"""
video_and_audio_processing = []

# If you want to use the pre-generate video this will use data from the public storage account
video_and_audio_processing.append({
    "video-filename" : "text-to-video-01.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-02.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-03.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-04.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-05.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-06.mp4"
})


# Download the videos to the notebook computer
for item in video_and_audio_processing:
  download_from_gcs(f"{directory_text_to_video_path}{item['video-filename']}", "data-analytics-golden-demo", f"chocolate-ai/v1/Campaign-Assets-Text-to-Video-01/story-02/{item['video-filename']}")

  # Simulate that text-to-video generated the videos (pretend that we have the files outputed from text-to-video on storage)
  copy_file_to_gcs(f"{directory_text_to_video_path}{item['video-filename']}", storage_account, f"chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/{item['video-filename']}")
"""

##### Story 3 (Chocolatier)

In [None]:
# Pre-Generated Videos (text-to-video)
# This download the files from a public bucket and then upload them as though text-to-video generated them

video_and_audio_processing = []

# If you want to use the pre-generate video this will use data from the public storage account
video_and_audio_processing.append({
    "video-filename" : "text-to-video-01.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-02.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-03.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-04.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-05.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-06.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-07.mp4"
})
video_and_audio_processing.append({
    "video-filename" : "text-to-video-08.mp4"
})

# Download the videos to the notebook computer
for item in video_and_audio_processing:
  download_from_gcs(f"{directory_text_to_video_path}{item['video-filename']}", "data-analytics-golden-demo", f"chocolate-ai/v1/Campaign-Assets-Text-to-Video-01/story-03/{item['video-filename']}")

  # Simulate that text-to-video generated the videos (pretend that we have the files outputed from text-to-video on storage)
  copy_file_to_gcs(f"{directory_text_to_video_path}{item['video-filename']}", storage_account, f"chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/{item['video-filename']}")


####  Create the Videos (using text-to-video)

In [None]:
# We need to swap our prompts to our pre-generated prompts (this is Story 1)
text_to_video_prompts = []
text_to_video_prompts.append({
    "text-to-video-prompt" : "Couple going down the streets and seeing a chocolate shop, like window shopping and seeing the display of chocolate and desserts."
    })

text_to_video_prompts.append({
    "text-to-video-prompt" : "Camera slowly moves forward, showcasing the exterior of a Parisian cafe.  The storefront is painted a light blue with gold trim. Red and white striped awnings protect small round tables, each adorned with a single red rose in a vase. The camera pans to the left to show two friends, a man and a woman, walking down the street. They stop at the storefront and look through the window at beautiful pastries on display. The camera zooms in on their faces as their eyes widen in amazement as they gaze upon rows of colorful macarons in pastel pink, green, and yellow, gleaming eclairs, and artfully crafted chocolate sculptures. Slow zoom, warm sunlight, ambiance of wonder, Paris street scene."
    })

text_to_video_prompts.append({
    "text-to-video-prompt" : "Camera close-up of a chocolatier chef delicately placing chocolate curls onto a \"Parisian Chocolate Symphony\" dessert to a wide shot of the picturesque Parisian street scene outside the window.  The Parisian chocolate shop scene is softly lit and warm. The camera pans to the window."
    })

text_to_video_prompts.append({
    "text-to-video-prompt" : "Medium range shot of three friends sitting at a bistro table, laughing together, Parisian cafe in the background, warm afternoon sunlight.  The camera slowly pushes in on the table as we hear the friends talking about the delicious premium handcrafted chocolates, handmade desserts, and coffee drinks in front of them. Close up of the Parisian Surprise, a playful dome of rich milk chocolate, hiding a secret center of marshmallow fluff and a sprinkle of sea salt.  Pull back to reveal the Doughnut Delight: Five fluffy, mini chocolate chip brioche donuts are served skewered and standing upright in a bed of creamy vanilla bean ice cream.  Topped with a drizzle of homemade salted caramel sauce and a sprinkle of cocoa powder. Camera slowly rotates around to the Rich Indulgence: Three layers of our darkest chocolate mousse, separated by thin layers of hazelnut dacquoise and topped with a delicate chocolate cage.  Finally, a slow zoom in on the Midnight in Provence: A whimsical dance of dark chocolate mousse infused with lavender, nestled atop a buttery almond biscuit. Experience the surprise of black currant jelly pockets and a sprinkle of delicate lavender buds. Warm, inviting, indulgent."
    })

# Same prompt a previous one
text_to_video_prompts.append({
    "text-to-video-prompt" : "Medium range shot of three friends sitting at a bistro table, laughing together, Parisian cafe in the background, warm afternoon sunlight.  The camera slowly pushes in on the table as we hear the friends talking about the delicious premium handcrafted chocolates, handmade desserts, and coffee drinks in front of them. Close up of the Parisian Surprise, a playful dome of rich milk chocolate, hiding a secret center of marshmallow fluff and a sprinkle of sea salt.  Pull back to reveal the Doughnut Delight: Five fluffy, mini chocolate chip brioche donuts are served skewered and standing upright in a bed of creamy vanilla bean ice cream.  Topped with a drizzle of homemade salted caramel sauce and a sprinkle of cocoa powder. Camera slowly rotates around to the Rich Indulgence: Three layers of our darkest chocolate mousse, separated by thin layers of hazelnut dacquoise and topped with a delicate chocolate cage.  Finally, a slow zoom in on the Midnight in Provence: A whimsical dance of dark chocolate mousse infused with lavender, nestled atop a buttery almond biscuit. Experience the surprise of black currant jelly pockets and a sprinkle of delicate lavender buds. Warm, inviting, indulgent."
    })

text_to_video_prompts.append({
    "text-to-video-prompt" : "Close-up shot of a pastry chef dusting  eclair with powdered sugar in a Parisian cafe, warm lighting, focus on the pastry."
    })

text_to_video_prompts.append({
    "text-to-video-prompt" : "Medium range shot of three friends sitting at a bistro table, laughing together, Parisian cafe in the background, warm afternoon sunlight.  The camera slowly pushes in on the table as we hear the friends talking about the delicious premium handcrafted chocolates, handmade desserts, and coffee drinks in front of them. Close up of the Parisian Surprise, a playful dome of rich milk chocolate, hiding a secret center of marshmallow fluff and a sprinkle of sea salt.  Pull back to reveal the Doughnut Delight: Five fluffy, mini chocolate chip brioche donuts are served skewered and standing upright in a bed of creamy vanilla bean ice cream.  Topped with a drizzle of homemade salted caramel sauce and a sprinkle of cocoa powder. Camera slowly rotates around to the Rich Indulgence: Three layers of our darkest chocolate mousse, separated by thin layers of hazelnut dacquoise and topped with a delicate chocolate cage.  Finally, a slow zoom in on the Midnight in Provence: A whimsical dance of dark chocolate mousse infused with lavender, nestled atop a buttery almond biscuit. Experience the surprise of black currant jelly pockets and a sprinkle of delicate lavender buds. Warm, inviting, indulgent."
    })

# if using Story 2, you need to remove a prompt (we are not actually using the prompt text)
#text_to_video_prompts.pop()

# if using Story 3, you need an additional prompt
text_to_video_prompts.append({"text-to-video-prompt" : "Text does not matter" })

i = 1
for item in text_to_video_prompts:
  filename = f"text-to-video-0{i}.mp4"
  text_to_video_output_path_for_generated_video = f"gs://{storage_account}/chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/{filename}"

  # No need to run this for the pre-generated videos
  # generateVideo(item["text-to-video-prompt"], storage_account, "chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/{filename}")

  # Download the generate video from GCS
  download_from_gcs(f"{directory_text_to_video_path}{filename}", storage_account, f"chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/{filename}")

  i = i + 1


In [None]:
# See the downloaded files
!ls {directory_text_to_video_path}

### <font color='#4285f4'>Merge the videos into one video</font>

In [None]:
# Merge the videos (the file names are when sorted match the placement of each video in the overall video)

print("Merging videos (without audio)")
print(f"directory_text_to_video_path: {directory_text_to_video_path}")
print(f"full_video_filename_no_audio: {full_video_filename_no_audio}")

merge_videos_sorted(directory_text_to_video_path, full_video_filename_no_audio)
copy_file_to_gcs(f"{full_video_filename_no_audio}", storage_account, f"chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/full-video-no-audio.mp4")

In [None]:
# View the directory of the merge text-to-video (no audio)
!ls {full_video_filename_no_audio}

In [None]:
# prompt: python to play a mp4 in a jupyter notebook
video_mp4 = open(full_video_filename_no_audio, 'rb').read()
video_url = "data:video/mp4;base64," + base64.b64encode(video_mp4).decode()

In [None]:
# 16:9 aspect ratio
HTML(f"""
<p>Combined text-to-video using moviepy.</p>
<video width=600 height=337 controls>
      <source src="{video_url}" type="video/mp4">
</video>
""")

### <font color='#4285f4'>Text to Speech Generation - Use Gemini to write a script (let it watch the video and come up with the voice over)</font>

In [None]:
fileUri = f"gs://{storage_account}/chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/full-video-no-audio.mp4"

# Write me the json in  OpenAPI 3.0 schema object for the below object.
# Make all fields required.
#{
#  "voiceover": [
#    {
#      "timestamp": "0:00-0:05",
#      "visuals": "Show storefront and pastries in the window",
#      "script": "Craving something delicious?  Something decadent?  At AI Chocolates, we hand-craft elegant chocolates, pastries, and desserts that are as delightful to look at as they are to taste."
#   }
# ]

response_schema = {
  "type": "object",
  "required": ["voiceover"],
  "properties": {
    "voiceover": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["timestamp", "visuals", "script"],
        "properties": {
          "timestamp": {
            "type": "string"
          },
          "visuals": {
            "type": "string"
          },
          "script": {
            "type": "string"
          },
          "explanation": {
            "type": "string"
          }
        }
      }
    }
  }
}

prompt = """You are a marketing expert, watch the attached video for a chocolate, dessert, coffee shop named Chocolate A.I. and create a voice over script.

Sample Json Output:
{
  "voiceover": [
    {
      "timestamp": "0:00-0:05",
      "visuals": "Show storefront and pastries in the window",
      "script": "Craving something delicious?  Something decadent?  At AI Chocolates, we hand-craft elegant chocolates, pastries, and desserts that are as delightful to look at as they are to taste.",
      "explanation": "Opens with enticing questions to capture attention and introduces the brand with a focus on craftsmanship and visual appeal."
    },
    {
      "timestamp": "0:06-0:11",
      "visuals": "Continue to show storefront and various pastries",
      "script": "Our light and airy shop, with its charming Parisian flair, offers the perfect setting to relax, indulge, and escape the everyday.",
      "explanation": "Highlights the shop's ambiance and atmosphere, emphasizing a sense of indulgence and escape."
    },
    {
      "timestamp": "0:12-0:17",
      "visuals": "Show the chef preparing a chocolate assortment and then friends enjoying treats",
      "script": "Using only the finest ingredients, our expert chocolatiers create a symphony of flavors that will tantalize your taste buds.",
      "explanation": "Emphasizes the quality of ingredients and the expertise of the chocolatiers, appealing to discerning taste buds."
    },
    {
      "timestamp": "0:18-0:29",
      "visuals": "Focus on the joy and connection happening over the food",
      "script": "Whether you're looking for a special treat for yourself or a loved one, AI Chocolates offers a range of options to satisfy every sweet tooth.  Come experience the magic of AI Chocolates.",
      "explanation": "Positions AI Chocolates as a destination for both personal indulgence and gifting, while emphasizing the variety and ability to satisfy any chocolate craving. Ends with a call to action."
    },
    {
      "timestamp": "Optional closing shot",
      "visuals": "Storefront, logo, or website address",
      "script": "AI Chocolates. Where happiness is always on the menu.",
      "explanation": "Reinforces the brand name and leaves a lasting impression by connecting AI Chocolates with happiness."
    }
  ]
}
"""

multimodal_prompt_list = [
    { "text": prompt },
    { "fileData": {  "mimeType": "video/mp4", "fileUri": fileUri } },
  ]

voice_llm_response = GeminiLLM_Multimodal(multimodal_prompt_list, response_schema=response_schema)
voice_llm_dict = json.loads(voice_llm_response)

print(PrettyPrintJson(voice_llm_response))

In [None]:
# Save the prompt so we know how we got this data
with open("full-text-to-speech.txt", "w") as f:
  f.write(PrettyPrintJson(voice_llm_response))

# upload it to GCS
copy_file_to_gcs("full-text-to-speech.txt", storage_account, f"chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/full-text-to-speech.txt")
delete_file("full-text-to-speech.txt")

In [None]:
# If you want the script from the orginally generated video, this is the text to speech
text = """
Craving something delicious? Something decadent? At Chocolate A.I., we hand-craft elegant chocolates, pastries, and desserts that are as delightful to look at as they are to taste.
Our light and airy shop, with its charming Parisian flair, offers the perfect setting to relax, indulge, and escape the everyday.
Using only the finest ingredients, our expert chocolatiers create a symphony of flavors that will tantalize your taste buds.
Whether you're looking for a special treat for yourself or a loved one,Chocolate A.I. offers a range of options to satisfy every sweet tooth.
Chocolate A.I. Where happiness is always on the menu.
"""

In [None]:
# British
language_code = "en-GB"
language_code_name = "en-GB-Neural2-B"
ssml_gender = "MALE"

text = ""
for item in voice_llm_dict["voiceover"]:
  text += item["script"] + "  "

text = text.strip()

# NOTE: You might need to change "Chocolate A.I. -> Chocolate A.I.. (put a second period at the end of a sentance)."
# Story 2
# text = "Indulge in the artistry of chocolate at Chocolate A.I.. ----  Each creation is a testament to our passion for crafting edible masterpieces.  From the finest cocoa beans to the most delicate decorations, we use only the highest quality ingredients.  Whether you're celebrating a special occasion or simply treating yourself, Chocolate A.I. offers a symphony of flavors to satisfy your sweetest cravings.  Come experience the magic of  Chocolate A.I., where every bite is a work of art.  Chocolate A.I.. - Indulge  your senses."
# Story 3
# text = "Indulge in the artistry of chocolate at Chocolate A.I., where each piece is handcrafted to perfection.  Our master chocolatiers use only the finest, ethically sourced cocoa beans and the freshest ingredients to create a symphony of flavors that will tantalize your taste buds.  From decadent truffles to exquisite bonbons and luxurious chocolate bars, each creation is a testament to our unwavering commitment to quality and craftsmanship.  More than just a chocolate shop, Chocolate A.I. is an experience. A place where you can escape the everyday and savor moments of pure indulgence.  Visit Chocolate A.I. and discover a world of chocolate beyond your wildest imagination.  Chocolate A.I.. - Where artistry meets indulgence."

audio_file_name = f"{directory_text_to_speech_path}full-text-to-speech-en-GB.mp3"

print (f"Text: {text}")
TextToSpeech(audio_file_name,text, language_code, language_code_name, ssml_gender, .90)

display(Audio(audio_file_name, autoplay=True,rate=16000))

In [None]:
!ls {directory_text_to_speech_path}

In [None]:
# upload it to GCS
copy_file_to_gcs(audio_file_name, storage_account, f"chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/full-text-to-speech-en-GB.mp3")

In [None]:
# French
language_code = "fr-FR"
language_code_name = "fr-FR-Neural2-B"
ssml_gender = "MALE"

text = ""
for item in voice_llm_dict["voiceover"]:
  text += item["script"] + "  "

text = text.strip()

# NOTE: You might need to change "Chocolate A.I. -> Chocolate A.I.. (put a second period at the end of a sentance)."
# Story 2
# text = "Indulge in the artistry of chocolate at Chocolate A.I.. ----  Each creation is a testament to our passion for crafting edible masterpieces.  From the finest cocoa beans to the most delicate decorations, we use only the highest quality ingredients.  Whether you're celebrating a special occasion or simply treating yourself, Chocolate A.I. offers a symphony of flavors to satisfy your sweetest cravings.  Come experience the magic of  Chocolate A.I., where every bite is a work of art.  Chocolate A.I.. - Indulge  your senses."
# Story 3
text = "Indulge in the artistry of chocolate at Chocolate A.I., where each piece is handcrafted to perfection.  Our master chocolatiers use only the finest, ethically sourced cocoa beans and the freshest ingredients to create a symphony of flavors that will tantalize your taste buds.  From decadent truffles to exquisite bonbons and luxurious chocolate bars, each creation is a testament to our unwavering commitment to quality and craftsmanship.  More than just a chocolate shop, Chocolate A.I. is an experience. A place where you can escape the everyday and savor moments of pure indulgence.  Visit Chocolate A.I. and discover a world of chocolate beyond your wildest imagination.  Chocolate A.I.. - Where artistry meets indulgence."

audio_file_name = f"{directory_text_to_speech_path}full-text-to-speech-fr-FR.mp3"

print (f"Text: {text}")
TextToSpeech(audio_file_name,text, language_code, language_code_name, ssml_gender, .95) # 1.01)

display(Audio(audio_file_name, autoplay=True,rate=16000))

In [None]:
!ls {directory_text_to_speech_path}

In [None]:
# upload it to GCS
copy_file_to_gcs(audio_file_name, storage_account, f"chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/full-text-to-speech-fr-FR.mp3")

### <font color='#4285f4'>Combine Video with Audio (text-to-video with text-to-audio)</font>

#### English

In [None]:
audio_file_name = f"{directory_text_to_speech_path}full-text-to-speech-en-GB.mp3"

print(f"full_video_filename_no_audio: {full_video_filename_no_audio}")
print(f"audio_file_name: {audio_file_name}")
print(f"full_video_filename_with_audio_en_GB: {full_video_filename_with_audio_en_GB}")

MergeVideoAndAudio(full_video_filename_no_audio, audio_file_name, full_video_filename_with_audio_en_GB)

In [None]:
! ls {directory_full_video_path}

In [None]:
# Show the first merged file
# prompt: python to play a mp4 in a jupyter notebook
video_mp4 = open(full_video_filename_with_audio_en_GB, 'rb').read()
video_url = "data:video/mp4;base64," + base64.b64encode(video_mp4).decode()

In [None]:
# Play the video
# 16:9 aspect ratio
HTML(f"""
<p>You might need to speed up or slow down the text-to-speech to match the overall video length.</p>
<video width=600 height=337 controls>
      <source src="{video_url}" type="video/mp4">
</video>
""")

In [None]:
# Up load the completed video to GCS
copy_file_to_gcs(full_video_filename_with_audio_en_GB, storage_account, f"chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/full-video-with-audio-en-GB.mp4")

#### French

In [None]:
audio_file_name = f"{directory_text_to_speech_path}full-text-to-speech-fr-FR.mp3"

print(f"full_video_filename_no_audio: {full_video_filename_no_audio}")
print(f"audio_file_name: {audio_file_name}")
print(f"full_video_filename_with_audio_efull_video_filename_with_audio_fr_FRn_GB: {full_video_filename_with_audio_fr_FR}")

MergeVideoAndAudio(full_video_filename_no_audio, audio_file_name, full_video_filename_with_audio_fr_FR)

In [None]:
! ls {directory_full_video_path}

In [None]:
# Show the first merged file
# prompt: python to play a mp4 in a jupyter notebook
video_mp4 = open(full_video_filename_with_audio_fr_FR, 'rb').read()
video_url = "data:video/mp4;base64," + base64.b64encode(video_mp4).decode()

In [None]:
# Play the video
# 16:9 aspect ratio
HTML(f"""
<p>You might need to speed up or slow down the text-to-speech to match the overall video length.</p>
<video width=600 height=337 controls>
      <source src="{video_url}" type="video/mp4">
</video>
""")

In [None]:
# Up load the completed video to GCS
copy_file_to_gcs(full_video_filename_with_audio_fr_FR, storage_account, f"chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}/full-video-with-audio-fr-FR.mp4")

### <font color='#4285f4'>View the Video in Cloud Storage</font>

In [None]:
print(f"View the GCS bucket: https://console.cloud.google.com/storage/browser/{storage_account}/chocolate-ai/Campaign-Assets-Text-to-Video-01/text-to-video-{formatted_date}")

### <font color='#4285f4'>Clean Up</font>

In [None]:
user_input = input("Do you want to the files on this notebook machine (Y/n)?")
if user_input == "Y":
  import shutil
  print(f"Removing directory: {directory_text_to_video_path}")
  shutil.rmtree(directory_text_to_video_path)

  print(f"Removing directory: {directory_text_to_speech_path}")
  shutil.rmtree(directory_text_to_speech_path)

  print(f"Removing directory: {directory_full_video_path}")
  shutil.rmtree(directory_full_video_path)

### <font color='#4285f4'>Reference Links</font>

- [Google.com](https://www.google.com)