<center><h1>Group 27 - Part of Codebase</h1></center>

<strong>Important Note: The following code was written as part of an academic project.</strong>

## Text-to-Video Pipeline for Story Generation

The following notebook contains code for generating a visual story file from a story prompt. Specifics of the model used and how it works is mentioned in the final report.

<br>
Note: The model used here requires significant compute. Please run the following code cells only on a GPU powered system or on resources like Google Colab, Kaggle, etc.

#### 1. Installing the dependencies

In [None]:
! pip install diffusers transformers accelerate torch
! pip install sentence-transformers

Collecting diffusers
  Downloading diffusers-0.23.1-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate
  Downloading accelerate-0.24.1-py3-none-any.whl (261 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: diffusers, accelerate
Successfully installed accelerate-0.24.1 diffusers-0.23.1
Collecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sentencepiece (from sentence-transformers)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32

#### 2. Importing the libraries

In [None]:
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video
import os
from moviepy.editor import VideoFileClip, concatenate_videoclips
import nltk
nltk.download('punkt')
from sentence_transformers import SentenceTransformer
from sklearn.cluster import KMeans
from nltk.tokenize import sent_tokenize
from IPython.display import display

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


#### 3. Preprocessing - Semantic Segmentation

In [None]:
def semantic_segmentation(text):
    sentences = sent_tokenize(text)
    num_clusters = len(sentences)
    model = SentenceTransformer("paraphrase-distilroberta-base-v1")
    sentence_embeddings = model.encode(sentences)

    # K-means clustering on the embeddings
    kmeans = KMeans(n_clusters=num_clusters)
    kmeans.fit(sentence_embeddings)

    clusters = {}
    for i, label in enumerate(kmeans.labels_):
        if label not in clusters:
            clusters[label] = []
        clusters[label].append(sentences[i])

    segments = [" ".join(cluster) for cluster in clusters.values()]
    return segments

# Story Prompt (Input)
story_prompt = 'In a small town nestled between rolling hills, lived a young girl named Lily. She had a curious pet named Sparky. One day, while exploring the woods behind her house, Lily stumbled upon a mysterious old map. The map seemed to lead to a hidden treasure. With Sparky by her side, Lily embarked on an adventure, facing challenges and overcoming obstacles along the way, until finally reaching a hidden cave. There, the long-lost treasure awaited their discovery. As they uncovered the treasure, a surprise awaited them—it held a secret that would change Lily\'s life forever.'

text_segments = semantic_segmentation(story_prompt)
text_segments

.gitattributes:   0%|          | 0.00/391 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.74k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/329M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.35k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]




['In a small town nestled between rolling hills, lived a young girl named Lily.',
 'She had a curious pet named Sparky.',
 'One day, while exploring the woods behind her house, Lily stumbled upon a mysterious old map.',
 'The map seemed to lead to a hidden treasure.',
 'With Sparky by her side, Lily embarked on an adventure, facing challenges and overcoming obstacles along the way, until finally reaching a hidden cave.',
 'There, the long-lost treasure awaited their discovery.',
 "As they uncovered the treasure, a surprise awaited them—it held a secret that would change Lily's life forever."]

#### 4. Setting up the pipeline for Video Generation

In [None]:
pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

model_index.json:   0%|          | 0.00/384 [00:00<?, ?B/s]

Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

text_encoder/config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

tokenizer/tokenizer_config.json:   0%|          | 0.00/755 [00:00<?, ?B/s]

scheduler/scheduler_config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

tokenizer/special_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/787 [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/657 [00:00<?, ?B/s]

model.fp16.safetensors:   0%|          | 0.00/681M [00:00<?, ?B/s]

diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/2.82G [00:00<?, ?B/s]

diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]

#### 5. Generating videos from text segments

In [None]:
for index, segment in enumerate(text_segments):
  prompt = segment
  video_frames = pipe(prompt, num_inference_steps=25).frames
  video_path = export_to_video(video_frames)
  new_file_name = f'{index}.mp4'
  directory = os.path.dirname(video_path)
  new_video_path = os.path.join(directory, new_file_name)
  os.rename(video_path, new_video_path)

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

#### 6. Collating the generated videos into a single video file

In [None]:
video_directory = '/tmp/'
output_path = '/tmp/combined_videos.mp4'
video_clips = []
video_files = [file for file in os.listdir(video_directory) if file.endswith(".mp4")]
video_files.sort(key=lambda x: int(x.split('.')[0]))
for video_file in video_files:
    video_path = os.path.join(video_directory, video_file)
    clip = VideoFileClip(video_path)
    video_clips.append(clip)
final_clip = concatenate_videoclips(video_clips)
final_clip.write_videofile(output_path, codec="libx264", fps=24)

Moviepy - Building video /tmp/combined_videos.mp4.
Moviepy - Writing video /tmp/combined_videos.mp4





Moviepy - Done !
Moviepy - video ready /tmp/combined_videos.mp4


#### 7. Displaying the final output video

In [None]:
# Generated Video (Output)
video_path = '/tmp/combined_videos.mp4'
clip = VideoFileClip(video_path)
clip.ipython_display(loop=True, autoplay=True)

Moviepy - Building video __temp__.mp4.
Moviepy - Writing video __temp__.mp4



                                                               

Moviepy - Done !
Moviepy - video ready __temp__.mp4




#### 8. Deleting the old video for a new run

<strong>Note: Only use this cell when you wish to generate a video for a new story</strong>

In [None]:
# Deleting old video files for a new run
video_directory = '/tmp/'
files = os.listdir(video_directory)
for file_name in files:
    if file_name.endswith('.mp4'):
        file_path = os.path.join(video_directory, file_name)
        os.remove(file_path)
        print(f"Deleted: {file_path}")

Deleted: /tmp/4.mp4
Deleted: /tmp/5.mp4
Deleted: /tmp/3.mp4
Deleted: /tmp/combined_videos.mp4
Deleted: /tmp/0.mp4
Deleted: /tmp/2.mp4
Deleted: /tmp/1.mp4
Deleted: /tmp/6.mp4
