# Text2Video Huggingface models

I've tested `damo-vilab/text-to-video-ms-1.7b` - it runs on 6GB GPU RAM in fp16.

### Dependencies


The following packages were used in `pixi.toml` (ipykernel from this venv needs to be created)

```toml
[project]
name = "text-to-video"
version = "0.1.0"
description = "Add a short description here"
channels = ["conda-forge", "nvidia", "pytorch"]
platforms = ["linux-64"]

[tasks]

[dependencies]
python = "3.10"
transformers = "4.33.3.*"
pytorch-gpu = "1.11.0.*"
ipykernel = "6.25.2.*"
diffusers = "0.18.2.*"
accelerate = "0.23.0.*"
opencv = "4.6.0.*"
pip = "23.2.1.*"
gradio = "3.23.0.*"
imageio = "2.31.1.*"
```


In [1]:
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video
from IPython.display import Video
import imageio
from pathlib import Path
import skvideo.io

In [2]:
VIDEOS_PATH = str(Path("~") / "Videos/text2video/")

In [3]:
from streamlit_jupyter import StreamlitPatcher, tqdm

In [4]:
pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

In [5]:
prompt = "Wizard hat with inscribed math equations rotating slowly"
video_frames = pipe(prompt, num_inference_steps=25, num_frames=50).frames

  0%|          | 0/25 [00:00<?, ?it/s]

In [6]:
import numpy as np
from IPython.display import Video, Image

In [7]:
path = "/home/kuba/Videos/text2video/output"

In [8]:
#StreamlitPatcher().jupyter()  # register streamlit with jupyter-compatible wrappers

In [9]:
for (i, frame) in enumerate(video_frames):
    frame_path = Path(path) / f"{i}.png"
    imageio.imwrite(frame_path, frame)

In [10]:
output_video_path = str(Path(VIDEOS_PATH) / "out.mp4")

In [11]:
!ffmpeg -framerate 5 -pattern_type glob -i $VIDEOS_PATH"/output/*.png" -c:v libx264 -pix_fmt yuv420p $output_video_path -y

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab