# Video Generation using fine-tuned StyleGAN3 model

This notebook contains code adapted from Class 7, titled *02_StyleGAN_inference.ipynb*, with help from Copilot.

## 00. Setup

First lets clone StyleGAN3

In [None]:
!git clone https://github.com/NVlabs/stylegan3.git

And install additional libraries

In [None]:
!pip install ninja

Now I've already pretrained two models, one for the title sections, and one for the rest of the film. I'll link to them here:

[main one]()

[titles one]()

## Imports

We have to add the StyleGAN repository to path temporarily in order to import some functions from torch_utils.

I had to use my full path and insert it at the beginning of my path to make it work. (thanks to Copilot)

```base_dir = r'C:\Users\iplow\Documents\code\ExploringMachineIntelligence_Spring2024\class-7\stylegan3'```

In [2]:
import sys
import os


base_dir = "./stylegan3"
if base_dir not in sys.path:
    sys.path.insert(0, base_dir)

In [3]:
import torch
import numpy as np

from torch_utils import misc
from torch_utils.ops import upfirdn2d

from torchvision.transforms import ToTensor
from PIL import Image

from torchvision.transforms.functional import to_pil_image
from torch.nn.functional import interpolate
from torchvision.utils import make_grid
from IPython.display import display, HTML

import legacy
import dnnlib

In [4]:
device = "cpu"

if torch.cuda.is_available():
    device = "cuda"

elif torch.backends.mps.is_available():
    device = "mps"

print(f'torch version {torch.__version__}')
print(f'Using device: {device}')

torch version 2.3.1+cu121
Using device: cuda


## 01. Loading the models

In [5]:
network_pkl_main = r'C:\Users\iplow\Documents\code\coding-3-submission\conor-output\00007-stylegan3-t--gpus1-batch32-gamma32\network-snapshot-000020.pkl'
network_pkl_titles = r'C:\Users\iplow\Documents\code\coding-3-submission\conor-output-titles\00003-stylegan3-t--gpus1-batch32-gamma32\network-snapshot-000080.pkl'

with dnnlib.util.open_url(network_pkl_main) as f:
    model = legacy.load_network_pkl(f)
    g_model_1 = model['G'].eval().requires_grad_(False).to(device)

with dnnlib.util.open_url(network_pkl_titles) as f:
    model = legacy.load_network_pkl(f)
    g_model_2 = model['G'].eval().requires_grad_(False).to(device)

## 02. Projecting into latent space and generating frames

There are a couple more imports needed for the projection and to interpolate between projected frames. 

Unfortunately there's not enough time to project into the latent space for every frame of the video, so we have to fill in the blanks with interpolation.

In [6]:
from stylegan3.dnnlib.util import open_url
from utils import image_path_to_tensor
from utils import run_projector

from utils import slerp
from base64 import b64encode
import torchvision.transforms as transforms

# these are my added functions as I have a few more requirements
from utils import image_directory_to_tensors
from utils import get_ws_emas_for_scene

#### Fetch a feature extractor

In order to move closer to the target style vector, we'll be using a pre-trained feature extractor to tell us how close we are. We'll use [VGG16](https://www.geeksforgeeks.org/vgg-16-cnn-model/) for this.

In [7]:
url = 'https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/vgg16.pt'
with open_url(url) as f:
    vgg16 = torch.jit.load(f).eval().to(device)
print('Using device:', device, file=sys.stderr)

Using device: cuda


## Preparing the images for interpolation

I put all my video files, separated by cuts, into a folder. Then I got Copilot to generate a script which split each image into frames, at 1frame per second.

This script is in the repository: [extract_frames.py](./extract_frames.py)
 
The script taker this structure of files:

![A picture of file structure containing a base directory titled "original videos" and two subdirectories titled respective to the AI model we want to use for those images.](journal_img\arranged_videos.png)

Where the subdirectories are named after the key for the model we want to use for those images.

Now we're going to need the path to all of those images in code, and to create tensors from them. We're keeping the file structure from the

In [22]:
target_images_base_path = r"extracted_frames/"

# Initialize an empty list to store lists of tensors from each subdirectory
all_subdir_tensors_and_models = image_directory_to_tensors(target_images_base_path, g_model_1, g_model_2, device)

['extracted_frames3/TITLES\\frame00000.png', 'extracted_frames3/TITLES\\frame00001.png', 'extracted_frames3/TITLES\\frame00002.png']


##
We're going to make a directory which will contain all of our generated frames for each cut. This way we can save progress after each generation.

In [11]:
# create a directory to save the model checkpoints
if not os.path.exists('generated_frames'):
    os.makedirs('generated_frames')

## Projecting into the $w$ space

We are now going to take our image, and project it into the $w$ space of StyleGAN. This process will start with a random vector, and make changes to the latent vector and noise input, until it converges on on the closest matching image in StyleGAN space to our input image. This is quite a long process, however if you want to shorten it you can change the the `step` variable to a smaller number if you want to reduce the amount of steps taken to find the closest match.

I have modified the code to work with our file structure/ setup per cut in the film.

This takes a long time and is really the main part of the project. IT took me 8 hours or so! Good luck!

In [24]:
steps = 500
num_interp = 25

for i, (scene, model_identifier) in enumerate(all_subdir_tensors_and_models):
    # Decide which model to use based on the model_identifier
    if model_identifier == 'MAIN':
        g_model = g_model_1
    else:
        g_model = g_model_2 

    this_scene_ws_emas = []
    for j, frame_tensor in enumerate(scene):
        ws_ema = run_projector(projection_target=frame_tensor,
                               g_model=g_model, 
                               steps=steps,
                               perceptual_model=vgg16, 
                               device=device, 
                               save_path=None)
        this_scene_ws_emas.append(ws_ema)
        print(f'Scene {i}/{len(all_subdir_tensors_and_models)}, Frame {j+1}/{len(scene)} complete')

    interp_vals = np.linspace(1./num_interp, 1, num=num_interp)
    this_scene_latent_interps = []

    for j in range(len(this_scene_ws_emas) - 1):
        latent_a_np = this_scene_ws_emas[j].cpu().detach().numpy().squeeze()
        latent_b_np = this_scene_ws_emas[j+1].cpu().detach().numpy().squeeze()
        latent_interp = np.array([slerp(v, latent_a_np, latent_b_np) for v in interp_vals], dtype=np.float32)
        this_scene_latent_interps.append(latent_interp)

    image_folder_name = f"generated_frames/{model_identifier}/scene_038"
    if not os.path.exists(image_folder_name):
        os.makedirs(image_folder_name)

    start_index = 0

    for k, latent_interp in enumerate(this_scene_latent_interps):
        for j, step in enumerate(latent_interp):
            step = torch.tensor(step).unsqueeze(0).to(device)
            image_tensor = g_model.synthesis(step, noise_mode='const')
            image = transforms.functional.to_pil_image(image_tensor.clamp(-1, 1).add(1).div(2).cpu().squeeze(0))
            # Calculate the image name index based on start_index and the current loop iteration
            image_name_index = start_index + k * len(latent_interp) + j
            # Save the image with the calculated name index
            image.save(f'./{image_folder_name}/{image_name_index:04}.jpg')            

image 0/500 | loss: 0.4338529109954834
image 10/500 | loss: 0.1981426179409027
image 20/500 | loss: 0.19325152039527893
image 30/500 | loss: 0.17646589875221252
image 40/500 | loss: 0.16870340704917908
image 50/500 | loss: 0.17523540556430817
image 60/500 | loss: 0.17212903499603271
image 70/500 | loss: 0.16980378329753876
image 80/500 | loss: 0.18504494428634644
image 90/500 | loss: 0.17219758033752441
image 100/500 | loss: 0.17117904126644135
image 110/500 | loss: 0.17082010209560394
image 120/500 | loss: 0.1705816239118576
image 130/500 | loss: 0.17958956956863403
image 140/500 | loss: 0.1705007702112198
image 150/500 | loss: 0.16997091472148895
image 160/500 | loss: 0.16975969076156616
image 170/500 | loss: 0.1699252426624298
image 180/500 | loss: 0.16972672939300537
image 190/500 | loss: 0.20126058161258698
image 200/500 | loss: 0.1720017045736313
image 210/500 | loss: 0.17230188846588135
image 220/500 | loss: 0.1694135218858719
image 230/500 | loss: 0.1692873239517212
image 240/5