<a href="https://colab.research.google.com/github/roberttwomey/machine-imagination-workshop/blob/main/generate_from_stored.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BigGAN + CLIP + CMA-ES: Interpolation

This notebook takes the results from the BigGAN+CLIP+CMA-ES search for visual representations of texts, and generates interpolations that smoothly transition between these samples. Given a series of noise/class vectors (each a point in "latent space"), we will generate intermediate steps between each point, and save the result as a video smoothly transitioning between these points with a pause on each phrase/image. 

This video interpolation process is much faster than the text-to-image translation process.

Please reach out with any thoughts, questions, or awesome results: [@roberttwomey](https://twitter.com/roberttwomey)

In [None]:
#@title 1. Setup software libraries (run once)
#@markdown This cell installs the software libraries necessary to run our 
#@markdown text-to-image code on this Colab instance: CUDA, torch, torchvision.

#@markdown Run this cell once - press the play button at top left.

#@markdown Afterwards, restart the kernel. Select __Runtime -> Restart runtime__
#@markdown from the top menu. Move on to Step 2 once you have restarted.

#@markdown (this takes around 4-5 minutes to run)

!pip install ipython-autotime
%load_ext autotime

# prints out what graphics card we have
!nvidia-smi -L

import subprocess

CUDA_version = [s for s in subprocess.check_output(["nvcc", "--version"]).decode("UTF-8").split(", ") if s.startswith("release")][0].split(" ")[-1]
print("CUDA version:", CUDA_version)

if CUDA_version == "10.0":
    torch_version_suffix = "+cu100"
elif CUDA_version == "10.1":
    torch_version_suffix = "+cu101"
elif CUDA_version == "10.2":
    torch_version_suffix = ""
else:
    torch_version_suffix = "+cu110"

!pip install torch==1.7.1{torch_version_suffix} torchvision==0.8.2{torch_version_suffix} -f https://download.pytorch.org/whl/torch_stable.html ftfy regex

In [None]:
#@title 2. Install ML Models
#@markdown Installs BigGAN — the image generator network. That is all we need
#@markdown to create our latent walks. Everything else is already in colab.

#@markdown (this takes around 1 minute to run)

# BigGAN
!pip install pytorch-pretrained-biggan

from IPython.display import HTML, clear_output
from PIL import Image
from IPython.display import Image as JupImage
import numpy as np
import nltk
from scipy.stats import truncnorm

# from biggan
import torch
from pytorch_pretrained_biggan import (BigGAN, one_hot_from_names, truncated_noise_sample,
                                       save_as_images, convert_to_images) #, display_in_terminal)
import logging
logging.basicConfig(level=logging.WARNING)

# do we need wordnet?
nltk.download('wordnet')

# load biggan
model = BigGAN.from_pretrained('biggan-deep-512')
print("loaded bigGAN")

In [None]:
#@title 3. Upload your stored class and noise vectors

#@markdown Click on "Choose Files" below, and select all of the _something_noise.txt_ 
#@markdown and _something_class.txt_ files from before. (For instance "sunrise 
#@markdown through a window_1_noise.txt", "sunrise through a window_1_class.txt")

#@markdown Upload as many pairs of files as you would like. We will generate 
#@markdown your interpolation ("latent walk") between these images in latent  
#@markdown space.

#@markdown You should see your uploaded files in current directory if you click
#@markdown on the folder icon at left. 
#@markdown If you have already uploaded files in this session, you can click "cancel upload".
from google.colab import files
uploaded = files.upload()

In [None]:
# set your prompts and order here (copy the text from above), but do not 
# include the "_class.txt" part or "_noise.txt" part. So just the stem of each
# phrase. The order of phrases here determines the order of images in the output.
# you can repeat images if you want.

prompts = [
    "a sunrise through a window_1",
    "a dog sitting on a couch_1", 
    "a cat in a refrigerator_255"
]

In [None]:
#@title 4. Generate a latent walk!

#@markdown This cell takes each of the images we generated before, and using
#@markdown their locations in latent space (given by the noise and class vectors), 
#@markdown interpolates between them to create a smoothly flowing traversal 
#@markdown ("walk") through the space of possible images. 

#@markdown Set the following parameters to shape your output movie:
#@markdown - fps is how many frames per second we want in the output video. 
#@markdown - num_steps is how many intermediate frames to generate between each 
#@markdown successive phrase/image
#@markdown - len_hold is how many frames to pause/"hold" on each resultant image.

# the movie
fps = 30 #@param {type: 'number'}

# the interpolation
num_steps = 90 #@param {type:'number'}
len_hold = 30 #@param {type: 'number'}

truncation = 1.0

interpbase = '/content/interpolation'
!mkdir -p $interpbase
moviefilename = 'interpolation_%s.mp4'

import numpy as np
from numpy import asarray
from numpy import vstack
from numpy.random import randn
from numpy.random import randint
from numpy import arccos
from numpy import clip
from numpy import dot
from numpy import sin
from numpy import linspace
from numpy.linalg import norm
import os
import glob

# from
# https://discuss.pytorch.org/t/help-regarding-slerp-function-for-generative-model-sampling/32475/4

# spherical linear interpolation (slerp)
def slerp(val, low, high):
    omega = arccos(clip(dot(low/norm(low), high/norm(high)), -1, 1))
    so = sin(omega)
    if so == 0:
        # L'Hopital's rule/LERP
        return (1.0-val) * low + val * high
    return sin((1.0-val)*omega) / so * low + sin(val*omega) / so * high
 
# uniform interpolation between two points in latent space
def interpolate_points(p1, p2, n_steps=10):
    # interpolate ratios between the points
    ratios = np.linspace(0, 1, num=n_steps)
    # linear interpolate vectors
    vectors = list()
    for ratio in ratios:
        v = slerp(ratio, p1, p2)
        vectors.append(v)
    return np.asarray(vectors)

def get_class_file(path, prompt):
    # print(path+'%s*_class.txt'%prompt)
    result = glob.glob(path+'%s*_class.txt'%prompt)
    return(result)

def get_noise_file(path, prompt):
    # print(path+'%s*_noise.txt'%prompt)    
    result = glob.glob(path+'%s*_noise.txt'%prompt)
    return(result)

class_filenames = [get_class_file('/content/', prompt)[0] for prompt in prompts]
noise_filenames = [get_noise_file('/content/', prompt)[0] for prompt in prompts]

# print(class_filenames, noise_filenames)

class_inputs = [np.loadtxt(filename) for filename in class_filenames]
noise_inputs = [np.loadtxt(filename) for filename in noise_filenames]

# print(class_inputs, noise_inputs)

count = 0

# loop over inputs and generate interpolations
for i in range(len(class_inputs)):

    # generate interpolations
    noises = interpolate_points(noise_inputs[i], 
                                noise_inputs[(i+1)%len(class_inputs)], 
                                num_steps)
    classes = interpolate_points(class_inputs[i], 
                                 class_inputs[(i+1)%len(class_inputs)], 
                                 num_steps)

    # generate images in batches
    batch_size = 10 # 50
    for j in range(0, num_steps, batch_size):

        # clear_output()
        print(i, j, count)
        noise_vector = noises[j:j+batch_size]
        class_vector = classes[j:j+batch_size]

        # convert to tensors
        noise_vector = torch.tensor(noise_vector, dtype=torch.float32)
        class_vector = torch.tensor(class_vector, dtype=torch.float32)

        # put everything on cuda (GPU)
        noise_vector = noise_vector.to('cuda')
        noise_vector = noise_vector.clamp(-2*truncation, 2*truncation)
        class_vector = class_vector.to('cuda')
        class_vector = class_vector.softmax(dim=-1)
        model.to('cuda')

        # generate images
        with torch.no_grad():
            #print(noise_vector.shape)
            #print(class_vector.shape)
            output = model(noise_vector, class_vector, truncation)

        # If you have a GPU put back on CPU
        output = output.to('cpu')

        imgs = convert_to_images(output)

        # repeat first image
        
        if j == 0:
            for k in range(len_hold):
                imgs[0].save(interpbase+"/output_%05d.png" % count)
                count = count + 1
                
        for img in imgs: 
            img.save(interpbase+"/output_%05d.png" % count)
            count = count + 1

# generate mp4
out = moviefilename%fps
with open('list.txt','w') as f:
  for i in range(count):
    # print('file %s/output_%05d.png\n'%(interpbase, i))
    f.write('file %s/output_%05d.png\n'%(interpbase, i))
!ffmpeg -r $fps -f concat -safe 0 -i list.txt -c:v libx264 -pix_fmt yuv420p -profile:v baseline -movflags +faststart -r $fps $out -y -loglevel error -stats
# !echo ffmpeg -r $fps -f concat -safe 0 -i list.txt -c:v libx264 -pix_fmt yuv420p -profile:v baseline -movflags +faststart -r $fps $out -y
        

# display movie in notebook
with open(moviefilename%fps, 'rb') as f:
  data_url = "data:video/mp4;base64," + b64encode(f.read()).decode()
display(HTML("""
  <video controls autoplay loop>
        <source src="%s" type="video/mp4">
  </video>""" % data_url))

# play "ding" and download movie
from google.colab import files, output
output.eval_js('new Audio("https://freesound.org/data/previews/80/80921_1022651-lq.ogg").play()')
files.download(moviefilename%fps)

# Explanation

To learn more about BigGAN, latent space, and interpolations between noise/class vectors, see [this colab notebook](https://colab.research.google.com/github/roberttwomey/machine-imagination-workshop/blob/main/BigGAN_handson.ipynb).

It walks through examples of sampling from different categories (`"933) cheeseburger"` for example), and lets you play with the role the noise and truncation play in generation. 

It also shows you how to interpolate between two samples with a fixed number of steps. This is what we are doing here.

# References
- pytorch pretrained BigGAN from huggingface https://github.com/huggingface/pytorch-pretrained-BigGAN
- Hands-on with BigGAN from Google (2018): [colab notebook](https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/biggan_generation_with_tf_hub.ipynb)