<a href="https://colab.research.google.com/github/roberttwomey/ml-art-code/blob/master/biggan/BigGAN_handson.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BigGAN Hands-On: Generating Images with BigGAN

<!-- __NOTE for OOD users__: select the `tf-gpu-cyclegan` kernel. -->

This notebook walks you through some of the basics of generating images with BigGAN. You can read more about BigGAN in [the paper on arXiv](https://arxiv.org/abs/1809.11096) [1]. It was adapted for ML for the Arts by rtwomey@unl.edu from [an example](https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/biggan_generation_with_tf_hub.ipynb) and [pytorch-pretrained-BigGAN](https://github.com/huggingface/pytorch-pretrained-BigGAN).

We will move through this notebook and run each cell by pressing the **Play** button to the left of the cell.

## Activities:

1. BigGAN set up
2. Generate images
3. Explore latent vector math
4. Generate interpolations
5. Discussions

## Select a Model (optional)

By default, this notebook will use the 256 x 256 pixel BigGAN-deep model to generate images (https://tfhub.dev/deepmind/biggan-deep-256/). To get started, I'd suggest leaving this unchanged.

To generate 128x128 or 512x512 images or to use the original BigGAN generators, comment out the active **`module_path`** below and uncomment one of the others.

In [None]:
# BigGAN-deep models
module_path = 'biggan-deep-128'
# module_path = 'biggan-deep-256'
# module_path = 'biggan-deep-512'


# module_path = 'https://tfhub.dev/deepmind/biggan-deep-128/1'  # 128x128 BigGAN-deep
# module_path = "https://tfhub.dev/deepmind/biggan-deep-256/1"  # 256x256 BigGAN-deep
# module_path = 'https://tfhub.dev/deepmind/biggan-deep-512/1'  # 512x512 BigGAN-deep

# BigGAN (original) models
# module_path = 'https://tfhub.dev/deepmind/biggan-128/2'  # 128x128 BigGAN
# module_path = 'https://tfhub.dev/deepmind/biggan-256/2'  # 256x256 BigGAN
# module_path = 'https://tfhub.dev/deepmind/biggan-512/2'  # 512x512 BigGAN

# 0. Setup

## Install pytorch-pretrained-BIGGAN
We will use this pretrained BigGAN implemented in pytorch: [pytorch-pretrained-BigGAN](https://github.com/huggingface/pytorch-pretrained-BigGAN)

In [None]:
!pip install pytorch-pretrained-biggan

Install other libraries

In [None]:
import IPython.display
import numpy as np
import io
import PIL.Image

import nltk
nltk.download('wordnet')

## Define some functions for sampling and displaying BigGAN images

In [None]:
def imgrid(imarray, cols=5, pad=1):
  if imarray.dtype != np.uint8:
    raise ValueError('imgrid input imarray must be uint8')
  pad = int(pad)
  assert pad >= 0
  cols = int(cols)
  assert cols >= 1
  N, H, W, C = imarray.shape
  rows = N // cols + int(N % cols != 0)
  batch_pad = rows * cols - N
  assert batch_pad >= 0
  post_pad = [batch_pad, pad, pad, 0]
  pad_arg = [[0, p] for p in post_pad]
  imarray = np.pad(imarray, pad_arg, 'constant', constant_values=255)
  H += pad
  W += pad
  grid = (imarray
          .reshape(rows, cols, H, W, C)
          .transpose(0, 2, 1, 3, 4)
          .reshape(rows*H, cols*W, C))
  if pad:
    grid = grid[:-pad, :-pad]
  return grid

def imshow(a, format='png', jpeg_fallback=True):
  a = np.asarray(a, dtype=np.uint8)
  data = io.BytesIO()
  PIL.Image.fromarray(a).save(data, format)
  im_data = data.getvalue()
  try:
    disp = IPython.display.display(IPython.display.Image(im_data))
  except IOError:
    if jpeg_fallback and format != 'jpeg':
      print(('Warning: image was too large to display in format "{}"; '
             'trying jpeg instead.').format(format))
      return imshow(a, format='jpeg')
    else:
      raise
  return disp

def sample(model, z, y, truncation):
  # convert to torch
  noise_vector = torch.from_numpy(z)
  class_vector = torch.from_numpy(y)

  # If you have a GPU, put everything on cuda
  noise_vector = noise_vector.to('cuda')
  class_vector = class_vector.to('cuda')
  model.to('cuda')

  # run the generator
  with torch.no_grad():
    output = model(noise_vector, class_vector, truncation)

  # If you have a GPU put back on CPU
  output = output.to('cpu')

  return output

## Load a pretrained BigGAN generator

(takes about 1 minute to run)

In [None]:
import torch
from pytorch_pretrained_biggan import (BigGAN, one_hot_from_names, one_hot_from_int,
                                       truncated_noise_sample, save_as_images,
                                       display_in_terminal, convert_to_images)

# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
import logging
logging.basicConfig(level=logging.INFO)

# Load pre-trained model
model = BigGAN.from_pretrained(module_path)

# 1. Explore BigGAN samples of a particular category

Let's generate an image with BigGAN!

We are going to specify two things:
- a `y` vector (__class vector__ 1 x 1000 long). This sets which _kind_ of image BigGAN will generate.
  - by default `y` is a "one hot" vector, which means means it is all zero, except for a `1` for the type of object it is.
- a `z` vector (__noise vector__ 1 x 128 long). this sets _which_ particular instance it generates.

We can change `y`, `z`, and a few other variables to change the output.

Seed our randoms

In [None]:
noise_seed = 0

## Generate images for your category

In [None]:
# prepare an input to biggan
truncation = 0.5
num_samples = 10

class_vector = one_hot_from_names(['cheeseburger'], batch_size=num_samples)
# class_vector = one_hot_from_int([933], batch_size=num_samples)
noise_vector = truncated_noise_sample(truncation=truncation, batch_size=num_samples, seed=noise_seed)

# convert to torch
noise_vector = torch.from_numpy(noise_vector)
class_vector = torch.from_numpy(class_vector)

# If you have a GPU, put everything on cuda
noise_vector = noise_vector.to('cuda')
class_vector = class_vector.to('cuda')
model.to('cuda')

# run the generator
with torch.no_grad():
  output = model(noise_vector, class_vector, truncation)

# If you have a GPU put back on CPU
output = output.to('cpu')

view a result:

In [None]:
results = convert_to_images(output)
results[2]

show as a grid using our helper functions

In [None]:
imshow(imgrid(np.array(results)))

Optional: save outputs as pngs (uncomment the following)

In [None]:
# save_as_images(output)

## Try another category

Ok. Now try some other categories.

You can see the full list of categories here in this list: [imagenet1000 class ids](https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt).

Just type in the number (f.ex. `933` for cheeseburger) for the variable `category` below.

In [None]:
# set parameters
num_samples = 1  # between 1 and 20
noise_seed = 0  # using the same number will give you the same results, repeatedly random!
truncation = 0.5  # between 0.02 and 1

# set category
category = 800 # slot machine

Create our input vectors and generate our image. We are going to use our
`sample()` function to make it easier to generate images from the BigGAN model.

In [None]:
class_vector = one_hot_from_int([category], batch_size=num_samples)
noise_vector = truncated_noise_sample(truncation=truncation, batch_size=num_samples, seed=noise_seed)

# generate the output from the model
output = sample(model, noise_vector, class_vector, truncation)

In [None]:
results = convert_to_images(output)
if (num_samples > 1):
  imshow(imgrid(np.array(results)))
else:
  imshow(results[0])

In [None]:
print(noise_vector) # lets see what our noise vector was

In [None]:
print(class_vector) # lets see what our class vector was. See how it is all zero except for one `1`

notice that `class_vector` array is all equal to `0` except for one `1`. that is what makes it a **one hot** array.

## Activities:
- try exploring different __`noise_seeds`__. When you put in a different number for this, how does the result change?
- try changing the **`truncation`** value (between 0.02 and 1.0), how does this change the result?
- change the `category`. You can see the full list of categories here in this list: [imagenet1000 class ids](https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt). Just type in the number (f.ex. `933` for cheeseburger)
  - (these are 1000 categories from the larger ImageNet dataset which BigGAN was trained on)
  - [Nice youtube video](https://youtu.be/YY6LrQSxIbc) of the 1000 classes from BigGAN
- you can generate a set of outputs. f.ex. increase `num_samples` to `10`.

# 2. Class Arithmetic (combinations of 2 or more categories)

If we have one vector for object 1, and one vector for object 2, we can make combinations of objects.

In [None]:
# sampling parameters
truncation = 0.2

# class and noise vectors
noise_seed_A = 0
category_A = 207 # golden retriever
# category_A = 850 # teddy bear

noise_seed_B = 0
category_B = 8 # hen
# category_B = 872 # tripod

# noise and class for image 1 (golden retriever)
z_A = truncated_noise_sample(truncation=truncation, batch_size=1, seed=noise_seed_A)
y_A = one_hot_from_int([category_A], batch_size=1)

# noise and class for image 2 (hen)
z_B = truncated_noise_sample(truncation=truncation, batch_size=1, seed=noise_seed_A)
y_B = one_hot_from_int([category_B], batch_size=1)

# make a combination
percent_A = 0.7 # how much of dog
percent_B = 1.0 - percent_A  # how much of hen

z_combo = percent_A * z_A + percent_B * z_B
y_combo = percent_A * y_A + percent_B * y_B

# Generate the images
output = sample(model, z_combo, y_combo, truncation)

# show the result
results = convert_to_images(output)
results[0]

## Activities
- try changing the `percent_A` to make the hybrid more dog, or more hen
- try your own two classes from the list above
- try three (!).
  - (you can do it: it's just algebra)

# 3. Interpolate between BigGAN samples

Now we are going calculate multiple intermediate samples between two different images.

First lets make helper functions to interpolate between vectors.

In [None]:
def interpolate(A, B, num_interps):
  if A.shape != B.shape:
    raise ValueError('A and B must have the same shape to interpolate.')
  alphas = np.linspace(0, 1, num_interps)
  return np.array([(1-a)*A + a*B for a in alphas])

def interpolate_and_shape(A, B, num_samples, num_interps):
  interps = interpolate(A, B, num_interps)
  return (interps.transpose(1, 0, *range(2, len(interps.shape)))
                 .reshape(num_samples * num_interps, -1))

Try two different classes (`category_A`, and `category_B`). We are going to use the interpolate functions to generate intermediate steps between those things.

(Try 48 for komodo dragon, and 8 for hen)

In [None]:
# sampling parameters
num_samples = 2
num_interps = 10
truncation = 0.25

# class and noise vectors
noise_seed_A = 0
category_A = 48 # komodo dragon
# category_A = 850 # teddy bear

noise_seed_B = 0
category_B = 8 # hen
# category_B = 872 # tripod

# noise vectors
z_A, z_B = [
    truncated_noise_sample(truncation=truncation, batch_size=num_samples, seed=noise_seed)
    for noise_seed in [noise_seed_A, noise_seed_B]
]

# class vectors
y_A, y_B = [
    one_hot_from_int([category] , batch_size=num_samples)
    for category in [category_A, category_B]
]

z_interp = interpolate_and_shape(z_A, z_B, num_samples, num_interps)
y_interp = interpolate_and_shape(y_A, y_B, num_samples, num_interps)

output = sample(model, z_interp, y_interp, truncation=truncation)
ims = convert_to_images(output)
imshow(imgrid(np.array(ims), cols=num_interps))

In [None]:
imshow(ims[2]) # show one hybrid in-between state

In [None]:
# save one image to disk
save_as_images(output[2].unsqueeze(0), "my_hybrid.jpg")

# save all the images
# save_as_images(output, "hybrids.jpg")

## Activities

- Try varying the `truncation`
- Try varying the `noise_seed_A` and `noise_seed_B`
- Pick your own categories for `category_A` and `category_B`.
  - What is an ideal hybrid image to make?
  - reminder: you can see the full list of categories here in this list: [imagenet1000 class ids](https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt). Just type in the number (f.ex. `933` for cheeseburger)
- Increase `num_interps` to increase the number of intermediate steps between the two classes.
- Try to combine 3+ classes with arithmetic.

# 3a. Interpolation with Video

helper functions for video interpolation

In [None]:
def interpolate_and_shape(A, B, num_samples, num_interps):
  interps = interpolate(A, B, num_interps)
  return (interps.transpose(1, 0, *range(2, len(interps.shape)))
                 .reshape(num_samples * num_interps, -1))

def get_interpolated_yz(categories_all, num_interps, noise_seed_A, noise_seed_B, truncation):
  nt = len(categories_all)
  num_samples = 1
  z_A, z_B = [truncated_noise_sample(truncation=truncation, batch_size=num_samples, seed=noise_seed)
              for noise_seed in [noise_seed_A, noise_seed_B]]

  y_interps = []
  for i in range(nt):
    category_A, category_B = categories_all[i], categories_all[(i+1)%nt]
    y_A, y_B = [one_hot_from_int([category], batch_size=num_samples) for category in [category_A, category_B]]


    y_interp = interpolate_and_shape(np.array(y_A), np.array(y_B), num_samples, num_interps)
    y_interps.append(y_interp)

  y_interp = np.vstack(y_interps)
  z_interp = interpolate_and_shape(z_A, z_B, num_samples, num_interps * nt)

  return y_interp, z_interp

def get_transition_yz(classes, num_interps, truncation):
  noise_seed_A, noise_seed_B = 10, 20   # fix this!
  return get_interpolated_yz(classes, num_interps, noise_seed_A, noise_seed_B, truncation=truncation)

def get_random_yz(num_classes, num_interps, truncation):
  random_classes = [ int(1000*random()) for i in range(num_classes) ]
  return get_transition_yz(random_classes, num_interps, truncation=truncation)

def get_combination_yz(categories, noise_seed, truncation):
  z = np.vstack([truncated_z_sample(1, truncation, noise_seed)] * (len(categories)+1))
  y = np.zeros((len(categories)+1, 1000))
  for i, c in enumerate(categories):
    y[i, c] = 1.0
    y[len(categories), c] = 1.0
  return y, z

def slerp(A, B, num_interps):  # see https://en.wikipedia.org/wiki/Slerp
  alphas = np.linspace(-1.5, 2.5, num_interps) # each unit step tends to be a 90 degree rotation in high-D space, so this is ~360 degrees
  omega = np.zeros((A.shape[0],1))
  for i in range(A.shape[0]):
      tmp = np.dot(A[i],B[i])/(np.linalg.norm(A[i])*np.linalg.norm(B[i]))
      omega[i] = np.arccos(np.clip(tmp,0.0,1.0))+1e-9
  return np.array([(np.sin((1-a)*omega)/np.sin(omega))*A + (np.sin(a*omega)/np.sin(omega))*B for a in alphas])

def slerp_and_shape(A, B, num_interps):
  interps = slerp(A, B, num_interps)
  return (interps.transpose(1, 0, *range(2, len(interps.shape)))
                 .reshape(num_interps, *interps.shape[2:]))

Video helpers


In [None]:
def save_images(imgs, filepath):
  for i, img in enumerate(imgs):
    outfile = os.path.join(filepath,"output_%05d.jpg"%i)
    PIL.Image.fromarray(img).save(outfile)

def save_images_with_hold(imgs, filepath, num_interps, len_hold):
  count = 0
  for i, img in enumerate(imgs):
    outfile = os.path.join(filepath,"output_%05d"%count)
    # thisout = PIL.Image.fromarray(img)
    # thisout.save(outfile)
    save_as_images(img.unsqueeze(0), outfile)
    count+=1

    if i%num_interps == 0:
        for j in range(len_hold):
            outfile = os.path.join(filepath,"output_%05d"%count)
            # thisout.save(outfile)
            save_as_images(img.unsqueeze(0), outfile)
            count+=1
  return count

# REMOVED OPENCV VIDEO CREATION FOR NOW

# def make_video(video_name, imgs):
#   _, height, width, _ = imgs.shape
#   video = cv2.VideoWriter(video_name, cv2.VideoWriter_fourcc('M','J','P','G'), fps=24, frameSize=(width,height))
#   for iter in range(0,imgs.shape[0]):
#       video.write(imgs[iter,:,:,::-1])
#   cv2.destroyAllWindows()
#   video.release()
# #   files.download(video_name)
#   print("download ", video_name)


# def make_video_from_samples(video_name, sess, noise, label, truncation=1.0, batch_size=8, vocab_size=vocab_size):
#   height, width = 512, 512
#   video = cv2.VideoWriter(video_name, cv2.VideoWriter_fourcc('M','J','P','G'), fps=30, frameSize=(width,height))
#   noise = np.asarray(noise)
#   label = np.asarray(label)
#   num = noise.shape[0]
#   if len(label.shape) == 0:
#     label = np.asarray([label] * num)
#   if label.shape[0] != num:
#     raise ValueError('Got # noise samples ({}) != # label samples ({})'
#                      .format(noise.shape[0], label.shape[0]))
#   label = one_hot_if_needed(label, vocab_size)
#   ims = []
#   for batch_start in tqdm(xrange(0, num, batch_size)):
#     s = slice(batch_start, min(num, batch_start + batch_size))
#     feed_dict = {input_z: noise[s], input_y: label[s], input_trunc: truncation}
#     ims = [sess.run(output, feed_dict=feed_dict)]
#     ims = np.concatenate(ims, axis=0)
#     ims = np.clip(((ims + 1) / 2.0) * 256, 0, 255)
#     ims = np.uint8(ims)
#     for iter in range(0,ims.shape[0]):
#       video.write(ims[iter,:,:,::-1])
#   cv2.destroyAllWindows()
#   video.release()
# #   files.download(video_name)
#   print("download ", video_name)

file paths:

In [None]:
import os
# file paths
work = os.getcwd() # get the current path

# the file directories
# workbase = %env WORK
workbase = "/content"

# results
resultsbase = os.path.join(workbase, "biggan", "results/")

# intermediate frames working directory
interpbase = os.path.join(workbase, "biggan", "interpolation/")

# video output files
filebase = 'myinterp_%d_%d'
moviefilename = filebase+'.mp4'

delete old results and make new folder if necessary

In [None]:
!mkdir -p $interpbase
!rm $interpbase/*

In [None]:
# frame rate for the the movie
fps = 30
len_hold = 30 # how many frames to pause on each sample
num_interps = 180 # how many frames to transition between the sample

# parameters for sampling from model
num_samples = 2
truncation = 0.25

# class and noise vectors
noise_seed_A = 0
category_A = 48

noise_seed_B = 0
category_B = 8 # hen


y_interp, z_interp = get_interpolated_yz([category_A, category_B], num_interps, noise_seed_A, noise_seed_B, truncation=truncation)
imgs = sample(model, z_interp, y_interp, truncation=truncation)

# save_images(imgs, interpbase)
# count = num_interps
count = save_images_with_hold(imgs, interpbase, num_interps, len_hold)
print("saved {0} images out... to {1}".format(count, interpbase))

In [None]:
fps = 30

out = moviefilename%(category_A, category_B)
# with open('list.txt','w') as f:
#   for i in range(count*2):
#     f.write('file %s/output_%05d.jpg\n'%(interpbase, i))
cmd = "ffmpeg -r {0} -i {1}/output_%05d_0.png -c:v libx265 -pix_fmt yuv420p -crf 0 -r {0} {2} -y"

os.system(cmd.format(fps, interpbase, out))
print(cmd.format(fps, interpbase, out))

### To make the movie file
1. open a terminal here in jupterhub.
2. type `module load ffmpeg` to load the ffmpeg program.
3. cut and past the ffmpeg command above to generate your video file. This will create an output .mp4 file in your current directory.
4. Download the video file from the file browser.

### Activities
- Try with your own classes for `category_A` and `category_B`
- Try with 3 or more classes. Replace `[category_A, category_B]` in `get_interpolated_yz` with 3 or more numbers.
  - f.ex.: `[48, 356, 7]` for komodo, ox, chicken.
  - don't forget to change the output filename or save a local copy.
- try changing the fps, the `num_interps` (which sets how many steps between each image), and `len_hold`, which sets how long we pause on each image.



# 4. Opposite of a class

In [None]:
# num_samples = 10
# truncation = 0.4
# noise_seed = 14
# category = 603

# z = truncated_z_sample(num_samples, truncation, noise_seed)
# y = one_hot([category] * num_samples)

# # print(y)

# # invert y (opposite of horse cart)
# # y = (1.0 - y)/(len(y[0]))
# y = y * -1.0

# print(y)

# ims = sample(sess, z, y, truncation=truncation)
# imshow(imgrid(ims, cols=min(num_samples, 5)))

# References

[1] Andrew Brock, Jeff Donahue, and Karen Simonyan. [Large Scale GAN Training for High Fidelity Natural Image Synthesis](https://arxiv.org/abs/1809.11096). *arxiv:1809.11096*, 2018.

