DSC160 Data Science and the Arts - Twomey - Spring 2020 - [dsc160.roberttwomey.com](http://dsc160.roberttwomey.com)

# BigGAN Hands-On

[BigGAN](https://arxiv.org/abs/1809.11096) set a standard for high resolution, high fidelity image synthesis in 2018. It contained four times as many parameters and eight times the batch size fo previous models, and synthesized a state of the art 512 x 512 images across 1000 different classes from Imagenet. It was also prohibitively expensive to train! Thankfully Google has released a number of pretrained models for us to explore. 

This exercise walks you through the use of the BigGAN network as a way to explore the role of latent vectors in generative output, and to build your familiarity with GANs and image synthesis. 

Compared to previous homeworks, this is less about the extension and more about hands one with GAN generation.

- Part 1 - Exploration (75 points total)
  - [A. Generate an Image](#1A.-Generate-One-Sample-Image)
  - [B. Generate a set of Images](#1B.-Generate-a-Set-of-Images) (20 points)
  - [C. Breeding Two Classes](#1C.-Breeding-Two-Classes) (15 points)
  - [D. Interpolation](#1D.-Interpolation) (30 points)
  - [E. Class Inversion](#1E.-Class-Inversion) (10 points)
- Part 2 - Extension (25 points total)
  - [A. Code](#2A.-Code) (15 points)
  - [B. Discussion](#2B.-Discussion) (10 points)

# Setup

Run one time to install tensorflow-hub. This module allows us to download [models shared through tfhub](https://tfhub.dev/s?subtype=module,placeholder), including the BigGAN variants.

In [None]:
# !pip install tensorflow-hub --user

Import modules

In [None]:
from io import StringIO, BytesIO
import IPython.display
import numpy as np
import PIL.Image
from scipy.stats import truncnorm
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt

## Get BigGAN Set Up

First, set the module path.
By default, we load the BigGAN-deep generator for 256x256 images from `https://tfhub.dev/deepmind/biggan-deep-256/1`.
To generate 128x128 or 512x512 images or to use the original BigGAN generators, comment out the active `module_path` setting and uncomment one of the others.

In [None]:
# BigGAN-deep models
# module_path = 'https://tfhub.dev/deepmind/biggan-deep-128/1'  # 128x128 BigGAN-deep
module_path = 'https://tfhub.dev/deepmind/biggan-deep-256/1'  # 256x256 BigGAN-deep
# module_path = 'https://tfhub.dev/deepmind/biggan-deep-512/1'  # 512x512 BigGAN-deep

# BigGAN (original) models
# module_path = 'https://tfhub.dev/deepmind/biggan-128/2'  # 128x128 BigGAN
# module_path = 'https://tfhub.dev/deepmind/biggan-256/2'  # 256x256 BigGAN
# module_path = 'https://tfhub.dev/deepmind/biggan-512/2'  # 512x512 BigGAN

Download the pre-trained BigGAN generator module from TF Hub

In [None]:
tf.reset_default_graph()
print('Loading BigGAN module from:', module_path)
module = hub.Module(module_path)
inputs = {k: tf.placeholder(v.dtype, v.get_shape().as_list(), k)
          for k, v in module.get_input_info_dict().items()}
output = module(inputs)

### Examining BigGAN model inputs and outputs

Print some information about the model. What are its inputs and outputs?

In [None]:
print('Inputs:\n', '\n'.join('\t{}: {}'.format(*kv) for kv in inputs.items()))
print('\nOutput:', output)

$z$ input is the noise vector (values drawn from a trunacted normal distribution)

In [None]:
input_z = inputs['z']
input_z.shape.as_list()

In [None]:
dim_z = input_z.shape.as_list()[1]

$y$ is the class conditioning vector (one hot), across 1000 classes. You can see the available classes [here](https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt).

In [None]:
input_y = inputs['y']
input_y.shape.as_list()

the truncation allows you to set a tradeoff between individual sample quality and overall sample variety

In [None]:
input_trunc = inputs['truncation']
input_trunc

# Image Generation

Create a TensorFlow session and initialize variables:

In [None]:
initializer = tf.global_variables_initializer()
sess = tf.Session()
sess.run(initializer)

### Create a helper function to display images

In [None]:
def imshow(a, format='png', jpeg_fallback=True):
    a = np.asarray(a, dtype=np.uint8)
    #   str_file = StringIO()
    str_file = BytesIO()
    PIL.Image.fromarray(a).save(str_file, format)
    im_data = str_file.getvalue()
    try:
        disp = IPython.display.display(IPython.display.Image(im_data))
    except IOError:
        if jpeg_fallback and format != 'jpeg':
            print ('Warning: image was too large to display in format "{}"; '
                 'trying jpeg instead.').format(format)
            return imshow(a, format='jpeg')
        else:
            raise
    return disp

## 1A. Generate One Sample Image

Let's generate a sample BigGAN output image from category number 933, generated with a truncation value of 0.2.

In [None]:
category_index = 933
truncation=0.2

### Generate a noise vector

Generate a noise vector, $z$ with size `dim_z` use the [`truncnorm.rvs()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.truncnorm.html) function. The authors suggest `(-2, 2)` for the bounds `a` and `b`, multiplied by the truncation value. Note: to work with tensorflow this vector needs to have dimensions `(1, dim_z)`. Set `random_state = 0`.

In [None]:
z = truncation * truncnorm.rvs(-2, 2, size=(1, dim_z), random_state=0)

Plot a Histogram showing the distribution of values in your $z$ vector: 

In [None]:
n, bins, patches = plt.hist(z[0], 10)
plt.show()

### Generate a one-hot class selection vector

The class conditioning vector, $y$, selects the category of image we generate. To generate a "one-hot" class conditioning vector, we want an array of dimensions `(1, 1000)` that is uniformly zero, with the nth element (`category_index` above) set to 1.

In [None]:
y = np.zeros((1,1000), dtype=np.float32)
y[0, category_index]=1

Verify the shape:

In [None]:
y.shape

Verify which element is non-zero (it should be `933`, our `category_index`)

In [None]:
y.nonzero()

### Run our TF session to generate an output

Store our noise ($z$) and class ($y$) vectors as well as our truncation value (`truncation`) in feed dictionary.

In [None]:
feed_dict = {input_z: z, input_y: y, input_trunc: truncation}

 Now run our tensorflow session to generate results.

In [None]:
results = sess.run(output, feed_dict=feed_dict)

What shape is our output?

In [None]:
results.shape

We need to grab the image data, and rescale the pixel range to display. (Rescale our values from `-1.0` to `1.0` to `0` to `256`)

In [None]:
image = results[0]
image = np.clip(((image + 1) / 2.0) * 256, 0, 255)
image = np.uint8(image)

Show the result:

In [None]:
imshow(image)

Wrap this image generation in a function for use later

In [None]:
def generate_images(z, y, truncation):
    feed_dict = {input_z: z, input_y: y, input_trunc: truncation}
    results = sess.run(output, feed_dict=feed_dict)

    image = results
    image = np.clip(((image + 1) / 2.0) * 256, 0, 255)
    image = np.uint8(image)
    
    return image

In [None]:
result = generate_images(z, y, truncation)
imshow(result[0])

## 1B. Generate a Set of Images

Let's produce a set of 10 outputs at once! We need to create a batch of 10 noise vectors and 10 class labels/one hot vectors.

(20 points total)

In [None]:
num_samples = 10
truncation = 0.4

Create `zs`, which will hold `num_samples` (10) truncated noise vectors (use `truncnorm.rvs` again with the same parameters as above).

(5 points)

In [None]:
# your code goes here

Your result should have a shape of `(10, 128)`:

In [None]:
zs.shape

Create `ys`, which will hold `num_samples` (10) class selection/conditioning vectors. Use the same `category_index` as above.

(5 points)

In [None]:
# your code goes here

Your result should have a shape of `(10, 1000)`:

In [None]:
ys.shape

Use the `generate_image()` function above to calculate your results 

(5 points)

In [None]:
# your code goes here

your results should have shape `(10, 256, 256, 3)`:

In [None]:
results.shape

Use imshow to view the results.

In [None]:
imshow(results[0])

Theres a trick with `np.concatenate()` that will let us join all the images into a horizontal image, you will use this later on in the exercise. 

In [None]:
imshow(np.concatenate(results, axis=1))

Explore different values for `truncation` between 0.02 and 1.0. What is the effect of truncation on the outputs? (You can also consult the BigGAN paper to figure this out)

(5 points)

In [None]:
# your response

### 1C. Breeding Two Classes

This section will show you how a weighted sum of class vectors results in a intermixed/combined output image. You will use the same noise vector (`z`) from above, but do a weighted sum of two different class conditioning vectors (`y_A` and `y_B`).

(15 points total)

In [None]:
truncation = 0.4
weight = 0.5
category_A = 207 # golden retriever
category_B = 8 # hen

Create a noise vector $z$ using your method from above. Use the same parameters for `random_state` as well as `a` and `b`.

(3 points)

In [None]:
# your code here

Generate two class conditioning vectors `y_A` and `y_B`, one hot encoding `category_A` and `category_B` above.

(6 points)

In [None]:
# your code here

Create a new class vector (`y_new`) that creates a weighted combination of `y_A` and `y_B` using `weight`.

(3 points)

In [None]:
# your code here

Run the session to generate the output and display it.

(3 points)

In [None]:
# your code here

## 1D. Interpolation

Now we will do a linear interpolation between these two classes of output and generate a number of intermediate transformation steps (`num_interps`).

(30 points total)

In [None]:
num_interps = 10
truncation = 0.8
noise_seed = 0
category_A = 207 # golden retriever
category_B = 8 # hen

Create a `z_A` and `y_A` for `category_A` using `noise_seed` for your `random_state`

(5 points)

In [None]:
# your code here

Create a `z_B` and `y_B` for `category_B` (again, using `noise_seed` for your `random_state`)

(5 points)

In [None]:
# your code here

Create 10 samples smoothly interpolating between `category_A` and `category_B`, both the `y_A`,`y_B` vectors and `z_A`,`z_B` vectors.

(15 points)

In [None]:
# your code here

Use `generate_images()` from above and the `np.concatenate()` trick to display the series of interpolations as a single wide image.

(5 points)

In [None]:
# your code here

## 1E. Class Inversion

If a one-hot vector conditions the class generation, what is the opposite of a given class? Let's try inverting the class generation vector (eg. multiply by `-1.0`).

(15 points total)

First, generate and display an image from category `603` using the parameters below.

(5 points)

In [None]:
truncation = 0.2
noise_seed = 0
category_index = 603 # horse cart

# your code here

Now generate a companion image using the same $z$ vector, but with the category vector ($y$) inverted (multiplied by `-1`), and display the result.

(5 points)

In [None]:
# your code here

How would you describe the resulting image? What does it show?

(5 points, no wrong answers!)

In [None]:
# your response here

# Part 2. Extension

Extend this exercise in some aspect. Possible extension include:
- Explore the full set of image classes. Find some particularly interest classes and combinations of classes, and produce hybrid/mutant outputs. What if we combine multiple (3) class vectors? Do we see a result with attributes of all three inputs?
- Identify multiple points of interest in latent space. Create an animation with some traversal of those distinct points of interest (you may use ImageIO to create a GIF, as in the DCGAN notebook).
- Explore manipulations of the noise vector ($z$). For instance, one blog post suggests applying `np.sin()` to the $z$ vector with a fixed class conditioning alters the zoom of the generated image.
- Try to upload a new image of one of the trained classes and "recover" a corresponding latent vector. See the reference below (https://arxiv.org/abs/1702.04782).

## 2A. Code

Write your code below with comments:

(15 points)

In [None]:
# your code here

## 2B. Discussion

Describe your goals for the extension, your results, and how this could lead to an interesting generative art project:

(1 paragraph, 10 points)

```REPLACE THIS WITH YOUR DISCUSSION OF YOUR EXTENSION```

# Reference
- Andrew Brock, Jeff Donahue, and Karen Simonyan. [Large Scale GAN Training for High Fidelity Natural Image Synthesis](https://arxiv.org/abs/1809.11096). arxiv:1809.11096, 2018.
- https://twitter.com/quasimondo/status/1151819357988761601?lang=en
- Precise Recovery of Latent Vectors from Generative Adversarial Networks (2017) https://arxiv.org/abs/1702.04782

