## Final Report: Deep Dream
CS 344  
Ridge DeJong




### *Vision*
This project revolves around Deep Dream, which is in the category of image processing. It was originally created by Google, and its purpose was to help get a better understanding of how deep neural networks see images. Deep dream works by reversing the neural network so that instead of asking the model to identify an object in an image, you identify an object and tell the model to find that object in the image. If you do this to an image that doesn't contain the identified object, the model will still try to find that object because you said it is there. This will cause the model to start recreating the image, and by doing so, create an illusion that the object is there. The significance of this is that when a model makes these changes, we can visually see the small pieces that the model is looking for, indicating what the model knows (and doesn't know) about that object and what direction the model needs to go to become more accurate. Therefore, the purpose of this project was open up the hood of Deep Dream to better understand how it works and what its capabilities are. 

### *Background*

This work was based off the Deep Dream tutorial by Chollet (https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/8.2-deep-dream.ipynb). Similar to the tutorial, this project used a pre-trained convolutional neural network called Inception V3 as the model. A CNN was chosen above other neural networks because this type of network excels at  processing images efficiently and accurately. This CNN was trained on ImageNet, which is a database composed of more than a million hand-labeled images of many types. Other technologies involved in this program were activation and loss of different layers (https://github.com/kvlinden-courses/cs344-code/blob/master/u08features/backpropagation.ipynb), along with a gradient ascent process (https://developers.google.com/machine-learning/glossary#gradient-descent) which have all been seen before in class (we studied gradient descent but gradient ascent is the same thing, just maximizing loss instead of minimizing). Some new technologies were preprocessing and deprocessing images, as well as detail injection into images. The point of preprocessing an image is to prepare it for the model so that it is easier to analyze. In this case, preprocessing meant resizing the image and formatting it into an appropriate tensor. Deprocessing is of course the opposite of this. After the model has finished analyzing the tensor, the tensor is converted and resized back into an image so that the results can be seen. Detail injection occurs between octaves where the image is upscaled. Since a smaller image would lose detail and become blurry when scaled bigger, details from the original image are injected to counteract this. It works by calculating the difference between the original image resized to the smaller image and the original image resized to the upscaled image. This difference therefore quantifies the details lost when going from the smaller image to the upscaled image, which can then be injected. A visual of this detail injection process can be found in the Results section. The heart of this Deep Dream algorithm is shown below, while the full code can be found in the repo under "Deep Dream Code".

In [0]:
# Fill this to the path to the image you want to use
base_image_path = '/dog.jpg'

# Load the image into a Numpy array
img = preprocess_image(base_image_path)

# We prepare a list of shape tuples
# defining the different scales at which we will run gradient ascent
original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range(1, num_octave):
    shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])
    successive_shapes.append(shape)

# Reverse list of shapes, so that they are in increasing order
successive_shapes = successive_shapes[::-1]

# Resize the Numpy array of the image to our smallest scale
original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])

for shape in successive_shapes:
    print('Processing image shape', shape)
    img = resize_img(img, shape)
    img = gradient_ascent(img,
                          iterations=iterations,
                          step=step,
                          max_loss=max_loss)
    upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)
    same_size_original = resize_img(original_img, shape)
    lost_detail = same_size_original - upscaled_shrunk_original_img

    img += lost_detail
    shrunk_original_img = resize_img(original_img, shape)
    save_img(img, fname='dream_at_scale_' + str(shape) + '.png')

save_img(img, fname='final_dream.png')

### *Implementation*

The project started by copying the tutorial mentioned above and getting it to run. Then, different types of images were fed to the model to see what the Deep Dream program did to them. There was a blank image, a static image, a high resolution image, a similar low resolution image, images with animals, a cartoon animal image, a clear image with lots of people, and a blurry image of people. With all these images, the first modification was to adjust some or all of the 11 layer activation coefficients labeled mixed0 through mixed10. These coefficients specified how much of each layer's activation contributed to the loss being maximixed. The chunk of code where this was done is shown below with arbitrary values.


In [0]:
layer_contributions = {
    'mixed0': 0.001,
    'mixed1': 0.05,
    'mixed2': 0.5,
    'mixed3': 1,
    'mixed4': 1,
    'mixed5': 0.5,
    'mixed6': 0.1,
    'mixed7': 0.05,
    'mixed8': 0.005,
    'mixed9': 0.0001,
    'mixed10': 0.0001,
}

Second, for all these images the hyperparameters were adjusted. These included 'step' which was the gradient ascent step size, 'num_octave' which was the number of scales that the gradient ascent was run for, 'octave_scale' which was the size ratio between these different scales, 'iterations' which was the number of ascent steps for each scale, and lastly 'max_loss' which was the limit at which the gradient ascent process would terminate if reached. By using many different combinations of these hyperparameters over a diverse group of images, a good understanding of Deep Dream was gained, which is explained in the Results section. The code below shows the hyperparameters that were set.

In [0]:
step = 0.05  # Gradient ascent step size
num_octave = 5  # Number of scales at which to run gradient ascent
octave_scale = 1.6  # Size ratio between scales
iterations = 20  # Number of ascent steps per scale

# If our loss gets larger than 10,
# we will interrupt the gradient ascent process, to avoid ugly artifacts
max_loss = 60.

### *Results*

The best way to communicate the results is through images. The first image below shows the original picture. The second image shows what happens when the layer activation coefficients are biased towards the lower layers, and the third image shows the coefficients biased towards the upper layers.


In [0]:
%%html
<img src="moon3.jpg" width="400" />
<img src="moon3lower.jpg" width="400" />
<img src="moon3upper.jpg" width="400" />

This was done with another image as well.

In [0]:
%%html
<img src="peopleblurry.jpg" width="400" />
<img src="peopleblurrylower.jpg" width="400" />
<img src="peopleblurryupper.jpg" width="400" />

Based on these outputs it can be concluded that the lower layers cause the model to focus on and extract the broader geometric patterns, while the higher layers tell it to look at the smaller, more detailed patterns.  

Next, the step was varied to experiment with different gradient ascent jumps. For the images below, the first of the pair uses a small step (0.005) while the second of the pair uses a larger step (0.05). 

In [0]:
%%html
<img src="moon3smallstep.jpg" width="400" />
<img src="moon3largestep.jpg" width="400" />
<img src="peopleblurrysmallstep.jpg" width="400" />
<img src="peopleblurrylargestep.jpg" width="400" />

These results show that a smaller step yields faint but coherent images that are almost recognizable as objects. On the other hand, larger steps result in more saturated colors but too many blended images to be recognized. These results align with the nature of gradient ascent. With smaller steps, the loss will not grow as fast as with larger steps, so when the number of iterations is fixed, the larger steps with end with a larger loss. However, the faster growth of the larger steps also means it skips over a lot, so there are more details but they are less refined.

Next, the number of octaves was varied which controlled how many times the image was processed. The diagram below from the chollet tutorial does a good job of visually explaining both octaves and detail injection.

In [0]:
%%html
<img src="octaves.jpg" width="800" />

While testing the num_octave hyperparameter, the first image was only processed once, while the second image was processed 5 times. This was again tested on a second image.

In [0]:
%%html
<img src="moon3oneoct.jpg" width="400" />
<img src="moon3fiveoct.jpg" width="400" />
<img src="peopleblurryoneoct.jpg" width="400" />
<img src="peopleblurryfiveoct.jpg" width="400" />

It can be seen that a lower octave makes minimal changes to the original image, while the higher octave makes much more complex and colorful designs. This makes sense because more octaves means more iterations, so more opportunities for the model to detect and add details to the image.

Next, the scale of the octaves was varied to change how much the image was scaled between processing loops. The first image of each pair uses a scale of 1.2 and the second image uses a scale of 1.8

In [0]:
%%html
<img src="moon3scale12.jpg" width="400" />
<img src="moon3scale18.jpg" width="400" />
<img src="peopleblurryscale12.jpg" width="400" />
<img src="peopleblurryscale18.jpg" width="400" />

From these images it can be seen that the scale doesn't play a huge role since these images are pretty similar. However, looking closely reveals some subtle differences. With the 1.2 scale, the model produces smaller images (usually animal faces) within the image. On the moon these are harder to see, and with the people these animal faces can be seen on the smaller heads. In contrast, the 1.8 scale produces larger animal faces. The moon shows some eyes and noses, while the people with bigger heads in the foreground have recognizable animal faces and even some paws. It makes sense that animals appear because even though the model was trained on ImageNet which contains many different types of images, animal images make up a huge portion of this dataset. 

Next, the number of iterations was varied. This controlled how many steps the gradient ascent would take for each octave. Therefore, it also controlled how large the loss ended up since the loss grew at each iteration. Because of this, the max loss was tested at the same time. The max loss controlled the loss at which the gradient ascent process would be cut off. So, if the max loss was too low then the number of iterations would not matter since the max loss would be reached every time. On the other hand, there comes a point when the max loss is so large that it does not matter, because at this point every iteration would be performed without being cut off. Taking this into account, the first image of each pair shows 10 iterations with a max loss of 10, while the second image shows 200 iterations on a max loss of 50.

In [0]:
%%html
<img src="moon3iterlow.jpg" width="400" />
<img src="moon3iterhigh.jpg" width="400" />
<img src="peopleblurryiterlow.jpg" width="400" />
<img src="peopleblurryiterhigh.jpg" width="400" />

These images show some interesting results. The less iterations (and therefore lower loss) showed good, clear patterns within the images. However, the higher iterations had different results. The moon with high loss appeared very similar to that of lower loss, with maybe slightly more color saturation. However, the people image with higher loss had much more saturation and so many details that it became noisy and unclear. The difference between these results is due to the nature of the images. When the moon was processed, the loss reached the max loss quickly so there were less iterations. When the people image was processed, it never reached the max loss in the first octave and reached the max loss in the other octaves slower than the moon. This means the number of iterations has a stronger influence on color and details than the final loss.  

Additionally, different types of images were compared. The image outputs shown below summarize how the model treats these different types of images. For all these comparisons, the following parameters were set since they produced good images. 

In [0]:
layer_contributions = {
    'mixed0': 0.0,
    'mixed1': 0.0,
    'mixed2': 0.0,
    'mixed3': 0.5,
    'mixed4': 1,
    'mixed5': 1,
    'mixed6': 1,
    'mixed7': 0.5,
    'mixed8': 0.0,
    'mixed9': 0.0,
    'mixed10': 0.0,
}

In [0]:
step = 0.01  # Gradient ascent step size
num_octave = 3  # Number of scales at which to run gradient ascent
octave_scale = 1.4  # Size ratio between scales
iterations = 20  # Number of ascent steps per scale
# If our loss gets larger than 10,
# we will interrupt the gradient ascent process, to avoid ugly artifacts
max_loss = 10.

The first comparison was with a blank and static image.

In [0]:
%%html
<img src="blankdream.jpg" width="400" />
<img src="staticdream.jpg" width="400" />

Then, high resolution and low resolution images of the moon were compared.

In [0]:
%%html
<img src="moondream.jpg" width="400" />
<img src="moon3dream.jpg" width="400" />

Next, a real dog was compared with a cartoon dog.

In [0]:
%%html
<img src="dogrealdream.jpg" width="400" />
<img src="dogcartoondream.jpg" width="400" />

Lastly, a clear image of people was compared with a blurry image of people. 

In [0]:
%%html
<img src="peoplecleardream.jpg" width="400" />
<img src="peopleblurrydream".jpg" width="400" />

The knowledge of how all these hyperparameters and different image types influence the model was then put to the test. Using a combination of these hyperparameters, this program attempted to reproduce the results of other deep dream algorithms across the internet. The first test was with mona lisa from Google's dreamscope app (https://gizmodo.com/someone-finally-turned-googles-deepdream-code-into-a-si-1719461004). The image from Google is shown first, and the image from this model is shown after.

In [0]:
%%html
<img src="monalisagoogledream.jpg" width="400" />
<img src="monalisadream.jpg" width="400" />

To get this resemblence, the following parameters were set.

In [0]:
layer_contributions = {
    'mixed0': 2,
    'mixed1': 1.5,
    'mixed2': 1,
    'mixed3': 0.5,
    'mixed4': 0.1,
    'mixed5': 0.1,
    'mixed6': 0.05,
    'mixed7': 0.05,
    'mixed8': 0.01,
    'mixed9': 0.01,
    'mixed10': 0.005,
}

In [0]:
step = 0.002  # Gradient ascent step size
num_octave = 1  # Number of scales at which to run gradient ascent
octave_scale = 1.6  # Size ratio between scales
iterations = 200  # Number of ascent steps per scale

# If our loss gets larger than 30,
# we will interrupt the gradient ascent process, to avoid ugly artifacts
max_loss = 30.

The reason Google's image has more eyes and dog features is because their model was trained on these things, while this Inception V3 model was trained on all types of images. Despite this difference in training, there are still a lot of similarities between these images, indicating that this model does a good job at recreating Deep Dream.  

A second test was performed with Manarola, Italy. The first image is from a Keras program using a VGG19 model (https://www.bonaccorso.eu/2017/07/09/keras-based-deepdream-experiment-based-vgg19/), followed by this program which uses an Inception V3 model.  


In [0]:
%%html
<img src="manarola1dream.jpg" width="400" />
<img src="manarola2dream.jpg" width="400" />

Getting this image output was the result of the following parameters.

In [0]:
layer_contributions = {
    'mixed0': 2,
    'mixed1': 1,
    'mixed2': 1,
    'mixed3': 0.5,
    'mixed4': 0.1,
    'mixed5': 0.1,
    'mixed6': 0.05,
    'mixed7': 0.05,
    'mixed8': 0.01,
    'mixed9': 0.01,
    'mixed10': 0.005,
}

In [0]:
step = 0.005  # Gradient ascent step size
num_octave = 4  # Number of scales at which to run gradient ascent
octave_scale = 1.2  # Size ratio between scales
iterations = 50  # Number of ascent steps per scale

# If our loss gets larger than 10,
# we will interrupt the gradient ascent process, to avoid ugly artifacts
max_loss = 20.

The main reason why these images are so different is because the first image was a very high resolution, meaning it was able to extract more details. However, because feeding the model a high resolution image takes so long to process, the image size was reduced for testing to dramatically shorten the process time. Because of this, less details were able to be extracted.  

Overall, a good understanding of Deep Dream was gained through this project. By experimenting with the data and parameters of the program, it was seen how Deep Dream works and how it handles different images. Unfortunately, since the Inception V3 model was pretrained on many different types of images, its results were not biased towards a certain pattern (ex. dog faces). Rather, its results included a lot of unique patterns with few distinguishable features. This made it difficult to intrepret the results and see what the model was seeing, which was unfortunate since that was the goal of the program. If there was more time for the project, I would have preferred to train my own CNN on a specific pattern and use that with the Deep Dream algorithm to better understand what the model is seeing. For example, if the CNN was trained on images of dogs and was given an image of a human, it is likely the output would have a face very similar to a dog. This would indicate that the model recogizes the similarities between dog and human faces. This type of insight would be much more valuable than that of Inception V3. 

### *Implications*

There are different implications for this Deep Dream technology going forward. First, this technology is making an impact in the art world. The random and bizarre outputs that Deep Dream programs can produce takes abstract art to a whole new level, sparking an interest with many people. Another implication is similiar to the goal of this project. As people better understand how neural networks see images (using Deep Dream), they will be able to design and tune neural networks to perform better than they already do. This means even more advancements in the area of image processing, which has many more implications of its own. I have seen the results of this advancement first-hand in airports where cameras and scanners are replacing the people who check passports. Some might think replacing these workers is unethical. Others might wonder if the facial recognition cameras are less reliable than actual people, meaning a potential security breach. These are the questions that must be asked as the Deep Dream technology becomes more and more popular. 