## Assignment 3 - PyTorch version

This is a version of assignment 3 that is made in [PyTorch](https://pytorch.org/) rather than Tensorflow/Keras.

This is for anyone who has been having trouble running tensorflow on their computer (especially if you have an M2 Mac!), or for students who just generally want the experience of working with a different deep learning library! For several years now PyTorch has become the most [commonly used library by researchers](https://www.assemblyai.com/blog/pytorch-vs-tensorflow-in-2023/) in this field.

PyTorch operates at a lower level of abstraction to Keras. Which means we need to write a bit more code to implement and train our model. This may make the code look more complex at first, but actually this is often preferable as it is hiding less about what the code is doing from you and you have more control over how your models are built, trained and represented.

If you want more practice with PyTorch [take a look at their excellent set of tutorials](https://pytorch.org/tutorials/).

- __Extend the model in this notebook into one which maps (X,Y) -> (R,G,B).__
- __Add at least 2 more layers to the network.__
- __Experiment with alternative activation functions and optimizers.__
- __In a paragraph or so, describe how the image we have created differs from a normal image.__

You can find other images to play with [from SciKit Image here](https://scikit-image.org/docs/dev/api/skimage.data.html), but of course you could experiment with using you're own images. For that you might want to use the [Pillow](https://pillow.readthedocs.io/en/stable/) package which has some [handy functions for loading and manipulating images](https://pillow.readthedocs.io/en/stable/reference/Image.html).

This shouldn't take you longer than an afternoon! __This will be handed in at the end of the module__ so once you have something working it would be _much appreciated_ if you go back over your code and tidy it up, maybe add comments to describe what is happening in the code.

Here are some more lovely examples from [David Ha](https://twitter.com/hardmaru):

![David Ha bw](./images/hardmaru_color.png)

---

If you like this work you could take some ideas explored by David Ha in his blog posts on this topic and re-implement them, or take them further for your final project. I think there is a lot of potential for creating really interesting images and even interesting drawing tools!

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import skimage
import random
from skimage.transform import resize
import matplotlib.pyplot as plt

Set the device for your machine.

For Mac M1/M2 this should say device should be `mps`.

For a machine with an NVIDIA GPU device should be `cuda`.

To run on the CPU of any other machine, device should be `cpu`.

In [None]:
# device = 'cuda'
device = 'mps' 
#device = 'cpu'

The below code is the original code provided in the Assignment brief. The rest of the assignment has been broken up into different sections, as follows:

Section 1 - The original image and model provided in the assignment notebook. 
Section 2 - Extending the model to match (X,Y) to (R, G, B).
Section 3 - Adding two or more layers to the neural network.
Section 4 - Experimenting with different activation functions (4.1) and optimizers (4.2).
Section 5 - In a paragraph or so, describe how the image we have created differs from a normal image. 

<font size = "5"> Section 1

In [None]:
#Get image from skimage
img = skimage.data.camera()
smaller_img = resize(img, (64, 64)) # Resize it just to make things quicker
plt.imshow(img, cmap='gray')

In [None]:
#Our function that gets a grid of coordinates
def get_mgrid(sidelen):
    '''Generates a flattened grid of (x,y,...) coordinates in a range of -1 to 1.'''
    width = np.linspace(-1, 1, sidelen)
    height = np.linspace(-1, 1, sidelen)
    mgrid = np.stack(np.meshgrid(width, height), axis=-1)
    mgrid = np.reshape(mgrid, [-1, 2])
    return mgrid

Defining the variables we will be using as our training dataset.

In [None]:
#defining the variables for training

image_side_length = smaller_img.shape[0]
X = get_mgrid(image_side_length)      # Inputs
X = np.float32(X)

#Convert input array to torch tensor
X_torch_tensor = torch.tensor(X, device=device)

y = np.reshape(smaller_img, [-1, 1])  # Outputs
y = np.float32(y)

#convert output array to torch tensor
y_torch_tensor = torch.tensor(y, device=device)

#Get the total number of coordinates 
num_coords = X_torch_tensor.shape[0]

Lets check to see our data tensor is running on the right device.

For Mac M1/M2 this should say `mps:0`.

For CPU this should say `cpu`.

For a machine with an NVIDIA GPU this should say `cuda`.

In [None]:
print(X_torch_tensor.device)

Here we define our model. Our network is a class that inherits from the base model in tensorflow `nn.Module`.

`def __init__(self):` Is where the constructor is defined, this is the function that gets called when our network is first **initialised** here we tell it what the layers are and the parameters that we want to keep track of.

`def forward(self, x):` is where we define the forward pass of the model. This is what we want the model to do with every input data example and define what layers (and activation functions) are ran, and what the output is that our model generates.

In [None]:
class CPPN(nn.Module):
    def __init__(self):
      super(CPPN, self).__init__()

      # First fully connected layer
      self.fc1 = nn.Linear(2, 16)
      # Second fully connected layer
      self.fc2 = nn.Linear(16, 32)
      # Third fully connected layer
      self.fc3 = nn.Linear(32, 1)     
    
    # x represents our data
    def forward(self, x):
        # Pass through first fully connected layer
        x = self.fc1(x)
        # Relu activation function
        x = F.relu(x)
        
        # Pass through second fully connected layer
        x = self.fc2(x)
        # Relu activation function
        x = F.relu(x)

        # Pass through third fully connected layer
        x = self.fc3(x)
        # Sigmoid activation function
        x = F.sigmoid(x)

        # Return our output
        return x

This is the code where we create our model, we create one version of our network based on our `CPPN` class. We also need to define the loss function we are using `criterion` and what `optimiser` we are using for updating the weights of the network.

In [None]:
#creating our model 

num_epochs = 8
batch_size = 1

cppn = CPPN()
cppn.to(device)
cppn.requires_grad_()

optimiser = torch.optim.SGD(cppn.parameters(), lr=0.001, momentum=0.9)
criterion = nn.MSELoss()

This is wheere we train the model. We have a for loop that tells us how many times we want to process data during training. This depends on the size of the data `num_coords`, the number of epochs (cycles through the data) `num_epochs`, and the batch size `batch_size`.

During training we process one batch at a time, our input batch is the `input_batch` variable (which is our batch of coordinates), our output batch is the `true_pixel_values`. 

We use our network model to get approximations of the pixel values `approx_pixel_values` based on the input coordinates, and compare that to our true pixel values using our loss function critera. Once we have our the value of our loss for that batch `loss`, we call `loss.backward()` to backpropagate the error through the network model. We then call `optimiser.step()` to update the weights of our neural network model. 

We do this repeatedly until we have completed the number of cycles through our datset `num_epochs`.

In [None]:
coord_indexes = list(range(0, num_coords))
for i in range( int((num_epochs * num_coords) / batch_size) ):
    optimiser.zero_grad()
    cppn.zero_grad()

    # Get batch of data
    batch_indexes = torch.tensor(np.array(random.sample(coord_indexes, batch_size)))
    input_batch = X_torch_tensor[batch_indexes]
    true_pixel_values = y_torch_tensor[batch_indexes]

    # Process data with model
    approx_pixel_values = cppn(input_batch)

    # Calculate loss function
    loss = criterion(true_pixel_values, approx_pixel_values)
    
    if i % 1000 == 0:
        print(f'step {i}, loss {loss:.3f}')
    
    #Update model
    loss.backward()
    optimiser.step()


Now lets process the whole image again with the trained model.

In [None]:
#process the image 

with torch.no_grad():
    prediction = cppn(X_torch_tensor)

Now lets visualise the results!

In [None]:
# Reshape it from 1D to 2D
reconstructed_img = np.reshape(prediction.cpu(), (64, 64))
# Scale the values from [0,1] to [0, 255]
reconstructed_img *= 255
# Convert the tensor into a numpy array, and cast the type into a uint8.
reconstructed_img = reconstructed_img.numpy().astype(np.uint8)
# Look at our creation next to the original!
fig, axes_array = plt.subplots(1,3, figsize=(20,10))
axes_array[0].imshow(img, cmap='gray')
axes_array[1].imshow(smaller_img, cmap='gray')
axes_array[2].imshow(reconstructed_img, cmap='gray')
plt.show()

<font size = "5" > Section 2 - Extend the Model to map (X,Y) -> (R,G,B)

In [None]:
#Get image from skimage

img_color = skimage.data.astronaut()
smaller_img_color = resize(img, (64, 64)) # Resize it just to make things quicker
plt.imshow(img_color)

In [None]:
#defining a function that gets a grid of coordinates

def get_mgrid(sidelen):
    '''Generates a flattened grid of (x,y,...) coordinates in a range of -1 to 1.'''
    width = np.linspace(-1, 1, sidelen)
    height = np.linspace(-1, 1, sidelen)
    mgrid = np.stack(np.meshgrid(width, height), axis=-1)
    mgrid = np.reshape(mgrid, [-1, 2])
    return mgrid

In [None]:
#defining the variables for training

image_side_length_color = smaller_img_color.shape[0]
X = get_mgrid(image_side_length_color)      # Inputs
X = np.float32(X)

#Convert input array to torch tensor
X_torch_tensor = torch.tensor(X, device=device)

y = np.reshape(smaller_img_color, [-1, 3])  # Outputs - adjusted the outer limit to be sure 3 outputs are generated. 
y = np.float32(y)

#convert output array to torch tensor
y_torch_tensor = torch.tensor(y, device=device)

#Get the total number of coordinates 
num_coords = X_torch_tensor.shape[0]

In [None]:
#defining a CPPN network to map color images

class CPPN_color(nn.Module):
    def __init__(self):
      super(CPPN_color, self).__init__()

      # First fully connected layer
      self.fc1 = nn.Linear(2, 16)
      # Second fully connected layer
      self.fc2 = nn.Linear(16, 32)
      # Third fully connected layer
      self.fc3 = nn.Linear(32, 3)     #adjusted the output of the layer to produce 3 values rather than 1 as RGB requires 3 values
    
    # x represents our data
    def forward(self, x):
        # Pass through first fully connected layer
        x = self.fc1(x)
        # Relu activation function
        x = F.relu(x)
        
        # Pass through second fully connected layer
        x = self.fc2(x)
        # Relu activation function
        x = F.relu(x)

        # Pass through third fully connected layer
        x = self.fc3(x)
        # Sigmoid activation function
        x = F.sigmoid(x)

        # Return our output
        return x

In [None]:
#creating our model to train on the color CPPN defined above

num_epochs = 20 #increased the number of epochs to better refine results (less epochs weren't as interesting)
batch_size = 1

cppnc = CPPN_color()
cppnc.to(device)
cppnc.requires_grad_()

optimiser = torch.optim.SGD(cppnc.parameters(), lr=0.001, momentum=0.9)
criterion = nn.MSELoss()

In [None]:
#training the color model 

coord_indexes = list(range(0, num_coords))
for i in range( int((num_epochs * num_coords) / batch_size) ):
    optimiser.zero_grad()
    cppnc.zero_grad()

    # Get batch of data
    batch_indexes = torch.tensor(np.array(random.sample(coord_indexes, batch_size)))
    input_batch = X_torch_tensor[batch_indexes]
    true_pixel_values = y_torch_tensor[batch_indexes]

    # Process data with model
    approx_pixel_values = cppnc(input_batch)

    # Calculate loss function
    loss = criterion(true_pixel_values, approx_pixel_values)
    
    if i % 1000 == 0:
        print(f'step {i}, loss {loss:.3f}')
    
    #Update model
    loss.backward()
    optimiser.step()

In [None]:
#processing the image

with torch.no_grad():
    prediction_color = cppnc(X_torch_tensor)

In [None]:
#configuring and diplaying the results of the color image after it has been trained on the model 

# Reshape it from 1D to 2D
reconstructed_img_color = np.reshape(prediction_color.cpu(), (64, 64, 3)) #added in a third dimension to accommodate for the 3 RGB values
# Scale the values from [0,1] to [0, 255]
reconstructed_img_color *= 255
# Convert the tensor into a numpy array, and cast the type into a uint8.
reconstructed_img_color = reconstructed_img_color.numpy().astype(np.uint8)
# Look at our creation next to the original!
fig, axes_array = plt.subplots(1,3, figsize=(20,10))
axes_array[0].imshow(img_color)
axes_array[1].imshow(smaller_img_color)
axes_array[2].imshow(reconstructed_img_color) 
plt.show()

For this section, I adjusted the model to map (X,Y) to (R, G, B) by adjusting the Neural Networks outputs (as seen when setting up the layers of the NN and when reshaping the image prior to training) and dimensions of the photograph once the model has been trained. In doing this, the model is now set up to take in 3 color values, as is necessary for an RGB image. The final output of this training is viewable in the above cell, shown by the right most image. 

<font size = "5" > Section 3 - Add at least two layers to the network.

To complete this section, I will be going back to using the original black and white photograph used in Section 1. I have updated the names of the models to reflect the additional layers (and so as not to confuse with the previous code). 

In [None]:
#Get image from skimage

img_add = skimage.data.camera()
smaller_img_add = resize(img_add, (64, 64)) # Resize it just to make things quicker
plt.imshow(img_add, cmap='gray')

In [None]:
#defining a function that gets a grid of coordinates

def get_mgrid(sidelen):
    '''Generates a flattened grid of (x,y,...) coordinates in a range of -1 to 1.'''
    width = np.linspace(-1, 1, sidelen)
    height = np.linspace(-1, 1, sidelen)
    mgrid = np.stack(np.meshgrid(width, height), axis=-1)
    mgrid = np.reshape(mgrid, [-1, 2])
    return mgrid

In [None]:
#defining our variables for training

image_side_length_color = smaller_img_color.shape[0]
X = get_mgrid(image_side_length_color)      # Inputs
X = np.float32(X)

#Convert input array to torch tensor
X_torch_tensor = torch.tensor(X, device=device)

y = np.reshape(smaller_img_color, [-1, 1])  # Outputs
y = np.float32(y)

#convert output array to torch tensor
y_torch_tensor = torch.tensor(y, device=device)

#Get the total number of coordinates 
num_coords = X_torch_tensor.shape[0]

In [None]:
#defining our larger neural network 

class CPPN_add(nn.Module):
    def __init__(self):
      super(CPPN_add, self).__init__()

      # First fully connected layer
      self.fc1 = nn.Linear(2, 16)
      # Second fully connected layer
      self.fc2 = nn.Linear(16, 32)
      # Third fully connected layer
      self.fc3 = nn.Linear(32, 64) #adjusted the values for the third fully connected layer
      # Fourth fully connected layer 
      self.fc4 = nn.Linear(64, 32) #adjusted the values for the fourth fully connected layer
      # Fifth fully connected layer
      self.fc5 = nn.Linear(32, 1)    
    
    # x represents our data
    def forward(self, x):
        # Pass through first fully connected layer
        x = self.fc1(x)
        # Relu activation function
        x = F.relu(x)
        
        # Pass through second fully connected layer
        x = self.fc2(x)
        # Relu activation function
        x = F.relu(x)

        # Pass through third fully connected layer
        x = self.fc3(x)
        # Relu activation function
        x = F.relu(x)

        # Pass through fourth fully connected layer
        x = self.fc4(x)
        # Relu activation function
        x = F.relu(x)

        # Pass through fifth fully connected layer
        x = self.fc5(x)
        # Sigmoid activation function
        x = F.sigmoid(x)

        # Return our output
        return x

In [None]:
#creating our model and defining the training epochs and batch size

num_epochs = 8
batch_size = 1

cppna = CPPN_add()
cppna.to(device)
cppna.requires_grad_()

optimiser = torch.optim.SGD(cppna.parameters(), lr=0.001, momentum=0.9)
criterion = nn.MSELoss()

In [None]:
#training the model 

coord_indexes = list(range(0, num_coords))
for i in range( int((num_epochs * num_coords) / batch_size) ):
    optimiser.zero_grad()
    cppna.zero_grad()

    # Get batch of data
    batch_indexes = torch.tensor(np.array(random.sample(coord_indexes, batch_size)))
    input_batch = X_torch_tensor[batch_indexes]
    true_pixel_values = y_torch_tensor[batch_indexes]

    # Process data with model
    approx_pixel_values = cppna(input_batch)

    # Calculate loss function
    loss = criterion(true_pixel_values, approx_pixel_values)
    
    if i % 1000 == 0:
        print(f'step {i}, loss {loss:.3f}')
    
    #Update model
    loss.backward()
    optimiser.step()

In [None]:
#processing the image

with torch.no_grad():
    prediction_add = cppna(X_torch_tensor)

In [None]:
#configuring and diplaying the results of the color image after it has been trained on the model 

# Reshape it from 1D to 2D
reconstructed_img_add = np.reshape(prediction_add.cpu(), (64, 64))
# Scale the values from [0,1] to [0, 255]
reconstructed_img_add *= 255
# Convert the tensor into a numpy array, and cast the type into a uint8.
reconstructed_img_add = reconstructed_img_add.numpy().astype(np.uint8)
# Look at our creation next to the original!
fig, axes_array = plt.subplots(1,3, figsize=(20,10))
axes_array[0].imshow(img_add, cmap='gray')
axes_array[1].imshow(smaller_img_add, cmap='gray')
axes_array[2].imshow(reconstructed_img_add, cmap='gray')
plt.show()

As shown in the code above, I added two additional layers to my network. Instead of the network mapping to 32 output values and back down, my network went up to 64 outputs, before coming back to 32 and then 1. What is interesting about the output of this network is the difference in color, or truly, the size of the black represented in the image. In the original network, the blurred image recognizes a humanoid shape, acknowledging the area surrounding the man's head which is not of a darker color. Similarly, it doesn't recognize the face or camera as dark either. In this new neural network, however, that delineation seems to have disappeared. The entire left-hand side of the image is now black. It would be interested to continue studying the reason behind this, and ask questions such as could this be a result of having jumped from 32 outputs to 1 so quickly? Would it produce a better result by having a slower gradient to 1 output? 

<font size = "5" > Section 4.1 - Experiment with other activation functions. 

This section will be split into several distinct parts. Section 4.1 will go through two different activation functions: softsign and tanhshrink. 4.2 will go through two different opitimzers: adam and adadelta. To compare results, all tests will be done on the original image using the original neural network in section 1. 

<font size = "4" > Softsign Activation Function

In [None]:
#defining the softsign activation function model

class CPPN_softsign(nn.Module):
    def __init__(self):
      super(CPPN_softsign, self).__init__()

      # First fully connected layer
      self.fc1 = nn.Linear(2, 16)
      # Second fully connected layer
      self.fc2 = nn.Linear(16, 32)
      # Third fully connected layer
      self.fc3 = nn.Linear(32, 1)     
    
    # x represents our data
    def forward(self, x):
        # Pass through first fully connected layer
        x = self.fc1(x)
        # Relu activation function
        x = F.relu(x)
        
        # Pass through second fully connected layer
        x = self.fc2(x)
        # Relu activation function
        x = F.relu(x)

        # Pass through third fully connected layer
        x = self.fc3(x)
        # Softsign activation function
        x = F.softsign(x)

        # Return our output
        return x

In [None]:
#creating our model and defining the training epochs and batch size

num_epochs = 8
batch_size = 1

#creating one version using the CPPN class
cppns = CPPN_softsign()
cppns.to(device)
cppns.requires_grad_()

#defining the criterion and optimizer to update the weights of the network - using the same optimizer as the original cppn
optimizer = torch.optim.SGD(cppns.parameters(), lr=0/1, momentum=0.9)
criterion = nn.MSELoss()

In [None]:
#training the model

coord_indexes = list(range(0, num_coords))
for i in range(int((num_epochs * num_coords) / batch_size)):
    optimizer.zero_grad()
    cppns.zero_grad()

    #get batch of data
    batch_indexes = torch.tensor(np.array(random.sample(coord_indexes, batch_size)))
    input_batch = X_torch_tensor[batch_indexes]
    true_pixel_values = y_torch_tensor[batch_indexes]

    #the approx pixel values the model predicts
    approx_pixel_values = cppns(input_batch)

    #calculate loss function
    loss = criterion(true_pixel_values, approx_pixel_values)

    if i % 1000 ==0:
        print(f'step {i}, loss {loss:.3f}')

    #update model and weights 
    loss.backward()
    optimizer.step()

In [None]:
#process the whole image with the trained model

with torch.no_grad():
    prediction = cppns(X_torch_tensor)

In [None]:
#configuring and diplaying the results of the color image after it has been trained on the model 

#reshape the image from 1D to 2D
reconstructed_img_cppns = np.reshape(prediction.cpu(), (64, 64))

#scale the values from [0,1] to [0,255]
reconstructed_img_cppns *= 255

#convert the tensor into a nunmpy array, and cast the type into the uint8
reconstructed_img_cppns = reconstructed_img_cppns.numpy().astype(np.uint8)

#look at our creation next to the original! 
fig, axes_array = plt.subplots(1, 3, figsize= (20,10))
axes_array[0].imshow(img, cmap='gray')
axes_array[1].imshow(smaller_img, cmap='gray')
axes_array[2].imshow(reconstructed_img_cppns, cmap='gray')
plt.show()

What is interesting about this image is that it does recognize the difference in values between light and dark, however, different from the original model, and the model with additional layers, the light and dark values are not oriented correctly to where those values appear on the image. In this production, the darkest values are nearly centered on the image, when in reality, they are associated more towards the left-hand size. This becomes more interesting when thinking about the outputs created between the sigmoid function and the softsign function. Whereas the sigmoid function produces values between 0 and 1, the softsign activation function produces values between -1 and 1. This means the softsign function is 0 centered, and likely to produce a more drastic gradient in color (here color referring to black and white) values than the sigmoid function. This is seen in our produced image as we see a greater depth of color values than the sigmoid function, despite not being as well associated to those values locations. 

<font size = "4"> Tanhshrink Activation Function

In [None]:
#defining the model using Tanhshrink

class CPPN_tanhshrink(nn.Module):
    def __init__(self):
      super(CPPN_tanhshrink, self).__init__()

      # First fully connected layer
      self.fc1 = nn.Linear(2, 16)
      # Second fully connected layer
      self.fc2 = nn.Linear(16, 32)
      # Third fully connected layer
      self.fc3 = nn.Linear(32, 1)     
    
    # x represents our data
    def forward(self, x):
        # Pass through first fully connected layer
        x = self.fc1(x)
        # Relu activation function
        x = F.relu(x)
        
        # Pass through second fully connected layer
        x = self.fc2(x)
        # Relu activation function
        x = F.relu(x)

        # Pass through third fully connected layer
        x = self.fc3(x)
        # Softsign activation function
        x = F.tanhshrink(x)

        # Return our output
        return x

In [None]:
#creating our model and defining the training epochs and batch size

num_epochs = 8
batch_size = 1

#creating one version using the CPPN class
cppnt = CPPN_tanhshrink()
cppnt.to(device)
cppnt.requires_grad_()

#defining the criterion and optimizer to update the weights of the network - using the same optimizer as the original cppn
optimizer = torch.optim.SGD(cppns.parameters(), lr=0/1, momentum=0.9)
criterion = nn.MSELoss()

In [None]:
#training the model

coord_indexes = list(range(0, num_coords))
for i in range(int((num_epochs * num_coords) / batch_size)):
    optimizer.zero_grad()
    cppnt.zero_grad()

    #get batch of data
    batch_indexes = torch.tensor(np.array(random.sample(coord_indexes, batch_size)))
    input_batch = X_torch_tensor[batch_indexes]
    true_pixel_values = y_torch_tensor[batch_indexes]

    #the approx pixel values the model predicts
    approx_pixel_values = cppnt(input_batch)

    #calculate loss function
    loss = criterion(true_pixel_values, approx_pixel_values)

    if i % 1000 ==0:
        print(f'step {i}, loss {loss:.3f}')

    #update model and weights 
    loss.backward()
    optimizer.step()

In [None]:
#process the whole image with the trained model

with torch.no_grad():
    prediction = cppnt(X_torch_tensor)

In [None]:
#configuring and displaying the image created from training

#reshape the image from 1D to 2D
reconstructed_img_cppnt = np.reshape(prediction.cpu(), (64, 64))

#scale the values from [0,1] to [0,255]
reconstructed_img_cppnt *= 255

#convert the tensor into a nunmpy array, and cast the type into the uint8
reconstructed_img_cppnt = reconstructed_img_cppnt.numpy().astype(np.uint8)

#look at our creation next to the original! 
fig, axes_array = plt.subplots(1, 3, figsize= (20,10))
axes_array[0].imshow(img, cmap='gray')
axes_array[1].imshow(smaller_img, cmap='gray')
axes_array[2].imshow(reconstructed_img_cppnt, cmap='gray')
plt.show()

Our Tanhshrink functino appears to have not recognized any light values at all in our original image. This is quite interesting, when looking at the graph produced by the tanhshrink function (available here: https://pytorch.org/docs/stable/generated/torch.nn.Tanhshrink.html) This graph tells us that for every increasing x input, y will also increase (the same is true for negative values). This is quite different from our softsign and sigmoid functions which approach boundaries at -1 and 1 or 0 and 1, respectively. What the produced image tells us, however, is that the tanhshrink activation function (and even suggests any activation function based on tangent) is not a good approach to designing our model.  

<font size = "5"> Section 4.2 - Experimenting with a new optimizer.

<font size = "4"> Adam Optimizer

In [None]:
#setting the Adam optimizer
#as I am using the same network structure as Section 1, not everything is copied to this section to avoid redundancy, only what needs to be changed is copied here and changed

adam_optimizer = torch.optim.Adam(cppn.parameters(), lr=0/0o1)

#training the model 
coord_indexes = list(range(0, num_coords))
for i in range(int((num_epochs * num_coords) / batch_size)):
    adam_optimizer.zero_grad()
    cppn.zero_grad()

    #get batch of data
    batch_indexes = torch.tensor(np.array(random.sample(coord_indexes, batch_size)))
    input_batch = X_torch_tensor[batch_indexes]
    true_pixel_values = y_torch_tensor[batch_indexes]

    #the approx pixel values the model predicts
    approx_pixel_values = cppn(input_batch)

    #calculate loss function
    loss = criterion(true_pixel_values, approx_pixel_values)

    if i % 1000 ==0:
        print(f'step {i}, loss {loss:.3f}')

    #update model and weights 
    loss.backward()
    adam_optimizer.step()

In [None]:
#process the whole image with the trained model

with torch.no_grad():
    prediction_adam = cppn(X_torch_tensor)

In [None]:
#reshape the image from 1D to 2D
reconstructed_img_adam = np.reshape(prediction_adam.cpu(), (64, 64))

#scale the values from [0,1] to [0,255]
reconstructed_img_adam *= 255

#convert the tensor into a nunmpy array, and cast the type into the uint8
reconstructed_img_adam = reconstructed_img_adam.numpy().astype(np.uint8)

#look at our creation next to the original! 
fig, axes_array = plt.subplots(1, 3, figsize= (20,10))
axes_array[0].imshow(img, cmap='gray')
axes_array[1].imshow(smaller_img, cmap='gray')
axes_array[2].imshow(reconstructed_img_adam, cmap='gray')
plt.show()

The Adam optimizater has produced similar results to the SGD optimization function, only differing in that the black values extend in to the higher y values. When looking at the values in which the darkest values appear, it more closely matches the values on the original image than the image produced using the SGD optimizer. This suggests the Adam optimizer may be a better option for the model. 

<font size = "4"> Adadelta Optimizer

In [None]:
#setting the adadelta optimizer

adadelta_optimizer = torch.optim.Adadelta(cppn.parameters(), lr=1.0)

#training the model 
coord_indexes = list(range(0, num_coords))
for i in range(int((num_epochs * num_coords) / batch_size)):
    adadelta_optimizer.zero_grad()
    cppn.zero_grad()

    #get batch of data
    batch_indexes = torch.tensor(np.array(random.sample(coord_indexes, batch_size)))
    input_batch = X_torch_tensor[batch_indexes]
    true_pixel_values = y_torch_tensor[batch_indexes]

    #the approx pixel values the model predicts
    approx_pixel_values = cppn(input_batch)

    #calculate loss function
    loss = criterion(true_pixel_values, approx_pixel_values)

    if i % 1000 ==0:
        print(f'step {i}, loss {loss:.3f}')

    #update model and weights 
    loss.backward()
    adadelta_optimizer.step()

In [None]:
#processing the image

with torch.no_grad():
    prediction_ada = cppn(X_torch_tensor)

In [None]:
#configuring and displaying the image

#reshape the image from 1D to 2D
reconstructed_img_ada = np.reshape(prediction_ada.cpu(), (64, 64))

#scale the values from [0,1] to [0,255]
reconstructed_img_ada *= 255

#convert the tensor into a nunmpy array, and cast the type into the uint8
reconstructed_img_ada = reconstructed_img_ada.numpy().astype(np.uint8)

#look at our creation next to the original! 
fig, axes_array = plt.subplots(1, 3, figsize= (20,10))
axes_array[0].imshow(img, cmap='gray')
axes_array[1].imshow(smaller_img, cmap='gray')
axes_array[2].imshow(reconstructed_img_ada, cmap='gray')
plt.show()

The Adadelta optimizer appears to have given us the best results of the three: Adadelta, Adam, and SGD. In the image produced using the Adadelta optimizer, we have more clearly defined shapes, most easily viewed by the two black triangular shaped corners reaching up to 10 on the y axis, and the other reaching out to 40 on the x axis. These points on the produced image correspond to where the head and arm appear on the smaller and original image. Though the Adam and SGD optimizers gave us similar shapes in this region, neither are as clearly defined as the Adadelta, which suggests the Adadelta may be that much better to use in our model than the Adam or SGD optimizers. 

<font size= "5" > Section 5 - In a paragraph or so, describe how the image we have created differs from the original image. 

The image we have generated across the various tests above differs from our original image in a few ways. One, the image is a recreation of our original image which trained on the color values in our image pixels and then recreated the image based on where it remembers those color values being within the pixelated grid. Second, it has far less pixels in the original image, as we actually trained the network on an image that is 64 x 64 pixels, or the smaller_img saved in various places. Since our produced image was trained on and produces and image with far less pixels than our original image, it is to be expected that the image produced from our network will have far fewer defining characteristics as the original image. These characteristics are everything from the camera stand to facial features. Third, one could argue that the produced image bears no actual connection with the original image. Yes, it was produced by a network trained on the original image. Yes, it bears a resemblance in color delineation on the image (such as the black shape resembling the figure of the man in the original image), but the produced image is a separate entity produced by the computer. The pixels within this produced image are not those within the original image. Thus, it could be classified as a separate entity from the original image, rather than a reproduction of the original image. 