# Homework #3
## Introduction to deep learning


This colaboratory contains Homework #3 which is due **October 20 midnight (23:59 EET time)**. To complete the homework, extract **(File -> Download .ipynb)** and submit to the course webpage.

**NB! Links to your colaboratory will not be accepted as a solution!**
## Submission's rules:

1.   Please, submit only .ipynb that you extract from the Colaboratory.
2. Run your homework exercises before submitting (output should be present, preferably restart the kernel and press run all the cells).
3. Do not change the description of tasks in red (even if there is a typo|mistake|etc).
4. Please, make sure to avoid unnecessary long printouts.
5. Each task should be solved right under the question of the task and not elsewhere.
6. Solutions to both regular and bonus exercises should be submitted in one IPYNB file.


##List of Homework's exercises:

1.   [Ex1](#scrollTo=4YtaQwccjrAL) - 4 points
2.   [Ex2](#scrollTo=tOfgGIUtIizt) - 4 points
3.   [Ex3](#scrollTo=rt6Fuo28nQkd) - 2 points
4.   [Bonus 1](#scrollTo=wT-4aQqUtDU7) - 2 points
5.   [Bonus 2](#scrollTo=lEW4oyQhnRQA) - 2 points


In [None]:
# A bit of setup
import numpy as np
import matplotlib.pyplot as plt

Here we will define few functions that will help us visualise classifiers that we are going to build in this class. Don't worry if you don't understand this code completely.

In [None]:
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

# for very shallow models
def plot_classifier(X, y, W, b):
  st = 0.02
  x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
  y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
  xx, yy = np.meshgrid(np.arange(x_min, x_max, st),
                       np.arange(y_min, y_max, st))
  Z = np.dot(np.c_[xx.ravel(), yy.ravel()], W) + b
  Z = np.argmax(Z, axis=1)
  Z = Z.reshape(xx.shape)
  fig = plt.figure()
  plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)
  plt.scatter(X[:, 0], X[:, 1], c=y, s=120, edgecolors = 'white', cmap=plt.cm.Spectral)
  plt.xlim(xx.min(), xx.max())
  plt.ylim(yy.min(), yy.max())

# for two-layer network
def plot_neural_network(X, y, W,b ,W2, b2):
  st = 0.02
  x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
  y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
  xx, yy = np.meshgrid(np.arange(x_min, x_max, st),
                       np.arange(y_min, y_max, st))
  Z = np.dot(np.maximum(0, np.dot(np.c_[xx.ravel(), yy.ravel()], W) + b), W2) + b2
  Z = np.argmax(Z, axis=1)
  Z = Z.reshape(xx.shape)
  fig = plt.figure()
  plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)
  plt.scatter(X[:, 0], X[:, 1], c=y, s=120, edgecolors = 'white', cmap=plt.cm.Spectral)
  plt.xlim(xx.min(), xx.max())
  plt.ylim(yy.min(), yy.max())

In [None]:
N = 100 # number of points per class
D = 2 # number of features (dimensions)
K = 2 # number of classes (purple and red circles)
X = np.zeros((N*K,D)) # data matrix (each row = single example)
num_examples = X.shape[0]
y = np.zeros((N*K, 1), dtype='int') # class labels

In [None]:
# Creating spiral data points
np.random.seed(1111)

for j in range(K):
  ix = range(N*j,N*(j+1))
  r = np.linspace(0.0,1,N) # radius
  t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta
  X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
  y[ix, 0] = j

In [None]:
plt.scatter(X[:, 0], X[:, 1], c=y, s=120, edgecolors = 'white', cmap=plt.cm.Spectral)
plt.show()

For our implementation we would need to transform the vector of correct labels `y` into one hot encoded matrix, let's call it `truth`. Сreating `truth` as one-hot encoded labels (`y`)

In [None]:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
truth = enc.fit_transform(y).toarray()

# first column is for red
# second is for purple class

## Homework exercise 1 (4 points): three-layer network
 <font color='red'>In the class we have obtained ~94% using 2-layer neural network when classifying points in spiral. This is surely not good enough for us. Hence, let's make a few tweaks in attempt to reach higher performance.</font>

<font color='red'>**PS: Before you start working on this task, please read it in its entirety!**</font>

<font color='red'> **(a)** Based on the code we have written in the practice session, build a new 3-layer neural network, consiting two hidden layers (each of size `h`) and one output layer. Using ChatGPT/Gemini will likely result in architectures which are substantially different from what we intended, so try to follow the code provided in the practice session. Use `tanh` as an activation function for the hidden layers. Compute both feedforward and backpropagation paths to update the model weights and produce predictions (you might want to check out [this post](https://medium.com/@Coursesteach/deep-learning-part-25-derivatives-of-activation-functions-4bbd7c7c7a1c) about the derivatives of the most popular activation functions). Answer a question at the end of this subtask. **(1.5 points)**. </font>

In [None]:
np.random.seed(111)
num_examples = X.shape[0]

# Initialize parameters randomly
h = 100  # Size of hidden layer
W = np.random.randn(D,h)
b = np.zeros((1,h))
W2 = np.random.randn(h,h)
b2 = np.zeros((1,h))
W3 = np.random.randn(h,K)
b3 = np.zeros((1,K))

# Some hyperparameters
step_size = 1e-0

# Gradient descent loop
##### YOUR CODE STARTS #####
for i in range(2000):

  # Forward path
  ...

  # Compute the error
  ...

  if i % 100 == 0:
    print("iteration %d: loss %e" % (i, total_error))

  # Compute the gradient on answers
  ...

  # Backpropagate the gradient to the parameters
  ...

  # Perform a parameter update
  ...

  ##### YOUR CODE ENDS #####

<font color='red'> If you have implemented everything correctly, you should have encountered a problem :)
Below name the problem and describe why it happens: </font>

<font color='red'> Your answer: </font>

<font color='red'> **(b)** Fix the issue you have encountered above using one of the ideas that we have discussed in the lecture. Please, insert and run the updated code in the cell below. **(1.5 points)**. </font>

In [None]:
##### YOUR CODE STARTS #####
...
##### YOUR CODE ENDS #####

<font color='red'> If it worked fine, evaluate your model running the code below. </font>

In [None]:
#evaluate training set accuracy
hidden_layer = np.tanh(np.dot(X, W) + b)
hidden_layer_2 = np.tanh(np.dot(hidden_layer, W2) + b2) # NB, tanh activation
answers = np.dot(hidden_layer_2, W3) + b3 # Note, no activation function for the last layer!
predicted_class = np.argmax(answers, axis=1)
print('training accuracy: %.2f' % (np.mean(predicted_class == y[:,0])))

<font color='red'> You should get about 98% of accuracy or more... </font>



<font color='red'> **(c)** Update function `plot_deep_neural_network` by altering the code of `plot_neural_network` to visualise obtained 3-layer network with `tanh` activation function. Visualise obtained decision boundary. How did it change comparing to the one we have observed by 2-layer models? Do you think you would be able to easily get 100% for this data? **(1 point)** </font>



In [None]:
def plot_deep_neural_network(X, y, W, b ,W2, b2, W3, b3):
  h = 0.02
  x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
  y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
  xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                       np.arange(y_min, y_max, h))

  ##### YOUR CODE STARTS #####
  Z = ...
  ##### YOUR CODE ENDS #####

  Z = np.argmax(Z, axis=1)
  Z = Z.reshape(xx.shape)
  fig = plt.figure()
  plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)
  plt.scatter(X[:, 0], X[:, 1], c=y, s=120, edgecolors = 'white', cmap=plt.cm.Spectral)
  plt.xlim(xx.min(), xx.max())
  plt.ylim(yy.min(), yy.max())

In [None]:
# plot the resulting classifier
plot_deep_neural_network(X, y, W, b, W2, b2, W3, b3)

<font color='red'> Answer to **(c)**: </font>



## Homework exercise 2 (4 points): balancing model complexity and performance
<font color='red'> In this exercise, you’re going to explore how we can balance model size and performance in neural networks. The goal of this exercise is to help you understand the relationship between model complexity, resource usage, and performance.  </font>



In [None]:
# A bit of setup again
from tensorflow.keras.datasets import cifar10

from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Input, Conv2D, Activation, Flatten, Dense, MaxPooling2D, BatchNormalization, Dropout

# Keras comes with built-in loaders for common datasets
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# shorten dataset for quicker training
X_train = X_train[:50000]
y_train = y_train[:50000]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

In [None]:
mu = X_train.mean(axis=(0,1,2)) # finds mean of R, G and B separately
std = X_train.std(axis=(0,1,2)) # same for std
X_train_norm = (X_train - mu)/std
X_test_norm = (X_test - mu)/std


<font color='red'> **(a)** Take a CNN model created for CIFAR-10 from the practice session and copy it below.

<font color='red'>Compute and report the following properties:
<font color='red'>
<ol>
  <li>Number of model parameters</li>
  <li>Memory used by the model parameters
  <li>Training and inference times</li>
  <li>Accuracy on the training and test set</li>
</ol>
</font>

<font color='red'> Additionally, visualize the learning curves (loss and accuracy over epochs). (**0.5 points**)

</font>

In [None]:
##### YOUR CODE STARTS #####
# you might want to use several cells
##### YOUR CODE ENDS #####

In [None]:
# Visualize the learning curves
##### YOUR CODE STARTS #####
...
##### YOUR CODE ENDS #####

In [None]:
# Report the properties
###### YOUR CODE STARTS #####
print(f"Number of parameters: ...")
print(f"Memory usage for the model parameters: ... ")
print(f"Training time: ... ")
print(f"Inference time: ... ")
print(f"Accuracy on the training set: ...")
print(f"Accuracy on the test set: ...")
##### YOUR CODE ENDS #####

<font color='red'> **(b)** Now, create a new model with approximately twice the number of parameters as the previous one, aiming for higher accuracy. You might consider adding layers, or tweaking the existing ones. Report the same properties as in **(a)**. Do not forget to visualize the learning curves. (**1.5 points**) </font>

In [None]:
##### YOUR CODE STARTS #####
# you might want to use several cells
##### YOUR CODE ENDS #####

In [None]:
# Visualize the learning curves
##### YOUR CODE STARTS #####
...
##### YOUR CODE ENDS #####

In [None]:
# Report the properties
###### YOUR CODE STARTS #####
print(f"Number of parameters: ...")
print(f"Memory usage for the model parameters: ... ")
print(f"Training time: ... ")
print(f"Inference time: ... ")
print(f"Accuracy on the training set: ...")
print(f"Accuracy on the test set: ...")
##### YOUR CODE ENDS #####

<font color='red'> **(c)** Finally, create a third model with approximately the same number of parameters as the initial baseline model, but aim for accuracy that matches or even exceeds the second (larger) model. Report the same properties and visualize the learning curves for this model. (**1.5 points**)

HINT: To get higher accuracy, focus on applying techniques you have learned from the lecture and practice sessions.
</font>

In [None]:
##### YOUR CODE STARTS #####
# you might want to use several cells
##### YOUR CODE ENDS #####

In [None]:
# Visualize the learning curves
##### YOUR CODE STARTS #####
...
##### YOUR CODE ENDS #####

In [None]:
# Report the properties
###### YOUR CODE STARTS #####
print(f"Number of parameters: ...")
print(f"Memory usage for the model parameters: ... ")
print(f"Training time: ... ")
print(f"Inference time: ... ")
print(f"Accuracy on the training set: ...")
print(f"Accuracy on the test set: ...")
##### YOUR CODE ENDS #####

<font color='red'> **(d)** Summarise all key properties (such as number of parameters and memory usage) of all three models in a single neat table! Also, additionally add a figure with all three pairs of learning curves (perhaps, using a different panel for each pair of curves). Explain how changes in the architecture affected the behaviour of all three models during training and inference. Which model would you choose for real-world use? Feel free to discuss different use cases. (**0.5 points**)
</font>

<font color='red'>  Answer to **(d)**: </font>

## Homework exercise 3 (2 points): Create your own dataset and build a CNN model using fast.ai API
<font color='red'> In this exercise, you have a chance to test if CNN can distinguish between images of your favourite objects. </font>

In [None]:
!pip install fastai==1.0.61

In [None]:
from fastai.vision import *
from fastai.metrics import error_rate

from pathlib import Path
from PIL import Image

import warnings
warnings.filterwarnings('ignore')

from urllib.request import urlopen

<font color='red'> **(a)** Create your own dataset with two or more classes using the same approach we used in the class. But this time choose classes yourself. **(1 point)** </font>

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
##### YOUR CODE STARTS #####
...
##### YOUR CODE ENDS #####

In [None]:
# add more cells as necessary

<font color='red'> **(b)** train a neural network on images you have acquired **(0.5 points)** </font>

In [None]:
##### YOUR CODE STARTS #####
learn = ...
##### YOUR CODE ENDS #####

Plot the confusion matrix to make sure that you model has learned something:

In [None]:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

<font color='red'> **(c)** Test your model on one or more images from the internet that represent classes you have chosen, but unlikely to be in the training data (you can change your search query). Print out class probabilities for one of these test images. **(0.5 points)** </font>

In [None]:
##### YOUR CODE STARTS #####
img = ...
img
##### YOUR CODE ENDS #####

In [None]:
# What are the probabilities of different classes for this image?
##### YOUR CODE STARTS #####
...
##### YOUR CODE ENDS #####

# Bonus exercises
*(NB, these are optional exercises!)*

## Bonus exercise 1 (2 bonus points):

<font color='red'> [Stable Diffusion](https://stability.ai/blog/stable-diffusion-announcement) model has been recently shown to produce trully impressive results in image generation. Let's explore some of its power in this auxiliarly exercise. Use Stable Diffusion model from this [colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb?hl=en). Generate several tricky images to test the model you trained in the exercise 3. Report the images that you have generated with Stable Diffusion, prompt that you have used for generating them, the number of iterations that you used to generate images and the classification results from your CNN model. Shortly summarise the results you have obtained. </font>

## Bonus exercise 2 (2 bonus points):

<font color='red'> [Pytorch](https://pytorch.org/) is another widely used and deeply loved deep learning library. In this task, you will get a chance to try it out. Firstly, re-implement one of the CNNs you were working with in EX 2 (or you can try a different architecture) in Pytorch. Train and validate the model on the dataset you collected in EX3. Secondly, use a ResNet model from [torchhub](https://pytorch.org/docs/stable/hub.html) and train it on the same data. Compare the results of these two models and the one from fastai library.

# Comments (optional feedback to the course instructors)
Here, please, leave your comments regarding the homework, possibly answering the following questions:
* how much time did you spend on this homework?
* was it too hard/easy for you?
* what would you suggest to add or remove?
* anything else you would like to tell us

Your comments:

# <font color='red'>  End of the homework. Please don't delete this cell.</font>