# Exercise Sheet 6

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import decomposition
import tensorflow as tf
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *
from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import TensorBoard

Some general information on the `Model()` syntax can be found [here](https://keras.io/getting-started/functional-api-guide/).

## 1 U-Net

Re-implementing networks which are discussed in the literature is a vital skill. Here you re-build the architecture from [arXiv:1505.04597](https://arxiv.org/abs/1505.04597). A figure of the network is shown in Figure 1 of this paper.  
You can check your results again via `model.compile()` and `model.summary()`.
<img src="net.png" style="width: 600px;"/>

### Solution

In [None]:
# declare imput size
inputs = Input((572,572,1))

# define weight initializations according to what is suggested in the paper below eq. (2)
init = tf.keras.initializers.VarianceScaling(scale=2.0, mode='fan_in', distribution='normal', seed=None)

# first box of convolutional 3x3 layers with cropping to copy the layer later on and max pooling
# later layers basically the same
Contract1conv1 = Conv2D(64, (3, 3), activation='relu', kernel_initializer=init) (inputs)
Contract1conv2 = Conv2D(64, (3, 3), activation='relu', kernel_initializer=init) (Contract1conv1)
Contract1crop = Cropping2D(cropping=((88, 88), (88, 88)))(Contract1conv2)
Contract1pool = MaxPooling2D((2, 2)) (Contract1conv2)

Contract2conv1 = Conv2D(128, (3, 3), activation='relu', kernel_initializer=init) (Contract1pool)
Contract2conv2 = Conv2D(128, (3, 3), activation='relu', kernel_initializer=init) (Contract2conv1)
Contract2crop = Cropping2D(cropping=((40, 40), (40, 40)))(Contract2conv2)
Contract2pool = MaxPooling2D((2, 2)) (Contract2conv2)

Contract3conv1 = Conv2D(256, (3, 3), activation='relu', kernel_initializer=init) (Contract2pool)
Contract3conv2 = Conv2D(256, (3, 3), activation='relu', kernel_initializer=init) (Contract3conv1)
Contract3crop = Cropping2D(cropping=((16, 16), (16, 16)))(Contract3conv2)
Contract3pool = MaxPooling2D((2, 2)) (Contract3conv2)

Contract4conv1 = Conv2D(512, (3, 3), activation='relu', kernel_initializer=init) (Contract3pool)
Contract4conv2 = Conv2D(512, (3, 3), activation='relu', kernel_initializer=init) (Contract4conv1)
Contract4crop = Cropping2D(cropping=((4, 4), (4, 4)))(Contract4conv2)
Contract4pool = MaxPooling2D(pool_size=(2, 2)) (Contract4conv2)

Contract5conv1 = Conv2D(1024, (3, 3), activation='relu', kernel_initializer=init) (Contract4pool)
Contract5conv2 = Conv2D(1024, (3, 3), activation='relu', kernel_initializer=init) (Contract5conv1)

# transpose convolution and concatenate with contracting path
# followed by convolutions
Expand4convtrans = Conv2DTranspose(512, (2, 2), strides=(2, 2), padding='same', kernel_initializer=init) (Contract5conv2)
Expand4concat = concatenate([Expand4convtrans, Contract4crop])
Expand4conv1 = Conv2D(512, (3, 3), activation='relu', kernel_initializer=init) (Expand4concat)
Expand4conv2 = Conv2D(512, (3, 3), activation='relu', kernel_initializer=init) (Expand4conv1)

Expand3convtrans = Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same', kernel_initializer=init) (Expand4conv2)
Expand3concat = concatenate([Expand3convtrans, Contract3crop])
Expand3conv1 = Conv2D(256, (3, 3), activation='relu', kernel_initializer=init) (Expand3concat)
Expand3conv2 = Conv2D(256, (3, 3), activation='relu', kernel_initializer=init) (Expand3conv1)

Expand2convtrans = Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same', kernel_initializer=init) (Expand3conv2)
Expand2concat = concatenate([Expand2convtrans, Contract2crop])
Expand2conv1 = Conv2D(128, (3, 3), activation='relu', kernel_initializer=init) (Expand2concat)
Expand2conv2 = Conv2D(128, (3, 3), activation='relu', kernel_initializer=init) (Expand2conv1)

Expand1convtrans = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same', kernel_initializer=init) (Expand2conv2)
Expand1concat = concatenate([Expand1convtrans, Contract1crop], axis=3)
Expand1conv1 = Conv2D(64, (3, 3), activation='relu', kernel_initializer=init) (Expand1concat)
Expand1conv2 = Conv2D(64, (3, 3), activation='relu', kernel_initializer=init) (Expand1conv1)

# sigmoid = softmax for two classes
outputs = Conv2D(2, (1, 1), activation='sigmoid', kernel_initializer=init) (Expand1conv2)

unet = Model(inputs=inputs, outputs=outputs)
unet.compile(optimizer='adam', loss='binary_crossentropy', metrics=["accuracy"])
unet.summary()

For a neat visualization, we use tensorboard:

In [None]:
# get session graph
graph = K.get_session().graph

# write to file
tb_path = "logs_unet/"
writer = tf.summary.FileWriter(logdir=tb_path, graph=graph)
K.clear_session()

In [None]:
# run tensorboard in shell -> interrupt kernel to stop
# you can click the link to see the graph of the network we built
!tensorboard --logdir=logs_unet --port=6006

If the above link does not work for some reason, you should find tensorboard at [localhost:6006](http://localhost:6006)

## 2 AlexNet

Again, the aim of this exercise is to build a network. In this exercise you should implement the network which is discussed in: [ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (Krizhevsky, Sutskever, Hinton). The network architecture is summarised in Figure 2 of that paper and more detailed descriptions are found in the text.  
You only need to implement the architecture and check that your network is consistent. Note, that you can check your results by  
a) checking your model is compiling in Keras and  
b) by comparing your `model.summary()` with the desired dimensions.

### Solution

The parallel structure of AlexNet is due the fact that it was designed to run on two GPUs. If you don't have access to a GPU, you can use [Google Colab](https://colab.research.google.com). There you can choose to run on a backend with a GPU or TPU. In the Menu, click "Runtime" and then "Change runtime type".  
Unfortunately we only have one GPU available there.

##### Simplified version with Batch Normalization

In [None]:
# Input layer
inputs = Input(shape=(224,224,3))

# 1st convolution block
conv1_up = Conv2D(48, kernel_size=11, strides=4, padding='same')(inputs)
batchnorm1_up = BatchNormalization(axis=-1)(conv1_up)
act1_up = Activation('relu')(batchnorm1_up)

conv1_down = Conv2D(48, kernel_size=11, strides=4, padding='same')(inputs)
batchnorm1_down = BatchNormalization(axis=-1)(conv1_down)
act1_down = Activation('relu')(batchnorm1_down)

maxpool1_up = MaxPooling2D(pool_size=3, strides=2)(act1_up)
maxpool1_down = MaxPooling2D(pool_size=3, strides=2)(act1_down)

# 2nd convolution block
conv2_up = Conv2D(128, kernel_size=5, strides=1, padding='same')(maxpool1_up)
batchnorm2_up = BatchNormalization(axis=-1)(conv2_up)
act2_up = Activation('relu')(batchnorm2_up)

conv2_down = Conv2D(128, kernel_size=5, strides=1, padding='same')(maxpool1_down)
batchnorm2_down = BatchNormalization(axis=-1)(conv2_down)
act2_down = Activation('relu')(batchnorm2_down)

maxpool2_up = MaxPooling2D(pool_size=3, strides=2)(act2_up)
maxpool2_down = MaxPooling2D(pool_size=3, strides=2)(act2_down)

# 3rd convolution block
merge3 = concatenate([maxpool2_up, maxpool2_down])
conv3_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)
conv3_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)

# 4th convolution block
conv4_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_up)
conv4_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_down)

# 5th convolution block
conv5_up = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_up)
conv5_down = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_down)
maxpool5_up = MaxPooling2D(pool_size=3, strides=2)(conv5_up)
maxpool5_down = MaxPooling2D(pool_size=3, strides=2)(conv5_down)

# Dense Layers 1st block (use dropout)
merge_dense1 = concatenate([maxpool5_up, maxpool5_down])
flatten_dense1 = Flatten()(merge_dense1)
dense1_up = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_up = Dropout(rate=0.5)(dense1_up)
dense1_down = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_down = Dropout(rate=0.5)(dense1_down)

# Dense Layers 2nd block (use dropout)
merge_dense2 = concatenate([dense1_dropout_up, dense1_dropout_down])
flatten_dense2 = Flatten()(merge_dense2)
dense2_up = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_up = Dropout(rate=0.5)(dense2_up)
dense2_down = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_down = Dropout(rate=0.5)(dense2_down)

# Softmax
merge_dense3 = concatenate([dense2_dropout_up, dense2_dropout_down])
flatten_dense3 = Flatten()(merge_dense3)
output = Dense(1000, activation='softmax')(flatten_dense3)

# Model
alex_net = Model(inputs=inputs, outputs=output)

# summarize layers
alex_net.compile(loss="binary_crossentropy", optimizer='adam')
print(alex_net.summary())

As you can see, the output dimension after the first convolution does not match with the one from the paper. I have found several references that state that the $224$ is a typo in the paper and should be $227$. I am not sure if this is true, as the authors quote the number at so many places in the paper. A quick fix that gives the right output dimensions is to change the padding to valid in the first layer and add a symmetric padding of 2. We cannot know for sure what the authors did without their pre-processed data or code.

In [None]:
# Input layer
inputs = Input(shape=(224,224,3))

# 1st convolution block
pad1_up = ZeroPadding2D(2)(inputs) # fix for output dimensions
conv1_up = Conv2D(48, kernel_size=11, strides=4, padding='valid')(pad1_up)
batchnorm1_up = BatchNormalization(axis=-1)(conv1_up)
act1_up = Activation('relu')(batchnorm1_up)

pad1_down = ZeroPadding2D(2)(inputs) # fix for output dimensions
conv1_down = Conv2D(48, kernel_size=11, strides=4, padding='valid')(pad1_down)
batchnorm1_down = BatchNormalization(axis=-1)(conv1_down)
act1_down = Activation('relu')(batchnorm1_down)

maxpool1_up = MaxPooling2D(pool_size=3, strides=2)(act1_up)
maxpool1_down = MaxPooling2D(pool_size=3, strides=2)(act1_down)

# 2nd convolution block
conv2_up = Conv2D(128, kernel_size=5, strides=1, padding='same')(maxpool1_up)
batchnorm2_up = BatchNormalization(axis=-1)(conv2_up)
act2_up = Activation('relu')(batchnorm2_up)

conv2_down = Conv2D(128, kernel_size=5, strides=1, padding='same')(maxpool1_down)
batchnorm2_down = BatchNormalization(axis=-1)(conv2_down)
act2_down = Activation('relu')(batchnorm2_down)

maxpool2_up = MaxPooling2D(pool_size=3, strides=2)(act2_up)
maxpool2_down = MaxPooling2D(pool_size=3, strides=2)(act2_down)

# 3rd convolution block
merge3 = concatenate([maxpool2_up, maxpool2_down])
conv3_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)
conv3_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)

# 4th convolution block
conv4_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_up)
conv4_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_down)

# 5th convolution block
conv5_up = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_up)
conv5_down = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_down)
maxpool5_up = MaxPooling2D(pool_size=3, strides=2)(conv5_up)
maxpool5_down = MaxPooling2D(pool_size=3, strides=2)(conv5_down)

# Dense Layers 1st block (use dropout)
merge_dense1 = concatenate([maxpool5_up, maxpool5_down])
flatten_dense1 = Flatten()(merge_dense1)
dense1_up = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_up = Dropout(rate=0.5)(dense1_up)
dense1_down = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_down = Dropout(rate=0.5)(dense1_down)

# Dense Layers 2nd block (use dropout)
merge_dense2 = concatenate([dense1_dropout_up, dense1_dropout_down])
flatten_dense2 = Flatten()(merge_dense2)
dense2_up = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_up = Dropout(rate=0.5)(dense2_up)
dense2_down = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_down = Dropout(rate=0.5)(dense2_down)

# Softmax
merge_dense3 = concatenate([dense2_dropout_up, dense2_dropout_down])
flatten_dense3 = Flatten()(merge_dense3)
output = Dense(1000, activation='softmax')(flatten_dense3)

# Model
alex_net = Model(inputs=inputs, outputs=output)

# summarize layers
alex_net.compile(loss="binary_crossentropy", optimizer='adam')
print(alex_net.summary())

##### Version with Local Response Normalization (LRN)

First we have to implement the LRN (see e.g. [here](https://resources.oreilly.com/examples/9781787128422/blob/0e1be827d0179cc535da74957866ed87a4ea0224/DeepLearningwithKeras_Code/Chapter07/tf-keras-func.py))

In [None]:
from tensorflow.python.keras.layers import Layer, InputSpec

class LocalResponseNormalization(Layer):

    def __init__(self, n=5, alpha=0.0001, beta=0.75, k=2, **kwargs):
        self.n = n
        self.alpha = alpha
        self.beta = beta
        self.k = k
        super(LocalResponseNormalization, self).__init__(**kwargs)

    def build(self, input_shape):
        self.shape = input_shape
        super(LocalResponseNormalization, self).build(input_shape)

    def call(self, x, mask=None):
        if K.image_data_format() == "th":
            _, f, r, c = self.shape
        else:
            _, r, c, f = self.shape
        squared = K.square(x)
        pooled = K.pool2d(squared, (self.n, self.n), strides=(1, 1),
        padding="same", pool_mode="avg")
        if K.image_data_format() == "th":
            summed = K.sum(pooled, axis=1, keepdims=True)
            averaged = self.alpha * K.repeat_elements(summed, f, axis=1)
        else:
            summed = K.sum(pooled, axis=3, keepdims=True)
            averaged = self.alpha * K.repeat_elements(summed, f, axis=3)
        denom = K.pow(self.k + averaged, self.beta)
        return x / denom

    def get_output_shape_for(self, input_shape):
        return input_shape

In [None]:
# Input layer
inputs = Input(shape=(224,224,3))

# 1st convolution block
pad1_up = ZeroPadding2D(2)(inputs)
conv1_up = Conv2D(48, kernel_size=11, strides=4, padding='valid', activation='relu')(pad1_up)
LRN1_up = LocalResponseNormalization()(conv1_up)

pad1_down = ZeroPadding2D(2)(inputs)
conv1_down = Conv2D(48, kernel_size=11, strides=4, padding='valid', activation='relu')(pad1_down)
LRN1_down = LocalResponseNormalization()(conv1_down)

maxpool1_up = MaxPooling2D(pool_size=3, strides=2)(LRN1_up)
maxpool1_down = MaxPooling2D(pool_size=3, strides=2)(LRN1_down)

# 2nd convolution block
conv2_up = Conv2D(128, kernel_size=5, strides=1, padding='same', activation='relu')(maxpool1_up)
LRN2_up = LocalResponseNormalization()(conv2_up)

conv2_down = Conv2D(128, kernel_size=5, strides=1, padding='same', activation='relu')(maxpool1_down)
LRN2_down = LocalResponseNormalization()(conv2_down)

maxpool2_up = MaxPooling2D(pool_size=3, strides=2)(LRN2_up)
maxpool2_down = MaxPooling2D(pool_size=3, strides=2)(LRN2_down)

# 3rd convolution block
merge3 = concatenate([maxpool2_up, maxpool2_down])
conv3_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)
conv3_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)

# 4th convolution block
conv4_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_up)
conv4_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_down)

# 5th convolution block
conv5_up = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_up)
conv5_down = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_down)
maxpool5_up = MaxPooling2D(pool_size=3, strides=2)(conv5_up)
maxpool5_down = MaxPooling2D(pool_size=3, strides=2)(conv5_down)

# Dense Layers 1st block (use dropout)
merge_dense1 = concatenate([maxpool5_up, maxpool5_down])
flatten_dense1 = Flatten()(merge_dense1)
dense1_up = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_up = Dropout(rate=0.5)(dense1_up)
dense1_down = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_down = Dropout(rate=0.5)(dense1_down)

# Dense Layers 2nd block (use dropout)
merge_dense2 = concatenate([dense1_dropout_up, dense1_dropout_down])
flatten_dense2 = Flatten()(merge_dense2)
dense2_up = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_up = Dropout(rate=0.5)(dense2_up)
dense2_down = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_down = Dropout(rate=0.5)(dense2_down)

# Softmax
merge_dense3 = concatenate([dense2_dropout_up, dense2_dropout_down])
flatten_dense3 = Flatten()(merge_dense3)
output = Dense(1000, activation='softmax')(flatten_dense3)

# Model
alex_net = Model(inputs=inputs, outputs=output)

# summarize layers
alex_net.compile(loss="binary_crossentropy", optimizer='adam')
print(alex_net.summary())

In [None]:
# get graph
graph = K.get_session().graph

# write to files
tb_path = "logs_alexnet/"
writer = tf.summary.FileWriter(logdir=tb_path, graph=graph)
K.clear_session()

In [None]:
# run tensorboard in shell -> interrupt kernel to stop
# you can click the link to see the graph of the network we built
!tensorboard --logdir=logs_alexnet

If the above link does not work for some reason, you should find tensorboard at [localhost:6006](http://localhost:6006)

## 3 PCA


In the lecture we have discussed how to obtain the first principal component $d$ as the eigenvector corresponding to the largest eigenvalue of $X^T X$. Show, using induction, that in general the matrix $D$ is given by the $l$ eigenvectors corresponding to the largest eigenvalues.  
Generate a numerical example to compare your data transformation with the transformation given by the implementation of PCA in sklearn. For instance, you can use an example based on a 2D-Gaussian as presented in the lectures.

### Solution

The proof can be found for example [here](https://people.eecs.berkeley.edu/~satishr/cs270/sp11/rough-notes/PCA.pdf).

In [None]:
# Generate some 2D data:
mean = [0,0]
cov = [[2,3],[3,5]]
x, y = np.random.multivariate_normal(mean, cov, 500).T  

X=np.array([x, y]).T
print(X.shape)

plt.scatter(x, y, s=2, color='gray')
_ = plt.axis('equal')

In [None]:
# Eigendecomposition
Sigma = np.dot(X.T,X)
eigenvalues, eigenvectors = np.linalg.eig(Sigma)
print(eigenvalues)
eigenvalues = eigenvalues[::-1]
sqrteigenvalues = np.sqrt(eigenvalues)
eigenvectors = eigenvectors.T[::-1]

In [None]:
# plot the eigenvectors
t = np.linspace(-.1,.1,100)
EV1X = t*eigenvectors[0,0]*sqrteigenvalues[0]
EV1Y = t*eigenvectors[0,1]*sqrteigenvalues[0]
EV2X = t*eigenvectors[1,0]*sqrteigenvalues[1]
EV2Y = t*eigenvectors[1,1]*sqrteigenvalues[1]

plt.scatter(x, y, s=1, color='gray')
plt.axis('equal')
plt.plot(EV1X,EV1Y,color='r')
plt.plot(EV2X,EV2Y,color='g')

In [None]:
pca = decomposition.PCA(n_components=2)
pca.fit(X)
print('Sklearn eigenvectors:')
print(pca.components_)
print('Our eigenvectors:')
print(eigenvectors)
print('Sklearn singular values:')
print(pca.singular_values_)
print('Our singular values:')
print(sqrteigenvalues)

As we can see, the result is only approximately the same. Note that the relative sign in the eigenvectors does not matter.