
![alt text](http://gmum.net/images/logo.jpg =300x150)

# Workshop Overview:



1.  Image classification with a CNN
2.   Multisize Image Classification with classical CNN
3.   Set Aggregation Network for Multisize Image classification



In [0]:
import tensorflow as tf
import random
import numpy as np
import matplotlib.pyplot as plt
import itertools
import cv2

from tensorflow.keras.datasets import cifar10
%matplotlib inline

# 1. Image classification

Heading towards a mulitsize image classfication, we will first create and train a network for classical Cifar-10 image classification. The cifar train dataset consist of 50000 RGB images of size 32x32.  The code written in this stage will be later used in next sections and is a preliminary before heading to the problem of multisize image classification



** 1.1. Load the data **  
The following function loads the CIFAR-10 dataset (and downloads it, if you haven't made it yet) and transforms the labels to one-hot-encoddings.

In [0]:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

n_classes = 10

X_train = X_train / 255.
X_test = X_test / 255.

valid_size = len(X_test)//2
X_valid = X_test[:valid_size]
y_valid = y_test[:valid_size]
X_test = X_test[valid_size:]
y_test = y_test[valid_size:]


with tf.Session() as sess:
    y_train = tf.one_hot(y_train, n_classes).eval()
    y_test = tf.one_hot(y_test, n_classes).eval()
    y_valid = tf.one_hot(y_valid, n_classes).eval()
    
y_train = y_train.reshape(-1, n_classes)
y_test = y_test.reshape(-1, n_classes)
y_valid = y_valid.reshape(-1, n_classes)

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
print(X_valid.shape, y_valid.shape)

** 1.2. Define auxillary functions**  
The function below calculates the accuracy  for a dataset, where:


*   `sess` - is the opened session with trained model
*   `dataset` is the dataset to evaluate on
*   `correct_sum` is the sum of all correct predictions
*   `x_placeholder` - is the placeholder for the inputs
*    `batch_size` - is the size of one batch, default to 500

We will use this function later to evaluate our models.



In [0]:
def calculate_accuracy_batch(sess, dataset, y_dataset, correct_sum, x_placeholder, batch_size=500):
    validation_accuracy = 0.
    for j in list(range(0, dataset.shape[0], batch_size)):
        good_pred = sess.run(correct_sum, feed_dict={x_placeholder: dataset[j:(j+batch_size)], y: y_dataset[j:(j+batch_size)]})
        validation_accuracy += good_pred

    validation_accuracy /= dataset.shape[0]
    return validation_accuracy

** 1.3. Classical Convolutional Neural Network as a baseline**  

In the following task you have to write a small CNN and train it on the CIFAR-10 dataset.
As the workshop time is limited, you should use small architecture. We propose the following:

1. Convolutional layers:
    *   Convolutional layer (32 filters, 3x3 kernel, stride=1)
    *   Max pooling (2x2)
    *   Convolutional layer (64 filters, 3x3 kernel, stride=1)
    *   Max pooling (2x2)
    *   Convolutional layer (64 filters, 3x3 kernel, stride=1)
    *   Max pooling (2x2)
    
2. Flatten layers:
    *  One classical flatten layer.
    
3. Dense layers:
    *  Dense layer (128 units, ReLU as activation function)
    *  Output dense layer (`n_class` units, no activation function -> instead of using softmax activation + cross entropy loss function, in order to achieve better numerical stability return the logits here, and next use the [`softmax_cross_entropy_with_logits_v2`](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2) cost function)



In [0]:
tf.reset_default_graph()

** 1.4. Define the network architecture **

Please complete the following functions, which define the network layers, as described above. We suggest using the [tf.layers](https://www.tensorflow.org/api_docs/python/tf/layers) module.

You will probably need the following layers:

*  [conv2d](https://www.tensorflow.org/api_docs/python/tf/layers/conv2d)
*  [max_pooling2d](https://www.tensorflow.org/api_docs/python/tf/layers/max_pooling2d)
*  [flatten](https://www.tensorflow.org/api_docs/python/tf/layers/flatten)
*  [dense](https://www.tensorflow.org/api_docs/python/tf/layers/dense)

Please write the code in variable scopes.

In [0]:
def conv_layers(x):
    """
    Write the function which takes the RGB image as input and then passes it
    through convolutional and max_pooling layers, as described in section 2.3. 
    """
    with tf.variable_scope('baseline_net', reuse=tf.AUTO_REUSE):
        x_conv = ##

    return x_conv
    
    
def flatten_layer(x):
    """
    Write the function for flattening the feature maps obtained from the convolutional layers. 
    """
    x_flat = ##
    return x_flat

  
def dense_layers(x):
    """
    Write the function which takes the flattened ouput of convolutions and passes it 
    through dense layers, as described in section 2.3.
    """
    with tf.variable_scope('baseline_net', reuse=tf.AUTO_REUSE):
        logits = ##
    return logits


def create_network_1(x):
    """
    This function is the pipeline for the image processing. The image is processed as follows:
    1. Apply the convolutional layers (function conv_layers).
    2. Flatten the image, before passing it to dense layers (function flatten_layer).
    3. Apply the dense layers to get the logits (function dense_layers).
    Return the logits. 
    """
    x_conv = conv_layers(x)
    x_flat = flatten_layer(x_conv)
    logits = dense_layers(x_flat)
    return logits

** 1.5. Define the placeholders for training data**  
Remember about the propper shape for training images (in CIFAR-10 dataset every digit is a 32x32 RGB image). The labels are represented in one-hot-encoding. 

In [0]:
""" Define the input placeholder. """
x = ##

""" Define the true labels placeholder. """
y = ##

** 1.6. Create the logits**  

In [0]:
""" Create the logits. """
logits = create_network_1(x)

** 1.7. Define the loss function and optimizer**  

In this section you are asked to complete the loss function and optimization operation.  
*   As mentioned above, we will use the cross entropy loss. The loss is implemented in [`tf.nn.softmax_cross_entropy_with_logits_v2`](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2)
*   As the optimizer please use the [`Adam optimizer`](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) with learning rate `1e-3` and other parameters set to their default values.



In [0]:
""" Define the cross entropy loss function. Remember that for numerical
stability reasons we use the tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=logits) function. 
You should not apply the softmax function on logits yourself. """

cross_entropy = ##
cross_entropy = ##

learning_rate = 1e-3
""" Define the Adam optimizer with default parameters, that will minimize our cross_entropy loss function. """
train_step = ##

** 1.8. Define the accurracy calculator**  

The functions below define the accuracy and correct sum (unormalized accuracy, which will be later passed to an evaluation function on test dataset).

In [0]:
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
correct_prediction = tf.cast(correct_prediction, tf.float32)

accuracy = tf.reduce_mean(correct_prediction)
correct_sum = tf.reduce_sum(correct_prediction)

** 1.9. Train the Neural Network** 

In this exercise you are asked to train and evaluate the CNN network. Please complete the code for the train function, where:


*   `sess` - is the opened tf.Session, you may assume that the global variables have already been initialized. 
*   `x_train` - is the train dataset.
*   `x_valid` - is the validation dataset. This is not obligatory, but you may use the validation dataset and  
the `calculate_accuracy_batch(x_valid, correct_sum, x)` function defined previously to monitor the network's score on out-of-train data.
*   `epoch_num` - the number of epochs. 
*   `batch_size` - the size of one batch.


In [0]:
print_every = 1
def train(sess, X_train, X_valid, y_train, y_valid, epoch_num, batch_size):
    val_accs_baseline = []
    set_size = X_train.shape[0]
    for epoch in range(epoch_num):
        for i in range(0, set_size, batch_size):
            """
            Complete the training loop.
            """
            ##
        
        if epoch % print_every == 0:
            val_acc = calculate_accuracy_batch(sess, X_valid, y_valid, correct_sum, x)
            val_accs_baseline.append(val_acc)
            print('Epoch: {0:d}, validation accuracy: {1:.3f}'.format(epoch, val_acc))
        
        # shuffle the dataset
        perm = np.random.permutation(set_size)
        X_train = X_train[perm, :]
        y_train = y_train[perm, :]
        
    return val_accs_baseline

** 1.10. Run the code **  
The cell below runs the train function and outputs the classification accuracy on the test set.

In [0]:
# some hyperparameters 
epoch_num = 10
batch_size = 64

with tf.Session() as sess:
    #initialize the session
    sess.run(tf.global_variables_initializer())
    baseline_accuracy = train(sess, X_train, X_valid, y_train, y_valid, epoch_num, batch_size)
    test_acc = calculate_accuracy_batch(sess, X_test, y_test, correct_sum, x)
    print('Final test accuracy: {0:.3f}'.format(test_acc))

**1.11. Plot the accuracy**

In [0]:
plt.figure(figsize=(9,7))

plt.plot(baseline_accuracy, label='Accuracy')
plt.ylim(0.2, 0.8)

plt.ylabel("Accuracy")
plt.xlabel("Epoch")

plt.legend(prop={'size': 15})
plt.show()

## 2. Multisize Image Classification : Training and testing with multisized images -- ConvNet with global max pooling.

Now we will move to the task of multisize image classfication, where the input images may have different shapes. 
A common approach to this problem is to resize all the images into the same shape and process them all together. Another solution is to write a separate network for each of the shapes. An altenrative method could incorporate the use of convolutional layers. However, as the convolution architecture is in fact shape-invaraint, the following head architecture (for exaple the layer right before the classifier) is not. A possible solution to this problem is to use global average on global max pooling layers. In this section we will use the later: i.e. the global max pooling. 

In the following task you have to write a CNN that can handle multisize images. Then train it on the CIFAR-10 dataset. For the purpose of this workshop we will create an artificial multisize image dataset by resizing the CIFAR-10 dataset to 56x56 and 20x20.

The CNN architecture will be very similar to the one in section 2, so you may reuse your code. We propose the following archietcture:

1. Convolutional layers:
    *   Convolutional layer (32 filters, 3x3 kernel, stride=1)
    *   Max pooling (2x2)
    *   Convolutional layer (64 filters, 3x3 kernel, stride=1)
    *   Max pooling (2x2)
    *   Convolutional layer (64 filters, 3x3 kernel, stride=1)
    *   Max pooling (2x2)
    
2. Flatten layers:
    * Global max pooling layer
    
3. Dense layers:
    * Dense layer (128 units, ReLU as activation function)
    * Dense layer (128 units, ReLU as activation function)
    * Output dense layer (`n_class` units, no activation function -> instead of using softmax activation + cross entropy loss function, in order to achieve better numerical stability return the logits here and next use the `softmax_cross_entropy_with_logits_v2` cost function)

This task is a little bit harder, as now we have to classify images that are not in a single shape. This is crumbersome because, as you have seen earlier, placeholders need a predefined image shape. We can solve this problem in two different ways:

1. Dynamic shaped placeholders -> this is the harder method, we won't use it here. However if you have enough time, you can try to implement it on your own.
2. Using more placeholders for the input data. Assuming we known the shapes beforehand (as in this situation), we can specify a separate placeholder for each of the shapes. 


---



Comment:
We use an artificially resized cifar-10, as orginal multisized datasets are way to big for the purpose of this workshop. Moreover implementing the upcoming models to use those datasets efficiently in tensorflow is much more difficult.  

In [0]:
tf.reset_default_graph()

** 2.1. Create the dataset **  
The function below augments the CIFAR-10 with resized images of 56x56 and 20x20 pixels. 


In [0]:
train_chunk_size = len(X_train)//3

X_train_res_56 = np.array(list(map(lambda x: cv2.resize(x, dsize=(56, 56), interpolation=cv2.INTER_CUBIC), X_train[:train_chunk_size])))
X_train_res_20 = np.array(list(map(lambda x: cv2.resize(x, dsize=(20, 20), interpolation=cv2.INTER_CUBIC), X_train[train_chunk_size:2*train_chunk_size])))

X_test_res_56 = np.array(list(map(lambda x: cv2.resize(x, dsize=(56, 56), interpolation=cv2.INTER_CUBIC), X_test)))
X_test_res_20 = np.array(list(map(lambda x: cv2.resize(x, dsize=(20, 20), interpolation=cv2.INTER_CUBIC), X_test)))

X_valid_res_56 = np.array(list(map(lambda x: cv2.resize(x, dsize=(56, 56), interpolation=cv2.INTER_CUBIC), X_valid)))
X_valid_res_20 = np.array(list(map(lambda x: cv2.resize(x, dsize=(20, 20), interpolation=cv2.INTER_CUBIC), X_valid)))

X_train_res_32 = X_train[2*train_chunk_size:]
X_test_res_32 = X_test
X_valid_res_32 = X_valid

y_train_56 = y_train[:train_chunk_size]
y_train_20 = y_train[train_chunk_size:2*train_chunk_size]
y_train_32 = y_train[2*train_chunk_size:]

** 2.2 Define the network architecture **

Please complete the following functions. 
You will probably need the following layers:

*  [conv2d](https://www.tensorflow.org/api_docs/python/tf/layers/conv2d)
*  [max_pooling2d](https://www.tensorflow.org/api_docs/python/tf/layers/max_pooling2d)
*  [flatten](https://www.tensorflow.org/api_docs/python/tf/layers/flatten)
*  [dense](https://www.tensorflow.org/api_docs/python/tf/layers/dense)

Remember to write the code using variable scopes, as we want to share the weights across layers for different input shapes. 

In [0]:
def conv_layers(x):
    """
    Write the function which takes the RGB image as input and then passes it
    through convolutional and max_pooling layers. 
    """
    with tf.variable_scope('max_pool_net', reuse=tf.AUTO_REUSE):
        x_conv = ##
        
    return x_conv
    
    
def dense_layers(x):
    """
    Write the function which takes the output of global max pooling
    and then passes it through dense layers.
    """
    with tf.variable_scope('max_pool_net', reuse=tf.AUTO_REUSE):
        logits = tf.layers.dense(inputs=x_flat, units=n_classes, name='FC_3') ##
        
    return logits

    
def image_max_pooling(x):
    """
    Write the function for global max_pooling over the filters.
    """
    x_flat = ##
    return x_flat


def create_network_2(x):
    """
    This function is the pipeline for the image processing. The image is processed as follows:
    1. Apply the convolutional layers.
    2. Use global max pooling on each filter of the returned image and then flatten it, before passing it to dense layers.
    3. Apply dense layers to get the logits.
    Return the logits.
    """
    x_conv = conv_layers(x)
    x_flat = image_max_pooling(x_conv)
    logits = dense_layers(x_flat)
    return logits

**2.3 Define the placeholders for the training data and the logits **  
Remember about the propper shape for training images (in CIFAR-10 dataset every digit is a 32x32 RGB image). The labels are represented using one-hot-encoding. We also have to handle images with shapes 20x20 and 56x56. 

In [0]:
""" Define the input placeholder. """
x = ##

""" Define the true labels placeholder. """
y = ##

""" You can consider using more placeholders """

** 2.4 Define the logits as the output from our CNN **

Create the logits using the function from section 3.2.

In [0]:
logits = create_network_2(x)

""" If You created more placeholders, You have to handle them somehow.
    Maybe this is the time and place. """
logits_20 = ##
logits_56 = ##

** 2.5 Define the loss function and optimizer **

In this section you are asked to complete the loss function and optimization operation.  
*   As mentioned above, we will use the cross entropy loss. The loss is implemented in [`tf.nn.softmax_cross_entropy_with_logits_v2`](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2)
*   As the optimizer please use the [`Adam optimizer`](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) with learning rate `1e-3` and other parameters set to their default values.

In [0]:
""" Define the cross entropy loss function """
cross_entropy = ##


""" Define the Adam optimizer with default parameters, that will minimize our cross_entropy loss function. """
train_step = ##

** 2.6. Define the accurracy calculator **

The functions below define the accuracy and correct_sum operations for evaluating our model. 

In [0]:
""" Create a vector that tells us, whether the predictions from our net (logits)
    are equal to the correct digit labels (y). """
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
correct_prediction = tf.cast(correct_prediction, tf.float32)

""" Calculate the accurracy of correct predictions.
    You should also calculate the number of correct predictions as this is required by calculate_accuracy_batch function."""
accuracy = tf.reduce_mean(correct_prediction)
correct_sum = tf.reduce_sum(correct_prediction)


""" It is also usefull to calculate the number and accurracy of correct predictions
    separately for images of different sizes. """
# Correct predictions
correct_prediction_56 = tf.equal(tf.argmax(logits_56, 1), tf.argmax(y, 1))
correct_prediction_56 = tf.cast(correct_prediction_56, tf.float32)

correct_prediction_20 = tf.equal(tf.argmax(logits_20, 1), tf.argmax(y, 1))
correct_prediction_20 = tf.cast(correct_prediction_20, tf.float32)

# Accurracies
accuracy_56 = tf.reduce_mean(correct_prediction_56)
accuracy_20 = tf.reduce_mean(correct_prediction_20)

correct_sum_56 = tf.reduce_sum(correct_prediction_56)
correct_sum_20 = tf.reduce_sum(correct_prediction_20)

**2.7. Train our Neural Network**  

In this exercise you are asked to train and evaluate the CNN network. Please complete the code for the train function, where:

*   `sess` - is the opened tf.Session, you may assume that the global variables have already been initialized.
*   `x_train` - is the train dataset.
*   `x_valid` - is the validation dataset. This is not obligatory, but you may use the validation dataset and  
the `calculate_accuracy_batch(x_valid, correct_sum, x)` function defined previously, to monitor the network score on out-of-train data.
*   `epoch_num` - number of epochs 
*   `batch_size` - the size of one batch.

This function trains and is evaluated on data of size 32x32, 56x56 and 20x20. 

In [0]:
def train(sess, X_train, X_valid, y_train, y_valid, epoch_num, batch_size, print_every=1):
    val_accs_pooling = []
    val_accs_56 = []
    val_accs_20 = []
    set_size = X_train.shape[0]
    global X_train_res_56, y_train_56
    global X_train_res_20, y_train_20
    for epoch in range(epoch_num):
        for i in range(0, set_size, batch_size):
            """
            Complete the training loop.
            """
            ##
        
        if epoch % print_every == 0:
            val_acc = calculate_accuracy_batch(sess, X_valid, y_valid, correct_sum, x)
            acc_56 = calculate_accuracy_batch(sess, X_valid_res_56, y_valid, correct_sum_56, x_56)
            acc_20 = calculate_accuracy_batch(sess, X_valid_res_20, y_valid, correct_sum_20, x_20)
            
            val_accs_pooling.append(val_acc)
            val_accs_56.append(acc_56)
            val_accs_20.append(acc_20)           

            print('Epoch: {0:d}, ACC: {1:.3f}, ACC (56): {2:.3f}, ACC (20): {3:.3f}'.format(epoch, val_acc, acc_56, acc_20))
        
        # shuffle the dataset
        perm = np.random.permutation(set_size)
        X_train = X_train[perm, :]
        y_train = y_train[perm, :]
        perm = np.random.permutation(len(X_train_res_56))
        X_train_res_56 = X_train_res_56[perm, :]
        y_train_56 = y_train_56[perm, :]
        perm = np.random.permutation(len(X_train_res_20))
        X_train_res_20 = X_train_res_20[perm, :]
        y_train_20 = y_train_20[perm, :]

        
    max_pool_accuracy = {}
    max_pool_accuracy['Acc'] = val_accs_pooling
    max_pool_accuracy['Acc_20'] = val_accs_20
    max_pool_accuracy['Acc_56'] = val_accs_56
    
    return max_pool_accuracy

** 2.8. Run the code **  
Run the functions above, either by using the train function from section 3.7. 

In [0]:
epoch_num = 15
batch_size = 64

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    max_pool_accuracy = train(sess, X_train_res_32, X_valid_res_32, y_train_32, y_valid, epoch_num, batch_size) 
    test_acc = calculate_accuracy_batch(sess, X_test_res_32, y_test, correct_sum, x)
    test_acc_56 = calculate_accuracy_batch(sess, X_test_res_56, y_test, correct_sum_56, x_56)
    test_acc_20 = calculate_accuracy_batch(sess, X_test_res_20, y_test, correct_sum_20, x_20)
    print("Final test accuracy: Original: {0:.3f}, 56x56: {1:.3f}, 20x20: {2:.3f}".format(test_acc, test_acc_56, test_acc_20))

**2.9. Plot the accuracy**

In [0]:
plt.figure(figsize=(9,7))

plt.plot(max_pool_accuracy['Acc'], label='Accuracy (original)')
plt.plot(max_pool_accuracy['Acc_20'], label='Accuracy (20x20)')
plt.plot(max_pool_accuracy['Acc_56'], label='Accuracy (56x56)')
plt.ylim(0.2, 0.8)

plt.ylabel("Accuracy")
plt.xlabel("Epoch")


plt.legend(prop={'size': 15})
plt.show()

## 3. Set Aggregate Network for multisize image classification

In this section we will use the Set Aggregate Network instead of the Global Max Pooling. The architecture of the model will look like this:

1. Convolutional layers (unchanged):
    *   Convolutional layer (32 filters, 5x5 kernel, strides=1)
    *   Max pooling (2x2)
    *   Convolutional layer (32 filters, 5x5 kernel, strides=1)
    *   Max pooling (2x2)
    
2. Flatten layers:
    * SAN layer (projection dim=128)
    
3. Dense layers:
    * Dense layer (128 units, ReLU as activation function)
    * Output dense layer (n_class units, no activation function -> instead of using softmax activation + cross entropy loss function, in order to have better numerical stability, we will return the logits here, and next we will use the softmax_cross_entropy_with_logits_v2 cost function)
    
The SAN layer takes a batch of sets as input and outputs a batch of representations of that input.  You will have to preprocess the output from the convolutional layers, before feeding them to SAN. Every image (3 dimensional feature map) has to be represented as a set of vectors of pixels accross the feature map channels:

$$X_{feature\ map} = \{ v_{(1,1)}, ..., v_{(m,n)} \}$$
where $v_{(i,j)}$ is the vector representation of the pixel along all channels at position (i,j), composed of:

$$ 𝑣(𝑖,𝑗)=(normalized(𝑖),normalized(𝑗),𝑝𝑖𝑥𝑒𝑙^{(𝑖,𝑗)}_1,...,𝑝𝑖𝑥𝑒𝑙^{(𝑖,𝑗)}_k), $$

where $𝑝𝑖𝑥𝑒𝑙^{(𝑖,𝑗)}_l$ is the pixel at position $(i,j)$ in channel $l$.  

The position of the pixel in feature map is normalized to fall in $(-1,1)$:

$$normalized(i) = \frac{2i}{m}-1$$  
$$normalized(j) = \frac{2j}{n}-1$$
<br>
<br>
![](https://ww2.ii.uj.edu.pl/~z1101353/Feature_Map.png)
<br>
<br>

Then the normalized image is processed by the SAN layer, which is given by:

$$SAN(X_{feature\ map}) = \sum_{v \in X_{feature\ map}} ReLU(w^{T}v + b)$$ 

In the equation above you may use the mean instead of sum. 



In [0]:
tf.reset_default_graph()

** 3.1. Define the network architecture **


Please complete the following functions. 
You will probably need the following layers and functions:

*  [conv2d](https://www.tensorflow.org/api_docs/python/tf/layers/conv2d)
*  [conv1d](https://www.tensorflow.org/api_docs/python/tf/layers/conv1d) or  [tensordot](https://www.tensorflow.org/api_docs/python/tf/tensordot) for implementing the SAN layer. 
*  [max_pooling2d](https://www.tensorflow.org/api_docs/python/tf/layers/max_pooling2d)
*  [flatten](https://www.tensorflow.org/api_docs/python/tf/layers/flatten)
*  [dense](https://www.tensorflow.org/api_docs/python/tf/layers/dense)
*  [tf.map_fn](https://www.tensorflow.org/api_docs/python/tf/map_fn)

Please write the code using variable scopes. 

In [0]:
def conv_layers(x):
    """
    Write the function which takes the RGB image as input and then passes it
    through convolutional and max_pooling layers. 
    """
    with tf.variable_scope('san_net', reuse=tf.AUTO_REUSE):
        x_conv = ##
    return x_conv
    
    
def dense_layers(x):
    """
    Write the function which takes the output from SAN layer and processes it through dense layers.
    """
    with tf.variable_scope('san_net', reuse=tf.AUTO_REUSE):
        logits = ##
        
    return logits
    
    
def san_layer(x, proj_dim):
    """
    Write the function which takes the preprocessed, normalized feature maps returned from convolutional layers
    and passes it through the SAN layer.
    """
    with tf.variable_scope('san_net', reuse=tf.AUTO_REUSE):
        x_reduced = ##
    return x_reduced


def iter_function(img): 
    x_shape = img.shape[0].value 
    y_shape = img.shape[1].value 
    
    x_lim = np.arange(x_shape, dtype=np.int32) 
    y_lim = np.arange(y_shape, dtype=np.int32) 
    
    prod_ = list(itertools.product(x_lim, y_lim)) 
    img_values = tf.gather_nd(img, prod_) 
    
    x_lim_norm = np.linspace(start=-1., stop=1., num=x_shape) 
    y_lim_norm = np.linspace(start=-1., stop=1., num=y_shape) 
    
    prod_normalized = np.array(list(itertools.product(x_lim_norm, y_lim_norm))) 
    
    return tf.concat([prod_normalized, img_values], axis=1) 
    
    
def prepare_batch(batch_x):
    """
    Write the function which takes the feature maps returned from convolutional layers and 
    makes the SAN preprocessing.
    You can preprocess every single feature map separately, which is done by the `iter_function`
    above. To do so, apply the tf.map_fn along the samples from batch_x. 
    """
    x_prepared = ##
    return x_prepared
    

def create_network_3(x, proj_dim=128):
    """
    This function is the pipeline for the image processing. The image is processed as follows:
    1. Apply the convolutional layers.
    2. Prepare the conv nets output to be able to pass it through the SAN layer.
    3. Use the SAN layer, before passing the image to dense layers.
    3. Apply dense layers to get the logits.
    """
    x_conv = conv_layers(x)
    x_prepared = prepare_batch(x_conv)
    x_flat = san_layer(x_prepared, proj_dim)
    logits = dense_layers(x_flat)
    return logits

** 3.2  Define the placeholders for training data. **

This code is the same as in section 3.3, so you may just copy it.

In [0]:
""" Define the input placeholder. """
x = ##

""" Define the true labels placeholder. """
y = ##

""" You can consider using more placeholders """

** 3.3 Define the logits as the output from our CNN **

Create the logits using the function from section 4.1

In [0]:
logits = create_network_3(x)

""" If You created more placeholders, You have to handle them somehow.
    Maybe this is the time and this is the place. """

** 3.4 Define the loss and optimizer **

In this section you are asked to complete the loss function and optimization operation.  
*   As mentioned above, we will use the cross entropy loss. The loss is implemented in [`tf.nn.softmax_cross_entropy_with_logits_v2`](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2)
*   As the optimizer please use the [`Adam optimizer`](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) with learning rate `1e-3` and other parameters set to their default values.

In [0]:
""" Define the cross entropy loss function """
cross_entropy = ##


""" Define the Adam optimizer with default parameters, that will minimize our cross_entropy loss function. """
train_step = ##

** 3.5 Define the accurracy calculator **  

The functions below define the accuracy and correct_sum operations for evaluating our model. 


In [0]:
""" Create a vector that tells us, whether the predictions from our net (logits)
    are equal to the correct digit labels (y). """
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1)) 
correct_prediction = tf.cast(correct_prediction, tf.float32) 

""" Calculate the accurracy of correct predictions.
    You should also calculate the number of correct predictions as this is required by calculate_accuracy_batch function."""
accuracy = tf.reduce_mean(correct_prediction) 
correct_sum = tf.reduce_sum(correct_prediction) 


""" It will be usefull to calculate the numbers and accurracy of correct predictions
    also for images of other sizes. """
# Correct predictions
correct_prediction_56 = tf.equal(tf.argmax(logits_56, 1), tf.argmax(y, 1)) 
correct_prediction_56 = tf.cast(correct_prediction_56, tf.float32) 

correct_prediction_20 = tf.equal(tf.argmax(logits_20, 1), tf.argmax(y, 1)) 
correct_prediction_20 = tf.cast(correct_prediction_20, tf.float32) 

# Accurracies
accuracy_56 = tf.reduce_mean(correct_prediction_56) 
accuracy_20 = tf.reduce_mean(correct_prediction_20) 

correct_sum_56 = tf.reduce_sum(correct_prediction_56)
correct_sum_20 = tf.reduce_sum(correct_prediction_20) 

**3.6 Train our Neural Network**  

In this exercise you are asked to train and evaluate the CNN network. Please complete the code for the train function, where:

*   `sess` - is the opened tf.Session, you may assume that the global varaibles have been already initialized.
*   `x_train` - is the train dataset
*   `x_valid` - is the validation dataset. This is not obligatory, but you may use the validation dataset and  
the `calculate_accuracy_batch(x_valid, correct_sum,x)` function defined previously, to monitor the network score on out-of-train data.
*   `epoch_num` - number of epochs 
*   `batch_size` - the size of one batch.

This function trains on data of size 32x32, 56x56 and 20x20. 

In [0]:
def train(sess, X_train, X_valid, y_train, y_valid, epoch_num, batch_size, print_every=1):
    val_accs_pooling = []
    val_accs_56 = []
    val_accs_20 = []
    set_size = X_train.shape[0]
    global X_train_res_56, y_train_56
    global X_train_res_20, y_train_20
    for epoch in range(epoch_num):
        for i in range(0, set_size, batch_size):
            """
            Complete the training loop.
            """
            ##
        
        if epoch % print_every == 0:
            val_acc = calculate_accuracy_batch(sess, X_valid, y_valid, correct_sum, x)
            acc_56 = calculate_accuracy_batch(sess, X_valid_res_56, y_valid, correct_sum_56, x_56)
            acc_20 = calculate_accuracy_batch(sess, X_valid_res_20, y_valid, correct_sum_20, x_20)
            
            val_accs_pooling.append(val_acc)
            val_accs_56.append(acc_56)
            val_accs_20.append(acc_20)
            
            print('Epoch: {0:d}, ACC: {1:.3f}, ACC (56): {2:.3f}, ACC (20): {3:.3f}'.format(epoch, val_acc, acc_56, acc_20))
        
        # shuffle the dataset
        perm = np.random.permutation(set_size)
        X_train = X_train[perm, :]
        y_train = y_train[perm, :]
        perm = np.random.permutation(len(X_train_res_56))
        X_train_res_56 = X_train_res_56[perm, :]
        y_train_56 = y_train_56[perm, :]
        perm = np.random.permutation(len(X_train_res_20))
        X_train_res_20 = X_train_res_20[perm, :]
        y_train_20 = y_train_20[perm, :]
        
        
    san_accuracy = {}
    san_accuracy['Acc'] = val_accs_pooling
    san_accuracy['Acc_20'] = val_accs_20
    san_accuracy['Acc_56'] = val_accs_56
    
    return san_accuracy

** 3.7 Run the code **  
Run the code either using the train function

In [0]:
epoch_num = 15
batch_size = 64

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    san_accuracy = train(sess, X_train_res_32, X_valid_res_32, y_train_32, y_valid, epoch_num, batch_size)
    test_acc = calculate_accuracy_batch(sess, X_test_res_32, y_test , correct_sum, x)
    test_acc_56 = calculate_accuracy_batch(sess, X_test_res_56, y_test, correct_sum_56, x_56)
    test_acc_20 = calculate_accuracy_batch(sess, X_test_res_20, y_test,correct_sum_20, x_20)
    print("Final test accuracy: Original: {0:.3f}, 56x56: {1:.3f}, 20x20: {2:.3f}".format(test_acc, test_acc_56, test_acc_20))

**3.8. Plot the accuracy**

In [0]:
plt.figure(figsize=(9,7))

plt.plot(san_accuracy['Acc'], label='Accuracy (original)')
plt.plot(san_accuracy['Acc_20'], label='Accuracy (20x20)')
plt.plot(san_accuracy['Acc_56'], label='Accuracy (56x56)')
plt.ylim(0.2, 0.8)

plt.ylabel("Accuracy")
plt.xlabel("Epoch")


plt.legend(prop={'size': 15})
plt.show()

** 3.9. Compare the accuracies of CNN with global max pooling and CNN with SAN layer**

In [0]:
plt.figure(figsize=(12,10))

plt.plot(san_accuracy['Acc'], label='Accuracy - SAN (original)')
plt.plot(san_accuracy['Acc_56'], label='Accuracy - SAN (56x56)')
plt.plot(san_accuracy['Acc_20'], label='Accuracy - SAN (20x20)')

plt.plot(max_pool_accuracy['Acc'], label='Accuracy - max pooling (original)')
plt.plot(max_pool_accuracy['Acc_56'], label='Accuracy - max pooling  (56x56)')
plt.plot(max_pool_accuracy['Acc_20'], label='Accuracy - max pooling  (20x20)')
plt.ylim(0.2, 0.8)

plt.legend(prop={'size': 15})
plt.show()