<img src="https://www.th-koeln.de/img/logo.svg" style="float: right;" width="200">

# 7th exercise: <font color="#C70039">MNIST Classification with a Convolutional Neural Network</font>
* Course: DIS21a.1
* Lecturer: <a href="https://www.gernotheisenberg.de/">Gernot Heisenberg</a>
* Author of notebook modifications and adaptations: <a href="https://www.gernotheisenberg.de/">Gernot Heisenberg</a>
* Date:   08.08.2023

<img src="https://miro.medium.com/max/1250/1*vkQ0hXDaQv57sALXAJquxA.jpeg" style="float: center;" width="600">

---------------------------------
**GENERAL NOTE 1**: 
Please make sure you are reading the entire notebook, since it contains a lot of information about your tasks (e.g. regarding the set of certain paramaters or specific computational tricks, etc.), and the written mark downs as well as comments contain a lot of information on how things work together as a whole. 

**GENERAL NOTE 2**: 
* Please, when commenting source code, just use English language only. 
* When describing an observation (for instance, after you have run through your test plan) you may use German language.
This applies to all exercises in DIS 21a.1.  

---------------------

### <font color="ce33ff">DESCRIPTION</font>:

This notebook allows you for learning how you effectively setup and use convolutional neural networks (CNN). 
For this purpose, the classification of the MNIST digits from the earlier exercise is done again. 
Using a densely (fully) connected ANN in that previous exercise, you have achieved a test accuracy of about 97%. 
Even though the CNN will be very basic, its accuracy will blow out that of a densely-connected ANN.

This notebook shows in a few lines of code what a basic CNN looks alike. 
Basically, it is a simple stack of `Conv2D` and `MaxPooling2D` layers. 

-------------------------------------------------------------------------------------------------------------

### <font color="FFC300">TASKS</font>:
Within this notebook, the tasks that you need to work on are always listed as bullet points below. 
If a task is more challenging and consists of several steps, this is indicated as well. 
Make sure you have worked down the task list and commented your doings. 
This should be done using markdown.<br> 
<font color=red>Make sure you don't forget to specify your name and your matriculation number in the notebook before submitting it.</font>

**YOUR TASKS in this exercise are as follows**:
1. import the notebook to Google Colab.
2. make sure you specified you name and your matriculation number in the header below my name and date. 
    * set the date too and remove mine.
3. read the entire notebook carefully. 
    * for better understanding, add comments whereever you feel it necessary.
    * run the notebook for the first time and note the result in a markdown table. 
        * I have provided you with an example of a table in markdown (see below). Make sure you adapt your table accordingly. 
        * Put the table at the end of the notebook. 
        * This type of table will be needed in the other exercises as well. Always put it at the end.
    
| type of method | loss function | optimizer | accuracy |
| :-: | :-: | :-: | :-: |
| classification | categorical_crossentropy | bamm !|.666

4. write the code for 'loading and preparing the MNIST data set'.
5. write the code for 'training the CNN'.
    * epochs=5
    * batch_size=64 
6. write the code for 'evaluating the CNN', initially with the given hyperparameters. 

7. Compare the obtained result to the one you obtained by the densely connected ANN and compute the increase of accuracy in percent.

8. write a test plan for testing other hyperparameters. Store the values (also the one from task 7) in a table as the one from above. 

-----------------------------------------------------------------------------------

## START OF THE NOTEBOOK CODE
----------------------------------------------------------------------------------------------------------------------
### necessary imports
others are going to be included as soon as they are needed

In [None]:
import tensorflow
tensorflow.keras.__version__

from tensorflow.keras import layers
from tensorflow.keras import models

### building the CNN

<font color="#C70039">NOTE: </font>
A CNN takes as input tensors of shape `(image_height, image_width, image_channels)` 
(not including the batch dimension). 

In this case here, the CNN is configured to process inputs of size `(28, 28, 1)`, which is the format of MNIST grayscale images. This is done by passing the argument `input_shape=(28, 28, 1)` to the first layer.

In [None]:
model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))

# the subsequent layers are taking the shape of the previous layers as input shape. 
# Therefore an explicit specification of the input shape is not needed.
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))

Display the architecture of the CNN

In [None]:
model.summary()

The output of every `Conv2D` and `MaxPooling2D` layer is a 3D tensor of shape `(height, width, channels)`. The width 
and height dimensions tend to shrink by going deeper into the network. 
The number of channels is controlled by the first argument passed to the `Conv2D` layers (e.g. 32 or 64).

The next step is to feed the last output tensor (of shape `(3, 3, 64)`) into a densely (fully) connected classifier network, i.e. a stack of `Dense` ANN layers. 
These classifiers do process vectors, which are 1D, whereas our current output is a 3D tensor. 

So, first the 3D outputs needs to be flattened to 1D, and then a few `Dense` layers are added on top.

In [None]:
model.add(layers.Flatten()) # this could also be done by the reshaping function. You can try it out, if you like.
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

Since this is a classification of 10 digits again, use a final layer with 10 outputs and a softmax activation (as done before already). 
Now here's what the final CNN looks like.

In [None]:
model.summary()

Note, that the `(3, 3, 64)` outputs were flattened into vectors of shape `(576,)`, before going through two `Dense` layers.

Now, train the CNN on the MNIST digits. 

### loading and preparing the MNIST data set
as done before (compare with your old code)

In [None]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

In [None]:
'''ADD THE MISSING CODE HERE'''
'''LOOK AT THE TEXT ABOVE TO SEE WHAT PARAMETERS THE NETWORK SHALL CONTAIN'''

# your code


### training the CNN

for the start, take 
* epochs=5
* batch_size=64 

In [None]:
'''ADD THE MISSING CODE HERE'''
'''LOOK AT THE TEXT ABOVE TO SEE WHAT PARAMETERS THE NETWORK SHALL CONTAIN'''

# your code


### evaluate the CNN
Evaluate the model on the test data.

In [None]:
'''ADD THE MISSING CODE HERE'''
'''LOOK AT THE TEXT ABOVE TO SEE WHAT PARAMETERS THE NETWORK SHALL CONTAIN'''

# your code


In [None]:
test_acc

While the densely connected ANN had a test accuracy of about ___%. 

The basic CNN has a test accuracy of about ___% (double bamm !!!).

This is an increase of ____ %

### <font color="#C70039">Include your result table here and reflect a good test plan (see task list)</font>