# Exercise 1

In this exercise, we will practice how to use deep neural networks on image data. We will be using the popular [Fashion mnist](https://www.tensorflow.org/datasets/catalog/fashion_mnist), which consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image associated with a label from 10 classes. Our goal is to build a neural network model to predict the label of a given image. We will achieve this in the following exercises.

### Exercise 1(a) (2 points)

Load the below libraries.

```
import matplotlib.pyplot as plt

import numpy as np
import tensorflow as tf
```

In [1]:
import matplotlib.pyplot as plt

import numpy as np
import tensorflow as tf

### Exercise 1(b) (2 points)

Load the `fashion_mnist` data as follows:

```
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
```

In [2]:
fashion_mnist = tf.keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

### Exercise 1(c) (2 points)

Report the shape of the `train` and `test` data sets.

In [3]:
print('Train data shape:', x_train.shape)
print('Test data shape:', x_test.shape)

Train data shape: (60000, 28, 28)
Test data shape: (10000, 28, 28)


### Execise 1(d) (12 points)

Build a CNN model as follows:

- Change the digit labels to 0-1 encoding.
- The CNN model should have the following layers in the given order:
    - `Conv2D` with 32 filters, `kernel_size=(3,3)` and `activation=relu`
    - `MaxPooling2D` with `pool_size=(2,2)`
    - `Conv2D` with 32 filters, `kernel_size=(3,3)` and `activation=relu`
    - `MaxPooling2D` with `pool_size=(2,2)`
    - `Flatten`
    - `Dense` with 128 neurons and `activation=relu`
    - `Dense` with 10 neurons and `activation=softmax`
- Compile the network with the following:
    - `optimizer='adam'`
    - `loss='categorical_crossentropy'`
    - `metrics=['accuracy']`
- Train the deep neural network with `epochs=50`, `batch_size=128`, and `validation_split=0.1`.
- Evaluate the model on the `test` data set.


In [4]:
y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

md1 = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)),
    tf.keras.layers.MaxPooling2D(pool_size = (2,2)),
    tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size = (2,2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax'),
])

md1.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

md1.fit(x_train, y_train, epochs = 50, batch_size = 128, validation_split = 0.1)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 20ms/step - accuracy: 0.6629 - loss: 4.3648 - val_accuracy: 0.8425 - val_loss: 0.4489
Epoch 2/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 16ms/step - accuracy: 0.8510 - loss: 0.4169 - val_accuracy: 0.8630 - val_loss: 0.3766
Epoch 3/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 17ms/step - accuracy: 0.8757 - loss: 0.3491 - val_accuracy: 0.8703 - val_loss: 0.3571
Epoch 4/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 18ms/step - accuracy: 0.8876 - loss: 0.3064 - val_accuracy: 0.8760 - val_loss: 0.3361
Epoch 5/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 20ms/step - accuracy: 0.8952 - loss: 0.2809 - val_accuracy: 0.8757 - val_loss: 0.3476
Epoch 6/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 16ms/step - accuracy: 0.9007 - loss: 0.2624 - val_accuracy: 0.8810 - val_loss: 0.3438
Epoch 7/50
[1m422/42

<keras.src.callbacks.history.History at 0x276ffd9e290>

### Execise 1(e) (12 points)

Build a CNN model as follows:

- The CNN model should have the following layers in the given order:
    - `Conv2D` with 64 filters, `kernel_size=(3,3)` and `activation=relu`
    - `MaxPooling2D` with `pool_size=(2,2)`
    - `Conv2D` with 64 filters, `kernel_size=(3,3)` and `activation=relu`
    - `MaxPooling2D` with `pool_size=(2,2)`
    - `Conv2D` with 64 filters, `kernel_size=(3,3)` and `activation=relu`
    - `MaxPooling2D` with `pool_size=(2,2)`
    - `Flatten`
    - `Dense` with 128 neurons and `activation=relu`
    - `Dense` with 10 neurons and `activation=softmax`
- Compile the network with the following:
    - `optimizer='adam'`
    - `loss='categorical_crossentropy'`
    - `metrics=['accuracy']`
- Train the deep neural network with `epochs=50`, `batch_size=128`, and `validation_split=0.1`.
- Evaluate the model on the `test` data set.

Notice that there is no need to 0-1 encode the target labels; you can go ahead and train the model.


In [5]:
md2 = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)),
    tf.keras.layers.MaxPooling2D(pool_size = (2,2)),
    tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size = (2,2)),
    tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size = (2,2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax'),
])

md2.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

md2.fit(x_train, y_train, epochs = 50, batch_size = 128, validation_split = 0.1)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 40ms/step - accuracy: 0.6481 - loss: 1.9074 - val_accuracy: 0.8088 - val_loss: 0.5169
Epoch 2/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 39ms/step - accuracy: 0.8291 - loss: 0.4682 - val_accuracy: 0.8208 - val_loss: 0.4794
Epoch 3/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 40ms/step - accuracy: 0.8463 - loss: 0.4138 - val_accuracy: 0.8597 - val_loss: 0.3868
Epoch 4/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 40ms/step - accuracy: 0.8676 - loss: 0.3570 - val_accuracy: 0.8640 - val_loss: 0.3724
Epoch 5/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 40ms/step - accuracy: 0.8790 - loss: 0.3326 - val_accuracy: 0.8637 - val_loss: 0.3684
Epoch 6/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 44ms/step - accuracy: 0.8867 - loss: 0.3047 - val_accuracy: 0.8743 - val_loss: 0.3456
Epoch 7/50
[1m4

<keras.src.callbacks.history.History at 0x27680c530d0>

### Execise 1(f) (12 points)

Build a CNN model as follows:

- The CNN model should have the following layers in the given order:
    - `Conv2D` with 32 filters, `kernel_size=(3,3)` and `activation=relu`
    - `MaxPooling2D` with `pool_size=(2,2)`
    - `Conv2D` with 64 filters, `kernel_size=(3,3)` and `activation=relu`
    - `MaxPooling2D` with `pool_size=(2,2)`
    - `Conv2D` with 128 filters, `kernel_size=(3,3)` and `activation=relu`
    - `MaxPooling2D` with `pool_size=(2,2)`
    - `Flatten`
    - `Dense` with 128 neurons and `activation=relu`
    - `Dense` with 10 neurons and `activation=softmax`
- Compile the network with the following:
    - `optimizer='adam'`
    - `loss='categorical_crossentropy'`
    - `metrics=['accuracy']`
- Train the deep neural network with `epochs=50`, `batch_size=128`, and `validation_split=0.1`.
- Evaluate the model on the `test` data set.

Notice that there is no need to 0-1 encode the target labels; you can go ahead and train the model.


In [6]:
md3 = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)),
    tf.keras.layers.MaxPooling2D(pool_size = (2,2)),
    tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size = (2,2)),
    tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size = (2,2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax'),
])

md3.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

md3.fit(x_train, y_train, epochs = 50, batch_size = 128, validation_split = 0.1)

Epoch 1/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 29ms/step - accuracy: 0.6636 - loss: 2.0795 - val_accuracy: 0.8302 - val_loss: 0.4619
Epoch 2/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 23ms/step - accuracy: 0.8436 - loss: 0.4266 - val_accuracy: 0.8507 - val_loss: 0.4057
Epoch 3/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 23ms/step - accuracy: 0.8644 - loss: 0.3630 - val_accuracy: 0.8690 - val_loss: 0.3632
Epoch 4/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 23ms/step - accuracy: 0.8800 - loss: 0.3236 - val_accuracy: 0.8690 - val_loss: 0.3611
Epoch 5/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 23ms/step - accuracy: 0.8906 - loss: 0.2937 - val_accuracy: 0.8690 - val_loss: 0.3555
Epoch 6/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 23ms/step - accuracy: 0.8973 - loss: 0.2789 - val_accuracy: 0.8817 - val_loss: 0.3446
Epoch 7/50
[1m4

<keras.src.callbacks.history.History at 0x2768d28c9d0>

### Exercise 1(g) (3 points)

Based on the results from parts 1(d)-1(f), which model would you use to predict the label of the `fashion_mnist` data set? Please be specific.

In [7]:
test_loss, test_acc = md1.evaluate(x_test, y_test, verbose = 0)
print(test_acc)
test_loss, test_acc = md2.evaluate(x_test, y_test, verbose = 0)
print(test_acc)
test_loss, test_acc = md3.evaluate(x_test, y_test, verbose = 0)
print(test_acc)

0.8888000249862671
0.8773000240325928
0.8759999871253967


Based on my results I would use the 1st model because it has the highest accuracy

### Exercise 1(h) (15 points)

Improve the performance of the best model architecture. You may consider adding more layers, change the activation function, add `Dropout` layers, etc.

In [12]:
md4 = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)),
    tf.keras.layers.MaxPooling2D(pool_size = (2,2)),
    tf.keras.layers.Dropout(.2),
    tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size = (2,2)),
    tf.keras.layers.Dropout(.2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax'),
])

md4.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

md4.fit(x_train, y_train, epochs = 50, batch_size = 128, validation_split = 0.1)

Epoch 1/50


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 24ms/step - accuracy: 0.5912 - loss: 4.6075 - val_accuracy: 0.8000 - val_loss: 0.5453
Epoch 2/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 25ms/step - accuracy: 0.7842 - loss: 0.5802 - val_accuracy: 0.8412 - val_loss: 0.4282
Epoch 3/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 27ms/step - accuracy: 0.8203 - loss: 0.4814 - val_accuracy: 0.8615 - val_loss: 0.3897
Epoch 4/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 26ms/step - accuracy: 0.8411 - loss: 0.4224 - val_accuracy: 0.8682 - val_loss: 0.3604
Epoch 5/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 27ms/step - accuracy: 0.8608 - loss: 0.3788 - val_accuracy: 0.8738 - val_loss: 0.3389
Epoch 6/50
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 28ms/step - accuracy: 0.8640 - loss: 0.3581 - val_accuracy: 0.8822 - val_loss: 0.3191
Epoch 7/50
[1m422/422[0m 

<keras.src.callbacks.history.History at 0x27699fee290>

In [13]:
test_loss, test_acc = md4.evaluate(x_test, y_test, verbose = 0)
print(test_acc)

0.9036999940872192


Adding dropout of .2 improved my best models score