# CODEATHON 2: Recognizing UVA landmarks with neural nets (50 pts)
![UVA Grounds](https://giving.virginia.edu/sites/default/files/2019-02/jgi-teaser-image.jpg)

The UVA Grounds is known for its Jeffersonian architecture and place in U.S. history as a model for college and university campuses throughout the country. Throughout its history, the University of Virginia has won praises for its unique Jeffersonian architecture.

In this codeathon, you will attempt the build an image recognition system to classify different buildlings/landmarks on Grounds. You will earn 50 points for this codeathon plus 10 bonus points. To make it easier for you, some codes have been provided to help you process the data, you may modify it to fit your needs. You must submit the .ipynb file via UVA Collab with the following format: yourcomputingID_codeathon_2.ipynb

In [None]:
import sys
import sklearn
import os
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from functools import partial

%tensorflow_version 2.x
import tensorflow as tf
from tensorflow import keras

np.random.seed(42)
tf.random.set_seed(42)

# Step 2: Process the  Dataset
The full dataset is huge (+37GB) with +13K images of 18 classes. So it will take a while to download, extract, and process. To save you time and effort, a subset of the data has been resized and compressed to only 379Mb and stored in my Firebase server. This dataset will be the one you will benchmark for your grade. If you are up for a challenge (and perhaps bonus points), contact the instructor for the full dataset!

In [None]:
# Download dataset from Firebase
!wget https://firebasestorage.googleapis.com/v0/b/uva-landmark-images.appspot.com/o/dataset.zip?alt=media&token=e1403951-30d6-42b8-ba4e-394af1a2ddb7

In [None]:
# Extract content
!unzip "/content/dataset.zip?alt=media"

In [None]:
from sklearn.datasets import load_files
from keras.utils import np_utils

from keras.preprocessing import image
from tqdm import tqdm # progress bar

data_dir = "/content/dataset/"
batch_size = 32;
# IMPORTANT: Depends on what pre-trained model you choose, you will need to change these dimensions accordingly
img_height = 150;
img_width = 150;

# Training Dataset
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split = 0.2,
    subset = "training",
    seed = 42,
    image_size= (img_height, img_width),
    batch_size = batch_size
)

# Validation Dataset
validation_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split = 0.2,
    subset = "validation",
    seed = 42,
    image_size = (img_height, img_width),
    batch_size = batch_size
)

In [None]:
# Visualize some of the train samples of one batch
# Make sure you create the class names that match the order of their appearances in the "files" variable
class_names = ['AcademicalVillage', 'AldermanLibrary', 'AlumniHall', 'AquaticFitnessCenter',
  'BavaroHall', 'BrooksHall', 'ClarkHall', 'MadisonHall', 'MinorHall', 'NewCabellHall',
  'NewcombHall', 'OldCabellHall', 'OlssonHall', 'RiceHall', 'Rotunda', 'ScottStadium',
  'ThorntonHall', 'UniversityChapel']

# Rows and columns are set to fit one training batch (32)
n_rows = 8
n_cols = 4
plt.figure(figsize=(n_cols * 3, n_rows * 3))
for images, labels in train_ds.take(1):
    for i in range (n_rows*n_cols):
        plt.subplot(n_rows, n_cols, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.axis('off')
        plt.title(class_names[labels[i]], fontsize=12)
plt.subplots_adjust(wspace=.2, hspace=.2)


In [None]:
# YOUR CODE STARTS HERE
print(len(class_names))

# Step 3: Create your own CNN architecture
You must design your own architecture. To get started, you may get inspiration from one in CNN notebook  (i.e. use one similar to LeNet-5 or AlexNet). You will have to report the design of the architecture:

1.   How many layers does it have?
2.   Why do you decide on a certain number nodes per layer?
3.   Which activation functions do you choose?
4.   How many parameters does it has in total?

Hint: use `myModel.summary()` to learn on the layers and parameters




# My Model Starts Here

The following architecture is based on the ALEXNET model. I have changed the input shape. When playing with my own models I could not achieve an validation accuracy above 0.1071
source: https://thecleverprogrammer.com/2021/12/13/alexnet-architecture-using-python/

In [None]:
!pip install visualkeras

In [None]:
import visualkeras

myModel = keras.models.Sequential()

myModel.add(keras.layers.Conv2D(filters=96, kernel_size=(11, 11),
                        strides=(4, 4), activation="relu",
                        input_shape=(150, 150, 3)))
myModel.add(keras.layers.BatchNormalization())
myModel.add(keras.layers.MaxPool2D(pool_size=(3, 3), strides= (2, 2)))
myModel.add(keras.layers.Conv2D(filters=256, kernel_size=(5, 5),
                        strides=(1, 1), activation="relu",
                        padding="same"))
myModel.add(keras.layers.BatchNormalization())
myModel.add(keras.layers.MaxPool2D(pool_size=(3, 3), strides=(2, 2)))
myModel.add(keras.layers.Conv2D(filters=384, kernel_size=(3, 3),
                        strides=(1, 1), activation="relu",
                        padding="same"))
myModel.add(keras.layers.BatchNormalization())
myModel.add(keras.layers.Conv2D(filters=384, kernel_size=(3, 3),
                        strides=(1, 1), activation="relu",
                        padding="same"))
myModel.add(keras.layers.BatchNormalization())
myModel.add(keras.layers.Conv2D(filters=256, kernel_size=(3, 3),
                        strides=(1, 1), activation="relu",
                        padding="same"))
myModel.add(keras.layers.BatchNormalization())
myModel.add(keras.layers.MaxPool2D(pool_size=(3, 3), strides=(2, 2)))
myModel.add(keras.layers.Flatten())
myModel.add(keras.layers.Dense(4096, activation="relu"))
myModel.add(keras.layers.Dropout(0.5))
myModel.add(keras.layers.Dense(18, activation="softmax"))

myModel.summary()
visualkeras.layered_view(myModel)

After designing the model, you will need to train it. In order to train, you will need to pick a number of `epoch` (iteration), which `optimizer` to use (from  `keras.optimizers`), a `loss` function, and some `metrics`.

In [None]:
Epochs = 10 ##TODO
Optimizer = keras.optimizers.Adam(learning_rate=1e-3) #TODO
Loss = "sparse_categorical_crossentropy" #TODO
Metrics = "accuracy" #TODO keep in mind that this can be multiple metrics including at least the accuracy
myModel.compile(loss= Loss, optimizer = Optimizer, metrics = Metrics)
myHistory = myModel.fit(train_ds,
                      validation_data=validation_ds,
                      epochs = Epochs)

Next, you need to create (1) a plot of training and validation `loss` and (2) a plot of training and validation `accuracy`. These plots might give you some insights about your model performance and possibility of overfitting.

Report the performance of your architecture on the validation set in a `confusion matrix`. Make comments on the performance by answering the following questiosns:
- How well do you think your architecture is doing (overall accuracy)?
- Where did it makes mistake most?
- Which classes can be improved?

In [None]:
# From slides
# def plot_learning_curves(loss, accuracy):
#     plt.plot(np.arange(len(loss)) + 0.5, loss, "b.-", label="Training loss")
#     plt.plot(np.arange(len(accuracy)) + 1, accuracy, "r.-", label="Accuracy")
#     plt.gca().xaxis.set_major_locator(mpl.ticker.MaxNLocator(integer=True))
#     plt.axis([1, 10, -0.02, 3])
#     plt.legend(fontsize=14)
#     plt.xlabel("Epochs")
#     plt.ylabel("Loss")
#     plt.grid(True)

def plot_loss_curves(train, val):
    plt.plot(np.arange(len(train)) + 0.5, train, "b.-", label="Training loss")
    plt.plot(np.arange(len(val)) + 1, val, "r.-", label="Validation loss")
    plt.gca().xaxis.set_major_locator(mpl.ticker.MaxNLocator(integer=True))
    # plt.axis([1, 10, -0.02, 3])
    plt.legend(fontsize=14)
    plt.xlabel("Epochs")
    plt.ylabel("Loss")
    plt.grid(True)

def plot_accuracy_curves(train, val):
    plt.plot(np.arange(len(train)) + 0.5, train, "b.-", label="Training accuracy")
    plt.plot(np.arange(len(val)) + 1, val, "r.-", label="Validation accuracy")
    plt.gca().xaxis.set_major_locator(mpl.ticker.MaxNLocator(integer=True))
    # plt.axis([1, 10, -0.02, 3])
    plt.legend(fontsize=14)
    plt.xlabel("Epochs")
    plt.ylabel("Loss")
    plt.grid(True)

In [None]:
# Your evaluation code here
plot_loss_curves(myHistory.history["loss"], myHistory.history["val_loss"])

In [None]:
plot_accuracy_curves(myHistory.history["accuracy"], myHistory.history["val_accuracy"])

In [None]:
pics = []
actual_values = []

for images, labels in validation_ds:
  for i in range(images.shape[0]):
    pics.append(images[i].numpy().astype("uint8"))
    actual_values.append(labels[i].numpy())

val_labels = np.asarray(actual_values)

In [None]:
from sklearn.metrics import confusion_matrix

expected = actual_values
y_pred = myModel.predict(validation_ds)
# print(len(expected), len(y_pred))
# print(expected[0:5], "\n\n", y_pred[0:5])
lst = []
for i in range(len(y_pred)):
  y_pred[i] = np.argmax(y_pred[i])
  lst.append(y_pred[i][0])

pred_labels = np.asarray(lst)

print(type(pred_labels))
print(type(val_labels))
print(pred_labels[0:5], "\n\n", val_labels[0:5])
# print(y_pred[0][0])

In [None]:
confusion_matrix = sklearn.metrics.confusion_matrix(pred_labels, val_labels)
print(confusion_matrix)

In [None]:
import seaborn as sns
import pandas as pd

con_mat_df = pd.DataFrame(confusion_matrix,
                     index = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],
                     columns = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17])
figure = plt.figure(figsize=(8, 8))
sns.heatmap(con_mat_df, annot=True,cmap=plt.cm.Blues)
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()

## Report Model Configuration
**Num layers:** 17

**Justification for Nodes per layer:** This is the node density that was proposed in the orginial AlexNet. The pattern is to apply increasing kernel layers and decreasing kernel sizes which makes sense for examining big picture then small details of an image. The large number of filters is important for examining many different "angles" or prospectives on an image. Also the final hidden layer is very dense (4096 fully connected) with no dropout so their is a potenitel for overfitting which is observed in the graphs as training always preforms substaintaly better than validation

**Activation Functions:** ReLu becuase it is the fastest activation function for training models (~6x faster than sigmoid and tanh)

**Total Number of Free Parameters:**

Total params: 13,267,730

Trainable params: 13,264,978

Non-trainable params: 2,752

## Analysis of Model
**How well is my architecture is doing (overall accuracy):** Validation Accuracy of 75%

**Where did my model makes mistake most?** Label 13 was the worse preformence as identified by scanning incrorrect predictions across a confusion matrix row comparing ratio of mislabeled : correct labeled. Label 14 was comparibly bad as well. 'OlssonHall' and 'RiceHall' respectively.

**Which classes can be improved?** Most of them could be improved. From the density of actual labels I do not think the image set is balenced which makes the NN predict the more frequent images more than less likely images. There is not a clear "accurate" diagonal in this confusion matrix but is stronger in existence for lower number labels. Again I think this is becuase proportionally their are more images of the lower labels.

# Step 4: Use a Pre-trained Network with Transfer Learning
Now that you have a your own custom model and some baseline performance, let's see if you can improve the performance using transfer learning and a pre-trained model. You may use any pre-trained model EXCEPT ones that already provided such as `Xception`, `MobileNet`, `EfficientNetB6`. Keep in mind that each pre-trained model may expect a different input shape, so adjust the size of your training images accordingly.

Make sure you report the design of this architecture by answer the same questions 1-4 in Step 3.

Hint: use `ImageNet` as weights when load the pre-train network, then add a `GlobalAveragePooling2D` and an output layer with `softmax` activation.



In [None]:
!pip install keras_applications

In [None]:
# Your code here
from tensorflow.keras.applications import ResNet50, resnet50
base_model = ResNet50(weights='imagenet', include_top=False)

In [None]:
from tensorflow.python import train
# https://stackoverflow.com/questions/70091290/tensorflow-datasets-crop-resize-images-per-batch-after-dataset-batch
# def resize_data(images):
#   tf.print('Original shape -->', tf.shape(images))
#   SIZE = (224, 224)

#   return tf.image.resize_with_crop_or_pad(images, SIZE[0], SIZE[1])

def preprocess(image, label):
  resized = keras.applications.resnet50.preprocess_input(image)
  return resized, label

train_images_resized = train_ds.map(preprocess)
validation_images_resized = validation_ds.map(preprocess)

Next, you will attempt to adapt this pre-trained model to your UVA Landmark dataset. It is recommended that you tried the two-phase training approach for your model:

1.   Phase 1: Freeze the pre-train weights and only train the top layer
2.   Phase 2: Train the entire network with much smaller learning rate (adapt the model to UVA data, but avoid destroying the transfered weights).



In [None]:
# Phase 1 code here
for layer in base_model.layers:
  layer.trainable = False

# How to do this source: https://towardsdatascience.com/build-a-custom-resnetv2-with-the-desired-depth-92892ec79d4b
 # Add classifier on top.
# v2 has BN-ReLU before Pooling
# X = BatchNormalization()(X)
# X = Activation('relu')(X)
# X = AveragePooling2D(pool_size=8)(X)
# y = Flatten()(X)
# y = Dense(512, activation='relu')(y)
# y = BatchNormalization()(y)
# y = Dropout(0.5)(y)

# outputs = Dense(num_classes,
#                 activation='softmax')(y)

# # Instantiate model.
# model = Model(inputs=inputs, outputs=outputs)

avg = keras.layers.GlobalAveragePooling2D()(base_model.output)
#one hidden layers
hidden = keras.layers.Dense(1024, activation='relu')(avg)
red = keras.layers.Dropout(0.5)(hidden)
#output layers
output = keras.layers.Dense(18, activation="softmax")(red)

franken_model = keras.Model(inputs=base_model.input, outputs=output)

for layer in base_model.layers:
  layer.trainable = True

franken_model.summary()
print(len(franken_model.layers))

In [None]:
# Phase 2 code here
frankenOptimizer = keras.optimizers.Adamax()

frankenLoss = "sparse_categorical_crossentropy"
frankenMetrics = "accuracy"

franken_model.compile(frankenOptimizer, frankenLoss, metrics = frankenMetrics)
frankenHistory = franken_model.fit(train_images_resized,
                      validation_data=validation_images_resized,
                      epochs = 50)

In [None]:
plot_loss_curves(frankenHistory.history["loss"], frankenHistory.history["val_loss"])

In [None]:
plot_accuracy_curves(frankenHistory.history["accuracy"], frankenHistory.history["val_accuracy"])

Repeat the same reporting of performance using the confusion matrix:
- Did this pre-trained network do better overall?
- In which class it improve the accuracy from the above model?
- Which class still has low performance?

Typically, your network must have a reasonable performance of at least 84% overall accuracy to be considered successful in this domain. If your network achieves a accuracy of 94% or above on the validation set, you will also recieve a 10 bonus points, so keep trying!

In [None]:
y_pred = franken_model.predict(validation_ds)

lst = []
for i in range(len(y_pred)):
  y_pred[i] = np.argmax(y_pred[i])
  lst.append(y_pred[i][0])

pred_labels = np.asarray(lst)

print(type(pred_labels))
print(type(val_labels))
print(pred_labels[0:5], "\n\n", val_labels[0:5])

In [None]:
confusion_matrix = sklearn.metrics.confusion_matrix(pred_labels, val_labels)
print(confusion_matrix)

In [None]:
con_mat_df = pd.DataFrame(confusion_matrix,
                     index = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],
                     columns = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17])
figure = plt.figure(figsize=(8, 8))
sns.heatmap(con_mat_df, annot=True,cmap=plt.cm.Blues)
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()

## Report Model Configuration
**Num layers:** 179

**Justification for Nodes per layer:** I kept the orginial ResNet50 ~175 layers the same only adding an averaging layer, then a 1 hidden layers of size 1024 dense nodes. Then a tuned dropout layer to prevent overfitting is used for class prediction in the subsequent softmax layer. No reason to pick 1024 other than I think it is suffiecently large for learning.

**Activation Functions:** ReLu becuase it is the fastest activation function for training models (~6x faster than sigmoid and tanh)

**Total Number of Free Parameters:**

Total params: 25,704,338

Trainable params: 25,651,218

Non-trainable params: 53,120

## Analysis of Model
How well is my architecture is doing (overall accuracy): 0.8246 Validation Loss

Where did my model makes mistake most? Model had a really hard time deserning between the labels 11,13,14,15 as idenfied in the concolution matrix by count of inaacurate predictions per categories.

Which classes can be improved? All could definitely be improved, I fear my validation set is skewed to have too many images from class 17 imprlying. So then the model could blent predict label 17 more times and be "correct" against the validation set

# Step 5: Reflection

Write at least a paragraph answering these prompts: How did your own network perform in comparison to the pre-trained one? What are the major differences between the architectures? Additionally, report on your experience implementing different models for this assignment (Was it hard/easy/fun?, from which part did you learn the most?)!

My Network preformed worse than the pre-trained one. My Model used many filters, stacked convolution layers whereas the pre-trained one realied on layer depth (~180 layers) to learn. I found this assignment very challenging. There does not seem to be any reason or rational for what layers to use, how to arange them and type hyper parameters like nodes, filters, kernels and strides. Could have been nice to see examples of grid search or something to tune these models. I guess the problem is that Neural Nets by nature are "Black-Box" so it is hard for me to infer what should be changed when looking at the output becuase it is so damn sensitive and seemingly irrational. I want to take an online course on deep learning network archetecture to figure some of this stuff out becuase I think it is cool. I just don't really know a good way to do it and I do not want to go to Grad School to find out.

**Note for the Grader**
please consider my pizzia post [here](https://piazza.com/class/l6yw646y6u617h/post/279) as excused justification for why this assignment was turned in late.
I kept trying to come up with my own archecture for step 3 but nothing ever got more than 0.35 accuracy [This is why I just implemented to AlexNet Archetecture, I want to be transparent that I am not claiming that design as my orginial work for plagerism purposes]. The amount of testing I was doing ate up my GPU credits for the last day and becuase the 'franken_model' has so many layers it completely eleminated any GPU credits. I have been locked out of the GPU from 10:30pm 11/30 til at least 5:00pm 12/1. I was temporarily out of credits on my personal google account as well but this blackout was shorter