# Assignment 8: 75 points 
## Neural Networks: Advanced Techniques and Convolutional Neural Network

## Special Notice: for this last assignment, you may take 2 weeks to work on it if you wish to see/read the material in Week 10 for the last task, but that is optional.  I have given you enough information to do the last task even before Week 10 readings and lectures. 

### IMPORTANT: 
#### You MUST read everything in tnis notebook CAREFULLY, including ALL code comments.  If you do not, then you may easily make mistakes.

Be sure to review the class slides if you need to. (But read the comments in this notebook first.)

Detailed documentation for Transfer Learning, written by the creator of Keras, is here:

https://keras.io/guides/transfer_learning/ 

Keras API Topic Reference is here:

https://keras.io/api/

For the last task in this assignment, it is not necessary, but if you want to see all of the options for convolutional layers you can look at the documention here:

https://keras.io/api/layers/convolution_layers/convolution2d/

And for pooling layers (again not really necessary) you can look at:

https://keras.io/api/layers/pooling_layers/

### Heads up: Some of these models will take some time to run, so use playsound (or beepy) and make productive use of your time while you are waiting for them to finish training.

In [44]:
# Task 1: 5 points.  Set up environment

####################################################################################
# If some of these do not import properly, you may need to install them and re-run #
####################################################################################
!pip install playsound
import keras
import sklearn
import tensorflow
import time
    
import matplotlib         as mpl   
import matplotlib.pyplot  as plt
import numpy              as np   
import pandas             as pd
import beepy              as bp
import playsound          as ps

from keras.datasets          import cifar10  
from pprint                  import pprint   

from sklearn.cluster         import KMeans
from sklearn.decomposition   import PCA
from sklearn.ensemble        import BaggingClassifier, ExtraTreesClassifier, RandomForestClassifier, VotingClassifier
from sklearn.linear_model    import SGDClassifier, LogisticRegression
from sklearn.metrics         import confusion_matrix, precision_recall_curve, precision_score, recall_score, f1_score, silhouette_score, homogeneity_score, completeness_score
from sklearn.model_selection import cross_val_predict, cross_val_score, GridSearchCV
from sklearn.neural_network  import MLPClassifier
from sklearn.pipeline        import make_pipeline
from sklearn.preprocessing   import StandardScaler
from sklearn.svm             import LinearSVC, SVC
from sklearn.tree            import DecisionTreeClassifier, export_graphviz
from sklearn.model_selection import train_test_split

from yellowbrick.classifier  import ClassBalance, ClassificationReport, ClassPredictionError, ConfusionMatrix
from yellowbrick.cluster     import SilhouetteVisualizer

from tensorflow.keras.callbacks import EarlyStopping

np.random.seed(42) 

%matplotlib inline 


# Make this notebook's output stable across runs
tensorflow.random.set_seed(42)
np.random.seed(42)

'Done'

SyntaxError: invalid syntax (3222271599.py, line 7)

In [42]:
# Load and prep the CIFAR-10 data

# Note that I have added new lines here 
# that relate to the use of early stopping
# AND I have deleted the lines that created the 
# input data which we do not need for these neural networks

(X_train, y_train), (X_test, y_test) = cifar10.load_data() 

X_train  = X_train.astype('float32')
X_test   = X_test.astype('float32')

# Normalize the data
X_train /= 255.0  # The largest number is 255, and the smallest 0
X_test  /= 255.0  # So this division will normalize the data.

X_valid, X_train = X_train[:5000], X_train[5000:] # 1st 5000 for validation in early stopping #
y_valid, y_train = y_train[:5000], y_train[5000:] # X_train and y_train for early stopping    #

# We also have to use ravel to change the target values (the values we want to predict). 
y_train = np.ravel(y_train)
y_valid = np.ravel(y_valid)
y_test  = np.ravel(y_test)

LABEL_NAMES = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

'Done' 


'Done'

In my chapter 11 lecture slide called 'Network Architecture using Batch Norm' you can see some Keras code for a small network.  For this model we will build our first serious, deep model which will also use Batch Normalization (BN).  There are numerous, significant benefits of BN, so it's important to know how to use it.  One of those benefits is that you can often use a larger learning rate, which helps to significantly reduced training time.  Let's try that in this next model.  

BN is easily implemented, simply by adding the BN layer before each Dense layer.  

In [43]:
# Task 2: 20 points

# Here we will build a BN neural net with 7 hidden layers with 300, 200, 100, 90, 
# 80, 70, and 60 neurons.  The math works out to be 1,044,998 neurons with 
# 7,944 of them not trainable.  Those are the parameters associated with the BN equations.
# With over a million trainable parameters you might expect this to run quite slowly,
# and if you just use the default hyperparameters it will run slowly.  However,
# one of the many benefits of BN is that you can often use a larger learning rate,
# which can dramatically reduce training time.  The default learning rate is 0.001
# but we will use a rate that is 5 times larger than that.

# We will also use early stopping to take advantage of the larger learning rate.
# This is an easy way to modify the default learning rate:

opt = keras.optimizers.Adam(learning_rate=0.005)    # default 0.001

# 5 points: Create a sequential model with 7 hidden layers of 300, 200, 100, 90, 80
#           70, and 60 neurons.  All of them should use the 'elu' activation function
#           and 'he_normal' kernel initializer.  You can refer to assignment 7
#           if you forget how to set those parameters.
#           Of course you will have a Flatten layer in the beginning that accepts
#           an input shape of [32, 32, 3] and a Dense output layer using 'softmax'
#           activation for its 10 neurons.
# 2 points: Add a BatchNormalization layer BEFORE each of the hidden layers, 
#           as well as a BatchNormalization layer in front of the output layer. 
#           Don't change any of the BN parameter default settings. 
# 2 points: In Keras you can give your own customized name to each layer
#           by setting the parameter 'name' to a string variable,
#           for example: name = 'My_Layer_1', or anything else as long is
#           it does not use any spaces. So add a name of your choice to each 
#           of your layers. This helps when looking at the summary.

#   Save the above architecture as the value of the variable cifar10_BN

# 1 point:  Call the summary method of cifar10_BN and verify that you see
#           1,037,054 trainable parameters and 7,944 non-trainable.
# 3 points: Compile this model using sparse_categorical_crossentropy loss,
#           the 'accuracy' metric, and set optimizer = opt, which I 
#           already defined above for you.
# 2 points: Define your early stopping callback as you did in the previous
#           assignment with restore_best_weights set to True, but set patience to 3
#           Save that callback into the variable early_stopping_cb
# 4 points: Call the fit method on cifar10_BN.  Because we are using early
#           stopping you will need to pass it X_train and y_train and use
#           both X_valid and y_valid for your validation data to trigger
#           early stopping. Don't forget to add the reference to early_stopping_cb
#           inside your call to the fit method. Consult Assignment 7 if you need to.
#           Save the results into variable cifar10_BN_History
# 1 point:  Before and after your fit method, add the necessary code
#           to capture the start and stop time, print out the elapsed time
#           print out the accuracy score on the test data, and call playsound.

####################  insert your code below  ####################


opt = keras.optimizers.Adam(learning_rate=0.005)                                                                     # Defining the learning rate for the optimizer.


model = keras.Sequential()                                                                                           # Creating a blank sequential model.


model.add(keras.layers.Flatten(input_shape=(32, 32, 3), name='Input_Layer'))                                         # Adding the input layer.


layer_sizes = [300, 200, 100, 90, 80, 70, 60]                                                                        # Add BatchNormalization and hidden layers.
for i, size in enumerate(layer_sizes):
    model.add(keras.layers.BatchNormalization(name=f'BatchNorm_{i+1}'))
    model.add(keras.layers.Dense(size, activation='elu', kernel_initializer='he_normal', name=f'Hidden_Layer_{i+1}'))# Making a for loop to add each hidden layer.


model.add(keras.layers.BatchNormalization(name='BatchNorm_Output'))                                                  # Adding the Output layer.
model.add(keras.layers.Dense(10, activation='softmax', name='Output_Layer'))                                         # Adding output layer with softmax activation.

opt = keras.optimizers.Adam(learning_rate=0.005)
cifar10_BN.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])                      # Compile the model.

model.summary()                                                                                                      # Display model summary.


early_stopping_cb = keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True)                             # Define early stopping callback.


start_time = time.time()                                                                                             # Train the model with early stopping.

cifar10_BN_History = cifar10_BN.fit(
    X_train, y_train,
    epochs=100,  
    validation_data=(X_valid, y_valid),
    callbacks=[early_stopping_cb]
)

end_time = time.time()                                                                                               # Grabbing the end time.
elapsed_time = end_time - start_time
print(f"Elapsed time: {elapsed_time:.2f} seconds")

test_loss, test_accuracy = cifar10_BN.evaluate(X_test, y_test)                                                       # Evaluate the model on test data
print(f"Test accuracy: {test_accuracy:.4f}")                                                                         # Printing out the test accuracy.


playsound("/Users/matthewivezaj/Desktop/yourcodeisdonerunning.m4a")                                                  # Play a sound to indicate completion

# Don't forget to add your code for timing and output messages
####################  insert your code above  ####################

Model: "sequential_16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_layer (Flatten)       (None, 3072)              0         
                                                                 
 hidden_layer_1 (Dense)      (None, 300)               921900    
                                                                 
 batch_norm_1 (BatchNormaliz  (None, 300)              1200      
 ation)                                                          
                                                                 
 hidden_layer_2 (Dense)      (None, 200)               60200     
                                                                 
 batch_norm_2 (BatchNormaliz  (None, 200)              800       
 ation)                                                          
                                                                 
 hidden_layer_3 (Dense)      (None, 100)             


    Error 277 for command:
        open "C:\Users\matth\OneDrive\Desktop\yourcodeisdonerunning.m4a"
    A problem occurred in initializing MCI.

    Error 305 for command:
        close "C:\Users\matth\OneDrive\Desktop\yourcodeisdonerunning.m4a"
    Cannot specify extra characters after a string enclosed in quotation marks.
Failed to close the file: "C:\Users\matth\OneDrive\Desktop\yourcodeisdonerunning.m4a"


PlaysoundException: 
    Error 277 for command:
        open "C:\Users\matth\OneDrive\Desktop\yourcodeisdonerunning.m4a"
    A problem occurred in initializing MCI.

For me that only ran for 11 epochs (about 10 minutes) and had accuracy of 0.5231999754905701, only slightly better than my best previous model, though your results may be somewhat different.  

#### Transfer Learning
We have discussed transfer learning before and I also discuss it again in my Chapter 11 lecture.  The idea is to load a pretrained model that operates on data that is similar to yours, and use it to recognize the low-level features of your data.  But you can delete the top layer(s) from the pretrained model, and replace it with your own top layer(s) in order to fine tune it for your specific data.  For our next model, we will use a famous predefined model called ResNet-50, which you can download directly from the web.  Let's see how it does on the CIFAR-10 data, even though it was not trained on CIFAR-10.

ResNet-50 has 107 layers and was trained on ImageNet, which has (the last time I looked at it) more than 14 million high-resolution, color images in 1,000 categories. So the fact that it, and many other large models, are freely available to anyone to download and use from the web is a great service to the machine learning community.  Most individuals simply do not have enough computing power at home to train such models.

You can learn more about ImageNet at: https://www.image-net.org/about.php

In [16]:
# Load ResNet-50

# This cell will download ResNet-50 from GitHub in a minute or less.
# The parameter settings are telling it not to include the top layer
# because ResNet was trained to recognize 1,000 categories of images in
# ImageNet.  But we are INCLUDING the pretrained imagenet weights because 
# they will be used to recognize our low-level image features in CIFAR-10.
# We have to tell it that, for our data, the input shape is (32, 32, 3)
# or else it will assume input shapes of (224, 224, 3).
# Finally, we are telling it to use max pooling, which is a concept
# I will discuss in my Chapter 14 lecture on Convolutional Neural Nets.

resnet50_model = tensorflow.keras.applications.ResNet50(include_top=False,       weights="imagenet", 
                                                        input_shape=(32, 32, 3), pooling='max') # Max pooling flattens the last layer from 1,1,2048 to 2048
                                                                                              # and gives better results than avg pooling
resnet50_model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "resnet50"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 32, 32, 3)]  0           []                               
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 38, 38, 3)    0           ['input_1[0][0]']                
                                                                                                  
 conv1_conv (Conv2D)            (None, 16, 16, 64)   9472        ['conv1_pad[0][0]']              
                                                                                                  
 conv1_bn (BatchNormalization)  (None, 16, 16, 

Notice how many parameters there are -- nearly 24 MILLION!  Also notice that 53,120 are not trainable because they are parameters related to the Batch Normalization layers which you can see in the summary output.  Take a few moments to scan through that summary and look at the different types of layers.  

Since this model is already trained, we will only use it to convert our CIFAR-10 images into a flat vector of 2048 numbers.  Notice that the final layer in the summary above shows that the output (after removing the top layer) has shape (1, 1, 2048), i.e. 1 row with only 1 column and a third dimension of 2048 features. 

So instead of using the 3072 features of CIFAR-10, we will use those 2048 in a similar manner to how we previously used a dimensionally-reduced set of features created by an autoencoder or by using principal component analysis.  But we would expect the ResNet features to give us better accuracy since they encode the low-level feature knowledge that ResNet-50 learned when it was trained on the massive ImageNet dataset.  

In [17]:
# ATTENTION: This cell will take time to run, almost 7 minutes
# on my computer, so just let it run until finished.  It is slow
# due to a required preprocessing step described next.

# We need to re-create our training and test data because 
# ResNet-50 requires us to preprocess our data with a custom
# algorithm that does 2 things:
#     First, it changes the order of the 3 color channels.
#     CIFAR-10 uses RGB, but ResNet-50 wants to see BGR.
#     Second, the preprocessor will zero-center the data 
#     but does not scale it.  So you will not see our previous
#     code that normalized the data by dividing it by 255.

# You can see this preprocessing function below called 'preprocess_input'.
# There are several other large models, all available from the Keras
# website, that have their own custom preprocessors.

(X_train, y_train), (X_test, y_test) = cifar10.load_data() 

X_train  = X_train.astype('float32')
X_test   = X_test.astype('float32')

# Process the training and test data, as required by ResNet-50, and then
# use the predict method to convert our 3072 features down to 2048.
X_train = resnet50_model.predict(keras.applications.resnet.preprocess_input(X_train))
X_test  = resnet50_model.predict(keras.applications.resnet.preprocess_input(X_test))

X_valid, X_train = X_train[:5000], X_train[5000:] # 1st 5000 for validation in early stopping 
y_valid, y_train = y_train[:5000], y_train[5000:] # X_train and y_train modified for early stopping  

# We again use ravel to change the target values 
y_train = np.ravel(y_train)
y_valid = np.ravel(y_valid)
y_test  = np.ravel(y_test)

LABEL_NAMES = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

'Done' 




'Done'

In [18]:
# Let's look at the new shapes of our data:

print('X_train shape:\t', X_train.shape)
print('X_valid shape:\t', X_valid.shape)
print('X_test shape:\t',  X_test.shape)

print('\ny_train shape:\t', y_train.shape)
print('y_valid shape:\t', y_valid.shape)
print('y_test shape:\t',  y_test.shape)

X_train shape:	 (45000, 2048)
X_valid shape:	 (5000, 2048)
X_test shape:	 (10000, 2048)

y_train shape:	 (45000,)
y_valid shape:	 (5000,)
y_test shape:	 (10000,)


In [22]:
# Task 3: 10 points

# NEW TOP LAYER FOR THE TRANSFER MODEL

# 5 points: Set up the top layer with 2 Dense layers of 100 
#           neurons each and 'elu' activation.  Set parameter 
#           kernel_initializer to he_normal, and of course, you 
#           must add your Dense softmax output layer.  Make your 
#           first Dense layer the input layer by setting input_shape 
#           to [2048]. You have already done something similar 
#           in a previous assignment. Save the model into transferTop
# 1 point:  Call the summary method of transferTop
# 2 points: Compile transferTop with the sparse_categorical_crossentropy
#           loss function, the Adam optimizer, and accuracy for metrics.
# 2 points: Call the fit method of transferTop on the training data,
#           for 30 epochs, but define an early stopping callback
#           with patience=5 and use it with the validation data.  
#           Save your fitted model result into transferTop_History
#           You have also done something very similar before.

####################  insert your code below  ####################

startTime = time.perf_counter() 

# Definition of model
transferTop = keras.Sequential([                                                                        # Creating a sequential model with 2 dense layers and adding in a softmax activation.
    keras.layers.Dense(100, activation='elu', kernel_initializer='he_normal', input_shape=(2048,)),
    keras.layers.Dense(100, activation='elu', kernel_initializer='he_normal'),
    keras.layers.Dense(10, activation='softmax')  
])


# Summary
transferTop.summary()                                                                                   # Bringing up a summary of the model



# Compile

transferTop.compile(optimizer='adam',                                                                   # Compiling the model.
                    loss='sparse_categorical_crossentropy',
                    metrics=['accuracy'])

# Early stopping callback

early_stopping_cb = EarlyStopping(patience=5, restore_best_weights=True)                                # Defining the early stopping callback and assigning it to a variable.


# Fit
transferTop_History  = transferTop.fit(X_train, y_train,                                        # Fitting the model on the training data and passing some other params.
                                      epochs=30,
                                      validation_data=(X_valid, y_valid),  
                                      callbacks=[early_stopping_cb])


########################### Your code ends above ##############################


stopTime      = time.perf_counter()                                      
print(f'\nElapsed time: {stopTime - startTime:0.0f} seconds') 
print('Accuracy:', transferTop.evaluate(X_test, y_test))

playsound('yourcodeisdonerunning.m4a') 

# In assg 7, this took 289s with all 3072 features, giving 0.49950000643730164
# Now with 2048 features from Resnet50, it took 78 sec in only 9 iters for 0.6582000255584717, massively best model!


Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_9 (Dense)             (None, 100)               204900    
                                                                 
 dense_10 (Dense)            (None, 100)               10100     
                                                                 
 dense_11 (Dense)            (None, 10)                1010      
                                                                 
Total params: 216,010
Trainable params: 216,010
Non-trainable params: 0
_________________________________________________________________
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30

Elapsed time: 18 seconds
Accuracy: [0.992353618144989, 0.6549000144004822]


NameError: name 'playsound' is not defined

In [None]:
# Now we must (yet again) reload and prepare our CIFAR-10
# data because we are done using ResNet-50

(X_train, y_train), (X_test, y_test) = cifar10.load_data() 

X_train  = X_train.astype('float32')
X_test   = X_test.astype('float32')

# Normalize the data
X_train /= 255.0  # The largest number is 255, and the smallest 0
X_test  /= 255.0  # So this division will normalize the data.

X_valid, X_train = X_train[:5000], X_train[5000:] # 1st 5000 for validation in early stopping #
y_valid, y_train = y_train[:5000], y_train[5000:] # X_train and y_train for early stopping  #

# Again use ravel to change the target values
y_train = np.ravel(y_train)
y_valid = np.ravel(y_valid)
y_test  = np.ravel(y_test)

LABEL_NAMES = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

'Done' 


The remaining tasks use an architecture called Convolutional Neural Networks (CNN) that was specifically designed for images.  The original images were the MNIST handwritten digits, but CNNs have evolved for color images and remain the dominant architecture for images.  This assignment is being given to you in Week 9, and we aren't discussing CNNs in class until next week.  So you will learn about them then, but that does not prevent us from actually building some now.  This will help generate your curiosity for next week. 

In [None]:
# Task 4: 10 points

# Baseline CNN Model

# Note: This baseline model will not give very good results!

# I have given you the start of some baseline code
# for a CNN.  The Conv2D layers refer to the actual convolutions
# you'll learn about next week.  These involve the creation of
# filters, which are like small windows that will slide across
# an image looking for imporant features, like lines, arcs, edges, etc.
# Where you see filters=32 that means there are 32 such filters
# that will learn to detect these low level features.  The kernel_size
# parameter defines the size of these filters, in this case 3 pixels by 3 pixels.
# You'll learn about padding next week. Convolutions are generally followed
# by a pooling layer, usually max pooling as you see here, or average pooling.
# Again you'll learn about these next week.  pool_size is also a
# measurement in pixels. 

# 6 points:  Copy the 3 layers here (2 convolutional layers followed
#            by a max pooling layer) and add 2 additional sets after
#            them for a total of 9 layers - 3 sets of 3 layers.
#            Remember that only the first convolutional layer accepts
#            the (32, 32, 3) input images, so remove that from all
#            the other convolutional layers.  You must also make one
#            additional change to the second and third pairs of convolution
#            layers.  In the second pair of convolutions, change the number
#            of filters from 32 to 64.  And in the third pair of convolutions
#            set the number of filters to 128.  After all this you will have
#            3 sets of 2 convolutions followed by a max pooling layer,
#            where set 1 has 32 filters in each convolution, the second set
#            has 64 filters each, and the third set has 128 filters each.
# 1 point:   After the 3rd max pooling layer, add a Flatten layer
#            with no arguments.  This will take the output of that last
#            max pooling layer and flatten it into a long vector, feeding
#            it to the next layer.
# 2 points:  After the Flatten layer, you'll add two Dense layers of 100
#            neurons each, using the same activation function and
#            kernel initializers that you see in the convolutional layers.
# 1 point:   Finally, add your Dense softmax output layer with 10 neurons.

# When you look at the summary, you should see that it has a total
# of 13 layers.

####################  insert your code below  ####################

model = keras.Sequential()                                                                                                                                # Creating the blank sequential model.

model.add(keras.layers.Conv2D(32, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same', input_shape=(32, 32, 3), name='Conv_Layer_1'))# Adding convolutional layer with 32 filters. 
model.add(keras.layers.Conv2D(32, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same', name='Conv_Layer_2'))                         # Adding convolutional layer with 32 filters. 
model.add(keras.layers.MaxPooling2D((2, 2), name='MaxPool_Layer_1'))                                                                                      # Adding maxpooling layer with input shape 2

model.add(keras.layers.Conv2D(64, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same', name='Conv_Layer_3'))                         # Adding convolutional layer with 64 filters. 
model.add(keras.layers.Conv2D(64, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same', name='Conv_Layer_4'))                         # Adding convolutional layer with 64 filters. 
model.add(keras.layers.MaxPooling2D((2, 2), name='MaxPool_Layer_2'))                                                                                      # Adding max pooling layer to match layers with 64 filters.      

model.add(keras.layers.Conv2D(128, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same', name='Conv_Layer_5'))                        # Adding convolutional layer with 128 filters. 
model.add(keras.layers.Conv2D(128, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same', name='Conv_Layer_6'))                        # Adding convolutional layer with 128 filters. 
model.add(keras.layers.MaxPooling2D((2, 2), name='MaxPool_Layer_3'))                                                                                      # Adding max pooling layer to match layers with 128 filters. 

model.add(keras.layers.Flatten(name='Flatten_Layer'))                                                                                                     # Adding flattened layer.

model.add(keras.layers.Dense(100, activation='elu', kernel_initializer='he_normal', name='Dense_Layer_1'))                                                # Adding dense layer with 100 neurons.
model.add(keras.layers.Dense(100, activation='elu', kernel_initializer='he_normal', name='Dense_Layer_2'))                                                # Adding dense layer with 100 neurons.

model.add(keras.layers.Dense(10, activation='softmax', name='Output_Layer'))                                                                              # Output layer with 10 neurons for classification.

model.summary()                                                                                                                                           # Displaying the summary.
####################  insert your code above  ####################

baseCNN.summary()

opt               = keras.optimizers.Adam(learning_rate=0.001) # This is the default learning rate for Adam

early_stopping_cb = keras.callbacks.EarlyStopping(patience            =3, 
                                                  restore_best_weights=True)

baseCNN.compile(loss       = "sparse_categorical_crossentropy",
                optimizer  = opt,
                metrics    =["accuracy"])
# Fit
startTime = time.perf_counter() 

baseCNN_History  = baseCNN.fit(X_train, y_train, 
                               epochs          = 20,         
                               validation_data = (X_valid, y_valid),
                               callbacks       = [early_stopping_cb]) 

stopTime      = time.perf_counter()                                      
print(f'\nElapsed time: {stopTime - startTime:0.0f} seconds') 
print('Accuracy:', baseCNN.evaluate(X_test, y_test))

playsound('yourcodeisdonerunning.m4a') 


Like I said above, this model does not give very good results for an architecture that is specifically designed for images, right?  On my laptop it only had accuracy of 0.39430001378059387 and stopped in 11 minutes after only 4 epochs.  But the next and final task will be to MODIFY this model which will result in your best model BY FAR.  For that final model, you must write ALL of the code yourself. Of course, MUCH of it you have already done for other assignments and tasks, so you could even copy a portion from other cells and modify it for various parameter settings and variable names, etc.  But be careful not to make simple errors!

When you think it's ready to run, double-check that you have:

* defined your model
* saved it into the variable revisedCNN
* defined your (optional, but strongly advised) early stopping callback (I strongly suggest a patience of 3 epochs)
* compiled your model with the loss function, a choice of optimizer, and accuracy for the evaluation metric
* called the fit method on the training data
* specified the maximum number of epochs
* added a reference to the validation data for your callback (if you decide to use it)
* added a reference to your early stopping callback (if you decide to use it)

### Also make sure you do the REQUIRED step of retraining your final, preferred model for at least 10 epochs as described below.

It's totally up to you if you want to add the timing and playsound alert. 

In [None]:
# Task 5: 30 points 

# Revised CNN Model

# Your final task is to REVISE the model just used and see how much better
# you can make it with some combination of the following techniques.  By now,
# you have already either used these techniques, or you have at least
# seen them in the textbook.  

# Here are some things to experiment with:
# 1. Learning Rate
# 2. Adding BatchNormalization layers, which you have already done above for Task 2
#    (I strongly suggest you do this one to get better accuracy.)
# 3. Adding Dropout layers.  You won't see any in the ResNet-50 summary, but
#    Dropout is a common regularization technique to reduce overfitting.  You can 
#    find code examples in the textbook or in Géron's notebook for chapter 11. You 
#    can experiment with different values for the dropout rate. See Géron for advice.
# 4. Adding additional Conv2D layers, followed by a pooling layer.
# 5. You can try an AveragePooling2D layer instead of MaxPooling2D. For this,
#    you can consult: https://keras.io/api/layers/pooling_layers/average_pooling2d/ 
# 6. For your Conv2D layers, you can try changing the kernel_size 
#    to (5, 5) instead of (3, 3)
# 7. Anything else you can think of.

# You do NOT have to try all of these techniques.  They are only ideas.

# STRONGLY SUGGESTED:  If you add several of these changes all at once,
#                      then if the model accuracy is either better or worse
#                      you will not know which technique made it better/worse.
#                      Thus, try adding one at a time, train the model and 
#                      check the accuracy results.  Some things may not give
#                      any benefit at all or may make it MUCH better.
#                      Because you will therefore be doing numerous training runs,
#                      to avoid waiting huge amounts of time, try to run
#                      the baseline model for only 2 epochs, and each revised one
#                      for only 2 epochs.  Then evaluate the results and continue.

# REQUIRED:            For your FINAL model after you decide which changes to keep,
#                      train it for at least 10 epochs (more, if you prefer)
# BE AWARE that this final model will likely take a LONG time to train.  Mine took
#    2 hours, so I STRONGLY advise you to use early stopping so you don't have
#    to wait any longer than necessary.

# Without much effort at all, I was easily able to get my own modified base model
# to achieve more than 78% accuracy, so I stopped there.  Some of you may be able 
# to do better than that, if you are willing to spend the time.  I have seen 
# references to CNN models achieving better than 90% accuracy on the CIFAR-10 data!

# While your grade for this task does NOT depend on doing better than mine, you
# should easily be able to do better than the baseline.  

# REQUIRED: For each model that you try, add some comments to the notebook 
# that BRIEFLY describe the changes and what the final accuracy was for that model.

# Make sure your final code is the model definition that gave you your best results.

# Begin by copying the code for the baseline model into this cell from the previous task,
# run it for only 2 epochs as suggested above, make a note of the results, and then start
# making modified versions until you find something better.  

# When you start making modifications, change the variable name of your model to
# revisedCNN.  Do NOT continue to use the variable baseCNN

# The more brief written summaries/results of models that I see in your comments,  
# the more of the 30 points you will receive.  But you only need to show me your 
# final, best model, which I will also run on my own computer.

# Have fun!!

##########  Your Code Below ###############








##########  Your Code Above ###############


# One last warning:  CAREFULLY read my comments in this entire cell 
# to avoid errors.  ALSO, re-read the checklist in the markup cell 
# just above this cell to avoid missing something important!!!!!!!