# Assignment 4: 75 points (+ 10 extra credit for SVM kernel trick)
## Support Vector Classifier, L1 & L2 Regularization, more on Cross Validation.

### IMPORTANT: 
#### You MUST read everything in this notebook CAREFULLY, including ALL code comments.  If you do not, then you may easily make mistakes.

This week we will build a Support Vector Machine as described in our textbook, and evaluate its expected performance using cross-validation, but we will not be combining cross-validation with hyperparameter tuning this time.

Be sure to review the class slides if you need to. (But read the comments in this notebook first.)

You may need to consult the following documentation URLs in order to complete this assignment:

Documentation for LinearSVC: https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html

Documentation for cross validation: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html


### Important note:

The original version of this notebook used a package called 'beepy', but that package will not work if you are using a Python version higher than 3.7.  Thus, the notebook was updated to use a package called 'playsound'. It will be used in Task 7 and the optional Task 8. Instructions for installation and the downloadable audio file were included in Assignment 3.

In [None]:
# Task 1: 5 points.  Set up environment

# If some of these do not import properly, you may need to install them and re-run

import keras
import playsound
import sklearn
import tensorflow
import time
    
import matplotlib         as mpl   
import matplotlib.pyplot  as plt
import numpy              as np   
import pandas             as pd

from keras.datasets          import cifar10 
from playsound               import playsound
from pprint                  import pprint   
from sklearn.linear_model    import SGDClassifier, LogisticRegression
from sklearn.metrics         import confusion_matrix, precision_recall_curve, precision_score, recall_score, f1_score
from sklearn.model_selection import cross_val_predict, cross_val_score, GridSearchCV
from sklearn.pipeline        import make_pipeline
from sklearn.preprocessing   import StandardScaler
from sklearn.svm             import LinearSVC, SVC
from yellowbrick.classifier  import ClassBalance, ClassificationReport, ClassPredictionError, ConfusionMatrix, ROCAUC

np.random.seed(42) 

%matplotlib inline 

'Done'

In [None]:
# Task 2: 5 points 

####################################################################################################
#### Add your code below to load and preprocess the CIFAR-10 dataset as in previous assignments ####
####           including a random seed of 42, all training data, test data and LABEL_NAMES      ####
####################################################################################################





In [None]:
# Task 3, 10 Points

# You will be studying the support vector machine (SVM) this week.
# which is implemented by LinearSVC

# Last week when you implemented a logistic regression classifier
# you noticed that it ran quite slowly, and you will again
# discover that LinearSVC is slow for the same reason.  This is
# because both algorithms implement a binary classifier, i.e. they can only 
# recognize 2 categories.  They deal with the ten categories of 
# the CIFAR dataset by using a one-vs-others approach. 
# Thus, both logistic regression and LinearSVC build TEN models, 
# each recognizing only one of the 10 CIFAR-10 categories, e.g. the 'bird'
# classifier will recognize pictures of birds vs. everything else.

# from sklearn.svm           import LinearSVC
# from sklearn.pipeline      import make_pipeline
# from sklearn.preprocessing import StandardScaler

# It is highly recommended to normalize input data to SVC, usually with
# sklearn's StandardScaler (which creates data that has a mean of 0)
# But we have already normalized our CIFAR data using the max-min method
# so it ranges between 0 and 1.  If you standardize that to center it at 0
# it will actually do slightly worse than just keeping it normalized as is.

# n_jobs is not supported by LinearSVC, so you will have to wait a bit
# for this, more than 10 minutes on my computer.

# 4 points: Create a LinearSVC model with 'random_state' of 42
#           and 'max_iter' to only 100, which will stop way too early, 
#           likely generate a warning, but will save you a LOT of wait time!
# 4 points: Call its fit method to train the model on X_train_flat, y_train
# 2 points: Print the accuracy score of svcModel with an appropriate message 

startTime  = time.perf_counter() 
####################  insert your code below for 10 points ####################

svcModel = 



##########################  insert your code above ############################
stopTime   = time.perf_counter()                                    
print(f'\nElapsed time: {stopTime - startTime:0.0f} seconds')
playsound('yourcodeisdonerunning.m4a') 

# You will see a score that is much worse than the approx 40% accuracy with
# logistic regression, but that's ok for now.  We are just practicing building
# different models.  We will do better than 40% later

In [None]:
# Task 4: 10 Points

# CROSS VALIDATION Revisited

# Previously we performed grid search using cross validation  
# for hyperparameter tuning via the GridSearchCV class.
# However, we can run cross validation with cross_val_score
# without tuning any hyperparameters to get a more precise estimate of its performance.

# Here is the documentation to get the correct parameter names for cross_val_score:
# https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html

# 2 points:  Define svcModel the same way as in the previous cell.
# 8 points:  Call cross_val_score on svcModel and the training data, using 3-fold 
#            cross validation and n_jobs = -1 (both will save time), 
#            'accuracy' for scoring, and set 'verbose' to True which will 
#            print the elapsed time without you computing it yourself.

####################  insert your code below for 10 points ####################

svcModel   =  
xValScores = 

##########################  insert your code above ##########################

print('\nCross validation scores are:\t', xValScores, '\nTheir mean is:\t\t\t', 
      xValScores.mean(), '\nTheir standard deviation is:\t', xValScores.std())

# On my laptop that ran in LESS time than the previous cell
# where we trained the model only once compared to 3 times here.  
# This is because the training used only 2/3 of the data 
# for each of the 3 models and because all 8 of my laptop's cores
# were used because of n_jobs=-1. If you have fewer cores, it will
# likely take longer.


In [None]:
# Task 5, 10 Points

# Investigate both L1 and L2 Regularization

# You may need to consult the documentation at:
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html

# We will build several models with different values for alpha, which
# controls the strength of the regularization. From chapter 4 and my lectures,
# you will learn that if alpha is large enough, L1 regularization will force
# some of the variable coefficients to zero, which might be helpful
# if their associated input variables (the pixel values for our CIFAR-10 data)
# are not very useful for predictions.
# This cannot happen with L2 regularization, but L2's advantage is when
# most or all variables are helpful in making predictions.

# We will switch back to SGDClassifier because it runs MUCH faster than
# LinearSVC or logistic regression.  Here are some of the parameters to use:
# loss = 'modified_huber', n_jobs = -1, random_state = 42

# When alpha = 0, no regularization is used at all. 
# However, if alpha = 0 an error message will occur because alpha 
# is also used to affect the learning rate. So to avoid that error,
# we will set alpha to a VERY small number, effectively 
# eliminating any effect of regularization. We will therefore use:
# alpha=0.0000000000000001
# If you get an error using that value, delete one of the zeros and try again (and again..)

# 8 points: Set baselineSGD to an SGDClassifier with the parameters just described.
# 2 points: Call the fit method of baselineSGD on the training data.

####################  insert your code below for 10 points ####################

baselineSGD = 



###########################  insert your code above ##########################

print('\nFor alpha = 0.0000000000000001,')
print(' Baseline score on test data is:\n', baselineSGD.score(X_test_flat, y_test))   
print(' Baseline coefficients are:\n') 
pprint(baselineSGD.coef_)   # Look at the learned coefficients
                            # You should not see any coefficient with value of 0.0
    
# When I did this, I had a baseline accuracy score of 0.273

In [None]:
# Task 6: 10 Points   

# You may need to consult the documentation at:
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html

# Now we will compare that baseine to models that use either L1
# or L2 regularization, to help reduce overfitting. We will
# look at the coefficients because we want to see if any of them are forced
# to have zero values when we use L1, assuming that alpha is large enough
# for that to happen.

# Define and train an L1 regularized SGDClassifier and 
# another one with L2 and display the accuracy scores of both 
# to compare their performance. Use 0.0008 for the value of alpha.  
# Use 'pprint' (NOT 'print) to print the values of the model coefficients.

# For loss, n_jobs and random_state use the same values as the previous cell.

# To build a classifier with L1 use: 
# penalty='l1'
# and for L2 use:
# penalty='l2'

# On my computer this takes less than 3 minutes.

# 5 points: Set l1SGD to the SGDClassifier described above using L1 penalty
#           Call its fit method to train it.
#           Print the trained model's score with some appropriate message.
#           Use 'print' to print some appropriate message about the coefficients.
#           Use 'pprint' to print those model coefficients. To access the
#           coefficients reference the 'coef_' property of the trained model,
#           which you can see in the previous cell.
#
# 5 points: Do the same thing again, but using l2SGD for L2 penalty

####################  insert your code below for 10 points ####################  

print('\nFor alpha = 0.0008') # Now enter the rest of the code yourself, above description

l1SGD = 





    
l2SGD = 





###########################  insert your code above ##########################


#### What do you see in the output?  
You should be able to see that many of the coefficients for the L1 model were squeezed down to 0, which means their associated pixels will be ignored.

On my computer I got better accuracy with the L1 model, suggesting that the L1 assumption of pixel importance is true.  However, there are other parameter settings I was able to find such that L2 always does better than L1, which is generally the case for most models, and explains why L2 is the default regularization technique for SGDClassifier.

In [None]:
# Task 7, 25 Points  

# You may need to consult the documentation at:
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html

# Elasticnet Regularization is when you use BOTH L1 and L2 in your model.
# For Elasticnet there is another parameter called l1_ratio in sklearn
# that ranges between 0 and 1 and controls the relative strength of 
# L1 vs. L2. When l1_ratio = 0, then L1 is ignored and the model only uses 
# L2, and when l1_ratio = 1, then L2 is ignored and it's only using L1.

# So let's see if we can do better than the previous cell by 
# using elasticnet.  In my own experimentation, I found that
# with the models we have been using here, L1 does best with
# alpha values near 0.0008 and L2 does best around 0.0005.
# So let's try several different values for l1_ratio all 
# with alpha=0.00065, which is the midpoint for 0.0005 and 0.0008

# Write a python for-loop to:
# 5 points: loop through the values [0.25, 0.33, 0.5, 0.67, 0.75] 
#           with loop variable L1Ratio
# 5 points: create an SGDClassifier using a penalty of 'elasticnet', 
#           alpha of 0.00065 for each model, l1_ratio set to the loop variable,
#           and the same values for loss, n_jobs, and random_state as before.
# 5 points: fit the model on the training data
# 5 points: print messages indicating the value of L1Ratio and the model's score
# 5 points: print messages about the elapsed time, and call playsound

# Also add code to print the elapsed time information and notify yourself with playsound

####################  insert your code below for 25 points ####################


print('For alpha = 0.00065') # Add the rest of the code yourself, per instructions above








###########################  insert your code above ##########################

# Notice how the accuracy fluctuates with each
# increasing value of L1Ratio through the loop.
# Hyperparameter tuning would be easier if it was more predictable!
# But the reason I did not ask you to do hyperparameter tuning here
# is simply to save time during training, but this will still be
# relatively slow, about 6 minutes on my computer.

In [None]:
# Task 8: 10 extra credit points

# Polynomial SVC with Kernel Trick

# If you choose to do this optional task, you will be using
# a different support vector machine algorithm called 'SVC'.

# You will need to consult the documentation for SVC at:
# https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html 
# for the appropriate parameter names to use as arguments.

# 8 points: Define a a polynomial support vector classifier of degree 5
#           Set coef0 to 5, C to 5 and max_iter to only 200 (to save time again)
# 2 points: Train the model 

# About 5 minutes on my computer
# Note: After you see the elapsed time message you will
#       STILL have to wait before you see the prediction score.
 
startTime   = time.perf_counter() 

####################  insert your code below for 10 points ####################

svcModel    = 


###########################  insert your code above ###########################

stopTime    = time.perf_counter()              
print(f'\nElapsed time:\t {stopTime - startTime:0.4f} seconds')  

print('SVC Score with Kernel Trick:', svcModel.score(X_test_flat, y_test))  # Even this slow, so be patient!
playsound('yourcodeisdonerunning.m4a')

# You will see that this does not do very well as a classifier on this dataset
# but we could certainly do better if we tune several of the hyperparameters