# Assignment 2: 75 points (+ 10 extra credit)
# Cross Validation and Hyperparameter Tuning for CIFAR-10

### IMPORTANT: 
#### You MUST read everything in tnis notebook CAREFULLY, including ALL code comments.  If you do not, then you may easily make mistakes.

In Week 1 we discussed Cross Validation and in Week 2 we discuss Hyperparameter Tuning.  In this assignment we will learn how to use Cross Validation to help you perform Hyperparameter Tuning.  Be sure to review the class slides if you need to.

You will need to consult the following documentation URLs in order to conplete this assignment:

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html 

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html



In [1]:
# Task 1: 5 points.  Set up environment

# As we did for Assignment 1, you will need the following imports.   
# If some of these do not import properly, you may need to install them
# and then re-run this cell.

import keras
import sklearn
import tensorflow
import time

import beepy              as bp    # for audio alerts
import matplotlib         as mpl   # for graphing
import matplotlib.pyplot  as plt
import numpy              as np    # for fast vector and matrix operations
import pandas             as pd

from keras.datasets          import cifar10  # The Keras package comes with several datasets, incl. CIFAR10
from pprint                  import pprint   # pprint means 'pretty print'.  You'll see why when we use it.
from sklearn.linear_model    import SGDClassifier
from sklearn.model_selection import cross_val_predict, cross_val_score, GridSearchCV

np.random.seed(42) # for reproducibility
# The next line tells Jupyter to show all plots inside the notebook
%matplotlib inline 

'Done'

'Done'

In [2]:
# Task 2: 10 points

# Load and prepare the CIFAR-10 dataset.
# You already did all of this in Assignment 1, so you can go back to that
# assignment and copy what you need into this cell.
# Be sure to include:
# 1. Setting the values of: X_train, y_train, X_test, and y_test
#    with X_train and X_test as normalized 'float32' numbers,
#    and with y_train and y_test converted to 1-dimensional vectors
# 3. Creation of X_train_flat and X_test_flat by reshaping X_train and X_test
#
# Again, ALL of this can be found in Assignment 1, so it should be
# an easy 10 points for you here.

# This cell needs: 
# import numpy as np
# from   keras.datasets import cifar10

np.random.seed(42) # Make this notebook's output stable across runs

#################### Insert your code below for 10 points ###############

(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()

# Normalize the data
X_train  = X_train.astype('float32')
X_test   = X_test.astype('float32')
X_train /= 255.0                      # The largest number is 255, and the smallest 0
X_test  /= 255.0                      # So this division will normalize the data.

X_train_flat = X_train.reshape(50000, X_train.shape[1]*X_train.shape[2]*X_train.shape[3])
X_test_flat  = X_test.reshape(10000, X_train.shape[1]*X_train.shape[2]*X_train.shape[3])

# We also have to use ravel to change the target values (the values we want to predict). 
# e.g. y_train has an original shape of (50000, 1), i.e. a 2-dimensional matrix, albeit with only one column.
# If we keep it in that shape, it will cause our modeling software to complain because 
# it wants the target values to appear as a 1-dimensional vector, which is what ravel will do for us.

y_train = np.ravel(y_train)
y_test  = np.ravel(y_test)
########################### Your code ends above ###########################

'Done' 

'Done'

In [3]:
# Task 3: 15 points

# Use CROSS VALIDATION to perform GRID SEARCH

# We have not yet studied Regularization, but we will later in this class.
# Just know that it's an important concept and there are hyperparameters
# to control it.  This assignment will help find good values for it. 

# We use grid search to find better values than the default for the hyperparameters 
# 'penalty' (for regularization, with default l2), 'alpha' (default 0.0001), 
# which controls the strength of regularization, and loss functions (default 'hinge')

# You may need to consult the following to complete this cell:
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html
# https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

# This cell needs the additional imports:
# import time
# from sklearn.linear_model     import SGDClassifier
# nfrom sklearn.model_selection import GridSearchCV

# First, set up your grid in the form of a Python dictionary.
# Each key is the string name of a hyperparameter, and its value
# is a list of values for that hyperparameter to use:
# For key 'loss', use values 'hinge' and 'modified_huber'
# For key 'penalty', use values 'l1' and 'l2'.  Those are the names
#    of two different regularizations that we'll study later.
# For key 'alpha', use the values: 0.0001, 0.001, 0.01, and 0.1
#    The value of alpha controls the strength of the regularization

# I have added the code already for the 'loss' functions.
# You need to add the code for 'penalty' and 'alpha'.

#################### Insert your code below for 5 points ##################
hp_grid    = {'loss':    ['hinge', 'modified_huber'], # Default is hinge
              'penalty': ['l1', 'l2'],
              'alpha':   [0.0001, 0.001, 0.01, 0.1]
             } 

########################### Your code ends above ###########################

# Note that with this grid, we will be building 16 models (2 * 2 * 4) 
# And all 16 models will be trained 3 times because of the 3-fold cross-validation

# Next, call SGDClassifier and pass it these arguments: 
# (look at the documentation for the exact argument names and perhaps other values)
# A maximum of 30 iterations. 

# You will likely see warning messages suggesting
# that you increase the number of iterations.  In the 'real world' you
# would probably use the default of 1000 iterations, but I'm having you
# use only 30 to save you a lot of time, while still getting practice.

# The maximum number of jobs (processors).  (We have already used this in Assignment 1)
# A random state of 42
# Save the classifer into the variable 'sgd'

# Then call GridSearchCV to create your grid search object, passing it the arguments:
# sgd
# hp_grid that you created above
# and a setting to set GridSearchCV to do 3-fold cross validation, again to save time
# over the default of 5-fold. 
# Save your grid search in the variable gridSearch

#################### Insert your code below for 5 points ##################
sgd        =     SGDClassifier(max_iter=30, n_jobs=-1, random_state=42)     
gridSearch =     GridSearchCV(sgd, hp_grid, cv=3)
########################### Your code ends above ###########################

# The variable gridSearch now encapsulates both your SGDClassifer as well as your 
# hyperparameter grid.  You can treat it just as if it was the name of a model
# that now needs to be fitted to the data.  The2efore, call gridSearch with its fit method
# and pass it the two arguments X_train_flat and y_train
# I have set up some timing, print statements and an audio alert for you.

startTime = time.perf_counter()                               # Capture the starting time 

########### Insert your grid search code below for 5 points ################
gridSearch.fit(X_train_flat, y_train)
########################### Your code ends above ###########################

stopTime  = time.perf_counter()                               # Capture the ending time
print(f'Elapsed time:\t {stopTime - startTime:0.4f} seconds') # Compute and display the time difference

print('Best params:\t', gridSearch.best_params_)              # Let's look at the best hyperparameter values.
print('Best estimator:\t', gridSearch.best_estimator_)        # You can get additional info this way, 
                                                              #including the best model's hyperparameter settings. 
bp.beep(sound='ping')                                         # To get your attention when this code is done




Elapsed time:	 234.9292 seconds
Best params:	 {'alpha': 0.1, 'loss': 'modified_huber', 'penalty': 'l2'}
Best estimator:	 SGDClassifier(alpha=0.1, loss='modified_huber', max_iter=30, n_jobs=-1,
              random_state=42)


In [4]:
# This code is adapted from Geron's chapter 2 cell, where he says: 
# "Let's look at the score of each hyperparameter combination tested during the grid search:"

gridResults = gridSearch.cv_results_

# show everything captured in the grid search cross-validation 
print('Plain print version of grid results: \n\n', gridResults)  # This will be kind of messy and hard to read


Plain print version of grid results: 

 {'mean_fit_time': array([7.8221519 , 4.05920045, 8.58809861, 4.04238025, 3.31697234,
       4.15598726, 8.80018234, 4.21493864, 1.89517617, 3.96844935,
       6.56839379, 5.32015308, 1.84254694, 1.96242428, 3.04668903,
       5.36755904]), 'std_fit_time': array([0.02033203, 0.01659742, 0.06956068, 0.01548768, 0.06983328,
       0.01964309, 0.46369796, 0.00755246, 0.01677323, 0.04872959,
       1.07619215, 0.01790813, 0.00685765, 0.09414567, 0.00616069,
       0.17603815]), 'mean_score_time': array([0.03847869, 0.03172286, 0.0284543 , 0.03680428, 0.04606692,
       0.03740414, 0.03186742, 0.030605  , 0.03459175, 0.03001968,
       0.03066969, 0.02849825, 0.03420432, 0.03267741, 0.02885365,
       0.03096652]), 'std_score_time': array([0.0059816 , 0.00219589, 0.00178622, 0.00474676, 0.00498781,
       0.00551577, 0.00298861, 0.00111484, 0.01088469, 0.00091648,
       0.00195335, 0.00014131, 0.00447988, 0.0054345 , 0.00437231,
       0.00373838]), '

In [5]:
# Task 4: 5 points

# 'pretty print' the gridResults and compare to the plain print version above
# from pprint import pprint   # pprint means 'pretty print'

print('Pretty Printed version:\n')

 # Just call the 'pprint' function and pass it gridResults as the argument
#################### Insert your code below for 5 points ##################
pprint(gridResults)
########################### Your code ends above ###########################

# You should notice that it's easier to read than the preious cell

Pretty Printed version:

{'mean_fit_time': array([7.8221519 , 4.05920045, 8.58809861, 4.04238025, 3.31697234,
       4.15598726, 8.80018234, 4.21493864, 1.89517617, 3.96844935,
       6.56839379, 5.32015308, 1.84254694, 1.96242428, 3.04668903,
       5.36755904]),
 'mean_score_time': array([0.03847869, 0.03172286, 0.0284543 , 0.03680428, 0.04606692,
       0.03740414, 0.03186742, 0.030605  , 0.03459175, 0.03001968,
       0.03066969, 0.02849825, 0.03420432, 0.03267741, 0.02885365,
       0.03096652]),
 'mean_test_score': array([0.28413955, 0.26485935, 0.28285973, 0.30557922, 0.24434035,
       0.31201946, 0.28337981, 0.28163977, 0.10207999, 0.33569939,
       0.30737989, 0.32753939, 0.09998   , 0.3301795 , 0.10002   ,
       0.37485992]),
 'param_alpha': masked_array(data=[0.0001, 0.0001, 0.0001, 0.0001, 0.001, 0.001, 0.001,
                   0.001, 0.01, 0.01, 0.01, 0.01, 0.1, 0.1, 0.1, 0.1],
             mask=[False, False, False, False, False, False, False, False,
                 

In [6]:
# In the preceding cell, notice that the 'params': section shows the hyperparameter 
# settings for each of the grid search models and 'mean_test_score': 
# gives the average of the cross validation splits for each of the 16 models

# Here are the mean test scores and their hyperparameter values for the models.
for test_score, params in zip(gridResults["mean_test_score"], gridResults["params"]):
    print(test_score, params)
    
# Just look at the average test score at the beginning of each line,
# and you will notice that the largest value is the line that shows
# what you saw above as output for the best parameter values:
# print('Best params:\t', gridSearch.best_params_)

0.2841395494183 {'alpha': 0.0001, 'loss': 'hinge', 'penalty': 'l1'}
0.2648593505988995 {'alpha': 0.0001, 'loss': 'hinge', 'penalty': 'l2'}
0.2828597322209322 {'alpha': 0.0001, 'loss': 'modified_huber', 'penalty': 'l1'}
0.30557922262891596 {'alpha': 0.0001, 'loss': 'modified_huber', 'penalty': 'l2'}
0.24434034540238034 {'alpha': 0.001, 'loss': 'hinge', 'penalty': 'l1'}
0.31201945623874033 {'alpha': 0.001, 'loss': 'hinge', 'penalty': 'l2'}
0.2833798058228203 {'alpha': 0.001, 'loss': 'modified_huber', 'penalty': 'l1'}
0.28163976862068424 {'alpha': 0.001, 'loss': 'modified_huber', 'penalty': 'l2'}
0.10207998888144161 {'alpha': 0.01, 'loss': 'hinge', 'penalty': 'l1'}
0.33569938945634864 {'alpha': 0.01, 'loss': 'hinge', 'penalty': 'l2'}
0.3073798922437488 {'alpha': 0.01, 'loss': 'modified_huber', 'penalty': 'l1'}
0.3275393906498445 {'alpha': 0.01, 'loss': 'modified_huber', 'penalty': 'l2'}
0.0999799996799776 {'alpha': 0.1, 'loss': 'hinge', 'penalty': 'l1'}
0.3301795046542367 {'alpha': 0.1, '

In [7]:
# For the best readability put the grid search results into a pandas dataframe,
# and you will see all available information
# import pandas as pd
pd.DataFrame(gridSearch.cv_results_)  

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_alpha,param_loss,param_penalty,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
0,7.741982,0.05801,0.050847,0.003287,0.0001,hinge,l1,"{'alpha': 0.0001, 'loss': 'hinge', 'penalty': ...",0.298914,0.291894,0.26161,0.28414,0.016186,8
1,4.068912,0.040881,0.044123,0.01335,0.0001,hinge,l2,"{'alpha': 0.0001, 'loss': 'hinge', 'penalty': ...",0.269815,0.292374,0.232389,0.264859,0.024738,12
2,8.720941,0.058463,0.029858,0.001979,0.0001,modified_huber,l1,"{'alpha': 0.0001, 'loss': 'modified_huber', 'p...",0.281814,0.297294,0.269471,0.28286,0.011383,10
3,4.108213,0.068133,0.035931,0.006869,0.0001,modified_huber,l2,"{'alpha': 0.0001, 'loss': 'modified_huber', 'p...",0.336413,0.313614,0.266711,0.305579,0.029018,7
4,3.386056,0.095332,0.032073,0.005512,0.001,hinge,l1,"{'alpha': 0.001, 'loss': 'hinge', 'penalty': '...",0.224756,0.246655,0.26161,0.24434,0.015135,13
5,4.15482,0.011835,0.037901,0.002727,0.001,hinge,l2,"{'alpha': 0.001, 'loss': 'hinge', 'penalty': '...",0.343253,0.307974,0.284831,0.312019,0.024022,5
6,8.859801,0.478158,0.034725,0.010017,0.001,modified_huber,l1,"{'alpha': 0.001, 'loss': 'modified_huber', 'pe...",0.284514,0.291954,0.273671,0.28338,0.007507,9
7,4.213021,0.018744,0.028877,0.001055,0.001,modified_huber,l2,"{'alpha': 0.001, 'loss': 'modified_huber', 'pe...",0.299934,0.274915,0.270071,0.28164,0.013086,11
8,1.896812,0.011559,0.032462,0.007012,0.01,hinge,l1,"{'alpha': 0.01, 'loss': 'hinge', 'penalty': 'l1'}",0.104758,0.099958,0.101524,0.10208,0.001999,14
9,3.944738,0.042876,0.032075,0.002323,0.01,hinge,l2,"{'alpha': 0.01, 'loss': 'hinge', 'penalty': 'l2'}",0.329513,0.372413,0.305172,0.335699,0.027797,2


In [7]:
# Task 5: 10 points   

# Build another model using the best combination of hyperparameters
# from your cross validation grid search in the earlier cell.
# Also use:
# max_iter     = 1000 # The default, for valid comparison with Assignment 1 
# n_jobs       = -1   # So you don't have to wait as long
# random_state = 42   # For valid comparison with Assignment 1 

# Also: time your model, use beepy, and end with displaying the accuracy score
# on the test data: X_test_flat and y_test.  You already did this in Assignment 1,
# and I also did all of that except the test data accuracy above.

# 4 points for creating your classifier with the proper settings just described
# and 1 point each for:
#  capturing the start time
#  fitting your model
#  capturing the stop time
#  printing a timing message
#  calling beepy
#  displaying the model's score on the test data

#################### Insert your code below for 10 points ##################

sgdBest = SGDClassifier(max_iter=1000, n_jobs=-1, random_state=42) 
BestParams    = {'loss':    ['modified_huber'], 
              'penalty': ['l2'],
              'alpha':   [0.1]
             } 
best_grid_search = GridSearchCV(sgdBest, BestParams, cv=3)
startTime        =                        time.perf_counter()                                                                                     # Capture the starting time                1 points
best_grid_search.fit(X_test_flat, y_test)                                                                                                         # Train the model                          4 points
stopTime         =                        time.perf_counter()                                                                                     # Capture the ending time                  1 points
print(f'\nElapsed Time Default:           {stopTime - startTime:0.0f} seconds')                                                                   # Display the elapsed time                 1 points
bp.beep(sound='ping')                                                                                                                             # Invoking favorite beepy sound.
print('Default Model Accuracy: ',         best_grid_search.score(X_test_flat, y_test))                                                            # Accuracy of predictions on the test data 3 points
  
########################### Your code ends above ###########################

# How did your model do?  When I did this, the accuracy was more than 
# 7 percentage points better than the baseline SGD model from assignment 1


Elapsed Time Default:           10 seconds
Default Model Accuracy:  0.451


In [8]:
# Task 6: 10 points 

# Given the grid search results, the best L2 parameter value for alpha is 0.1, 
# (assuming you got the same results as I did)
# but that was the highest value attempted in the grid search.
# Could an even higher value of alpha do even better?
# Let's try out 0.2 manually to find out. Keep everything else the same as the previous cell.
# This relates to Geron's "Tip" box in the "Grid Search" section.
# Make sure you print out the score on the test data to see the results

#################### Insert your code below for 5 points ##################

anotherSGD = SGDClassifier(max_iter=1000, n_jobs=-1, random_state=42) 
BestParams    = {'loss':    ['modified_huber'], 
              'penalty': ['l2'],
              'alpha':   [0.2]
             } 
anotherGridSearch = GridSearchCV(anotherSGD, BestParams, cv=3)
startTime        =                        time.perf_counter()                                                         # Capture the starting time                1 points
anotherGridSearch.fit(X_train_flat, y_train)                                                                          # Train the model                          4 points
stopTime         =                        time.perf_counter()                                                         # Capture the ending time                  1 points
print(f'\nElapsed Time Default:           {stopTime - startTime:0.0f} seconds')                                       # Display the elapsed time                 1 points
print('Default Model Accuracy: ',         anotherGridSearch.score(X_test_flat, y_test))                               # Accuracy of predictions on the test data 3 points
bp.beep(sound='success')  

########################### Your code ends above ###########################

# And now, let's do that AGAIN but trying alpha set to 0.3

#################### Insert your code below for 5 points ##################

sgdFinal = SGDClassifier(max_iter=1000, n_jobs=-1, random_state=42) 
BestParams    = {'loss':    ['modified_huber'], 
              'penalty': ['l2'],
              'alpha':   [0.3]
             } 
FinalGridSearch = GridSearchCV(sgdFinal, BestParams, cv=3)
startTime        =                        time.perf_counter()                                                         # Capture the starting time                1 points
FinalGridSearch.fit(X_train_flat, y_train)                                                                            # Train the model                          4 points
stopTime         =                        time.perf_counter()                                                         # Capture the ending time                  1 points
print(f'\nElapsed Time Default:           {stopTime - startTime:0.0f} seconds')                                       # Display the elapsed time                 1 points
print('Default Model Accuracy: ',         FinalGridSearch.score(X_test_flat, y_test))                                 # Accuracy of predictions on the test data 3 points
bp.beep(sound='success')  


########################### Your code ends above ###########################

# alpha=0.2 did slightly better on my computer giving a score of: 0.3999 
# But alpha=0.3 did not do as well.

# So there are THREE lessons here:
# 1. Grid search helps to systematically find better hyperparameters than the defaults.
# 2. If your best hyperparameters are 'edge' values, i.e. either the largest or the smallest 
#    in the range of those tested, you may still be able to find a better value if you extend
#    the range, as shown with the models in this cell.  
# 3. To possibly do even better you should do another grid search
#    with more values slightly below and above the best value found so far.

# Another thing is that we only tested two loss functions.  But if you look 
# at the sklearn documentation for SGDClassifier, there are more than 2
# so we would probably want to try them too, unless the documentation suggests
# that they may not be worthwhile, e.g. some are designed for regression whereas
# we are doing classification.



Elapsed Time Default:           27 seconds
Default Model Accuracy:  0.3998

Elapsed Time Default:           22 seconds
Default Model Accuracy:  0.3857


In [9]:
# Task 7: 20 points

# Set up a NEW grid search around the current best known value for alpha.
# (If your best known value is different from mine, that's fine.  Use yours.)
# Try 2 values less than your best alpha value, then 2 more values
# that are more than the best value (you may skip the best value
# since you already did it above).

# NOTE: You are only creating a grid for 'alpha'.  Don't include any 
#       other hyperparameters in that grid, but make sure you fix
#       all others with their values from your best model so far.

# Supply ALL code, including, for 2 points each:
# the new hp_grid
# the new SGDClassifier for variable sgd, including the same values you just used above for
#     loss, max_iter, n_jobs, and random_state
# the new gridSearch 
# the timing start
# the call to fit gridSearch
# the timing stop
# printing of the elapsed time
# printing of the best_params_ 
# printing of the best_estimator_
# the call to beepy

#################### Insert your code below for 20 points ##################
hp_grid    = {                                                                                                             # the new hp_grid
              'alpha':   [ 0.3, 0.2, 0.01,0.001]                                                                           # Using 2 values above alpha and two values below.
             } 

sgd = SGDClassifier(loss="modified_huber", penalty='l2', max_iter=1000, n_jobs=-1, random_state=42) 
GridSearch = GridSearchCV(sgd, hp_grid, cv=3)
startTime        =                        time.perf_counter()                                                              # Capture the starting time                1 points
GridSearch.fit(X_train_flat, y_train)                                                                                      # Train the model                          4 points
stopTime         =                        time.perf_counter()                                                              # Capture the ending time                  1 points
print(f'\nElapsed Time Default:           {stopTime - startTime:0.0f} seconds')                                            # Display the elapsed time                 1 points
print('Default Model Accuracy: ',         GridSearch.score(X_test_flat, y_test))                                           # Accuracy of predictions on the test data 3 points
print('Best params:\t', GridSearch.best_params_)                                                                           # Let's look at the best hyperparameter values.
print('Best estimator:\t', GridSearch.best_estimator_)                                                                     # You can get additional info this way, 
bp.beep(sound='wilhelm')
########################### Your code ends above ###########################

# The next code is copied from above so you can easily find the best model.
# but you'll need to manually compare the best in this run to your previous best 
# to see which one is overall the best.

gridResults = gridSearch.cv_results_
for test_score, params in zip(gridResults["mean_test_score"], gridResults["params"]):
    print(test_score, params)

# For me, the best model was still alpha = 0.20


Elapsed Time Default:           179 seconds
Default Model Accuracy:  0.3857
Best params:	 {'alpha': 0.3}
Best estimator:	 SGDClassifier(alpha=0.3, loss='modified_huber', n_jobs=-1, random_state=42)
0.2841395494183 {'alpha': 0.0001, 'loss': 'hinge', 'penalty': 'l1'}
0.2648593505988995 {'alpha': 0.0001, 'loss': 'hinge', 'penalty': 'l2'}
0.2828597322209322 {'alpha': 0.0001, 'loss': 'modified_huber', 'penalty': 'l1'}
0.30557922262891596 {'alpha': 0.0001, 'loss': 'modified_huber', 'penalty': 'l2'}
0.24434034540238034 {'alpha': 0.001, 'loss': 'hinge', 'penalty': 'l1'}
0.31201945623874033 {'alpha': 0.001, 'loss': 'hinge', 'penalty': 'l2'}
0.2833798058228203 {'alpha': 0.001, 'loss': 'modified_huber', 'penalty': 'l1'}
0.28163976862068424 {'alpha': 0.001, 'loss': 'modified_huber', 'penalty': 'l2'}
0.10207998888144161 {'alpha': 0.01, 'loss': 'hinge', 'penalty': 'l1'}
0.33569938945634864 {'alpha': 0.01, 'loss': 'hinge', 'penalty': 'l2'}
0.3073798922437488 {'alpha': 0.01, 'loss': 'modified_huber',

### Optional Task 8: 10 extra credit points

Above we performed cross validation during grid search via the GridSearchCV class.
However, we can run cross validation directly on a model to get a good estimate of its performance.  You can do this by using cross_val_score. First, you will need to consult the documentation here to perform this task:

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html

In [47]:
# You have already trained a model called sgdFinal above for Task 6, 
# so you do not need to re-train it here unless you have restarted Anaconda
# or rebooted your computer. If you have done one of those, then you need to 
# re-run task 6 so you can use sgdFinal here.

# REMINDER: You'll need to consult the documentation to determine 
# the positions and/or names of the arguments.

# Here are the possible points for the arguments to cross_val_score:
# 1 point:  your pre-trained model sgdFinal from above
# 1 point:  X_train_flat 
# 1 point:  y_train
# 1 point:  Use 7-fold cross validation
# 1 point:  Use all available processors (cores) to speed up training as we have previously done
# 1 point:  Verbosity is turned off by default.  Turn it on by setting it either to 1 or to True
#           This will print useful output, including the elapsed time
# 1 point:  Use 'accuracy' for scoring
# 1 point:  Print the vector of cross validation scores, i.e. the value of xValScores. 
#           Be sure to print some appropriate label to show what we are looking at.
# 1 point:  Print the mean of xValScores (just use the mean() method)
#           Be sure to print some appropriate label to show what we are looking at.
# 1 point:  Print the standard deviation of xValScores (the std() method)
#           Be sure to print some appropriate label to show what we are looking at.

# from sklearn.model_selection import cross_val_score

#################### Insert your code below for 20 points ##################
crossvalscore = cross_val_score(sgdFinal, X_train_flat, y_train, cv=7, n_jobs = -1,verbose=True,scoring='accuracy')     #Creating an instance of the cross value score.
print("The cross validation scores: "+str(crossvalscore))                                                               #Printing out the mean of the cross validation scores.

print("Mean cross-validation scores: " + str(crossvalscore.mean()))                                                     #Printing out the mean of the cross validation scores.

print("Standard deviation of cross-validation scores: " + str(crossvalscore.std()))                                     #Printing out the standard deviation of the cross validation scores.
# gridSearch =     GridSearchCV(sgd, hp_grid, cv=3)
# new_cross_val_score = cross_val_score(sgdFinal, X_train_flat,y_train, cv=7)
# pd.DataFrame(new_cross_val_score.cv_results_) 


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.


The cross validation scores: [0.33431331 0.33081338 0.33389332 0.33207336 0.31611368 0.33739325
 0.33141977]
Mean cross-validation scores: 0.33086001119518355
Standard deviation of cross-validation scores: 0.006358925985668936


[Parallel(n_jobs=-1)]: Done   7 out of   7 | elapsed:    9.7s finished


### You are done!
Your best model should be approximately 40% accurate on the test data.  That is not very good, but it's still about 4 times better than guessing at random.  In future assignments you will see that we can do much better than 40%.