Notebook 2: Evaluating Classifier Performance
=============================================

## Goals for learning
In this assignment, we will:

1) Explore methods for calculating classifier performance metrics
2) Reflect on the real-world trade-offs that these performance metrics can inform 
3) Practice working with the array-oriented programming paradigm
4) Gain experience working with off-the-shelf machine learning and data analysis libraries

## Instructions
* Read through the notebook.
* Answer any plain text questions (replace cell content, "YOUR RESPONSE HERE", with your response).
* Insert your code within the code blocks marked with the comments "# START your code here" and "# STOP your code here".
* Do not use loops, iteration, or recursion in any of the code cells.
* Do not use any "Generative AI" tools or assistants in the creation of your solutions.
* Run all cells to make sure your code works and you see reasonable results.

## Submission details
* Due: Monday 9/15, 11:59 PM
* [Submission instructions](https://www.cs.oswego.edu/~agraci2/csc461/submission_instructions.html)

## Notebook premise
You are a machine learning engineer for a large healthcare company. Your organization needs a model for predicting which patients are likely to have diabetes. Much to your dismay, the financial analysts have decided that it would be much more cost effective to purchase a third-party model than to develop one in-house. You have been tasked with evaluating two different third-party models, one from Company A and one from Company B, and recommending which one your company should purchase based on performance. 

## About the dataset
The [Pima Indians Diabetes Database](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database) is a Public Domain data repository provided by the UC Irvine.
It contains medical predictor variables that can be used to diagnostically predict whether or not a patient has diabetes.

##### <b>Question:</b> Based on the above description, does this data best support a *binary* or *multiclass* classification task?

##### <b>Question:</b> Based on the above description, what real-world entity does a row (one item from axis 0) represent?

## Loading our data in Python
The Python code snippet below shows:

1) how to access the database from within a notebook and 
2) how to read in [CSV](https://en.wikipedia.org/wiki/Comma-separated_values)-formatted data using the [Pandas](https://pypi.org/project/pandas/) library

Note: This is the same dataset that we used for Notebook #1.
If you need to download the dataset again, please see the "Dataset: Pima Indians Diabetes Database" submodule on Brightspace, 
or download it from [Kaggle](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database).

In [6]:
import pandas

# START your code here
DATASET_ROOT_DIR='/home/agraci2/data/' # Please edit with your dataset path
# STOP your code here

# dataframe = pandas.read_csv(DATASET_ROOT_DIR + 'pima/diabetes.csv')
df = pandas.read_csv('diabetes.csv')

Use the following cell to further explore the dataset using the [DataFrame](https://pandas.pydata.org/docs/reference/frame.html) datatype's functions, then use the results to answer the questions below.

In [8]:
# START your code here
pandas.set_option('display.max_columns',None)

print(df.head())
print('------------------------------------------------------------')
print("Shape: ",df.shape)
print("Size: ",df.size)
# STOP your code here

   Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0            6      148             72             35        0  33.6   
1            1       85             66             29        0  26.6   
2            8      183             64              0        0  23.3   
3            1       89             66             23       94  28.1   
4            0      137             40             35      168  43.1   

   DiabetesPedigreeFunction  Age  Outcome  
0                     0.627   50        1  
1                     0.351   31        0  
2                     0.672   32        1  
3                     0.167   21        0  
4                     2.288   33        1  
------------------------------------------------------------
Shape:  (768, 9)
Size:  6912


##### <b>Question:</b> How many patients are represented in the database?

## Preparing our "Training Set" and "Evaluation Set" from the data

Now that we have our data loaded, the next thing we want to do is to partition all of our data into two sets:

1) A "Training Set" that can be used to train our classifier models.
2) An "Evaluation Set" (sometimes called a "Test Set") that can be used to evaluate our models against "novel" data (by which I mean data that the model hasn't seen before).

Typically, you want to use the bulk of your data for training in order to get the best results from your model. In this exercise, we will be using an 80/20 split.

In [11]:
import numpy

# We want to use an 80/20 split between our training and test sets
TRAINING_PERCENTAGE=80

# First, we will use the NumPy library to randomly select entries for either the training set or evaluation set
np_random = numpy.random.RandomState(seed=12345)
rand_unifs = np_random.uniform(0,1,size=df.shape[0]) # A collection of random numbers [0,1) corresponding to each entry in our dataframe
division_threshold = numpy.percentile(rand_unifs, TRAINING_PERCENTAGE) # A threshold that the random numbers above can be checked against to see if they fall into the 80% or 20%

# The training set will use the first 80% of entries
train_indicator = rand_unifs < division_threshold # A collection of True/False indicators corresponding to each entry in our dataframe
train_dataframe = df[train_indicator].reset_index(drop=True) # Filter our dataframe based on the training indicators above

# The test set will use the remaining 20% of entries
eval_indicator = rand_unifs >= division_threshold # A collection of True/False indicators corresponding to each entry in our dataframe (inverse of train_indicator)
eval_dataframe = df[eval_indicator].reset_index(drop=True) # Filter our dataframe based on the evaluation indicators above

# Show how many entries (rows) are in our training vs evaluation dataframes:
print(f'Number of entries in the training set: {len(train_dataframe)}')
print(f'Number of entries in the evaluation set: {len(eval_dataframe)}')

Number of entries in the training set: 614
Number of entries in the evaluation set: 154


Use the following cell to further explore the dataset using the [DataFrame](https://pandas.pydata.org/docs/reference/frame.html) datatype's functions, then use the results to answer the questions below.

In [13]:
# START your code here
print('\n------------------------------------------------------------\nTRAINING SET\n------------------------------------------------------------')
print(train_dataframe.head())
print('------------------------------------------------------------')
print("Shape: ",train_dataframe.shape)
print("Size: ",train_dataframe.size)
print('\n------------------------------------------------------------\nTEST SET\n------------------------------------------------------------')
print(eval_dataframe.head())
print('------------------------------------------------------------')
print("Shape: ",eval_dataframe.shape)
print("Size: ",eval_dataframe.size)
# STOP your code here


------------------------------------------------------------
TRAINING SET
------------------------------------------------------------
   Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0            1       85             66             29        0  26.6   
1            8      183             64              0        0  23.3   
2            1       89             66             23       94  28.1   
3            0      137             40             35      168  43.1   
4            5      116             74              0        0  25.6   

   DiabetesPedigreeFunction  Age  Outcome  
0                     0.351   31        0  
1                     0.672   32        1  
2                     0.167   21        0  
3                     2.288   33        1  
4                     0.201   30        0  
------------------------------------------------------------
Shape:  (614, 9)
Size:  5526

------------------------------------------------------------
TEST SET
------

### Separating features from labels in our training and evaluation sets
Remember that in **supervised learning** tasks such as classification, we use **labeled data**. 
This means that the entries (rows) in our database include **lables** (which can be thought of as a special purpose **feature**).

##### <b>Question:</b> Based on what you know from our dataframe object, which columns correspond to our regular input features / predictor variables ($X$)?

##### <b>Question:</b> Based on what you know from our dataframe object, which column corresponds to our output labels ($Y_t$)?

##### Next steps:
Now that we have our training set and evaluation set partitioned, we will want to separate out the features ($X$) from the labels ($Y_t$) for both sets. The resulting datastructures will be [N-dimensional arrays](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html).

In [18]:
# START your code here
LABEL_COLUMN='Outcome'
# STOP your code here

train_features = train_dataframe.loc[:, train_dataframe.columns != LABEL_COLUMN].values # Take all columns except for label
train_labels = train_dataframe[LABEL_COLUMN].values # Take only the label column

eval_features = eval_dataframe.loc[:, eval_dataframe.columns != LABEL_COLUMN].values # Take all columns except for label
eval_labels = eval_dataframe[LABEL_COLUMN].values # Take only the label column

print("Training set (num_rows, num_columns):")
print("- Original training data shape: {}".format(train_dataframe.shape))
print("- New training features shape: {}".format(train_features.shape))
print("- New training labels shape: {}".format(train_labels.shape))
# Note: the absence of a number indicates an implied "1"

print("Evaluation set (num_rows, num_columns):")
print("- Original evaluation data shape: {}".format(eval_dataframe.shape))
print("- New evaluation features shape: {}".format(eval_features.shape))
print("- New evaluation labels shape: {}".format(eval_labels.shape)) 
# Note: the absence of a number indicates an implied "1"

Training set (num_rows, num_columns):
- Original training data shape: (614, 9)
- New training features shape: (614, 8)
- New training labels shape: (614,)
Evaluation set (num_rows, num_columns):
- Original evaluation data shape: (154, 9)
- New evaluation features shape: (154, 8)
- New evaluation labels shape: (154,)


## Implementing evaluation functions
Now that we have a feel for what our inputs look like and how our data sets are structured, 
we are ready to implement functions that calculate various **performance metrics**. 
These metrics will allow us to better understand and compare our classification models.

**Please reference the slides posted to Brightspace or your own notes from class to see the equations for each of the performance metrics.**

### Implementation instructions
* Only place your code between the comments "# START your code here" and "# STOP your code here".
* Do not modify or add code outside of these blocks.
* Do not include any libraries other than those provided.
* **Do not use loops, iteration, or recursion!**
    
### Implementation hints
* For *TruePositives*, *TrueNegatives*, *FalsePositives*, and *FalseNegatives*, look at the [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#) functions, [NumPy ndarray](https://numpy.org/doc/stable/reference/arrays.ndarray.html#) functions, and [NumPy logical](https://numpy.org/doc/stable/reference/routines.logic.html) functions available to you. Some of the functions I found especially helpful are:
    * [numpy.logical_and()](https://numpy.org/doc/stable/reference/generated/numpy.logical_and.html)
    * [numpy.logical_or()](https://numpy.org/doc/stable/reference/generated/numpy.logical_or.html)
    * [numpy.logical_not()](https://numpy.org/doc/stable/reference/generated/numpy.logical_not.html)
    * [numpy.greater()](https://numpy.org/doc/stable/reference/generated/numpy.greater.html)
    * [ndarray.sum()](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.sum.html)
* Remember that in Python, booleans can be used as integers (False=0, True=1)
* For *Accuracy*, *ErrorRate*, *Recall*, *Precision*, and *F1Score*, use the other functions as building blocks.
* Feel free to use print statements to explore your intermediate results during development, but please clean them up before submitting.
* Farther down in the notebook, I have provided a checkpoint for you to validate your work.

In [20]:
# Returns an integer (the number of predictions that were true/1 and matched the known label) 
def TruePositives(predictions, labels):
    
    # START your code here
    tp = sum(numpy.logical_and(predictions==1,labels==1))
    # STOP your code here
    
    return tp

# Returns an integer (the number of predictions that were true/1 while the known label was false/0) 
def FalsePositives(predictions, labels):
    
    # START your code here
    fp = sum(numpy.logical_and(predictions==1,labels==0))
    # STOP your code here
    
    return fp

# Returns an integer (the number of predictions that were false/0 and matched the known label) 
def TrueNegatives(predictions, labels):
    
    # START your code here
    tn = sum(numpy.logical_and(predictions==0,labels==0))
    # STOP your code here
    
    return tn

# Returns an integer (the number of predictions that were false/0 while the known label was true/1)
def FalseNegatives(predictions, labels):
    
    # START your code here
    fn = sum(numpy.logical_and(predictions==0,labels==1))
    # STOP your code here
    
    return fn

# Returns a float (the accuracy)
def Accuracy(predictions, labels):
    
    # START your code here
    tp = TruePositives(predictions, labels)
    fp = FalsePositives(predictions, labels)
    tn = TrueNegatives(predictions, labels)
    fn = FalseNegatives(predictions, labels)
    accuracy = (tp+tn)/(tp+tn+fp+fn)
    # STOP your code here
    
    return accuracy

# Returns a float (the error rate)
def ErrorRate(predictions, labels):
    
    # START your code here
    tp = TruePositives(predictions, labels)
    fp = FalsePositives(predictions, labels)
    tn = TrueNegatives(predictions, labels)
    fn = FalseNegatives(predictions, labels)
    er = (fp+fn)/(tp+tn+fp+fn)
    # STOP your code here
    
    return er

# Returns a float (the recall/sensitivity)
def Recall(predictions, labels):
    
    # START your code here
    tp = TruePositives(predictions, labels)
    fn = FalseNegatives(predictions, labels)
    sensitivity = tp/(tp+fn)
    # STOP your code here
    
    return sensitivity

# Returns a float (the Precision)
def Precision(predictions, labels):
    
    # START your code here
    tp = TruePositives(predictions, labels)
    fp = FalsePositives(predictions, labels)
    precision = tp/(tp+fp)
    # STOP your code here
    
    return precision

def F1Score(predictions, labels):
    
    # START your code here
    sensitivity = Recall(predictions, labels)
    precision = Precision(predictions, labels)
    f1_score = (2*sensitivity*precision)/(sensitivity+precision)
    # STOP your code here
    
    return f1_score

def PrintPerformanceMetrics(title, predictions, labels):
    tp = TruePositives(predictions, labels)
    fp = FalsePositives(predictions, labels)
    tn = TrueNegatives(predictions, labels)
    fn = FalseNegatives(predictions, labels)
    print(title)
    table = [['', 'Predicted 0', 'Predicted 1', ''], ['Actual 0', f'tn={tn}', f'fp={fp}', f'{tn+fp}'], ['Actual 1', f'fn={fn}', f'tp={tp}', f'{fn+tp}'], ['', f'{tn+fn}', f'{fp+tp}', f'n={tp+fp+tn+fn}']]
    # Print the confusion matrix
    table = [['', 'Predicted 0', 'Predicted 1', ''], ['Actual 0', f'tn={tn}', f'fp={fp}', f'{tn+fp}'], ['Actual 1', f'fn={fn}', f'tp={tp}', f'{fn+tp}'], ['', f'{tn+fn}', f'{fp+tp}', f'n={tp+fp+tn+fn}']]
    print("\n{}\t\t|{}\t|{}".format(table[0][0], table[0][1], table[0][2]))
    print("{}\t|{}\t\t|{}".format(table[1][0], table[1][1], table[1][2]))
    print("{}\t|{}\t\t|{}\n".format(table[2][0], table[2][1], table[2][2]))
    # Print computed metrics
    print(f'Number of entries in set: {labels.size}')
    print(f'Accuracy: {Accuracy(predictions, labels)}')
    print(f'Error Rate: {ErrorRate(predictions, labels)}')
    print(f'Recall: {Recall(predictions, labels)}')
    print(f'Precision: {Precision(predictions, labels)}')
    print(f'F1 Score: {F1Score(predictions, labels)}')

print("Done")

Done


## Instantiate and train an off-the-shelf classification model (KNN)

Company A provides you with their model, which uses a K-Nearest Neighbors (KNN) algorithm.

We will explore this algorithm in detail later in the course! For now, we simply want to observe how the model performs, without regard for how it works under the hood.

For our evaluation, we will be using an [off-the-shelf implementation of KNN](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) provided by the [scikit-learn](https://scikit-learn.org/stable/index.html) library. Pretend that this is the model implementation provided by Company A.

In [22]:
%%time
# ^This will display how long it takes to execute this cell

import sklearn.neighbors

# Grab an off-the-shelf classification model for our first example
modelA = sklearn.neighbors.KNeighborsClassifier()

# Use our training set (both features and labels) to train the model
modelA_trained = modelA.fit(train_features, train_labels)

CPU times: total: 953 ms
Wall time: 1.93 s


## Instantiate and train an off-the-shelf Model (Naive Bayes)

Company B provides you with their model, which uses a Naive Bayes algorithm. Again, we only want to observe how the model performs the purposes of this exercise.

For our evaluation, we will be using an [off-the-shelf implementation of Naive Bayes](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html) provided by the [scikit-learn](https://scikit-learn.org/stable/index.html) library. Pretend that this is the implementation provided by Company B.

In [24]:
%%time
# ^This will display how long it takes to execute this cell

import sklearn.naive_bayes

# Grab an off-the-shelf classification model for our second example
modelB = sklearn.naive_bayes.GaussianNB()

# Use our training set (both features and labels) to train the model
modelB_trained = modelB.fit(train_features, train_labels)

CPU times: total: 0 ns
Wall time: 23.9 ms


## Evaluating our classifier models
Now that our models are trained, we are ready to use them to make predictions.
In order to understand how much we can trust the results of these predictions, we will need to understand the performance of our trained models.

In [26]:
%%time
# ^This will display how long it takes to execute this cell

# Use the first model to make predictions based on the training set
modelA_train_pred = modelA_trained.predict(train_features)

# Use the first model to make predictions based on the evaluation set
modelA_eval_pred = modelA_trained.predict(eval_features)

# Next we will use the evaluation metric functions that you implemented earlier and display the results for the first model:
print("------------------------------------------------------------------------")
PrintPerformanceMetrics('Model A Training Data', modelA_train_pred, train_labels)
print("------------------------------------------------------------------------")
PrintPerformanceMetrics('Model A Evaluation Data', modelA_eval_pred, eval_labels)
print("------------------------------------------------------------------------")

------------------------------------------------------------------------
Model A Training Data

		|Predicted 0	|Predicted 1
Actual 0	|tn=362		|fp=43
Actual 1	|fn=75		|tp=134

Number of entries in set: 614
Accuracy: 0.8078175895765473
Error Rate: 0.19218241042345277
Recall: 0.6411483253588517
Precision: 0.7570621468926554
F1 Score: 0.6943005181347152
------------------------------------------------------------------------
Model A Evaluation Data

		|Predicted 0	|Predicted 1
Actual 0	|tn=81		|fp=14
Actual 1	|fn=32		|tp=27

Number of entries in set: 154
Accuracy: 0.7012987012987013
Error Rate: 0.2987012987012987
Recall: 0.4576271186440678
Precision: 0.6585365853658537
F1 Score: 0.54
------------------------------------------------------------------------
CPU times: total: 15.6 ms
Wall time: 22.6 ms


In [27]:
%%time
# ^This will display how long it takes to execute this cell

# Use the model to make predictions based on the training set
modelB_train_pred = modelB_trained.predict(train_features)

# Use the model to make predictions based on the evaluation set
modelB_eval_pred = modelB_trained.predict(eval_features)

# Next we will use the evaluation metric functions that you implemented earlier and display the results for the second model:
print("------------------------------------------------------------------------")
PrintPerformanceMetrics('Model B Training Data', modelB_train_pred, train_labels)
print("------------------------------------------------------------------------")
PrintPerformanceMetrics('Model B Evaluation Data', modelB_eval_pred, eval_labels)
print("------------------------------------------------------------------------")

------------------------------------------------------------------------
Model B Training Data

		|Predicted 0	|Predicted 1
Actual 0	|tn=346		|fp=59
Actual 1	|fn=84		|tp=125

Number of entries in set: 614
Accuracy: 0.7671009771986971
Error Rate: 0.23289902280130292
Recall: 0.5980861244019139
Precision: 0.6793478260869565
F1 Score: 0.6361323155216285
------------------------------------------------------------------------
Model B Evaluation Data

		|Predicted 0	|Predicted 1
Actual 0	|tn=80		|fp=15
Actual 1	|fn=23		|tp=36

Number of entries in set: 154
Accuracy: 0.7532467532467533
Error Rate: 0.24675324675324675
Recall: 0.6101694915254238
Precision: 0.7058823529411765
F1 Score: 0.6545454545454547
------------------------------------------------------------------------
CPU times: total: 0 ns
Wall time: 15.2 ms


In [28]:
# Optional Checkpoint

# Feel free to use this for validating your results
# Note: Because we used the same random seed when shuffling our data, our splits aren't really random.

# START your code here
enable_checkpoint=True
# STOP your code here

# If enabled, check just one of our cases (model B on eval data)
if enable_checkpoint:
    checked_decimals=7
    numpy.testing.assert_almost_equal(Accuracy(modelB_eval_pred, eval_labels), 0.7532467532467533, decimal=checked_decimals, verbose=True)
    numpy.testing.assert_almost_equal(ErrorRate(modelB_eval_pred, eval_labels), 0.24675324675324672, decimal=checked_decimals, verbose=True)
    numpy.testing.assert_almost_equal(Recall(modelB_eval_pred, eval_labels), 0.6101694915254238, decimal=checked_decimals, verbose=True)
    numpy.testing.assert_almost_equal(Precision(modelB_eval_pred, eval_labels), 0.7058823529411765, decimal=checked_decimals, verbose=True)
    numpy.testing.assert_almost_equal(F1Score(modelB_eval_pred, eval_labels), 0.6545454545454547, decimal=checked_decimals, verbose=True)
    print("Passed checkpoint!")

Passed checkpoint!


##### <b>Question:</b> Which model has a greater <u>accuracy</u> when run on **new** data?

##### <b>Question:</b> Assuming your company cares the <u>equally</u> about incorrectly classifying a healthy person as diabetic and incorrectly classifying a diabetic person as healthy, which model would your recommend that your company purchase? Why?

##### <b>Question:</b> Based on the above results, which model can more quickly **train** on large datasets?

##### <b>Question:</b> Based on the above results, which model can more quickly make **predictions** at runtime?

Before you send out your email to management, making your model recommendation, you decide to double-check your work using **cross-validation**, specifically **K-fold cross validation**, in order to make sure you didn't run into any problems with **overfitting** or **selection bias**.

Here are the steps you take:

1) Choose an appropriate value of "K" to determine the size of your "folds".
    * If K=1, this is also called "leave-one-out cross validation"
    * For this task, you decide to define K=8
2) Shuffle your entire data set.
3) Partition your dataset into N/K partitions:
    * Where N is the number of rows in your data.
    * Where K is the size of each partition.
4) For each partition, re-run your training and evaluation procedure (this is called a "fold"):
    * Use the current partition as the **evaluation set**.
    * Use all other partitions as the **training set**.
    * Save off the accuracy for each run.
5) Average the results from each fold.

In [38]:
K=8 # Size of each partition
N=df.shape[0] # Size of total dataset
P=int(N/K) # Number of folds, number of partitions
print("K:{}, N:{}, P:{}".format(K,N,P))

# Use the NumPy library to "shuffle" our data
np_random = numpy.random.RandomState(seed=12345)
rand_unifs = np_random.uniform(0,1,size=df.shape[0]) # A collection of random numbers [0,1) corresponding to each entry in our dataframe

# Create N/K Partitions
fold_percentage=100/P
partitions=[]
for idx in range(0,P):
    lower_percentage = idx * fold_percentage
    upper_percentage = (idx+1) * fold_percentage
    lower_threshold = numpy.percentile(rand_unifs, lower_percentage)
    upper_threshold = numpy.percentile(rand_unifs, upper_percentage)
    lower_indicator = rand_unifs >= lower_threshold
    upper_indicator = rand_unifs <= upper_threshold
    part_indicator = numpy.logical_and(lower_indicator, upper_indicator)
    part_dataframe = df[part_indicator].reset_index(drop=True)
    partitions.append(part_dataframe)
    
print('Number of partitions: {}'.format(len(partitions)))
print('Size of each partition: {}'.format(len(partitions[0])))
print('Total data size: {}'.format(len(partitions) * len(partitions[0])))
for partition in partitions:
    assert len(partition) == len(partitions[0]), len(partition) 
    
# For each partition, re-run training and evaluation
AccuraciesModelA=[]
AccuraciesModelB=[]
for idx in range(0,P):
    train_data = pandas.concat([x for i,x in enumerate(partitions) if i!=idx])
    eval_data = partitions[idx]
    train_features = train_data.loc[:, train_data.columns != LABEL_COLUMN].values
    train_labels = train_data[LABEL_COLUMN].values
    eval_features = eval_data.loc[:, eval_data.columns != LABEL_COLUMN].values
    eval_labels = eval_data[LABEL_COLUMN].values
    
    modelA_trained = modelA.fit(train_features, train_labels)
    modelA_eval_pred = modelA_trained.predict(eval_features)
    AccuraciesModelA.append(Accuracy(modelA_eval_pred, eval_labels))
    
    modelB_trained = modelB.fit(train_features, train_labels)
    modelB_eval_pred = modelB_trained.predict(eval_features)
    AccuraciesModelB.append(Accuracy(modelB_eval_pred, eval_labels))
    
# Average results from each fold
ModelA_Avg = numpy.mean(AccuraciesModelA)
ModelB_Avg = numpy.mean(AccuraciesModelB)
print("Average accuracy for Model A: {}".format(ModelA_Avg))
print("Average accuracy for Model B: {}".format(ModelB_Avg))

K:8, N:768, P:96
Number of partitions: 96
Size of each partition: 8
Total data size: 768
Average accuracy for Model A: 0.71484375
Average accuracy for Model B: 0.75390625


##### <b>Question:</b> Did the results of the cross-validation change your mind about which model to recommend? Why?

##### Congratulations, you have reached the end of this notebook!