# Project -- Fine Tuning and Evaluation

In [1]:
# Imports:
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import fbeta_score, make_scorer, confusion_matrix, classification_report
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingGridSearchCV
import pickle

In [2]:
# 1) Revisit the ‘data analysis and preparation’ step.
# Make any changes to the data preprocessing you wish to make, such as adding additional stopwords for text data, removing a column, or handling NaNs differently.
# Alter the code so that only a train/test split is done. Save the CSV files.


# No changes were needed - data contained only numerical values and no missing values. In addition, a train/test split was already used.

In [3]:
# 2) Load the training and testing CSV files into four separate Pandas objects to hold labels and features for training, and labels and features for testing.

# Sample data for better memory and efficiency in outputs
train = pd.read_csv('Training.csv').sample(n=10000, random_state = 42)
test = pd.read_csv('Testing.csv')

train_label = np.array(train['Condition'])
train_features = np.array(train.drop('Condition', axis=1))

test_label = np.array(test['Condition'])
test_features = np.array(test.drop('Condition', axis=1))

In [4]:
# 3) Train and evaluate a decision tree using the GridSearchCV module from sklearn.
# Define a dictionary of hyperparameter options you wish to try. Try both Gini and entropy as the ‘criterion’. 
# Refer to your previous work on decision trees to get an idea of which values to try for other hyperparameters.
# Define a StratifiedKFold object to setup cross-validation.
# Choose a metric to use for determining the best model. Define a make_scorer object that uses that metric and the ‘weighted’ average option.
# Define a GridSearchCV object for a decision tree, using the above. (Using HavingGridSearchCV if this is too slow.)
# Run the grid search to train a decision tree.
# Display the best model hyperparameters, the weighted mean training score, and the weighted mean cross-validation score.
# Find predictions on the training data and use them to display the confusion matrix and classification report.
# Try new hyperparameters and repeat the above until you are satisfied that the best possible model has been found.

param_grid = {
    'criterion': ['gini', 'entropy'],
    'max_depth': [1, 2, 3, 4, 5],
    'min_samples_split': [2, 3, 4, 5, 6],
    'min_samples_leaf': [1, 2, 3, 4, 5]
}

stratified_kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
f2_scorer = make_scorer(fbeta_score, beta=2, average='weighted')

dt_grid_search = GridSearchCV( estimator=DecisionTreeClassifier(), param_grid=param_grid, cv=stratified_kfold, scoring=f2_scorer, return_train_score=True)
dt_grid_search.fit(train_features, train_label)

print("Best parameters:", dt_grid_search.best_params_)
print("Weighted mean training score:", dt_grid_search.best_score_)

pred = dt_grid_search.predict(train_features)
print(confusion_matrix(train_label, pred))
print(classification_report(train_label, pred))

Best parameters: {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 5, 'min_samples_split': 2}
Weighted mean training score: 0.7187069978865335
[[3651 1271]
 [1489 3589]]
              precision    recall  f1-score   support

           0       0.71      0.74      0.73      4922
           1       0.74      0.71      0.72      5078

    accuracy                           0.72     10000
   macro avg       0.72      0.72      0.72     10000
weighted avg       0.72      0.72      0.72     10000



In [5]:
# 4) Train and evaluate an SVM using the GridSearchCV module from sklearn.
# Define a Pipeline object that uses standardization or normalization, prior to fitting an SVC estimator with class_weight='balanced'.
# Define a dictionary of hyperparameter options you wish to try. Refer to your previous work on decision trees to get an idea of which values to try for other hyperparameters.
# Define a StratifiedKFold object to setup cross-validation.
# Choose a metric to use for determining the best model. Define a make_scorer object that uses that metric and the ‘weighted’ average option.
# Define a GridSearchCV object for a decision tree, using the above. (Using HavingGridSearchCV if this is too slow.)
# Run the grid search to train an SVM classifier.
# Display the best model hyperparameters, the weighted mean training score, and the weighted mean cross-validation score.
# Find predictions on the training data and use them to display the confusion matrix and classification report.
# Try new hyperparameters and repeat the above until you are satisfied that the best possible model has been found.

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svc', SVC(class_weight='balanced'))
])

param_grid = {
    'svc__decision_function_shape': ['ovo', 'ovr'],
    'svc__degree': [1, 10, 15, 20, 50],
    'svc__gamma': [1, 5, 10, 15],
    'svc__C': [5, 10, 15, 30, 50],
    'svc__kernel': ['linear', 'rbf']
}

stratified_kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
f2_scorer = make_scorer(fbeta_score, beta=2, average='weighted')

svm_grid_search = HalvingGridSearchCV(pipeline, param_grid, cv=stratified_kfold, scoring=f2_scorer, return_train_score=True)
svm_grid_search.fit(train_features, train_label)

print("Best parameters:", svm_grid_search.best_params_)
print("Weighted mean training score:", svm_grid_search.best_score_)

pred = svm_grid_search.predict(train_features)
print(confusion_matrix(train_label, pred))
print(classification_report(train_label, pred))

Best parameters: {'svc__C': 15, 'svc__decision_function_shape': 'ovr', 'svc__degree': 1, 'svc__gamma': 10, 'svc__kernel': 'linear'}
Weighted mean training score: 0.7035474257492174
[[3990  932]
 [1876 3202]]
              precision    recall  f1-score   support

           0       0.68      0.81      0.74      4922
           1       0.77      0.63      0.70      5078

    accuracy                           0.72     10000
   macro avg       0.73      0.72      0.72     10000
weighted avg       0.73      0.72      0.72     10000



In [6]:
# 5) Add a markdown cell to compare the best decision tree and SVM models. Discuss overfitting/underfitting, how well the models do on important metrics, 
# and any other observations you feel are important. Decide which model is best and save it to a file using the ‘pickle’ library.

with open("model.pkl", "wb") as f:
    pickle.dump(svm_grid_search.best_estimator_, f)

# **Explanation:**

# Decision Tree Model
**1st Attempt:**
<pre>
Best parameters: {'criterion': 'entropy', 'max_depth': 5, 'min_samples_leaf': 1, 'min_samples_split': 10}
Weighted mean training score: 0.7182680438484256
[[3939  983]
 [1727 3351]]
              precision    recall  f1-score   support

           0       0.70      0.80      0.74      4922
           1       0.77      0.66      0.71      5078

    accuracy                           0.73     10000
   macro avg       0.73      0.73      0.73     10000
weighted avg       0.73      0.73      0.73     10000
</pre>

**2nd and Final Attempt:**
<pre>
Best parameters: {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 5, 'min_samples_split': 6}
Weighted mean training score: 0.7187069978865335
[[3651 1271]
 [1489 3589]]
              precision    recall  f1-score   support

           0       0.71      0.74      0.73      4922
           1       0.74      0.71      0.72      5078

    accuracy                           0.72     10000
   macro avg       0.72      0.72      0.72     10000
weighted avg       0.72      0.72      0.72     10000
</pre>

# SVM Model
**1st and Final Attempt:**
<pre>
Best parameters: {'svc__C': 20, 'svc__decision_function_shape': 'ovo', 'svc__degree': 10, 'svc__gamma': 3, 'svc__kernel': 'linear'}
Weighted mean training score: 0.7147642016470885
[[3990  932]
 [1876 3202]]
              precision    recall  f1-score   support

           0       0.68      0.81      0.74      4922
           1       0.77      0.63      0.70      5078

    accuracy                           0.72     10000
   macro avg       0.73      0.72      0.72     10000
weighted avg       0.73      0.72      0.72     10000
</pre>

After reviewing these results, I believe that the best model is the SVM model. Based solely on the confusion matrix, this model achieves the best results when compared to the others. With this model, we can see that the False Negative values (being the worst use-case for this project) are the lowest of all, with 932 FN and 1876 FP. Moving on, there is only a marginal difference in the F2-Score, with the hundreds place digit being lower by 0.004 than the rest; this might indicate slight overfitting of the decision tree models. This quality may not seem great, but I believe that the most important attribute is the count of FNs, which this model achieves the lowest amount of. So, while the best score of all the models may not be the SVM model, this characteristic does not hold the most importance. Next, we have the classification reports. At first glance, I can see that the only values obtained for accuracy are 0.73 and 0.72, so this model does achieve a relatively good accuracy rate of 0.72. Then, we have the negative vs. the positive class results. These results show us that the model performs better in the negative than in the positive class (with 68% and 81% vs. 77% and 63%), and while this is not a great outcome for this model, the values are relatively spread out, which indicates that there is some balance between classes 0 and 1.

Overall, I believe that the best model is the SVM model, as it obtains the lowest number of FNs (this being the most important attribute in this use-case), a relatively good balance between classes 0 and 1, and a reasonably well accuracy and F2-Score rate.

In [7]:
# 6) Evaluate and discuss the final model.
# Load the model from its pickle file.
# Use the model to make predictions on the test data.
# Use the predictions to display the confusion matrix, classification report, and any other important metrics not already in the report.
# Add a markdown cell to discuss how well your model would perform in the desired use case. Be specific, quoting test metrics, when discussing how reliable you expect the predictions to be. 
# Discuss any problems and limitations of the model.

# Load best model:
with open("model.pkl", "rb") as f:
    best_model = pickle.load(f)


# Make predictions and display metrics:
preds = best_model.predict(test_features)
print(confusion_matrix(test_label, preds))
print(classification_report(test_label, preds))
print(fbeta_score(test_label, preds, beta=2))

[[1769  388]
 [ 849 1369]]
              precision    recall  f1-score   support

           0       0.68      0.82      0.74      2157
           1       0.78      0.62      0.69      2218

    accuracy                           0.72      4375
   macro avg       0.73      0.72      0.71      4375
weighted avg       0.73      0.72      0.71      4375

0.6439928497506822


# **Explanation:**
<pre>
FN: 388 - 0.08868 | 932 - 0.0932
FP: 849 - 0.19406 | 1876 - 0.1876
</pre>
**SVM Training Metrics:**
<pre>
Best parameters: {'svc__C': 20, 'svc__decision_function_shape': 'ovo', 'svc__degree': 10, 'svc__gamma': 3, 'svc__kernel': 'linear'}
Weighted mean training score: 0.7147642016470885
[[3990  932]
 [1876 3202]]
              precision    recall  f1-score   support

           0       0.68      0.81      0.74      4922
           1       0.77      0.63      0.70      5078

    accuracy                           0.72     10000
   macro avg       0.73      0.72      0.72     10000
weighted avg       0.73      0.72      0.72     10000
</pre>
Given the previous and new metrics obtained from the testing data, this model performs reasonably well. With the FNs and FPs calculated (this being the most important quality), this model performs somewhat better than the training model. With this model's performance 1% better for FN count, and 1% worse for FP count, this model's performance is better as FNs are the worst metric for this use case. Other than the smaller differences in the confusion matrices, this model obtains somewhat of the same results for the other metrics. In conclusion, the marginal differences in the confusion matrices indicate consistent behavior between training and testing, suggesting no overfitting or underfitting issues.

In [8]:
# Explanation:

# FN: 388 - 0.08868 | 932 - 0.0932
# FP: 849 - 0.19406 | 1876 - 0.1876

# SVM Training Metrics:
# Best parameters: {'svc__C': 20, 'svc__decision_function_shape': 'ovo', 'svc__degree': 10, 'svc__gamma': 3, 'svc__kernel': 'linear'}
# Weighted mean training score: 0.7147642016470885
# [[3990  932]
#  [1876 3202]]
#               precision    recall  f1-score   support

#            0       0.68      0.81      0.74      4922
#            1       0.77      0.63      0.70      5078

#     accuracy                           0.72     10000
#    macro avg       0.73      0.72      0.72     10000
# weighted avg       0.73      0.72      0.72     10000

# Given the previous and new metrics obtained from the testing data, this model performs reasonably well. With the FNs and FPs calculated (this being the most important quality), 
# this model performs somewhat better than the training model. With this model's performance 1% better for FN count, and 1% worse for FP count, this model's performance is better 
# as FNs are the worst metric for this use case. Other than the smaller differences in the confusion matrices, this model obtains somewhat of the same results for the other metrics. 
# In conclusion, the marginal differences in the confusion matrices indicate consistent behavior between training and testing, suggesting no overfitting or underfitting issues.