#### Model Evaluation   
- confusion matrix,   
- accuracy,  
- precision,   
- recall,   
- F1-Score,   
- cross validation,   
- Area under curve,   
- Gradient Boosting.  

#### AUC-ROC curve   
The AUC-ROC curve, or Area Under the Receiver Operating Characteristic curve,  
is a graphical representation of the performance of a binary classification model     
at various classification thresholds.   
It assesses the ability of a model to distinguish between two classes.  

**True Positive Rate** or Sensitivity = $\frac{True\ Positives}{True\ Positives + False\ Negatives}$

**False Positive Rate** = $\frac{False\ Positives}{False\ Positives + True\ Negatives}$  

**Specificity** = $\frac{True\ Negatives}{False\ Positives + True\ Negatives}$  

False Positive Rate = 1 - Specificity.   


- ROC Curves summarize the trade-off between the true positive rate and false positive rate  
    for a predictive model using different probability thresholds.  
    It is a plot of the false positive rate (x-axis) versus the true positive rate (y-axis)    
    for a number of different candidate threshold values between 0.0 and 1.0.  
    It plots the false alarm rate versus the hit rate.  
- Precision-Recall curves summarize the  
    trade-off between the true positive rate and the positive predictive value  
    for a predictive model using different probability thresholds.   
- ROC curves are appropriate when the observations are balanced between each class,  
    whereas precision-recall curves are appropriate for imbalanced datasets.   

1. ROC curves of different models can be compared directly in general or for different thresholds.  
2. AUC can be used as a summary of the model skill.  

- Smaller values on the x-axis of the plot indicate lower false positives and higher true negatives.  
- Larger values on the y-axis of the plot indicate higher true positives and lower false negatives.   

#### skLearn predict() vs predict_proba()   
The `predict()` method is used to predict a category for a set of input features.  
It returns a discrete value that can be directly assigned to each input feature.  

The `predict_proba()` method returns the predicted probabilities of the input features for each category.  
This is useful when we also want to know the model's confidence in its prediction.   

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
import numpy as np

from sklearn.metrics import roc_auc_score

iris_data = load_iris()

In [None]:
X = iris_data.data
y = iris_data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4) # random_state=42

model = LogisticRegression()
model.fit(X_train, y_train)

predicted_labels = model.predict(X_test)
predicted_probabilities = model.predict_proba(X_test)

print(f"\nroc_auc_score: {roc_auc_score(y_test, predicted_probabilities, multi_class='ovr')}\n")  

print("Label argmax    max of probabilities")
for res in zip(predicted_labels, predicted_probabilities):
    probs = res[1]
    print(f"  {res[0]} {np.argmax(res[1]):6}      {round(max(probs), 2)}   {probs}")

 ##### ROC curve and ROC AUC for a Logistic Regression model on a small test problem.   

In [None]:
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score

In [None]:
def plot_auc(model, X_test, y_test):
    
    # generate a no skill prediction (majority class)
    ns_probs = [0 for _ in range(len(y_test))] # no skills probability
    
    lr_probs = model.predict_proba(X_test)     # logistics regression probability
    lr_probs = lr_probs[:, 1]                  # probabilities for positive outcome only

    # calculate and display scores
    ns_auc = roc_auc_score(y_test, ns_probs)   
    lr_auc = roc_auc_score(y_test, lr_probs)   
    print(f'\nNo Skill: ROC AUC = {ns_auc:.3f}')
    print(f'Logistic: ROC AUC = {lr_auc:.3f}')

    # calculate roc curves:  fpr = flase positive rate, tpr = true positive rate
    ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)
    lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)

    # plot the roc curve for the model
    plt.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
    plt.plot(lr_fpr, lr_tpr, marker='.', label='Logistic')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.legend();

In [None]:
# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)

model = LogisticRegression(solver='lbfgs')
model.fit(X_train, y_train)

plot_auc(model, X_test, y_test)

In [None]:
# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)

model = LogisticRegression(solver='sag')
model.fit(X_train, y_train)

plot_auc(model, X_test, y_test)

#### Precision-Recall Curves     
In iformation retrieval (finding documents based on queries) we measure precision and recall. These measures are also useful in applied machine learning for evaluating binary classification models.

Positive predictive value or $Precision = \frac{True\ Positives}{True\ Positives + False\ Positives}$   

Recall or $Sensitivity =  \frac{True\ Positives}{True\ Positives + False\ Negatives}$   

Reviewing both precision and recall is useful in cases where there is an imbalance in the observations between the two classes. Specifically, there are many examples of no event (class 0) and only a few examples of an event (class 1). It is only concerned with the correct prediction of the minority class, class 1.  



##### sk-Learn Solvers   
1. newton-cg   
2. lbfgs (Limited-memory Broyden–Fletcher–Goldfarb–Shanno Algorithm)    
3. liblinear (A Library for Large Linear Classification)
4. sag (Stochastic Average Gradient)    
5. saga   


#### Plot precision_recall_curve for the model   

In [None]:
def plot_precision_recall(model, X_test, y_test):
    lr_probs = model.predict_proba(X_test)
    # keep probabilities for the positive outcome only
    lr_probs = lr_probs[:, 1]

    yhat = model.predict(X_test)
    lr_precision, lr_recall, _ = precision_recall_curve(y_test, lr_probs)
    lr_f1, lr_auc = f1_score(y_test, yhat), auc(lr_recall, lr_precision)
    print(f'Logistic: f1 = {lr_f1:.3f}   auc = {lr_auc:.3f}')

    # plot the precision-recall curves
    no_skill = len(y_test[y_test==1]) / len(y_test)  # average value
    plt.plot([0, 1], [no_skill, no_skill], linestyle='--', label='No Skill')

    plt.plot(lr_recall, lr_precision, marker='.', label='Logistic')
    plt.xlabel('Recall')
    plt.ylabel('Precision')
    plt.legend();

In [None]:
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import f1_score
from sklearn.metrics import auc

plot_precision_recall(model, X_test, y_test)

- Use ROC curves when there are roughly equal numbers of observations for each class.
- Use Precision-Recall curves when there is a moderate to large class imbalance.

#### Task   
Draw AUC-ROC curves for  
- a classifier with Sonar data.  
- a male/female classifier based on height, weight, collar size, shoulder and waist measures.   


#### classify Sonar data   

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sms
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

In [None]:
dataset = pd.read_csv('../Data/sonar.csv', header = None)
target = 60
dataset.shape

In [None]:
dataset.head()

In [None]:
dataset.replace({'R': 0, 'M': 1}, inplace=True)  # encode labels   

In [None]:
X = dataset.drop(columns = target, axis = 1)
y = dataset[target]

#### shuffle or stratify   
Stratify parameter makes a split so that the proportion of values in the sample produced will be the same as the proportion of values provided by parameter stratify.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, test_size = 0.1, random_state = 1)

In [None]:
print(f"y_test: {y_test.value_counts()}")
print(f"y_train: {y_train.value_counts()}")

In [None]:
model = LogisticRegression()
# model = LogisticRegression(solver='lbfgs')
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)
train_accuracy_score = accuracy_score(y_train, model.predict(X_train))
test_accuracy_score = accuracy_score(y_test, y_pred)
print(f'Accuracy score on training data: {train_accuracy_score}')
print(f'Accuracy score on test data    : {test_accuracy_score}')

#### Plot AUC  

In [None]:
plot_auc(model, X_test, y_test)

#### Plot Precision-Recall Curve  

In [None]:
plot_precision_recall(model, X_test, y_test)

#### Student Data   

In [None]:
cols = ['Height_cm', 'Weight_Kg', 'Sex']
df_std = pd.read_excel('../Data/Seven Schools.xlsx', sheet_name='all_Schools', usecols=cols)

df_std = df_std[cols]                           # re-order columns
df_std.Sex = df_std.Sex.str.strip()             # remove leading and trailing spaces
df_std.replace({'F': 0, 'M': 1}, inplace=True)  # encode labels   
df_std.Sex = df_std.Sex.astype('int')
df_std.head()

In [None]:
X = df_std[['Height_cm', 'Weight_Kg']].values
y = df_std['Sex'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 1)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
train_accuracy_score = accuracy_score(y_train, model.predict(X_train))
test_accuracy_score = accuracy_score(y_test, y_pred)
print(f'Accuracy score on training data: {train_accuracy_score}')
print(f'Accuracy score on test data    : {test_accuracy_score}')

#### AUC   

In [None]:
plot_auc(model, X_test, y_test)

#### Plot Precision-Recall Curve  

In [None]:
plot_precision_recall(model, X_test, y_test)

#### Lower F1 scores with reduced features   
AUC reduces with lower accuracy   

In [None]:
cols = ['Weight_Kg', 'Sex']
df_std = pd.read_excel('../Data/Seven Schools.xlsx', sheet_name='all_Schools', usecols=cols)

df_std = df_std[cols]                           # re-order columns
df_std.Sex = df_std.Sex.str.strip()             # remove leading and trailing spaces
df_std.replace({'F': 0, 'M': 1}, inplace=True)  # encode labels   
df_std.Sex = df_std.Sex.astype('int')

X = df_std[['Weight_Kg']].values
y = df_std['Sex'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 1)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
train_accuracy_score = accuracy_score(y_train, model.predict(X_train))
test_accuracy_score = accuracy_score(y_test, y_pred)
print(f'Accuracy score on training data: {train_accuracy_score}')
print(f'Accuracy score on test data    : {test_accuracy_score}')

In [None]:
plot_auc(model, X_test, y_test)

In [None]:
plot_precision_recall(model, X_test, y_test)

#### With Height as feature     

In [None]:
cols = ['Height_cm', 'Sex']
df_std = pd.read_excel('../Data/Seven Schools.xlsx', sheet_name='all_Schools', usecols=cols)

df_std = df_std[cols]                           # re-order columns
df_std.Sex = df_std.Sex.str.strip()             # remove leading and trailing spaces
df_std.replace({'F': 0, 'M': 1}, inplace=True)  # encode labels   
df_std.Sex = df_std.Sex.astype('int')

X = df_std[['Height_cm']].values
y = df_std['Sex'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 1)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
train_accuracy_score = accuracy_score(y_train, model.predict(X_train))
test_accuracy_score = accuracy_score(y_test, y_pred)
print(f'Accuracy score on training data: {train_accuracy_score}')
print(f'Accuracy score on test data    : {test_accuracy_score}')

In [None]:
plot_auc(model, X_test, y_test)

In [None]:
plot_precision_recall(model, X_test, y_test)