# MODEL EVALUATION

# EVALUATING BINARY CLASSIFIER PREDICTIONS

Given a trained classification model you want to evaluate its quality.

### Scoring= "accuracy".

Proportion of observations predicted correctly. In the real world data could suffer from imbalanced classes ( 90% women, 10% men) so that accuracy could suffers from a paradox where a model is highly accurate but lacks predictive power.

#### Performance metrics
Accuracy = (TP+TN)/(TP+TN+FP+FN)

TP: True Positives. Predicted as True being True

TN: True Negatives.  Predicted as False being False

FP: False Positives. Predicted as True being False        ==> TYPE 1 ERROR 

FN: False Negatives. Predicted as False being True        ==> TYPE 2 ERROR

In [1]:
# Load libraries

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

In [2]:
## Generate Feature Matrix and Target Vector 

X,y = make_classification(n_samples = 10000,
                          n_features = 3,
                          n_informative = 3,
                          n_redundant = 0,
                          n_classes = 2,
                          random_state = 1)

In [3]:
# Create Logistic Regression

logit = LogisticRegression()

In [4]:
# Cross Validate Model Using Accuracy

cross_val_score(logit, X , y, scoring = "accuracy")

array([0.9555, 0.95  , 0.9585, 0.9555, 0.956 ])

### PRECISION

Precision = TP / (TP*FP)

Proportion of every observation predicted to be positive that is actually positive. It's similar to a measure of noise in our prediction.

In [5]:
# Cross Validate using precision

cross_val_score(logit, X, y, scoring = "precision")

array([0.95963673, 0.94820717, 0.9635996 , 0.96149949, 0.96060606])

### RECALL

Recall = TP / (TP + FN )

Proportion of every positive observation that is truly positive. Recall measures the model's ability to identify an observation of the positive class.

High recall models are optimistic in that they have a low bar for predicting thtat an obervation is in the positive class.


In [6]:
# Cross Validate using precision

cross_val_score(logit, X, y, scoring = "recall")

array([0.951, 0.952, 0.953, 0.949, 0.951])

### F1 - SCORE

F1 = 2 * [(Precision * Recall) / (Precision + Recall)]


Balance between Precision and Recall.

It is a measure of correctness achieved in positive prediction. Observations labeled as positive that are actually positives



In [7]:
# Cross Validate using F1-Score

cross_val_score(logit, X, y, scoring = "f1")

array([0.95529884, 0.9500998 , 0.95827049, 0.95520886, 0.95577889])

### Do we have the true "y" values ?
We can check accuracy directly with our "ŷ" predicted.

In [10]:
# Load libraries 

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [11]:
# Create training and test split

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.1, random_state = 1)

In [12]:
# Predict values for training target vector                  #ŷ

y_hat = logit.fit(X_train, y_train).predict(X_test)

In [13]:
# Calculate accuracy

accuracy_score(y_test, y_hat)

0.947