You now understand how precision, recall (sensitivity), and the F1 score can be used to assess a model's performance. Let's return to the Pima Indian diabetes dataset to go through an example in Python. Run all the cells in the notebook. All the data preparation steps have been performed, and a logistic regression model was trained and created predictions.

In [None]:
# Import our Dependencies
from path import Path
import pandas as pd

In [None]:
# Download our Data
data = Path('../Resources/diabetes.csv')
df = pd.read_csv(data)
df.head()

 ## Separate the Features (X) from the Target (y)

In [None]:
y = df["Outcome"]
X = df.drop(columns="Outcome")

 ## Split our data into training and testing

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    random_state=1, 
                                                    stratify=y)
X_train.shape

 ## Create a Logistic Regression Model

In [None]:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(solver='lbfgs',
                                max_iter=200,
                                random_state=1)

 ## Fit (train) or model using the training data

In [6]:
classifier.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=200,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=1, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

 ## Make predictions

In [7]:
y_pred = classifier.predict(X_test)
results = pd.DataFrame({"Prediction": y_pred, "Actual": y_test}).reset_index(drop=True)
results.head(20)

Unnamed: 0,Prediction,Actual
0,0,0
1,1,1
2,0,0
3,1,1
4,0,0
5,0,0
6,1,1
7,1,0
8,1,1
9,0,0


In [8]:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))

0.7760416666666666


import the relevant modules for validation and print the confusion_matrix, which is the table of true positives, false positives, true negatives, and false negatives.

In [9]:
# Import confusion_matrix
from sklearn.metrics import confusion_matrix, classification_report

In [10]:
# Print the classification Report
matrix = confusion_matrix(y_test, y_pred)
print(matrix)

[[113  12]
 [ 31  36]]


The table printed in the notebook is unlabeled, but can be interpreted as the following:
![image.png](attachment:image.png)


How many true positives are there? 113

How many false positives are there? 31

Although we can manually calculate the metrics of the model, Scikit-learn's classification_report module performs the task for us:

In [11]:
# classification_report performs the task of precision, sensitivity and F1
report = classification_report(y_test, y_pred)
print(report)

              precision    recall  f1-score   support

           0       0.78      0.90      0.84       125
           1       0.75      0.54      0.63        67

    accuracy                           0.78       192
   macro avg       0.77      0.72      0.73       192
weighted avg       0.77      0.78      0.77       192



What is the sensitivity/recall of this model? .9

What is the precision of this model, to two decimal places? .78

The precision for prediction of the nondiabetics and diabetics are in line with each other. However, the recall (sensitivity) for predicting diabetes is much lower than it is for predicting an absence of diabetes. The lower recall for diabetics is reflected in the dropped F1 score as well.