Evaluating Logistic Regression Models - Lab

Introduction

In regression, you are predicting values so it made sense to discuss error as a distance of how far off our estimates were. In classifying a binary variable however, a model is either correct or incorrect. As a result, we tend to deconstruct this as how many false positives versus false negatives we come across.
In particular, we examine a few different specific measurements when evaluating the performance of a classification algorithm. In this review lab, we'll review precision, recall, accuracy, and F1-score in order to evaluate our logistic regression models.

Objectives

You will be able to:

Understand and assess precision, recall, and accuracy of classifiers
Evaluate classification models using various metrics

Terminology Review

Let's take a moment and review some classification evaluation metrics:

$Precision = \frac{\text{Number of True Positives}}{\text{Number of Predicted Positives}}$

$Recall = \frac{\text{Number of True Positives}}{\text{Number of Actual Total Positives}}$

$Accuracy = \frac{\text{Number of True Positives + True Negatives}}{\text{Total Observations}}$

$\text{F1-Score} = 2\ \frac{Precision\ x\ Recall}{Precision + Recall}$

At times, it may be superior to tune a classification algorithm to optimize against precision or recall rather than overall accuracy. For example, imagine the scenario of predicting whether or not a patient is at risk for cancer and should be brought in for additional testing. In cases such as this, we often may want to cast a slightly wider net, and it is much preferable to optimize for recall, the number of cancer positive cases, then it is to optimize precision, the percentage of our predicted cancer-risk patients who are indeed positive.

1. Split the data into train and test sets

import pandas as pd
df = pd.read_csv('heart.csv')

#Your code here

2. Create a standard logistic regression model

#Your code here

3. Write a function to calculate the precision

def precision(y_hat, y):
    #Your code here

4. Write a function to calculate the recall

def recall(y_hat, y):
    #Your code here

5. Write a function to calculate the accuracy

def accuracy(y_hat, y):
    #Your code here

6. Write a function to calculate the F1-score

def f1_score(y_hat,y):
    #Your code here

7. Calculate the precision, recall, accuracy, and F1-score of your classifier.

Do this for both the training and the test set

#Your code here

Great Job! Now it's time to check your work with sklearn.

8. Calculating Metrics with sklearn

Each of the metrics we calculated above is also available inside the sklearn.metrics module.

In the cell below, import the following functions:

precision_score
recall_score
accuracy_score
f1_score

Compare the results of your performance metrics functions with the sklearn functions above. Calculate these values for both your train and test set.

#Your code here

9. Comparing Precision, Recall, Accuracy, and F1-Score of Test vs Train Sets

Calculate and then plot the precision, recall, accuracy, and F1-score for the test and train splits using different train set sizes. What do you notice?

importimport  matplotlib.pyplotmatplot  as plt
%matplotlib inline

training_Precision = []
testing_Precision = []
training_Recall = []
testing_Recall = []
training_Accuracy = []
testing_Accuracy = []

for i in range(10,95):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= None) #replace the "None" here
    logreg = LogisticRegression(fit_intercept = False, C = 1e12)
    model_log = None
    y_hat_test = None
    y_hat_train = None

# 6 lines of code here

Create 4 scatter plots looking at the test and train precision in the first one, test and train recall in the second one, testing and training accuracy in the third one, and testing and training f1-score in the fourth one.

# code for test and train precision

# code for test and train recall

# code for test and train accuracy

# code for test and train F1-score

Summary

Nice! In this lab, you gained some extra practice with evaluation metrics for classification algorithms. You also got some further python practice by manually coding these functions yourself, giving you a deeper understanding of how they work. Going forward, continue to think about scenarios in which you might prefer to optimize one of these metrics over another.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
.gitignore		.gitignore
.learn		.learn
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
heart.csv		heart.csv
index.ipynb		index.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

.gitignore

.gitignore

.learn

.learn

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE.md

LICENSE.md

README.md

README.md

heart.csv

heart.csv

index.ipynb

index.ipynb

Repository files navigation

Evaluating Logistic Regression Models - Lab

Introduction

Objectives

Terminology Review

1. Split the data into train and test sets

2. Create a standard logistic regression model

3. Write a function to calculate the precision

4. Write a function to calculate the recall

5. Write a function to calculate the accuracy

6. Write a function to calculate the F1-score

7. Calculate the precision, recall, accuracy, and F1-score of your classifier.

8. Calculating Metrics with sklearn

9. Comparing Precision, Recall, Accuracy, and F1-Score of Test vs Train Sets

Summary

About

Releases

Packages

Languages

License

lindseyberlin/dsc-evaluating-logistic-regression-models-lab-houston-ds-042219

Folders and files

Latest commit

History

Repository files navigation

Evaluating Logistic Regression Models - Lab

Introduction

Objectives

Terminology Review

1. Split the data into train and test sets

2. Create a standard logistic regression model

3. Write a function to calculate the precision

4. Write a function to calculate the recall

5. Write a function to calculate the accuracy

6. Write a function to calculate the F1-score

7. Calculate the precision, recall, accuracy, and F1-score of your classifier.

8. Calculating Metrics with sklearn

9. Comparing Precision, Recall, Accuracy, and F1-Score of Test vs Train Sets

Summary

About

Resources

License

Stars

Watchers

Forks

Languages