# Introduction to Evaluating Binary Classifiers

We'll be working with a dataset containing data on 644 applicants with the following columns:

- `gre` - applicant's score on the Graduate Record Exam, a generalized test for
prospective graduate students (Score ranges from 200 to 800)
- `gpa` - college grade point average (Continuous between 0.0 and 4.0)
- `admit` - binary value (Binary value, 0 or 1, where 1 means the applicant was admitted to the program and 0 means the applicant was rejected)

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

In [2]:
admissions = pd.read_csv("admissions.csv")

In [3]:
admissions.head()

Unnamed: 0,admit,gpa,gre
0,0,3.177277,594.102992
1,0,3.412655,631.528607
2,0,2.728097,553.714399
3,0,3.093559,551.089985
4,0,3.141923,537.184894


### Using Logistic Regression to predict labels

In [4]:
lr = LogisticRegression()
lr.fit(admissions[["gpa"]], admissions["admit"])
labels = lr.predict(admissions[["gpa"]])
admissions["predicted_label"] = labels

In [5]:
admissions["predicted_label"].value_counts()

0    598
1     46
Name: predicted_label, dtype: int64

In [6]:
admissions.head()

Unnamed: 0,admit,gpa,gre,predicted_label
0,0,3.177277,594.102992,0
1,0,3.412655,631.528607,0
2,0,2.728097,553.714399,0
3,0,3.093559,551.089985,0
4,0,3.141923,537.184894,0


### Finding Prediction Accuracy

In [7]:
admissions["actual_label"] = admissions["admit"]
matches = (admissions["predicted_label"] == admissions["actual_label"])
correct_predictions = admissions[matches]
accuracy = len(correct_predictions) / len(admissions)
accuracy

0.6459627329192547

### Binary Classification Outcomes

In [10]:
TP = (admissions["predicted_label"] == 1) & (admissions["actual_label"] == 1)
true_positives = len(admissions[TP])
true_positives

31

In [11]:
TN = (admissions["predicted_label"] == 0) & (admissions["actual_label"] == 0)
true_negatives = len(admissions[TN])
true_negatives

385

In [18]:
FN = (admissions["predicted_label"] == 0) & (admissions["actual_label"] == 1)
false_negatives = len(admissions[FN])
false_negatives

213

In [19]:
FP = (admissions["predicted_label"] == 1) & (admissions["actual_label"] == 0)
false_positives = len(admissions[FP])
false_positives 

15

### Sensitivity

In [20]:
sensitivity = true_positives / (true_positives + false_negatives)
sensitivity

0.12704918032786885

### Specificity

In [21]:
specificity = (true_negatives) / (false_positives + true_negatives)
specificity

0.9625