# Precision and Recall Lab

### Introduction

In [1]:
import pandas as pd

df = pd.read_csv('./coerced_customer_churn.csv', index_col = 0)

In [4]:
X = df.drop('Churn', axis = 1)
y = df['Churn']

Now let's fit our logistic regression model.

In [8]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .4)
X_validate, X_test, y_validate, y_test = train_test_split(X_test, y_test, test_size = .5)

In [10]:
# X_validate.shape

In [5]:
from sklearn.linear_model import LogisticRegression

In [13]:
model = LogisticRegression(solver="lbfgs").fit(X_train, y_train)

In [14]:
model.score(X_validate, y_validate)

0.7927608232789212

### Breaking down our error

Let's start with confusion matrix.  Let's see our confusion matrix.

In [54]:
from sklearn.metrics import confusion_matrix

In [55]:
y_pred_val = model.predict(X_validate)

In [82]:
mtx = confusion_matrix(y_validate.values, y_pred_val)
mtx

array([[893,  98],
       [194, 224]])

Let's turn this into a confusion matrix.

In [105]:
conf_mtx_df = pd.DataFrame(mtx, index = ['observed -', 'observed +'], columns = ['predicted -', 'predicted +'])

In [106]:
conf_mtx_df

Unnamed: 0,predicted -,predicted +
observed -,893,98
observed +,194,224


Now let's break down the confusion matrix.  From here assign the `true_positive`, `false_positive`, `false_negative`, and `false_positive` below.

In [None]:
TN = 893
TP = 224
FP = 98
FN = 194

In [122]:
true_negative = 893

true_positive = 224

false_positive = 98

false_negative = 194 


In [123]:
ACC = (true_positive+true_negative)/(true_positive+false_positive+false_negative+true_negative)

In [124]:
ACC

0.7927608232789212

### Working through Precision and Recall

Now let's calculate the precision and recall.  

1. Precision 

Let's start with precision.  Remember that precision is the percentage classified as positive that is actually positive.  

In [125]:
precision = true_positive/(true_positive + false_positive)
precision

0.6956521739130435

2. Recall

Next, let's move to recall.  Recall is the percentage of positive events that were classified as positive. 

In [126]:
recall = true_positive/(true_positive + false_negative)
recall

0.5358851674641149

Now, we can also get these calculations through the `metrics` module on sklearn.

In [112]:
from sklearn.metrics import precision_score, recall_score

In [127]:
precision_score(y_validate, y_pred_val)

0.6956521739130435

In [128]:
recall_score(y_validate, y_pred_val)

0.5358851674641149

So we can see that we while much of what we predicted as positive, was positive.  We only captured a little over half of the positive events.