# Precision and Recall Lab

### Introduction

In this lesson, we'll train a classifier and explore our different metrics for measuring the classifier's performance.

### Loading our Data

Let's begin by loading our data.

In [1]:
import pandas as pd

df = pd.read_csv('./coerced_customer_churn.csv', index_col = 0)

Now let's take a look at our data.

In [2]:
df[:2]

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,gender,Partner,Dependents,PhoneService,MultipleLines_x0_No phone service,MultipleLines_x0_Yes,InternetService_x0_Fiber optic,...,StreamingMovies_x0_Yes,Contract_x0_One year,Contract_x0_Two year,PaperlessBilling,PaymentMethod_x0_Credit card (automatic),PaymentMethod_x0_Electronic check,PaymentMethod_x0_Mailed check,Churn,TotalCharges,TotalCharges_is_na
0,0,1,29.85,0.0,1.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,29.85,False
1,0,34,56.95,1.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1889.5,False


As we can can see, our data has already been formatted so that we can train a model.  Let's get to it.  Assign column everything but `Churn` to the variable X, and assign Churn to y as the target.

In [4]:
X = df.drop('Churn', axis = 1)
y = df['Churn']

Now let's split the data into training validation and test sets.

In [11]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = .4,
                                                    random_state = 1)
X_validate, X_test, y_validate, y_test = train_test_split(X_test, y_test, 
                                                          test_size = .5, 
                                                          random_state = 1)

Now let's fit our logistic regression model, set the solver as `lbfgs` and the `random_state` as 1.  Check the accuracy on the validation set.

In [53]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(solver="lbfgs",
                           random_state = 1).fit(X_train, y_train)
model.score(X_validate, y_validate)

0.8097941802696949

### Breaking down our error

Now let's see a confusion matrix with the validation data.  Use the model to prediction the validation data.

In [19]:
from sklearn.metrics import confusion_matrix

In [20]:
y_pred_val = model.predict(X_validate)

In [21]:
mtx = confusion_matrix(y_validate.values, y_pred_val)

In [26]:
import pandas as pd
conf_mtx_df = pd.DataFrame(mtx, index = ['observed -', 'observed +'],
                           columns = ['predicted -', 'predicted +'])
conf_mtx_df.iloc[::-1, ::-1].T

Unnamed: 0,observed +,observed -
predicted +,199,106
predicted -,162,942


Now let's break down the confusion matrix.  From here assign the `true_positive`, `false_positive`, `false_negative`, and `false_positive` below.

In [29]:
TN = 942
TP = 199
FP = 106
FN = 162

Use the four variables declare in the cell above to calculate the accuracy.  It should line up to the score we saw above.

In [28]:
accuracy = (199 + 942)/(199 + 942 + 106 + 162)
accuracy
# 0.8097941802696949

0.8097941802696949

### Working through Precision and Recall

Now let's calculate the precision and recall.  

1. Precision 

Let's start with precision.  Remember that precision is the percentage classified as positive that is actually positive.  

> Use the variables above to calculate the precision.

In [30]:
precision = TP/(TP + FP)
precision

0.6524590163934426

> Then import `precision_score` from `sklearn.metrics` and check that you get the same number.

In [31]:
from sklearn.metrics import precision_score

In [33]:
precision_score(y_validate, y_pred_val)
# 0.6524590163934426

0.6524590163934426

So we can see that roughly one third of what our classifier detects is a false positive.   In other words, one third of those who our classifier predicts will churn, do not.

2. Recall

Next, let's move to recall.  Remember that recall is the percentage of observed positive events that were classified as positive. 

In [35]:
recall = TP/(TP + FN)
recall

0.5512465373961218

Now, we import the recall_score from sklearn and check that we get a matching recall.

In [36]:
from sklearn.metrics import recall_score

In [37]:
recall_score(y_validate, y_pred_val)

0.5512465373961218

So we can see that many churned customers are not captured by our classifier.

### Balancing Data 

In [52]:
model_balanced = LogisticRegression(solver="lbfgs",
                           random_state = 1, class_weight='balanced').fit(X_train, y_train)
model_balanced.score(X_validate, y_validate)

0.7643718949609652

Notice that our score decreases from `0.8097941802696949`.

In [42]:
y_val_predict = model.predict(X_validate)

In [49]:
mtx_balanced = confusion_matrix(y_validate.values, y_val_predict)

In [50]:
import pandas as pd
conf_mtx_df_balanced = pd.DataFrame(mtx_balanced, index = ['observed -', 'observed +'],
                           columns = ['predicted -', 'predicted +'])
conf_mtx_df_balanced.iloc[::-1, ::-1].T

Unnamed: 0,observed +,observed -
predicted +,286,257
predicted -,75,791


In [51]:
conf_mtx_df.iloc[::-1, ::-1].T

Unnamed: 0,observed +,observed -
predicted +,199,106
predicted -,162,942


We can. see that using `class_weight` of `balanced` has our model perform better with the at predicting true positives, but worse at predicting true negatives.  It suffers from more false positives as well, while false negatives decreased.

Let's take a look at the precision and recall scores.

In [41]:
precision_score(y_validate, y_val_predict), recall_score(y_validate, y_val_predict)

(0.5267034990791897, 0.7922437673130194)

In [None]:
# previous precision score 0.6524590163934426
# previous recall score 0.5512465373961218

We can see that by changing to balanced, the the recall of the classifier greatly increased, but the precision score decreased.

In [54]:
from sklearn.metrics import f1_score

In [55]:
f1_score(y_validate, model.predict(X_validate))

0.5975975975975976

In [57]:
f1_score(y_validate, model_balanced.predict(X_validate))

0.6327433628318584

We can see that averaging both of these scores, the balanced model performed better.

### Summary

In this lesson, we practiced calculating the precision and recall scores, and compared logistic regression models where the data was balanced, and where the sample weight was used.

We applied the formulas of:

* $precision = \frac{TP}{TP + FP}$
* $recall = \frac{TP}{TP + FN}$

We also computed the accurracy score of the total correctly classified (TP + TN) divided by the all of the observations (TP + FN + FP + FN).

### Resources

[Class weight](https://stackoverflow.com/questions/30972029/how-does-the-class-weight-parameter-in-scikit-learn-work)