# Precision and Recall Lab

### Introduction

In this lesson, we'll train a classifier and explore our different metrics for measuring the classifier's performance.  We'll do so by looking at customer churn data from a telecommunications company.

### Loading our Data

Let's begin by loading our data.

In [7]:
import pandas as pd
url = "https://raw.githubusercontent.com/jigsawlabs-student/logistic-regression/master/0-classification-fundamentals/3-metrics-for-classification/coerced_customer_churn.csv"
df = pd.read_csv(url, index_col = 0)

Now let's take a look at our data.

In [8]:
df[:2]

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,gender,Partner,Dependents,PhoneService,MultipleLines_x0_No phone service,MultipleLines_x0_Yes,InternetService_x0_Fiber optic,...,StreamingMovies_x0_Yes,Contract_x0_One year,Contract_x0_Two year,PaperlessBilling,PaymentMethod_x0_Credit card (automatic),PaymentMethod_x0_Electronic check,PaymentMethod_x0_Mailed check,Churn,TotalCharges,TotalCharges_is_na
0,0,1,29.85,0.0,1.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,29.85,False
1,0,34,56.95,1.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1889.5,False


As we can can see, our data has already been formatted so that we can train a model.  Let's get to it.  Assign column everything but `Churn` to the variable X, and assign Churn to y as the target.

In [9]:
X = None
y = None

In [10]:
X.shape, y.shape

# ((7043, 31), (7043,))

((7043, 31), (7043,))

Now let's split the data into training validation and test sets.  We split the training data into training and test datasets.  Now it's your turn to split the test dataset in half into validation and test datasets. 

> Make sure you stratify the data by the `y_test` data.  Set `random_state = 1`.

In [68]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = .4,
                                                    random_state = 1, stratify = y)
X_validate, X_test, y_validate, y_test = None, None, None, None

In [17]:
X_validate[:2]

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,gender,Partner,Dependents,PhoneService,MultipleLines_x0_No phone service,MultipleLines_x0_Yes,InternetService_x0_Fiber optic,...,StreamingMovies_x0_No internet service,StreamingMovies_x0_Yes,Contract_x0_One year,Contract_x0_Two year,PaperlessBilling,PaymentMethod_x0_Credit card (automatic),PaymentMethod_x0_Electronic check,PaymentMethod_x0_Mailed check,TotalCharges,TotalCharges_is_na
2105,0,3,75.25,0.0,0.0,0.0,1.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,242.0,False
3336,0,55,20.3,0.0,1.0,1.0,1.0,0.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1079.05,False


Now let's fit our logistic regression model, set the solver as `lbfgs` and the `random_state` as 1.  Check the accuracy on the validation set.

In [20]:
from sklearn.linear_model import LogisticRegression
model = None


# 0.8062455642299503

0.8062455642299503

### Breaking down our error

Next let's use the validation data to create a confusion matrix.  Use the `confusion_matrix` function from `sklearn.metrics`.   

> We've indicated a breakdown below for you to check your work.

In [30]:
from sklearn.metrics import confusion_matrix

In [33]:
import pandas as pd
conf_mtx_df = None


conf_mtx_df
# observed +	observed -
# predicted +	199	106
# predicted -	162	942

Unnamed: 0,observed +,observed -
predicted +,197,96
predicted -,177,939


Now let's break down the confusion matrix.  From here assign the `true_positive`, `false_positive`, `false_negative`, and `false_positive` below.

In [34]:
TP = 197
TN = 939
FP = 96
FN = 177

Use the four variables declare in the cell above to calculate the accuracy.  It should line up to the score we saw above.

In [36]:
accuracy = None
accuracy
# 0.8062455642299503

0.8062455642299503

Let's also check that the total number of positive observations and negative observations line up with what we see in our confusion matrix.  Use the variables to make the correct calculations.

In [37]:
total_positives = None

total_positives

374

In [38]:
y_validate.sum()

374.0

In [39]:
total_negatives = None
total_negatives

1035

In [40]:
(y_validate == 0).sum()

1035

### Working through Precision and Recall

Now let's calculate the precision and recall.  

1. Precision 

Let's start with precision.  Remember that precision is **the percentage our classifier predicts** is positive that is actually positive.  

> Use the variables above to calculate the precision.

In [41]:
precision = None
precision
# 0.6723549488054608

0.6723549488054608

> Then import `precision_score` from `sklearn.metrics` and check that you get the same number.

In [44]:

# 0.6723549488054608

0.6723549488054608

So we can see that roughly one third of what our classifier detects is a false positive.   In other words, one third of those who our classifier predicts will churn, do not.

2. Recall

Next, let's move to recall.  Remember that recall is the **percentage of observed positive events** that were classified as positive. 

In [35]:
recall = None
recall

# 0.5512465373961218

0.5512465373961218

Now, we import the `recall_score` from sklearn and check that we get a matching recall.

In [46]:


# 0.5267379679144385

0.5267379679144385

So we can see that many churned customers are not captured by our classifier (.55 recall score), and that our model performs a little better by balancing our data.

### Balancing Data 

Let's see if we can perform a little better by balancing our data. Currently, we have an imbalanced dataset. 

In [47]:
y_train.mean()

# 0.26532544378698225

0.26532544378698225

This means that during training, a model will be optimized at performing better on the negative observations than the positive ones, as there are three times as many negative observations.  We can alter this by setting `class_weight` as `balanced` when initializing the LogisticRegression model.  

As explained in the documenation:

```text 
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``
```

So the fewer the number of observations, the larger the estimator multiplies the cost associated with that observation.

> Fit the logistic regression model with the `class_weight = 'balanced'`, and the solver='lbfgs', and a random_state = 1.  Score the `balanced_model` on the validation set.

In [69]:
balanced_model = None

# 0.7359829666430092

> Notice that the score decreases from `0.8062455642299503` previously to `0.7359829666430092`.  This is expected as the accuracy score calculates the total number of observations classified correctly, while our balanced classifier focuses on fitting to the positive events.

Now let's see how our balanced model changes the way it classifies the positive events.  Begin by creating a confusion matrix for the balanced model.

In [57]:
mtx_balanced = None

In [58]:
import pandas as pd
conf_mtx_df_balanced = None


# 	observed +	observed -
# predicted +	286	284
# predicted -	88	751

Unnamed: 0,observed +,observed -
predicted +,286,284
predicted -,88,751


Now let's compare this to the original confusion matrix.

In [59]:
conf_mtx_df.iloc[::-1, ::-1].T

Unnamed: 0,observed +,observed -
predicted +,197,96
predicted -,177,939


We can see that using `class_weight` of `balanced`, has our model performed better with predicting true positives, but worse at predicting true negatives.

Let's take a look at the precision and recall scores.

In [62]:
precision_balanced = None
precision_balanced
# 0.5017543859649123

0.5017543859649123

In [63]:
recall_balanced = recall_score(y_validate, y_val_predict_balanced)

recall_balanced

# 0.7647058823529411

0.7647058823529411

In [None]:
# previous precision score 0.6524590163934426
# previous recall score 0.5512465373961218

We can see that by changing to balanced, the the recall of the classifier greatly increased, but the precision score decreased.

### Using the F1 score

Finally, we can use the `f1_score` from metrics to see a harmonic mean of the precision and recall scores.  We'll use this metric to compare our two models.

In [64]:
from sklearn.metrics import f1_score

In [65]:
f1_score(y_validate, model.predict(X_validate))
# 0.5907046476761618

0.5907046476761618

> Check the f1_score on the balanced model.

In [67]:

# 0.6059322033898306

0.6059322033898306

We can see that averaging both of these scores, the balanced model performed better.

### Summary

In this lesson, we practiced calculating the precision and recall scores, and compared logistic regression models where the data was balanced, and where the sample weight was used.

We applied the formulas of:

* $precision = \frac{TP}{TP + FP}$
* $recall = \frac{TP}{TP + FN}$

We also computed the accurracy score of the total correctly classified (TP + TN) divided by the all of the observations (TP + FN + FP + FN).

### Resources

[Class weight](https://stackoverflow.com/questions/30972029/how-does-the-class-weight-parameter-in-scikit-learn-work)