# Testing the Model

Recall that at the outset of this modeling exercise, we separated out a test set to evaluate our final model.  We are now ready to do so. 

## Importing Packages

In [1]:
import joblib
import pandas as pd

## Feature Selector

In [2]:
from sklearn.base import BaseEstimator, TransformerMixin

class FeatureSelector(BaseEstimator, TransformerMixin):
    def __init__(self, columns):
        self.columns = columns
        
    def fit(self, X, y=None):
        return self
    
    def transform(self, X, y=None):
        return X[self.columns]

## Reading-In Our Fitted Model Object

In [3]:
model_saved = joblib.load("pickle/lending_club_model.pkl")

## Reading-In Our Test Set

In [4]:
df_test = pd.read_csv("data_processed/02_binary_testing.csv")
df_test

Unnamed: 0,funded_amnt,addr_state,annual_inc,application_type,dti,earliest_cr_line,emp_length,emp_title,fico_range_high,fico_range_low,...,zip_code,last_pymnt_amnt,num_actv_rev_tl,mo_sin_rcnt_rev_tl_op,mo_sin_old_rev_tl_op,bc_util,bc_open_to_buy,avg_cur_bal,acc_open_past_24mths,charged_off
0,8000.0,WA,36000.0,Individual,1.33,Jan-2005,4 years,Video Editor,774.0,770.0,...,980xx,279.63,2.0,0.0,115.0,23.6,2291.0,142.0,2.0,False
1,10000.0,OH,65000.0,Individual,20.53,Dec-2006,7 years,Emerson Network Power,679.0,675.0,...,430xx,1027.64,7.0,6.0,71.0,87.3,1165.0,12397.0,6.0,False
2,11075.0,TN,34800.0,Individual,17.66,Oct-2003,10+ years,Paraplanner,664.0,660.0,...,384xx,8528.03,8.0,4.0,157.0,91.7,158.0,1052.0,9.0,False
3,15000.0,MO,95000.0,Individual,21.54,Sep-2007,3 years,Analyst,664.0,660.0,...,633xx,12392.94,10.0,20.0,110.0,97.0,4728.0,19138.0,6.0,False
4,3000.0,MA,57600.0,Individual,3.65,Jan-1989,,,674.0,670.0,...,018xx,101.07,,,,,,,,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
403209,6000.0,WA,18000.0,Individual,29.27,Nov-1999,,,689.0,685.0,...,984xx,5371.78,8.0,5.0,211.0,55.5,2890.0,1173.0,4.0,False
403210,22600.0,CA,60000.0,Individual,33.64,Dec-2002,2 years,Account Manager Team Lead,679.0,675.0,...,928xx,663.21,10.0,24.0,134.0,74.3,5350.0,4909.0,3.0,True
403211,24000.0,UT,106000.0,Individual,11.34,Mar-2007,8 years,Assistant Sales Manager,669.0,665.0,...,847xx,19840.69,6.0,7.0,110.0,63.9,2092.0,5141.0,3.0,False
403212,21000.0,FL,88000.0,Individual,21.25,Jan-1999,1 year,Parts Manager,679.0,675.0,...,322xx,7103.35,4.0,4.0,205.0,11.0,3750.0,60199.0,6.0,False


## Model Accuracy with 0.5 Threshold

Notice that we have to do no preprocessing to our test data.  All the preprocessing is built into the pickle file.

In [5]:
from sklearn.metrics import f1_score, accuracy_score
print(accuracy_score(model_saved.predict(df_test), df_test["charged_off"]))
print(f1_score(model_saved.predict(df_test), df_test["charged_off"]))

0.8829256920642636
0.6820031256736365


## Using Optimal Threshold

Recall from our previous work thath the optimal threshold is 0.48.  Let see how that threshold affects accuracy.

In [6]:
# apply threshold to positive probabilities to create labels
def to_inference(pos_probs, threshold):
	return (pos_probs >= threshold).astype('int')

Accuracy is very slighly improved.

In [7]:
accuracy_score(df_test["charged_off"], to_inference(model_saved.predict_proba(df_test)[:,1], 0.47))

0.8833572246995393