<h1> Introduction </h1>

<p> The intention of this notebook is to utilize tensorflow to build a neural network that helps to predict default likelihood, and to visualize some of the insights generated from the study. This kernel will evolve over time as I continue to add features and study the Lending Club data </p>

<h3> Dependencies </h3>

<p> Below the data and some external libraries are imported to begin the process </p>

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import itertools
from sklearn import preprocessing
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from tensorflow.contrib.learn.python.learn import metric_spec
from tensorflow.contrib.learn.python.learn.estimators import _sklearn
from tensorflow.contrib.learn.python.learn.estimators import estimator
from tensorflow.contrib.learn.python.learn.estimators import model_fn
from tensorflow.python.framework import ops
from tensorflow.python.saved_model import loader
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.util import compat
tf.logging.set_verbosity(tf.logging.FATAL) 
df = pd.read_csv("../input/loan.csv", low_memory=False)

<h3> Creating the Target Label </h3>

<p> From a prior notebook, I examined the 'loan_status' column. The cell below creates a column with binary value 0 for loans not in default, and binary value 1 for loans in default.  

In [None]:
df['Default_Binary'] = False
df.Default_Binary = df.loan_status.isin([
    'Default',
    'Charged Off',
    'Late (31-120 days)',
    'Late (16-30 days)',
    'Does not meet the credit policy. Status:Charged Off'
])
df[['loan_status','Default_Binary']].head()

<h3> Creating a category feature for "Loan Purpose" </h3>

<p> Below I create a new column for loan purpose, and assign each type of loan purpose an integer value. </p>

In [None]:
df['Purpose_Cat'] = int(0) 
for index, value in df.purpose.iteritems():
    if value == 'debt_consolidation':
        df.set_value(index,'Purpose_Cat',int(1))
    if value == 'credit_card':
        df.set_value(index, 'Purpose_Cat',int(2))
    if value == 'home_improvement':
        df.set_value(index, 'Purpose_Cat',int(3))    
    if value == 'other':
        df.set_value(index, 'Purpose_Cat',int(4))    
    if value == 'major_purchase':
        df.set_value(index,'Purpose_Cat',int(5))
    if value == 'small_business':
        df.set_value(index, 'Purpose_Cat',int(6))
    if value == 'car':
        df.set_value(index, 'Purpose_Cat',int(7))    
    if value == 'medical':
        df.set_value(index, 'Purpose_Cat',int(8))   
    if value == 'moving':
        df.set_value(index, 'Purpose_Cat',int(9))    
    if value == 'vacation':
        df.set_value(index,'Purpose_Cat',int(10))
    if value == 'house':
        df.set_value(index, 'Purpose_Cat',int(11))
    if value == 'wedding':
        df.set_value(index, 'Purpose_Cat',int(12))    
    if value == 'renewable_energy':
        df.set_value(index, 'Purpose_Cat',int(13))     
    if value == 'educational':
        df.set_value(index, 'Purpose_Cat',int(14))  

In [None]:
# Now let us look at the correlation coefficient of each of these variables #
x_cols = [col for col in df.columns if col not in ['Default_Binary']]

labels = []
values = []
for col in x_cols:
    labels.append(col)
    values.append(np.corrcoef(df[col], df['Default_Binary'])[0,1])
    
ind = np.arange(len(labels))
width = 0.9
fig, ax = plt.subplots(figsize=(12,40))
rects = ax.barh(ind, np.array(values), color='y')
ax.set_yticks(ind+((width)/2.))
ax.set_yticklabels(labels, rotation='horizontal')
ax.set_xlabel("Correlation coefficient")
ax.set_title("Correlation coefficient")
#autolabel(rects)
plt.show()

<h3> Scaling Interest Rates </h3>

<p> Below I scale the interest rate for each loan to a value between 0 and 1 </p>

In [None]:
x = np.array(df.int_rate.values).reshape(-1,1) 
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df['int_rate_scaled'] = pd.DataFrame(x_scaled)
print (df.int_rate_scaled[0:5])

<h3> Scaling Loan Amount </h3>

<p> Below I scale the amount funded for each loan to a value between 0 and 1 </p>

In [None]:
x = np.array(df.funded_amnt.values).reshape(-1,1) 
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df['funded_amnt_scaled'] = pd.DataFrame(x_scaled)
print (df.funded_amnt_scaled[0:5])

<h3> Setting up the Neural Network </h3>

<p> Below I split the data into a training, testing, and prediction set </p>
<p> After that, I assign the feature and target columns, and create the function that will be used to pass the data into the model </p>

In [None]:
training_set = df[0:500000] # Train on first 500k rows
testing_set = df[500001:849999] # Test on next 350k rows
prediction_set = df[850000:] # Predict on final 37k rows


COLUMNS = ['Purpose_Cat','funded_amnt_scaled','int_rate_scaled','Default_Binary']          
FEATURES = ['Purpose_Cat','funded_amnt_scaled','int_rate_scaled']
LABEL = 'Default_Binary'

def input_fn(data_set):
    feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES} 
    labels = tf.constant(data_set[LABEL].values)
    return feature_cols, labels

print(input_fn(training_set))

<h3> Fitting The Model </h3>

In [None]:
feature_cols = [tf.contrib.layers.real_valued_column(k)
              for k in FEATURES]
print(feature_cols)
#config = tf.contrib.learn.RunConfig(keep_checkpoint_max=1) ######## DO NOT DELETE
regressor = tf.contrib.learn.DNNClassifier(
  feature_columns=feature_cols, hidden_units=[10, 20, 10], ) 
regressor.fit(input_fn=lambda: input_fn(training_set), steps=300)

<h3> Evaluating the Model </h3>

In [None]:
# Score accuracy
ev = regressor.evaluate(input_fn=lambda: input_fn(testing_set), steps=10)
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))

<h3> 
 </h3>

In [None]:
figure = plt.figure(figsize=(27, 9))

x_min, x_max = df['int_rate_scaled'].min() - .5, df['int_rate_scaled'].max() + .5
y_min, y_max = df['funded_amnt_scaled'].min() - .5, df['funded_amnt_scaled'].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, .02),
                     np.arange(y_min, y_max, .02))

for i in range(1,15):
    cm = plt.cm.RdBu
    cm_bright = ListedColormap(['#FF0000', '#0000FF'])
    ax = plt.subplot(3, 5, i)
    
    meshset = pd.DataFrame(data=np.c_[xx.ravel(), yy.ravel(),np.full(yy.ravel().shape, i),np.full(yy.ravel().shape, False)],
                           index=np.arange(0,len(xx.ravel())),
                           columns=['int_rate_scaled','funded_amnt_scaled','Purpose_Cat','Default_Binary'])

    Z = regressor.predict_proba(input_fn = lambda: input_fn(meshset))
    Z = np.array(list(Z))[:,1]
    Z = Z.reshape(xx.shape)
    
    ax.contourf(xx, yy, Z, cmap=cm, alpha=.8)
    trainsample = training_set.sample(frac=0.0001).reset_index(drop=True)
    testsample = testing_set.sample(frac=0.0001).reset_index(drop=True)
    X_train=trainsample[['int_rate_scaled','funded_amnt_scaled']]
    y_train=trainsample['Default_Binary'].astype(int)
    X_test=testsample[['int_rate_scaled','funded_amnt_scaled']]
    y_test=testsample['Default_Binary'].astype(int)

    # Plot also the training points
    ax.scatter(X_train.iloc[:,0], X_train.iloc[:,1], c=y_train, cmap=cm_bright)
    # and testing points
    ax.scatter(X_test.iloc[:,0], X_test.iloc[:,1], c=y_test, cmap=cm_bright,
               alpha=0.6)

    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xticks(())
    ax.set_yticks(())

In [None]:

plt.show()

<h3> Visualize Predictions Relative to Loan Size </h3>

In [None]:
plt.plot(prediction_set.funded_amnt_scaled, predictions, 'ro')
plt.ylabel("Model Prediction Value")
plt.xlabel("Funded Amount of Loan (Scaled between 0-1)")
plt.show()

<h3> Visualize Predictions Relative to Loan Purpose </h3>

In [None]:
plt.plot(prediction_set.Purpose_Cat, predictions, 'ro')
plt.ylabel("Default Prediction Value")
plt.xlabel("Loan Purpose")
plt.title("DNN Regressor Predicting Default By Loan Purpose")
fig_size = plt.rcParams["figure.figsize"]
fig_size[0] = 8
fig_size[1] = 8
plt.rcParams["figure.figsize"] = fig_size
labels = ['Debt Consolidation', 'Credit Card', 'Home Improvement', 'Other',
         'Major Purchase', 'Small Business', 'Car', 'Medical',
         'Moving', 'Vacation', 'House', 'Wedding',
         'Renewable Energy']

plt.xticks([1,2,3,4,5,6,7,8,9,10,11,12,13,14], labels, rotation='vertical')

plt.show()