<img align="center" src="http://sydney.edu.au/images/content/about/logo-mono.jpg">
<h1 align="center" style="margin-top:10px">Statistical Learning with Python</h1>
<h2 align="center" style="margin-top:20px">Lecture 9: Neural Networks (Regression)</h2>
<br>

<a href="#1.-Credit-Card-Data">Credit Card Data</a> <br>
<a href="#2.-Single-Layer-Perceptron">Single Layer Perceptron</a> <br>
<a href="#3.-Scikit-Learn-Wrapper">Scikit-Learn Wrapper</a> <br>
<a href="#4.-Model-Evaluation">Model Evaluation</a> <br>

This notebook relies on the following libraries and settings.

In [1]:
# Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore') 

In [2]:
# Plot settings
sns.set_context('notebook') # optimises figures for notebook display
sns.set_style('ticks') # set default plot style
crayon = ['#4E79A7','#F28E2C','#E15759','#76B7B2','#59A14F', 
          '#EDC949','#AF7AA1','#FF9DA7','#9C755F','#BAB0AB']
sns.set_palette(crayon) # set custom color scheme
%matplotlib inline
plt.rcParams['figure.figsize'] = (9, 6)

In [3]:
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score,  mean_absolute_error

### 1. Credit Card Data

We again use the `Credit` dataset. Here we simply repeat the steps from the previous notebook to load and process the data. 

In [4]:
# We will always assume that the data file is in a subdirectory called "Data"
train=pd.read_hdf('Data/Credit.h5', 'train')
test=pd.read_hdf('Data/Credit.h5', 'test')
train.head(10) 

Unnamed: 0_level_0,Income,Limit,Cards,Age,Education,Student,Married,Balance,Male,Caucasian,Asian
Obs,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
400,18.701,5524,5,64,7,0,0,966,0,0,1
26,14.09,4323,5,25,16,0,1,671,0,0,0
280,54.319,3063,3,59,8,1,0,269,0,1,0
261,67.937,5184,4,63,12,0,1,345,1,0,1
131,23.793,3821,4,56,12,1,1,868,0,0,0
381,115.123,7760,3,83,14,0,0,661,0,0,0
361,53.566,5891,4,82,10,0,0,712,0,1,0
21,17.7,2860,4,63,16,0,0,89,0,0,1
193,28.508,3933,4,56,14,0,1,336,1,0,1
259,41.4,2561,2,36,14,0,1,0,1,1,0


We consider two predictors, the credit card limit and income. 

In [5]:
# Response label
response = 'Balance'

# Creates a list with the names of all variables which are not the respose
predictors = [x for x in train.columns if x!= response]

# Here we convert the data to NumPy arrays. This is not strictly necessary, but generally better.
y_train = train[response].to_numpy() 
X_train = train[predictors].to_numpy() 

y_test = test[response].to_numpy()
X_test = test[predictors].to_numpy() 

In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test  = scaler.transform(X_test)

### 2. Single Layer Perceptron

In [7]:
from keras.models import Sequential
from keras.layers import Dense

slp = Sequential()
slp.add(Dense(24, input_dim=X_train.shape[1], activation='relu'))
slp.add(Dense(1))
slp.compile(loss='mse', optimizer='rmsprop')
slp.fit(X_train, y_train, epochs=10000, verbose=0)

Using TensorFlow backend.


Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.


<keras.callbacks.History at 0x1680328af28>

### 3. Scikit-Learn Wrapper

In [8]:
from keras.wrappers.scikit_learn import KerasRegressor

def build_model():
    model = Sequential()
    model.add(Dense(24, input_dim=X_train.shape[1], activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mse', optimizer='rmsprop')
    return model

estimator = KerasRegressor(build_fn=build_model, epochs=100, verbose=0)
cross_val_score(estimator, X_train, y_train, cv=5, scoring = 'neg_mean_squared_error')

array([-480320.19213479, -394190.32693342, -470625.01228574,
       -505443.28002781, -420750.55544441])

### 4. Model Evaluation


In [9]:
# Benchmark
ols = LinearRegression()
ols.fit(X_train, y_train)

# Initialise table
columns=['RMSE', 'R-Squared', 'MAE']
rows=['Linear Regression', 'Single Layer Perceptron']
results =pd.DataFrame(0.0, columns=columns, index=rows)

methods = [ols, slp] 

for i, method in enumerate(methods):    
    y_pred = method.predict(X_test)
    results.iloc[i, 0] = np.sqrt(mean_squared_error(y_test, y_pred))
    results.iloc[i, 1] = r2_score(y_test, y_pred)
    results.iloc[i, 2] = mean_absolute_error(y_test, y_pred) 

results.round(2)

Unnamed: 0,RMSE,R-Squared,MAE
Linear Regression,97.19,0.96,80.03
Single Layer Perceptron,13.27,1.0,9.73


### Additional Code

The two cells below format the notebook for display online. Please omit them from your work.

In [10]:
%%html
<style>
@import url('https://fonts.googleapis.com/css?family=Source+Sans+Pro|Open+Sans:800&display=swap');
</style>

In [11]:
from IPython.core.display import HTML
style = open('jstyle.css', "r").read()
HTML('<style>'+ style +'</style>')