# Regression Task in Deep Learning

Case Study: Loan Amount Calculator

Another common type of machine learning problem is "regression", which consists of predicting a continuous value instead of a discrete label. For instance, predicting the temperature tomorrow, given meteorological data, or predicting the time that a software project will take to complete, given its specifications.

We import all the required libraries

In [None]:
# Import Modules
import os
#import pandas
import pandas as pd
import keras
# Modeling
from keras import models
from keras import layers
# Validation
from keras import backend as K
# NumPy
import numpy as np
# Plot
import matplotlib.pyplot as plt

In [None]:
# Check Keras version
keras.__version__

'2.2.5'

## Build a Loan Amount Calculator using Multiple Linear Regression Dataset

The company wants to predict potential interest rate of a loan based on the amount of loan, income, purpose and other variables. Your job is to set up a model to look for possible influences on interest rate (variable int_rate) and to set up a multiple linear regression to predict it.


### Data Understanding

In [None]:
# Load the data

LoanStats = pd.read_csv('LoanStats.csv', sep = ',')

LoanStats.head()

Unnamed: 0,int_rate,loan_amnt,term,grade,home_ownership,annual_inc,purpose
0,10.65,5000,36,B,RENT,24000.0,credit_card
1,15.27,2500,60,C,RENT,30000.0,car
2,15.96,2400,36,C,RENT,12252.0,small_business
3,13.49,10000,36,C,RENT,49200.0,other
4,12.69,3000,60,B,RENT,80000.0,other


In [None]:
#We drop the columns no longer required
LoanStats.drop(columns = ['grade','home_ownership','purpose'], inplace=True)

In [None]:
#Check for missing Values
LoanStats.isna().sum()

int_rate      0
loan_amnt     0
term          0
annual_inc    0
dtype: int64

#### Split the data in train and test

In [None]:
train_dataset = LoanStats.sample(frac=0.8,random_state=0)
test_dataset = LoanStats.drop(train_dataset.index)

In [None]:
#Let us inspect test dataset
test_dataset.head()

Unnamed: 0,int_rate,loan_amnt,term,annual_inc
10,14.65,6500,60,72000.0
11,12.69,12000,36,75000.0
13,9.91,3000,36,15000.0
19,6.03,9200,36,77385.19
21,12.42,21000,36,105000.0


In [None]:
test_dataset.shape

(7957, 4)

In [None]:
#Since we want to predict int_rate we need to label it as target variable
train_labels = train_dataset.pop('int_rate')
test_labels = test_dataset.pop('int_rate')

In [None]:
# Inspect target
train_labels[0:10]

20260     9.63
35519    11.48
26446    14.72
15586    11.99
32646    16.07
12288    11.49
10196    11.99
38396    13.36
20432    12.68
37497    12.53
Name: int_rate, dtype: float64

## Prepare the data
Range of all variables should be equal-model then trains faster!

In [None]:
#First we have to get the mean and sd of the training data
mean = train_dataset.mean(axis = 0)
sd = train_dataset.std(axis=0)

In [None]:
#Transformation (Z-transformation)

train_dataset -= mean # -= subtract the mean from every value in a column

train_dataset /= sd

In [None]:
# Validate results:
train_dataset.std(axis=0)

loan_amnt     1.0
term          1.0
annual_inc    1.0
dtype: float64

In [None]:
#Transformation (Z-transformation)

test_dataset -= mean # -= subtract the mean from every value in a column

test_dataset /= sd

## Build Network

activation: 'relu' only
Since this exercise is only aimed at usage of CNN and not on the goodness of the model, we are taking basic layers

In [None]:
#We need to build the CNN with following parameters

network = models.Sequential()
network.add(layers.Dense(64, activation='relu', input_shape = (3,)))
network.add(layers.Dense(1))

In [None]:
## Summary of network
network.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 64)                256       
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 65        
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________


### Initialization - Part 2
- Loss-function: `mse` #Mean Squared Error
- optimizer: `rmsprop`
- Metrics: `mae` #Mean Average Error

In [None]:
network.compile(optimizer='rmsprop', loss='mse',metrics=['mape', 'mae'])

In [None]:
network.fit(train_dataset, train_labels, epochs=10)




Epoch 1/10





Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f24043d9c88>

In [None]:
test_mape, test_mae = network.evaluate(x=test_dataset, y=test_labels)



In [None]:
print("MAPE & MAE:", test_mape , test_mae)

MAPE & MAE: 152.34189457957493 15.26102464840582


In the 10th Epoch Mean Absolute percentage error & Mean absolute error for train data is 26.08 & 2.7 respectively

While Mean Absolute percentage error & Mean absolute error for test data is 152.34 & 15.26 respectively

Given the above values, optmizing layers is advisable for better results.