# Long Short-Term Memory Network 

The Long Short-Term Memory network, or LSTM network, is a recurrent neural network that is trained using Backpropagation Through Time and overcomes the vanishing gradient problem.LSTM networks have memory blocks that are connected through layers.

Given the log error of this month, what is the log error (log(zestimate)- log(sales price)) next month?

We can write a simple function to convert our single column of data into a two-column dataset: the first column containing this month’s (t) log Error count and the second column containing next month’s (t+1) log Error, to be predicted.

Before we get started, let’s first import all of the functions and classes we intend to use. This assumes a working SciPy environment with the Keras deep learning library installed.

In [13]:
import numpy as np
import pandas as pd
import math
from collections import defaultdict
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import LabelEncoder
from keras.layers import Dropout, BatchNormalization

Fixing the random number seed to ensure our results are reproducible.

In [14]:
# fix random seed for reproducibility
np.random.seed(7)

In [15]:
train = pd.read_csv("train_2016_v2.csv", parse_dates=["transactiondate"])
prop = pd.read_csv('properties 2016.csv')
sample = pd.read_csv('sample_submission.csv')
 
print('Fitting Label Encoder on properties')
for c in prop.columns:
    prop[c]=prop[c].fillna(-1)
    if prop[c].dtype == 'object':
        lbl = LabelEncoder()
        lbl.fit(list(prop[c].values))
        prop[c] = lbl.transform(list(prop[c].values))
        
#Create df_train and x_train y_train from that
print('Creating training set:')
df_train = train.merge(prop, how='left', on='parcelid')

  interactivity=interactivity, compiler=compiler, result=result)


Fitting Label Encoder on properties
Creating training set:


LSTMs are sensitive to the scale of the input data, specifically when the sigmoid (default) or tanh activation functions are used. 

So, we rescale the data to the range of 0-to-1, also called normalizing.We can easily normalize the dataset using the MinMaxScaler preprocessing class from the scikit-learn library.

In [16]:
df_train.fillna(-1.0)
dataset = df_train[['logerror']]
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)

After modelling our data and estimate the skill of our model on the training dataset, we need to get an idea of the skill of the model on new unseen data. 
For a normal classification or regression problem, we would do this using cross validation.

With time series data, the sequence of values is important. 
A simple method that we can use is to split the ordered dataset into train and test datasets. 
The code below calculates the index of the split point and separates the data into the training datasets with 90% of the 
observations that we can use to train our model, leaving the remaining 10% for testing the model.

In [17]:
# split into train and test sets
train_size = int(len(dataset) * 0.90)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
print(len(train), len(test))

81247 9028


We define a function to create new dataset;

The function takes two arguments: the dataset, which is a NumPy array that we want to convert into a dataset, and the look_back, which is the number of previous time steps to use as input variables to predict the next time 
period -in this case defaulted to 1.

This default will create a dataset where X is the log Error at a given time (t)
and Y is the log Error at the next time (t + 1).

In [18]:
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
    dataX, dataY = [],[]
    for i in range(len(dataset)-look_back):
        a = dataset[i:(i+look_back), :]
        dataX.append(a)
        dataY.append(dataset[i + look_back, :])
    return np.array(dataX), np.array(dataY)

We create a function build_model which takes the parameters train,test, look_back, activation, optimizer, epoch and loss as arguments.

The LSTM network expects the input data (X) to be provided with a specific array structure in the form of: [samples, time steps, features].

Currently, our data is in the form: [samples, features] and we are framing the problem as one time step for each sample. We can transform the prepared train and test input data into the expected structure using numpy.reshape().

To design and fit our LSTM network for this problem,
The network has a visible layer with 1 input, a hidden layer with 4 LSTM blocks or neurons, and an output layer that makes a single value prediction. The default sigmoid activation function is used for the LSTM blocks. The network is trained for 100 epochs and a batch size of 1 is used.

In [19]:
look_back = 1
def build_model(train,test,look_back,activation,optimizer,epochs,loss):
    # reshape into X=t and Y=t+1
    trainX, trainY = create_dataset(train, look_back)
    testX, testY = create_dataset(test, look_back)
    # reshape input to be [samples, time steps, features]
    trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
    testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
    # create and fit the LSTM network
    model = Sequential()
    model.add(LSTM(4, input_shape=(1, look_back),activation = activation))
    model.add(Dense(1))
    model.compile(loss=loss, optimizer=optimizer)
    model.fit(trainX, trainY, epochs=epochs, batch_size=256, verbose=2)
    # make predictions
    trainPredict = model.predict(trainX)
    testPredict = model.predict(testX)
    # invert predictions
    trainPredict = scaler.inverse_transform(trainPredict)
    trainY = scaler.inverse_transform(trainY)
    testPredict = scaler.inverse_transform(testPredict)
    testY = scaler.inverse_transform(testY)
    # calculate root mean squared error
    trainScore = math.sqrt(mean_squared_error(trainY, trainPredict[:, 0]))
    print('Train Score: %.2f RMSE' % (trainScore))
    testScore = math.sqrt(mean_squared_error(testY, testPredict[:, 0]))
    print('Test Score: %.2f RMSE' % (testScore))

The function build_model is called various times with changing the activation functions (sigmoid, relu, tanh), optimizers (adam, adagrad) and loss functions (mean_squared_error,hinge, logcosh) for Epoch=100.

In [20]:
build_model(train,test,1,'sigmoid','adam',100,'mean_squared_error')

Epoch 1/100
 - 2s - loss: 0.0771
Epoch 2/100
 - 2s - loss: 4.8103e-04
Epoch 3/100
 - 2s - loss: 3.0218e-04
Epoch 4/100
 - 2s - loss: 3.0219e-04
Epoch 5/100
 - 2s - loss: 3.0219e-04
Epoch 6/100
 - 2s - loss: 3.0220e-04
Epoch 7/100
 - 2s - loss: 3.0221e-04
Epoch 8/100
 - 2s - loss: 3.0221e-04
Epoch 9/100
 - 2s - loss: 3.0223e-04
Epoch 10/100
 - 2s - loss: 3.0220e-04
Epoch 11/100
 - 2s - loss: 3.0222e-04
Epoch 12/100
 - 2s - loss: 3.0226e-04
Epoch 13/100
 - 2s - loss: 3.0226e-04
Epoch 14/100
 - 2s - loss: 3.0227e-04
Epoch 15/100
 - 2s - loss: 3.0230e-04
Epoch 16/100
 - 2s - loss: 3.0225e-04
Epoch 17/100
 - 2s - loss: 3.0227e-04
Epoch 18/100
 - 2s - loss: 3.0233e-04
Epoch 19/100
 - 2s - loss: 3.0238e-04
Epoch 20/100
 - 2s - loss: 3.0245e-04
Epoch 21/100
 - 2s - loss: 3.0252e-04
Epoch 22/100
 - 2s - loss: 3.0256e-04
Epoch 23/100
 - 2s - loss: 3.0256e-04
Epoch 24/100
 - 2s - loss: 3.0274e-04
Epoch 25/100
 - 2s - loss: 3.0256e-04
Epoch 26/100
 - 2s - loss: 3.0271e-04
Epoch 27/100
 - 2s - loss

We can see that the model has an average error of about 0.16 on the training dataset, and about 0.15 on the test dataset. Of all the various functions used to predict and calculate the RMSE these set of activation, loss and optimizer functions give more accurate predictions of Log Error of home values.

In [21]:
build_model(train,test,1,'relu','adam',100,'mean_squared_error')

Epoch 1/100
 - 2s - loss: 0.0323
Epoch 2/100
 - 2s - loss: 3.7371e-04
Epoch 3/100
 - 2s - loss: 3.7311e-04
Epoch 4/100
 - 2s - loss: 3.7236e-04
Epoch 5/100
 - 2s - loss: 3.7132e-04
Epoch 6/100
 - 2s - loss: 3.7015e-04
Epoch 7/100
 - 2s - loss: 3.6867e-04
Epoch 8/100
 - 2s - loss: 3.6712e-04
Epoch 9/100
 - 2s - loss: 3.6521e-04
Epoch 10/100
 - 2s - loss: 3.6311e-04
Epoch 11/100
 - 2s - loss: 3.6065e-04
Epoch 12/100
 - 2s - loss: 3.5804e-04
Epoch 13/100
 - 2s - loss: 3.5509e-04
Epoch 14/100
 - 2s - loss: 3.5190e-04
Epoch 15/100
 - 1s - loss: 3.4853e-04
Epoch 16/100
 - 2s - loss: 3.4498e-04
Epoch 17/100
 - 2s - loss: 3.4126e-04
Epoch 18/100
 - 2s - loss: 3.3755e-04
Epoch 19/100
 - 2s - loss: 3.3400e-04
Epoch 20/100
 - 2s - loss: 3.2911e-04
Epoch 21/100
 - 2s - loss: 3.2592e-04
Epoch 22/100
 - 2s - loss: 3.2150e-04
Epoch 23/100
 - 2s - loss: 3.1812e-04
Epoch 24/100
 - 2s - loss: 3.1466e-04
Epoch 25/100
 - 2s - loss: 3.1175e-04
Epoch 26/100
 - 2s - loss: 3.0994e-04
Epoch 27/100
 - 2s - loss

We can see that the model has an average error of about 0.16 on the training dataset, and about 0.15 on the test dataset. Of all the various functions used to predict and calculate the RMSE these set of activation, loss and optimizer functions give more accurate predictions of Log Error of home values.

In [22]:
build_model(train,test,1,'tanh','adam',100,'mean_squared_error')

Epoch 1/100
 - 2s - loss: 0.0680
Epoch 2/100
 - 2s - loss: 3.3425e-04
Epoch 3/100
 - 2s - loss: 3.1591e-04
Epoch 4/100
 - 2s - loss: 3.1582e-04
Epoch 5/100
 - 2s - loss: 3.1572e-04
Epoch 6/100
 - 2s - loss: 3.1558e-04
Epoch 7/100
 - 2s - loss: 3.1542e-04
Epoch 8/100
 - 2s - loss: 3.1522e-04
Epoch 9/100
 - 2s - loss: 3.1505e-04
Epoch 10/100
 - 2s - loss: 3.1480e-04
Epoch 11/100
 - 2s - loss: 3.1452e-04
Epoch 12/100
 - 2s - loss: 3.1418e-04
Epoch 13/100
 - 2s - loss: 3.1380e-04
Epoch 14/100
 - 2s - loss: 3.1345e-04
Epoch 15/100
 - 2s - loss: 3.1283e-04
Epoch 16/100
 - 2s - loss: 3.1251e-04
Epoch 17/100
 - 2s - loss: 3.1188e-04
Epoch 18/100
 - 2s - loss: 3.1128e-04
Epoch 19/100
 - 2s - loss: 3.1059e-04
Epoch 20/100
 - 2s - loss: 3.0999e-04
Epoch 21/100
 - 2s - loss: 3.0934e-04
Epoch 22/100
 - 2s - loss: 3.0865e-04
Epoch 23/100
 - 2s - loss: 3.0793e-04
Epoch 24/100
 - 2s - loss: 3.0741e-04
Epoch 25/100
 - 2s - loss: 3.0632e-04
Epoch 26/100
 - 2s - loss: 3.0572e-04
Epoch 27/100
 - 2s - loss

The model has an average error of about 0.16 on the training dataset, and about 0.15 on the test dataset. Of all the various functions used to predict and calculate the RMSE these set of activation and optimizer functions give more accurate predictions of Log Error of home values with the loss function being MEAN_SQUARED_ERROR which measures the average of the squares of the errors or deviations —that is, the difference between the estimator and what is estimated.

In [23]:
#Hinge loss function
build_model(train,test,1,'tanh','adam',100,'hinge')

Epoch 1/100
 - 2s - loss: 0.8655
Epoch 2/100
 - 2s - loss: 0.2673
Epoch 3/100
 - 2s - loss: 0.0011
Epoch 4/100
 - 2s - loss: 8.3260e-04
Epoch 5/100
 - 2s - loss: 7.0076e-04
Epoch 6/100
 - 2s - loss: 6.1188e-04
Epoch 7/100
 - 2s - loss: 5.4652e-04
Epoch 8/100
 - 2s - loss: 4.9493e-04
Epoch 9/100
 - 2s - loss: 4.5169e-04
Epoch 10/100
 - 2s - loss: 4.1609e-04
Epoch 11/100
 - 2s - loss: 3.8580e-04
Epoch 12/100
 - 2s - loss: 3.5984e-04
Epoch 13/100
 - 2s - loss: 3.3609e-04
Epoch 14/100
 - 2s - loss: 3.1593e-04
Epoch 15/100
 - 2s - loss: 2.9944e-04
Epoch 16/100
 - 2s - loss: 2.8545e-04
Epoch 17/100
 - 2s - loss: 2.7224e-04
Epoch 18/100
 - 2s - loss: 2.5860e-04
Epoch 19/100
 - 2s - loss: 2.4523e-04
Epoch 20/100
 - 2s - loss: 2.3238e-04
Epoch 21/100
 - 2s - loss: 2.2013e-04
Epoch 22/100
 - 2s - loss: 2.0834e-04
Epoch 23/100
 - 2s - loss: 1.9641e-04
Epoch 24/100
 - 2s - loss: 1.8467e-04
Epoch 25/100
 - 2s - loss: 1.7258e-04
Epoch 26/100
 - 2s - loss: 1.6038e-04
Epoch 27/100
 - 2s - loss: 1.4648

The model has an average error of about 57.46 on the training dataset, and about 57.48 on the test dataset. By this we can see that these set of function give out the bad predictions of Log Error. 

In [24]:
#logcosh loss function
build_model(train,test,1,'tanh','adam',100,'logcosh')

Epoch 1/100
 - 2s - loss: 0.0526
Epoch 2/100
 - 2s - loss: 8.0372e-04
Epoch 3/100
 - 2s - loss: 1.5111e-04
Epoch 4/100
 - 2s - loss: 1.5109e-04
Epoch 5/100
 - 2s - loss: 1.5109e-04
Epoch 6/100
 - 2s - loss: 1.5106e-04
Epoch 7/100
 - 2s - loss: 1.5108e-04
Epoch 8/100
 - 1s - loss: 1.5108e-04
Epoch 9/100
 - 1s - loss: 1.5108e-04
Epoch 10/100
 - 2s - loss: 1.5107e-04
Epoch 11/100
 - 1s - loss: 1.5107e-04
Epoch 12/100
 - 2s - loss: 1.5107e-04
Epoch 13/100
 - 2s - loss: 1.5102e-04
Epoch 14/100
 - 2s - loss: 1.5103e-04
Epoch 15/100
 - 2s - loss: 1.5101e-04
Epoch 16/100
 - 1s - loss: 1.5101e-04
Epoch 17/100
 - 2s - loss: 1.5097e-04
Epoch 18/100
 - 2s - loss: 1.5095e-04
Epoch 19/100
 - 2s - loss: 1.5095e-04
Epoch 20/100
 - 2s - loss: 1.5097e-04
Epoch 21/100
 - 2s - loss: 1.5088e-04
Epoch 22/100
 - 2s - loss: 1.5089e-04
Epoch 23/100
 - 2s - loss: 1.5083e-04
Epoch 24/100
 - 2s - loss: 1.5079e-04
Epoch 25/100
 - 2s - loss: 1.5089e-04
Epoch 26/100
 - 2s - loss: 1.5078e-04
Epoch 27/100
 - 2s - loss

The model has an average error of about 0.16 on the training dataset, and about 0.15 on the test dataset. By this we can see that these set of function give out the same predictions of Log Error as the mean squared error loss function. 

In [25]:
build_model(train,test,1,'tanh','Adagrad',100,'hinge')

Epoch 1/100
 - 2s - loss: 0.6602
Epoch 2/100
 - 2s - loss: 0.1939
Epoch 3/100
 - 2s - loss: 0.0024
Epoch 4/100
 - 2s - loss: 0.0015
Epoch 5/100
 - 2s - loss: 0.0012
Epoch 6/100
 - 2s - loss: 0.0011
Epoch 7/100
 - 2s - loss: 0.0010
Epoch 8/100
 - 2s - loss: 9.4884e-04
Epoch 9/100
 - 2s - loss: 8.9868e-04
Epoch 10/100
 - 2s - loss: 8.5881e-04
Epoch 11/100
 - 2s - loss: 8.2596e-04
Epoch 12/100
 - 2s - loss: 7.9799e-04
Epoch 13/100
 - 2s - loss: 7.7324e-04
Epoch 14/100
 - 2s - loss: 7.5099e-04
Epoch 15/100
 - 2s - loss: 7.3112e-04
Epoch 16/100
 - 2s - loss: 7.1346e-04
Epoch 17/100
 - 2s - loss: 6.9726e-04
Epoch 18/100
 - 2s - loss: 6.8242e-04
Epoch 19/100
 - 2s - loss: 6.6879e-04
Epoch 20/100
 - 2s - loss: 6.5608e-04
Epoch 21/100
 - 2s - loss: 6.4411e-04
Epoch 22/100
 - 2s - loss: 6.3287e-04
Epoch 23/100
 - 2s - loss: 6.2235e-04
Epoch 24/100
 - 2s - loss: 6.1249e-04
Epoch 25/100
 - 2s - loss: 6.0313e-04
Epoch 26/100
 - 2s - loss: 5.9431e-04
Epoch 27/100
 - 2s - loss: 5.8590e-04
Epoch 28/10

The model has an average error of about 18.40 on the training dataset, and about 18.40 on the test dataset. 

In [26]:
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back),activation = 'tanh'))
model.add(Dense(1))
model.add(BatchNormalization())
model.add(Dropout(.6))
model.add(Dense(1))
model.compile(loss='hinge', optimizer='Adagrad')
model.fit(trainX, trainY, epochs=200, batch_size=256, verbose=2)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform(trainY)
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform(testY)
trainScore = math.sqrt(mean_squared_error(trainY, trainPredict[:, 0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY, testPredict[:, 0]))
print('Test Score: %.2f RMSE' % (testScore))

Epoch 1/200
 - 3s - loss: 0.7224
Epoch 2/200
 - 2s - loss: 0.4858
Epoch 3/200
 - 2s - loss: 0.4538
Epoch 4/200
 - 2s - loss: 0.4306
Epoch 5/200
 - 2s - loss: 0.4079
Epoch 6/200
 - 2s - loss: 0.3889
Epoch 7/200
 - 2s - loss: 0.3696
Epoch 8/200
 - 2s - loss: 0.3541
Epoch 9/200
 - 2s - loss: 0.3363
Epoch 10/200
 - 2s - loss: 0.3219
Epoch 11/200
 - 2s - loss: 0.3057
Epoch 12/200
 - 2s - loss: 0.2919
Epoch 13/200
 - 2s - loss: 0.2771
Epoch 14/200
 - 2s - loss: 0.2634
Epoch 15/200
 - 2s - loss: 0.2504
Epoch 16/200
 - 2s - loss: 0.2387
Epoch 17/200
 - 2s - loss: 0.2258
Epoch 18/200
 - 2s - loss: 0.2129
Epoch 19/200
 - 2s - loss: 0.2021
Epoch 20/200
 - 2s - loss: 0.1903
Epoch 21/200
 - 2s - loss: 0.1795
Epoch 22/200
 - 2s - loss: 0.1677
Epoch 23/200
 - 2s - loss: 0.1571
Epoch 24/200
 - 2s - loss: 0.1466
Epoch 25/200
 - 2s - loss: 0.1365
Epoch 26/200
 - 2s - loss: 0.1259
Epoch 27/200
 - 2s - loss: 0.1166
Epoch 28/200
 - 2s - loss: 0.1076
Epoch 29/200
 - 2s - loss: 0.0974
Epoch 30/200
 - 2s - lo

The eposch has been set as 200 but this doesn't improve the RSME rather the error is increased to 22.20 on train and 22.19 on test data set in comparision with eposch=100.

### Kernel Initializers
Initializations define the way to set the initial random weights of Keras layers.

#### Normal
Initializer that generates tensors with a normal distribution.
Here, we have used Random normal distribution of the weights and epoch being 200 the RSME of train dataset 21.52 and test 21.52.

#### Random Uniform
Initializer that generates tensors with a uniform distribution.
Here, we have used Random normal distribution of the weights and epoch being 200 the RSME of train dataset 21.51 and test 21.50.

Even with the normal or uniformly distributed weights the error of the log Error doesn't improve.

In [27]:
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back),activation = 'tanh'))
model.add(Dense(1,kernel_initializer = 'normal'))
model.add(BatchNormalization())
model.add(Dropout(.6))
model.add(Dense(1,kernel_initializer = 'normal'))
model.compile(loss='hinge', optimizer='Adagrad')
model.fit(trainX, trainY, epochs=200, batch_size=256, verbose=2)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform(trainY)
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform(testY)
trainScore = math.sqrt(mean_squared_error(trainY, trainPredict[:, 0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY, testPredict[:, 0]))
print('Test Score: %.2f RMSE' % (testScore))

Epoch 1/200
 - 3s - loss: 0.8240
Epoch 2/200
 - 2s - loss: 0.5971
Epoch 3/200
 - 2s - loss: 0.4466
Epoch 4/200
 - 2s - loss: 0.4187
Epoch 5/200
 - 2s - loss: 0.4024
Epoch 6/200
 - 2s - loss: 0.3841
Epoch 7/200
 - 2s - loss: 0.3659
Epoch 8/200
 - 2s - loss: 0.3511
Epoch 9/200
 - 2s - loss: 0.3362
Epoch 10/200
 - 2s - loss: 0.3195
Epoch 11/200
 - 2s - loss: 0.3078
Epoch 12/200
 - 2s - loss: 0.2922
Epoch 13/200
 - 2s - loss: 0.2816
Epoch 14/200
 - 2s - loss: 0.2678
Epoch 15/200
 - 2s - loss: 0.2551
Epoch 16/200
 - 2s - loss: 0.2428
Epoch 17/200
 - 2s - loss: 0.2315
Epoch 18/200
 - 2s - loss: 0.2205
Epoch 19/200
 - 2s - loss: 0.2086
Epoch 20/200
 - 2s - loss: 0.1989
Epoch 21/200
 - 2s - loss: 0.1872
Epoch 22/200
 - 2s - loss: 0.1767
Epoch 23/200
 - 2s - loss: 0.1659
Epoch 24/200
 - 2s - loss: 0.1551
Epoch 25/200
 - 2s - loss: 0.1457
Epoch 26/200
 - 2s - loss: 0.1365
Epoch 27/200
 - 2s - loss: 0.1273
Epoch 28/200
 - 2s - loss: 0.1169
Epoch 29/200
 - 2s - loss: 0.1082
Epoch 30/200
 - 2s - lo

In [28]:
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back),activation = 'tanh'))
model.add(Dense(1,kernel_initializer = 'random_uniform'))
model.add(BatchNormalization())
model.add(Dropout(.6))
model.add(Dense(1,kernel_initializer = 'random_uniform'))
model.compile(loss='hinge', optimizer='Adagrad')
model.fit(trainX, trainY, epochs=200, batch_size=256, verbose=2)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform(trainY)
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform(testY)
trainScore = math.sqrt(mean_squared_error(trainY, trainPredict[:, 0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY, testPredict[:, 0]))
print('Test Score: %.2f RMSE' % (testScore))

Epoch 1/200
 - 3s - loss: 0.8307
Epoch 2/200
 - 2s - loss: 0.6101
Epoch 3/200
 - 2s - loss: 0.4535
Epoch 4/200
 - 2s - loss: 0.4196
Epoch 5/200
 - 2s - loss: 0.4008
Epoch 6/200
 - 2s - loss: 0.3811
Epoch 7/200
 - 2s - loss: 0.3657
Epoch 8/200
 - 2s - loss: 0.3513
Epoch 9/200
 - 2s - loss: 0.3355
Epoch 10/200
 - 2s - loss: 0.3215
Epoch 11/200
 - 2s - loss: 0.3066
Epoch 12/200
 - 2s - loss: 0.2953
Epoch 13/200
 - 2s - loss: 0.2811
Epoch 14/200
 - 2s - loss: 0.2678
Epoch 15/200
 - 2s - loss: 0.2543
Epoch 16/200
 - 2s - loss: 0.2429
Epoch 17/200
 - 2s - loss: 0.2324
Epoch 18/200
 - 2s - loss: 0.2206
Epoch 19/200
 - 2s - loss: 0.2077
Epoch 20/200
 - 2s - loss: 0.1977
Epoch 21/200
 - 2s - loss: 0.1876
Epoch 22/200
 - 2s - loss: 0.1771
Epoch 23/200
 - 2s - loss: 0.1666
Epoch 24/200
 - 2s - loss: 0.1570
Epoch 25/200
 - 2s - loss: 0.1469
Epoch 26/200
 - 2s - loss: 0.1370
Epoch 27/200
 - 2s - loss: 0.1272
Epoch 28/200
 - 2s - loss: 0.1180
Epoch 29/200
 - 2s - loss: 0.1079
Epoch 30/200
 - 2s - lo

## Summary:

In this notebook, we have implemented LSTM recurrent neural networks for time series prediction of LogError = (log(Zestimate)-log(salesprice)) using 2016 property dataset and its corresponding log error values provided by zillow for home value prediction in Python using Keras and tensorflow deep learning libraries. 

Firstly, we have converted an array of values into a dataset matrix and fix random seed for reproducability. Normalized the dataset and split into training 90% and test dataset 10% later, a LSTM network was bulit with 4 inputs, 1 layer to predict the Error of existing Log Error and predicted Log Error by randonmly generated weights on gradient descent with various epochs, optimizers, activation and loss functions. With eposch=100, Activation functions= (sigmoid, relu, tanh), optimizers (adam, adagrad) and loss functions = mean_squared_error. Also, the weights are given manually for the gradient descent by kernel intializers which gives the best RSME of 0.16 on the train data and 0.15 on the test data.


#### Referrence:
https://www.sciencedirect.com/science/article/pii/S0377221703005484