# T81-558: Applications of Deep Neural Networks
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), School of Engineering and Applied Science, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

**Module 10 Assignment: Time Series Neural Network**

**Student Name: Julia Huang**

# Assignment Instructions

For this assignment you will use a LSTM to predict a time series contained in the data file **[series-31-num.csv](https://data.heatonresearch.com/data/t81-558/datasets/series-31-num.csv)**.  The code that you will use to complete this will be similar to the sunspots example from the course module.  This data set contains two columns: *time* and *value*.  Create a LSTM network and train it with a sequence size of 5 and a prediction window of 1.  If you use a different sequence size, you will not have the correct number of submission rows. Train the neural network, the data set is fairly simple and you should easily be able to get a RMSE below 1.0.  FYI, I generate this datasets by fitting a cubic spline to a series of random points. 

This is a time series data set, do not randomize the order of the rows!  For your training data use all *time* values less than 3000 and for test, use the remaining values greater than or equal to 3000. For the submit file, send me the results of your test evaluation.  You should have two columns: *time* and *value*.  The column *time* should be the time at the beginning of each predicted sequence. The *value* should be the next value that was predicted for each of your sequences.

Your submission file will look similar to:

|time|value|
|-|-|
|3000|37.022846|
|3001|37.030582|
|3002|37.03816|
|3003|37.045563|
|3004|37.0528|
|...|...|

# Assignment Submit Function

You will submit the 10 programming assignments electronically.  The following submit function can be used to do this.  My server will perform a basic check of each assignment and let you know if it sees any basic problems. 

**It is unlikely that should need to modify this function.**

In [0]:
import base64
import os
import numpy as np
import pandas as pd
import requests

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - Pandas dataframe output.
# key - Your student key that was emailed to you.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.  
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    r = requests.post("https://api.heatonresearch.com/assignment-submit",
        headers={'x-api-key':key}, json={'csv':base64.b64encode(data.to_csv(index=False).encode('ascii')).decode("ascii"),
        'assignment': no, 'ext':ext, 'py':encoded_python})
    if r.status_code == 200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))

# Google CoLab Instructions

If you are using Google CoLab, it will be necessary to mount your GDrive so that you can send your notebook during the submit process.  Running the following code will map your GDrive to /content/drive.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [3]:
!ls /content/drive/My\ Drive/Colab\ Notebooks

assignment_hjulia_class7.ipynb	 assignment_jhuang_class8.ipynb
assignment_jhuang_class10.ipynb  assignment_juliahuang_class3.ipynb
assignment_jhuang_class1.ipynb	 assignment_ZachWang_class9.ipynb
assignment_jhuang_class2.ipynb	 Kaggle_features_v1.ipynb
assignment_jhuang_class3.ipynb	 Kaggle_preprocessing_v1.ipynb
assignment_jhuang_class4.ipynb	 Untitled
assignment_jhuang_class5.ipynb	 Untitled0.ipynb
assignment_jhuang_class6.ipynb


# Assignment #10 Sample Code

The following code provides a starting point for this assignment.

In [13]:
import pandas as pd
import os

    

# This is your student key that I emailed to you at the beginnning of the semester.
key = "Yg3Uc8sn118A6HaWAFSKG5g1Th1nOyw34jLD5Uh8"  # This is an example key and will not work.

# You must also identify your source file.  (modify for your local setup)
file='/content/drive/My Drive/Colab Notebooks/assignment_jhuang_class10.ipynb'  # Google CoLab
# file='C:\\Users\\jeffh\\projects\\t81_558_deep_learning\\assignments\\assignment_yourname_class10.ipynb'  # Windows
#file='/Users/jheaton/projects/t81_558_deep_learning/assignments/assignment_yourname_class10.ipynb'  # Mac/Linux

# Read from time series file
df = pd.read_csv("https://data.heatonresearch.com/data/t81-558/datasets/series-31-num.csv")


print("Starting file:")
print(df[0:10])

print("Ending file:")
print(df[-10:])




Starting file:
   time      value
0     0  10.000000
1     1  10.050953
2     2  10.101758
3     3  10.152415
4     4  10.202924
5     5  10.253286
6     6  10.303499
7     7  10.353566
8     8  10.403485
9     9  10.453256
Ending file:
      time      value
3990  3990  14.694572
3991  3991  14.727313
3992  3992  14.760351
3993  3993  14.793687
3994  3994  14.827322
3995  3995  14.861256
3996  3996  14.895491
3997  3997  14.930026
3998  3998  14.964862
3999  3999  15.000000


In [14]:
df_train = df[df['time']<3000]
df_test = df[df['time']>=3000]

series_train = df_train['value'].tolist()
series_test = df_test['value'].tolist()

print("Training set has {} observations.".format(len(series_train)))
print("Test set has {} observations.".format(len(series_test)))


Training set has 3000 observations.
Test set has 1000 observations.


In [47]:
time=df_test.time
time

3000    3000
3001    3001
3002    3002
3003    3003
3004    3004
        ... 
3995    3995
3996    3996
3997    3997
3998    3998
3999    3999
Name: time, Length: 1000, dtype: int64

In [37]:
import numpy as np
def to_sequences(seq_size, obs):
    x = []
    y = []

    for i in range(len(obs)-SEQUENCE_SIZE):
        #print(i)
        window = obs[i:(i+SEQUENCE_SIZE)]
        after_window = obs[i+SEQUENCE_SIZE]
        window = [[x] for x in window]
        #print("{} - {}".format(window,after_window))
        x.append(window)
        y.append(after_window)
        
    return np.array(x),np.array(y)

SEQUENCE_SIZE = 5
x_train,y_train = to_sequences(SEQUENCE_SIZE,series_train)
x_test,y_test = to_sequences(SEQUENCE_SIZE,series_test)

print("Shape of training set: {}".format(x_train.shape))
print("Shape of test set: {}".format(x_test.shape))

Shape of training set: (2995, 5, 1)
Shape of test set: (995, 5, 1)


In [38]:

x_train

array([[[10.        ],
        [10.05095321],
        [10.10175826],
        [10.15241526],
        [10.20292437]],

       [[10.05095321],
        [10.10175826],
        [10.15241526],
        [10.20292437],
        [10.25328571]],

       [[10.10175826],
        [10.15241526],
        [10.20292437],
        [10.25328571],
        [10.30349943]],

       ...,

       [[36.47831158],
        [36.49049175],
        [36.50246712],
        [36.51423794],
        [36.52580449]],

       [[36.49049175],
        [36.50246712],
        [36.51423794],
        [36.52580449],
        [36.537167  ]],

       [[36.50246712],
        [36.51423794],
        [36.52580449],
        [36.537167  ],
        [36.54832574]]])

In [39]:
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding
from tensorflow.keras.layers import LSTM
from tensorflow.keras.datasets import imdb
from tensorflow.keras.callbacks import EarlyStopping
import numpy as np

print('Build model...')
model = Sequential()
model.add(LSTM(64, dropout=0.0, recurrent_dropout=0.0,input_shape=(None, 1)))
model.add(Dense(32))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto', restore_best_weights=True)
print('Train...')

model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=2,epochs=1000)

Build model...
Train...
Train on 2995 samples, validate on 995 samples
Epoch 1/1000
2995/2995 - 2s - loss: 59.9661 - val_loss: 12.6534
Epoch 2/1000
2995/2995 - 1s - loss: 0.3260 - val_loss: 0.3293
Epoch 3/1000
2995/2995 - 1s - loss: 0.0192 - val_loss: 0.1103
Epoch 4/1000
2995/2995 - 1s - loss: 0.0074 - val_loss: 0.0576
Epoch 5/1000
2995/2995 - 1s - loss: 0.0051 - val_loss: 0.0437
Epoch 6/1000
2995/2995 - 1s - loss: 0.0029 - val_loss: 0.0260
Epoch 7/1000
2995/2995 - 1s - loss: 0.0030 - val_loss: 0.0202
Epoch 8/1000
2995/2995 - 1s - loss: 0.0023 - val_loss: 0.0340
Epoch 9/1000
2995/2995 - 1s - loss: 0.0025 - val_loss: 0.0186
Epoch 10/1000
2995/2995 - 1s - loss: 0.0024 - val_loss: 0.0216
Epoch 11/1000
2995/2995 - 1s - loss: 0.0020 - val_loss: 0.0155
Epoch 12/1000
2995/2995 - 1s - loss: 0.0020 - val_loss: 0.0224
Epoch 13/1000
2995/2995 - 1s - loss: 0.0019 - val_loss: 0.0191
Epoch 14/1000
2995/2995 - 1s - loss: 0.0023 - val_loss: 0.0164
Epoch 15/1000
2995/2995 - 1s - loss: 0.0017 - val_loss

<tensorflow.python.keras.callbacks.History at 0x7fcf3d795fd0>

In [40]:
from sklearn import metrics

pred = model.predict(x_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print("Score (RMSE): {}".format(score))


Score (RMSE): 0.08435883438478084


In [61]:
pred.shape

(995, 1)

In [66]:
df_submit=pd.DataFrame(pred, columns=["value"])
df_submit.index = df_submit.index + 3000
df_submit=pd.concat([time, df_submit], axis=1)
df_submit.dropna(inplace=True)
df_submit

Unnamed: 0,time,value
3000,3000,36.437775
3001,3001,36.444473
3002,3002,36.450966
3003,3003,36.457272
3004,3004,36.463440
...,...,...
3990,3990,14.752495
3991,3991,14.785945
3992,3992,14.819691
3993,3993,14.853735


In [67]:
submit(source_file=file,data=df_submit,key=key,no=10)

Success: Submitted Assignment #10 for hjulia:
You have submitted this assignment 2 times. (this is fine)

