<a href="https://colab.research.google.com/github/rajanieprabha/HackZurich/blob/master/time_series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2019 The TensorFlow Authors.

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Time series forecasting

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/tutorials/structured_data/time_series"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/structured_data/time_series.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/docs/blob/master/site/en/tutorials/structured_data/time_series.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/docs/site/en/tutorials/structured_data/time_series.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import glob
mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False

In [0]:
from google.colab import drive

drive.mount("/content/gdrive")



In [0]:
### ENTER THE NAME OF LOCATION YOU WANT TO GET THE MODEL FOR 


In [0]:
tf.random.set_seed(13)

In [0]:
parking_locations = []
for filename in glob.glob('/content/gdrive/My Drive/Archive/*.csv'):
  location = filename.split("/")[5].split("-")[0]
  if location not in parking_locations:
    parking_locations.append(location)

print(parking_locations)
print(len(parking_locations))



## Check list uniqueness

In [0]:
print ("The original list is : " + str(parking_locations)) 
  
flag = 0
  
# using set() + len() 
# to check all unique list elements 
flag = len(set(parking_locations)) == len(parking_locations) 
  
  
# printing result 
if(flag) : 
    print ("List contains all unique elements") 
else :  
    print ("List contains does not contains all unique elements")

## Read csv files for one location

In [0]:
data = []
for filename in glob.glob('/content/gdrive/My Drive/Archive/*.csv'):
  if parking_locations[29] == filename.split("/")[5].split("-")[0]:
    print(filename)
    data.append(pd.read_csv(filename, names = ['Date', 'free'], index_col='Date', parse_dates=True))

data = pd.concat(data)
    
data.sort_index(inplace=True)
print(len(data))

Let's take a glance at the data.

In [0]:
data.head()


In [0]:

out_of_service_count = 0
itera = 0 
service_off = False
remove_these = []
for index_1, row in data.iterrows():
  if row['free'] == 0:
    out_of_service_count += 1
    if out_of_service_count > 60 and not service_off:
      service_off=True
      start_ind = itera-60
      #start_time = data.iloc[itera-60].index
      for i in range(itera-60,itera+1):
        
        remove_these.append(i)
  
      
  elif service_off:
    #end_time = data.iloc[itera]
    end_index = itera
    remove_these.append(itera)
    #print(start_time, end_time)

    #data.loc[~data.index.isin(data.index[data.index.slice_indexer(start_time, end_time)])]
    #data.drop(start_time)
    out_of_service_count = 0
      
      
  else:
    out_of_service_count = 0
  
  itera += 1
      
    
print(remove_these)



In [0]:
data.drop(data.index[remove_these])
data.head()

As you can see above, an observation is recorded every 10 mintues. This means that, for a single hour, you will have 6 observations. Similarly, a single day will contain 144 (6x24) observations. 

Given a specific time, let's say you want to predict the free spots 6 hours in the future. In order to make this prediction, you choose to use 5 days of observations. Thus, you would create a window containing the last 720(5x144) observations to train the model. Many such configurations are possible, making this dataset a good one to experiment with.

The function below returns the above described windows of time for the model to train on. The parameter `history_size` is the size of the past window of information. The `target_size` is how far in the future does the model need to learn to predict. The `target_size` is the label that needs to be predicted.

In [0]:
data.head()

In [0]:
def univariate_data(dataset, start_index, end_index, history_size, target_size):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i)
    # Reshape data from (history_size,) to (history_size, 1)
    data.append(np.reshape(dataset[indices], (history_size, 1)))
    labels.append(dataset[i+target_size])
  return np.array(data), np.array(labels)

In both the following tutorials, the first 300,000 rows of the data will be the training dataset, and there remaining will be the validation dataset. This amounts to ~2100 days worth of training data.

In [0]:
TRAIN_SPLIT = 300000

Setting seed to ensure reproducibility.

## Part 1: Forecast a univariate time series
First, you will train a model using only a single feature (temperature), and use it to make predictions for that value in the future.

Let's first extract only the temperature from the dataset.

In [0]:
def create_time_steps(length):
  time_steps = []
  for i in range(-length, 0, 1):
    time_steps.append(i)
  return time_steps

In [0]:
def show_plot(plot_data, delta, title):
  labels = ['History', 'True Future', 'Model Prediction']
  marker = ['.-', 'rx', 'go']
  time_steps = create_time_steps(plot_data[0].shape[0])
  if delta:
    future = delta
  else:
    future = 0

  plt.title(title)
  for i, x in enumerate(plot_data):
    if i:
      plt.plot(future, plot_data[i], marker[i], markersize=10,
               label=labels[i])
    else:
      plt.plot(time_steps, plot_data[i].flatten(), marker[i], label=labels[i])
  plt.legend()
  plt.xlim([time_steps[0], (future+5)*2])
  plt.xlabel('Time-Step')
  return plt

### Baseline
Before proceeding to train a model, let's first set a simple baseline. Given an input point, the baseline method looks at all the history and predicts the next point to be the average of the last 20 observations.

In [0]:
def baseline(history):
  return np.mean(history)

Let's see if you can beat this baseline using a recurrent neural network.

### Recurrent neural network

A Recurrent Neural Network (RNN) is a type of neural network well-suited to time series data. RNNs process a time series step-by-step, maintaining an internal state summarizing the information they've seen so far. For more details, read the [RNN tutorial](https://www.tensorflow.org/tutorials/sequences/recurrent). In this tutorial, you will use a specialized RNN layer called Long Short Term Memory ([LSTM](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/layers/LSTM))

Let's now use `tf.data` to shuffle, batch, and cache the dataset.

In [0]:
BATCH_SIZE = 256
BUFFER_SIZE = 10000



You will see the LSTM requires the input shape of the data it is being given.

In [0]:
EVALUATION_INTERVAL = 200
EPOCHS = 10


In [0]:
#features_considered = ['p (mbar)', 'T (degC)', 'rho (g/m**3)']

In [0]:
features_considered = ['free']
features = data[features_considered]
features.index = data.index
features.head()

Let's have a look at how each of these features vary across time.

In [0]:
features.plot(subplots=True)

As mentioned, the first step will be to normalize the dataset using the mean and standard deviation of the training data.

In [0]:
dataset = features.values
#print(dataset)
data_mean = dataset.mean(axis=0)
data_std = dataset.std(axis=0)
print(data_std)
print(data_mean)

In [0]:
dataset = (dataset-data_mean)/data_std

In [0]:
print(dataset)
print(len(dataset))

### Single step model
In a single step setup, the model learns to predict a single point in the future based on some history provided.

The below function performs the same windowing task as below, however, here it samples the past observation based on the step size given.

In [0]:
def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size, step, single_step=False):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i, step)
    data.append(dataset[indices])

    if single_step:
      labels.append(target[i+target_size])
    else:
      labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)

In this tutorial, the network is shown data from the last five (5) days, i.e. 720 observations that are sampled every hour. The sampling is done every one hour since a drastic change is not expected within 60 minutes. Thus, 120 observation represent history of the last five days.  For the single step prediction model, the label for a datapoint is the temperature 12 hours into the future. In order to create a label for this, the temperature after 72(12*6) observations is used.

In [0]:
past_history = 1000
future_target = 72
STEP = 6

#

Let's look at a single data-point.


In [0]:
def plot_train_history(history, title):
  loss = history.history['loss']
  val_loss = history.history['val_loss']

  epochs = range(len(loss))

  plt.figure()

  plt.plot(epochs, loss, 'b', label='Training loss')
  plt.plot(epochs, val_loss, 'r', label='Validation loss')
  plt.title(title)
  plt.legend()

  plt.show()

### Multi-Step model
In a multi-step prediction model, given a past history, the model needs to learn to predict a range of future values. Thus, unlike a single step model, where only a single future point is predicted, a multi-step model predict a sequence of the future.

For the multi-step model, the training data again consists of recordings over the past five days sampled every hour. However, here, the model needs to learn to predict the temperature for the next 12 hours. Since an obversation is taken every 10 minutes, the output is 72 predictions. For this task, the dataset needs to be prepared accordingly, thus the first step is just to create it again, but with a different target window.

In [0]:
future_target = 72
x_train_multi, y_train_multi = multivariate_data(dataset, dataset[:,0], 0,
                                                 TRAIN_SPLIT, past_history,
                                                 future_target, STEP)
x_val_multi, y_val_multi = multivariate_data(dataset, dataset[:,0],
                                             TRAIN_SPLIT, None, past_history,
                                             future_target, STEP)

Let's check out a sample data-point.

In [0]:
print ('Single window of past history : {}'.format(x_train_multi[0].shape))
print ('\n Target temperature to predict : {}'.format(y_train_multi[0].shape))

In [0]:
BATCH_SIZE = 256
BUFFER_SIZE = 10000
train_data_multi = tf.data.Dataset.from_tensor_slices((x_train_multi, y_train_multi))
train_data_multi = train_data_multi.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()

val_data_multi = tf.data.Dataset.from_tensor_slices((x_val_multi, y_val_multi))
val_data_multi = val_data_multi.batch(BATCH_SIZE).repeat()

Plotting a sample data-point.

In [0]:
def multi_step_plot(history, true_future, prediction):
  plt.figure(figsize=(12, 6))
  num_in = create_time_steps(len(history))
  num_out = len(true_future)
  plt.plot(num_in, np.array(history[:,0]), label='History')
  plt.plot(np.arange(num_out)/STEP, np.array(true_future), 'bo',
           label='True Future')
  if prediction.any():
    plt.plot(np.arange(num_out)/STEP, np.array(prediction), 'ro',
             label='Predicted Future')
  plt.legend(loc='upper left')
  plt.show()

In this plot and subsequent similar plots, the history and the future data are sampled every hour.

In [0]:
print(train_data_multi)
for x, y in train_data_multi:
  multi_step_plot(x[0], y[0], np.array([0]))
  break

Since the task here is a bit more complicated than the previous task, the model now consists of two LSTM layers. Finally, since 72 predictions are made, the dense layer outputs 72 predictions.

In [0]:

multi_step_model = tf.keras.models.Sequential()
multi_step_model.add(tf.keras.layers.LSTM(32,
                                          return_sequences=True,
                                          input_shape=x_train_multi.shape[-2:]))
                                   
multi_step_model.add(tf.keras.layers.LSTM(32, return_sequences=True))
multi_step_model.add(tf.keras.layers.LSTM(32, return_sequences=True))
multi_step_model.add(tf.keras.layers.LSTM(16))
multi_step_model.add(tf.keras.layers.Dense(72))

multi_step_model.compile(optimizer=tf.keras.optimizers.RMSprop(clipvalue=1.0), loss='mae')

Let's see how the model predicts before it trains.

In [0]:
for x, y in val_data_multi.take(1):
  print (multi_step_model.predict(x).shape)

In [0]:
EPOCHS = 100
EVALUATION_INTERVAL = 500

multi_step_history = multi_step_model.fit(train_data_multi, epochs=EPOCHS,
                                          steps_per_epoch=EVALUATION_INTERVAL,
                                          validation_data=val_data_multi,
                                          validation_steps=50)

In [0]:
plot_train_history(multi_step_history, 'Multi-Step Training and validation loss')

In [0]:
print(val_data_multi)

In [0]:
print(x[1])
  #break

#### Predict a multi-step future
Let's now have a look at how well your network has learnt to predict the future.

In [0]:
for x, y in val_data_multi:
  #print(x.shape, y.shape)
  multi_step_plot(x[0], y[0], multi_step_model.predict(x)[0])
  #break

In [0]:
multi_step_model.save('/content/gdrive/My Drive/zuerichparkhaushardauii.h5')

In [0]:
#tf.enable_eager_execution()
new_model = tf.keras.models.load_model('/content/gdrive/My Drive/zuerichparkhaushardauii.h5')
x_test = tf.random.uniform((1,84,1),minval=0,maxval=200)
y_test = tf.random.uniform((1,72),minval=0,maxval=200)


test_data_multi = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_data_multi = test_data_multi.cache().shuffle(1000).batch(1).repeat()
for x, y in test_data_multi:
  multi_step_plot(x[0], y[0], new_model.predict(x, steps=1)[0]*data_std+data_mean)
  break



#p = (new_model.predict(x_test, steps=1))
#print(p*data_std+data_mean)
#multi_step_plot(x_test, y_test, p)