# Workbook on time-series prediction with different Neural Models

## Aims:
- to give you familiarity with using the keras DataSet class for time series data   
   via the util function ```keras.utils.timeseries_dataset_from_array()```
- to give you experience of configuring and running models that split between the  _feature encoders_ and a _prediction head_
- to provider a springboard for you to explore different neural models  for sequence prediction problems

## Note: Python Naming Convention
- We've used a similar naming convention other than keras/tensorflow
- i.e. `names_with_underscores_seperating_words`

In [None]:
import socket
import pandas as pd
import numpy as np
import logging

import tensorflow as tf
import tensorflow.keras as keras
tf.get_logger().setLevel('ERROR')
logging.getLogger('tensorflow').setLevel(logging.ERROR)

import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Part 1: Introducing the keras [timeseries data set from array function](https://keras.io/api/preprocessing/timeseries)

##  A simple example that fits on-screen
To illustrate this, lets make a simple data set with four features and 10 time steps.

To make it obvious what is happening, we'll assume:
- the first feature has the values 1,2,...10, 
- the second feature has the values is 10,20,...
- the third feature has the values 1000,2000 ... 
- and so on


Now to make it easy to visualise on screen let's say:
- we are using a window of size 3,
- we are trying to predict the first column, one day in advance
- i.e. at time _t_ 
  - we give the model x[t], x[t-1], and x[t-2] 
  we want to predict x[t+1][0] 

### 1.1 Run the next cell to make the data in numpy and display it onscreen

In [None]:
dataset = np.zeros([10,4],dtype='int32')
for row in range(10):
    val=row+1
    dataset[row][0] = val 
    dataset[row][1] = val*10
    dataset[row][2] = val*100
    dataset[row][3] = val*1000
dataset

### 1.2 Now we will use a Keras built in function to give us a tensorflow dataset object

Things to note:
- they use the term _sequence_length_ as a synonym for *window size*
- you define the target (y value to predict) by saying *how many rows* to look ahead and also *which column* to predict.
- you can define how many styeps  forwards ot take between samples (the *stride* - same as for 2D convolutional networks)
- for now we will set the *batch_size* to be 1 - later we w ill see how to change this dynamically
- **you will normally set shuffle=False** - otyhereise you lose the temporal relationships in the data.

**Run the cell below to create the Keras Dataset object**

In [None]:
window_size = 3
timeseries = keras.utils.timeseries_dataset_from_array(
    dataset,
    sequence_length=window_size, # lets take sequences of length 3
    targets=dataset[window_size:,0],# we want to predict feature[0] on the next input after our sequence
    sequence_stride=1,
    sampling_rate=1,
    batch_size=1,
    shuffle=False,
    seed=None,
    start_index=None,
    end_index=None,
)

## What's in a dataset?
Lots of stuff and lots of functionality!

Datasets are designed to be used as part of a production pipeline.  
So instead of getting access via indexes (lie you would for pandas or numpy), they
provide access via iterators  such as _batch()_ or _take()_

For now, it's easiest to look at what this dataset object contains if we convert it to a list then print it out.

**Note** This code throws out a warning that it is reaching the end of the sequence and cannot process any more data.
- This seems to be perfectly normal - ther eis lots of discussion online about it.
- setting the wartning flags to remove this warning does not seem to work

In [None]:
list(timeseries)

### Yuk!
We can just about see that in list form, the timeseries dataset  contains 7 pairs of items.
- there are 7 because that is how many length 3 sequences you can get from 10 items

Each item  has:
- a tensor of shape (batchsize (1), sequence length (3), num_features(4))
- a scalar value (again wrapped up inside a tensor) 
  - because we only asked to predict 1 feature (the one at index 0)
  

### For you to experiment
Try  asking for
- different length sequences (line 1)
- or different size batches (line 8), 
- or for more than one feature as a label(line 5) 

then re-running the two code cells above and make sure you understand what you are getting

## But a tensorflow dataset does have some advantages
- for example we can ask it batch up the data
- and if we pass it to tensorflow preprocessing layers or a model's fit() method they will do that

So let's ask our timeseries to give us a load of batches

Note that batch() or take() give us the outputs of type batchdataset so we have to iterate over their contents using
````
    for item in timeseries.batch(batch_size=1):
    ````
    
instead of using slices like we would for numpy arrays or pandas dataframes

In [None]:
print('First batches of size 1')
for item in timeseries.batch(batch_size=1):
    print(f'{item[:-1]} : {item[-1]}')
    

In [None]:
print('Now batches of size 2')
x2= timeseries.batch(batch_size=2)
for item in x2:
    print(f'x= {item[:-1]} \n y= {item[-1]}')
print('The last batch only has one thing in, because there are only 7 sequences of length 3 in 0...9') 

## Finally, this is one way to convert one of these items back to numpy using a lambda function

Lets take the first batch as an example and turn it into a 'windowed' row of size (sequence length *number of features). and a scalar (the label)

As you can see the function *flatten_dummy_sequence()* f uses some reshaping and indexing into the tensors.

This can take a while to get your head around, so it's useful to implement and test your code with *toy* data designed so you can easily spot if you are pulling out the right things.


In [None]:
def flatten_dummy_sequence(item):
    # get size of array to hold one windowed row
    item_shape= item[0].shape
    num_windowed_features= item_shape[-2]*item_shape[-1]
    X= tf.reshape(item[0],num_windowed_features)
    y = tf.reshape(item[1],1)
    return(X,y)

my_iterator = iter(timeseries)
first_item = my_iterator.get_next()
X,y = flatten_dummy_sequence(first_item) 
print(f'when flattened the first item is:\n {X} : {y}')
X,y = flatten_dummy_sequence(my_iterator.get_next())
print(f'when flattened the next item is;\n {X} : {y}')


# Part 2: Comparing three different neural architectures for a time-series prediction problem

## Data set description and characteristics
Delhi data from [here](https://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data)

4 variables: temp, humidity, windspeed, pressure
Training set: 


## 2.1 Start by loading the data and cleaning up the data

In [None]:
# set path to data
if (socket.gethostname()=='csctcloud'): #on csctcloud
    datapath="/home/common/datasets"
elif (socket.gethostname()[0:7]=='jupyter'): #on csctcloud
    datapath="~/shared/datasets"
else: #machine specific- this is for jim's development
    datapath = "../datasets"
reldirname= datapath +"/delhi/"


#load data into pandas dataframes
train_raw = pd.read_csv(reldirname +"DailyDelhiClimateTrain.csv")
test_raw =  pd.read_csv(reldirname +"DailyDelhiClimateTest.csv")

### Print some descriptive statistics

In [None]:
#print some statistics
print (f'training data set has {train_raw.shape[0]} '
       f'rows and {train_raw.shape[1]} features\n'
       f' it has {train_raw.isna().sum().sum()} nulls\n'
      f'    text data set has  {test_raw.shape[0]} '
       f'rows and {test_raw.shape[1]} features\n'
       f' it has {train_raw.isna().sum().sum()} nulls\n'
      )

print(f'column names are {train_raw.columns}')

#easier to read and use the pandas index item if we convert it to a list
print(f' or more nicely when converted from pandas index to a list:\n {list(train_raw.columns)}')

#use the pandas function to p;rlovide stats abut the numerical features in  the raw data
train_raw.describe()

### Identify and fix the outliers
What stands out immediately is that there are some wierd outliers in the meanPressure column.

If it turns out that these are isolated odd readings we will replace them by the 50th centile value (~the mode)

It's cleanest to do this using the pandas ```.loc[row,col]``` syntax   
 using a condition for the row 
 - look this up if you've not come across it before

In [None]:
median = train_raw['meanpressure'].median()

# set values above 1200 mbar to the median value
train_raw.loc[ train_raw['meanpressure'] >1200, 'meanpressure'] = median

# set values below 800 mbar to the median value
train_raw.loc[ train_raw['meanpressure'] <800, 'meanpressure'] = median

train_raw.describe()

###  For now just predict the next day's temperature. 

We use the pandas built in ```shift()``` method to move things downards for a given number of periods
- 1 in our case, as we are predicting the next day

Then we will need to fill in something sensibel for day1 where we had no target

In [None]:
print('This is the first few rows of raw training data\n'
      f'{train_raw.head()}')

#select the mean temp to be the y value
#and copy with the date value
train_y = train_raw[['date','meantemp']]
train_y.set_index('date', inplace=True)

#shift works nicely if the index is a datetime object
train_y =train_y.shift(periods=1)

#fill NaN in first row with something sensible
train_y.iloc[0] = train_raw['meantemp'].mean()


print('\nthis is the labels (y-data) showing how we have manipulated it\n'
      f'{train_y.head()}'
     )




## 2.2 Getting the data ready for ML with Keras 

### Start by making a time series dataset from the train and test data we loaded
with the aim of predicting the next day's temperature.   
- for simplicity we'll drop the date columns and change everything to numpy arrays
- we will leave out all the parameters where the default settings are fine
- we'll take a window size of 7 days in case there are weekly effects



In [None]:
if 'date' in train_raw.columns:
    train_raw=train_raw.drop(columns=['date'])
if 'date' in test_raw.columns:
    test_raw=test_raw.drop(columns=['date'])
train=train_raw.to_numpy()
test = test_raw.to_numpy()

print(f'train and test shape {train.shape} , {test.shape}')

### next we'll apply a standard scaler to transform values 
Note how we fit the scaler to the training data only (because the test data is *unseen*),  
but apply the scaler to both training and test data

In [None]:
scaler = StandardScaler()
train=scaler.fit_transform(train)
test=scaler.transform(test)

### Set the window size
- we'll choose 7 since in many thigns to do with human activity there are weekly cycles

In [None]:
window_size=7


### Finally lets make the basic timeseries datasets
and afterwards store and print out the shape of each batch and item

In [None]:
train_keras_series = keras.utils.timeseries_dataset_from_array(
    train,
    sequence_length=window_size, 
    targets=train[window_size:,0],
    batch_size=1
    )

test_keras_series = keras.utils.timeseries_dataset_from_array(
    test,
    sequence_length=window_size, 
    targets=test[window_size:,0],
    batch_size=1
    )

In [None]:
first_item=iter(train_keras_series).get_next()
batch_shape= list(first_item)[0].shape
print(f'shape of batches is {batch_shape}')

item_shape= batch_shape[1:]
print(f'shape of items is {item_shape}')

## 2.3 Define Some common things to use in our comparisons
- If you were doing this *for real* you  would probably define these via a dictionary,    
  so you could iterate over different values in code to find the best hyper-parameters for each algorithm
- but we'll leave  that for your self-study

What we will do is use the *pipeline* workflow 
- and define a common *regression head* 
- this will sit on top of the different algorithm *bodies*
- and have one dense layer, then a single output node

In [None]:
epochs=15
batch_size=20
first_layer_nodes=20
dense_nodes= 10

regression_head = keras.Sequential([    
    keras.layers.Dense(dense_nodes,activation='relu'),
    keras.layers.Dense(1,activation='linear')])

### Finally define a performance reporting function 
- using a neat bit of code from [stackoverflow](https://stackoverflow.com/questions/56226621/how-to-extract-data-labels-back-from-- - Is this assumption about the appropriate error metric right???

In [None]:
def evaluate_and_report(model, name,train_ds,test_ds):
    ''' gets the train and test mse error
        for a given model and train/test datasets
        and make a nice plot
        Parameters:
        ==========
        model: trained instance of Keras Sequential or Model class
        name: string to use for reporting
        train_ds: ndarray or Keras dataset
        test_ds: ndarray or keras dataset
        '''
    trainres=f'Training MSE= {model.evaluate(train_ds)}'
    testres=f'Test MSe= {model.evaluate(test_ds)}'

    y_train = np.concatenate([y for x, y in train_ds], axis=0)
    y_train_pred= model.predict(train_ds).reshape(y_train.shape[0])
    print('made training predictions')
    
    y_test = np.concatenate([y for x, y in test_ds], axis=0)
    y_test_pred= model.predict(test_ds).reshape(y_test.shape[0])
    print('made test predictions')
    actual = np.concatenate((y_train,y_test))
    predicted= np.concatenate((y_train_pred, y_test_pred))

    fig,ax = plt.subplots(figsize=(15,5))
    ax.set_ylim((-2.5,2.5))
    ax.plot(predicted,label='predicted')
    ax.plot(actual,label="actual")
    ax.axvline(x=y_train.shape[0],color='red')
    ax.set_title(f'{name} results, red line denotes switch from train to test\n{trainres}\n{testres}')
    ax.legend()

## 2.4 Algorithm 1: a MLP with a time window.

### Preprocess data
For this case we can 'flatten' each X item the timeseries dataset from a windowsizex4 array into a 12x1

- I've been a bit lazy and worked out the flat size in advance
- and i've just printed out the first item from the dataset   
  to illustrate the dataset.take() method

In [None]:
flatsize = window_size* 4 #window size * num features

def flatten_weather_sequence(X,y):
    return(tf.reshape(X,[1,flatsize]),y)

In [None]:
flattened_train = train_keras_series.map(flatten_weather_sequence)
flattened_test = test_keras_series.map(flatten_weather_sequence)

### let's examine the sizes of the tensors holding the examples and batches

In [None]:
# get a batch 
first_item= iter(flattened_train).get_next()
print(f' first_item is of type {type(first_item)}\n'
      f'with contents {first_item}\n'
     )

In [None]:
flattened_batch_shape = list(first_item)[0].shape
print( f'shape of batches in flattened version is now {flattened_batch_shape}')

In [None]:
print(f'batchsize {flattened_batch_shape[0]}, item shape {flattened_batch_shape[1:]}')
flattened_item_shape = flattened_batch_shape[1:]

### Now lets build a sequential model for our MLP
- we'll use a single hidden layer of 20 nodes (we defined this in section 2.3) 
- note how in the fit() method we can now change batch shape from 1 to 20 on the fly!

In [None]:
#define the body
mlp_body= keras.Sequential(
    [
    keras.Input(shape=flattened_item_shape),
    keras.layers.Dense(first_layer_nodes,activation='relu'),
    ]
    )
       
#add our standard prediction head on top
mlp= keras.Sequential( [mlp_body, regression_head])

# build summarise and train
mlp.compile(optimizer='adam', loss='mse')
mlp.summary()
history=mlp.fit(flattened_train, epochs=epochs,batch_size=batch_size)

### Let's see how well it did on the training and test data


In [None]:
evaluate_and_report(mlp,"MLP", flattened_train, flattened_test)

## 2.5 Algorithm 2: A 1-D CNN

## Preprocess the data
The 2D CNN neeed to know the height and width of images in order to optimise its inner loops
- look at my code from week 2 for an example
- or [Keras.layers.Conv2d api](https://keras.io/api/layers/convolution_layers/convolution2d/)

Similarly the 1D CNN layer needs to have a fixed size number of timesteps (sequences) to work with
- not necessarily the same as the size of the filters (usually bigger)
- but  it needs to know the size of the loop to run it's filters over

So we can re-use the code we wrote to create the datasets which had sequences of length 7 days


## Now specify the 1D CNN architecture
- Let's see how we get on with kernel size 3 (days) : another hyper-parameter to be tuned
- and for fairness with the MLP (which had 20 hidden nodes) we'l have 20 kernels
- we also need to specify the input shape which is (batch_size,1,sequence_length, num_features)  
  i.e the shape of the batches we just found

** The main thing to note with ```Conv1D``` layers**
- is that for sequence problems we combine the kernel outputs using a ```GlobalAveragePooling1D()```
- instead of a ```Flatten()``` layer 

In [None]:

oneD_cnn_body = keras.Sequential(
                           [ keras.Input(shape=item_shape),
                             keras.layers.Conv1D(
                                 filters= first_layer_nodes,
                                 kernel_size=3,
                                 activation='relu'
                                 ),
                            keras.layers.GlobalAveragePooling1D(),
                            ]
            )
oneD_cnn= keras.Sequential (
    [oneD_cnn_body,
     regression_head
    ] )
oneD_cnn.compile(optimizer='adam', loss='mse')
oneD_cnn.summary()



In [None]:
history= oneD_cnn.fit(train_keras_series, epochs=epochs,batch_size=batch_size)

In [None]:
evaluate_and_report(oneD_cnn,"1-D ConvNet",train_keras_series,test_keras_series)

## 2.6 Algorithm 3 LSTM Network
- For the LSTMs we will simply use our original dataset, 
- getting the sequence length right would be a good start for experimentation

**Notice** that this is considerably  slower than MLP or CNN  
- because it is having to do BackPropagation Thought Time
- the memory overhead is also bigger


## Define and train model

In [None]:
lstm_body=keras.Sequential(
        [keras.Input(shape=item_shape),
         keras.layers.LSTM(units=first_layer_nodes,
                           stateful=False,
                          ),
        ])

lstmnet= keras.Sequential( [lstm_body,regression_head]) 


lstmnet.compile(optimizer='adam',loss='mse')
history=lstmnet.fit(train_keras_series,epochs=epochs,batch_size=batch_size)


## Evaluate and show results

In [None]:
evaluate_and_report(lstmnet,"LSTM", train_keras_series,test_keras_series)


# Part 3: For you to experiment

To make sure you are familiar with using the TimeSeriesDataset class and algorithms you could experiment with:
- Loking at the outputs as the model train: is it worth changing the maximum  number of epochs allowed?
- Changing the number of feature detectors (hidden layer perceptrons - 1DConv filters - LSTM nodes) in the models
- different datasets from kaggle etc.
- creating a benchmark algorithm with prediction='same as last time step'
- doing a fairer algorithm comparison via hyper-parameter tuning
- finding ways of monitoring memory usage
    - can you do this without requiring jupyter extensions you not be able to install in a work environment?
- investigating what sorts of errors different models make
  - starting by improving the plotting function 