# Practical Business Analytics Week 5 Lab - Time Series

The lab from this week looks at processing time series data and applying neural networks to sequential data.

Work through the cells in this Jupyter Notebook, following the instructions in the text boxes to load and analyse the data.

Run the cells below to import the necessary libraries.

In [None]:
!pip install yfinance keras tensorflow

In [None]:
import pandas as pd
import datetime
import math
import yfinance as yahooFinance
from statsmodels.tsa import seasonal
import numpy as np
import keras
from keras import layers
from keras import ops
from keras.utils import timeseries_dataset_from_array
import matplotlib.pyplot as plt

## Encoding time values

First we will look at how we can write functions to encode a time in a 24 hour clock format "HH\:MM\:SS" (for example "15\:15\:00" for 3.15pm) into a pair of numeric features.

We will start by using the Python *datetime* library to generate Python objects corresponding to a particular time.

In [None]:
midday = datetime.time.fromisoformat("12:00")

From this Python object we can access the hour, minutes and seconds of the time using the *hour*, *minute* and *second* attributes of the *midday* object.

```python
print(midday.hour) # 12
```

Try extracting the minute below.

We are going to construct a function than takes the number of seconds from midnight and converts this into two features. The first step towards doing this is to calculate the number of seconds from midnight from a Python *time* object.

Given that there are 60 seconds in a minute and 60 minutes in an hour, write a Python function that takes a *time* object and returns the number of second from midnight.

Test the function using the time object below. You should get a result of $70650$ seconds

```python
test_time = datetime.time.fromisoformat("19:37:30")
```

Now we will construct two features from the time in seconds. To do this we will use the following two formulae:

$$\textrm{Feature}_1 = \cos{\frac{2\pi s}{S}}$$

$$\textrm{Feature}_1 = \sin{\frac{2\pi s}{S}}$$

where $s$ is the time in seconds from midnight, and $S$ is the total number of seconds in a day.

Write a Python function to calculate these two features from the time in seconds from midnight.

You can use the *math.sin()* and *math.cos()* functions to calculate sine and cosine, and *math.pi* for the value of pi. To return two values from a Python function, create a pair to return:

```python
return (a,b)
```

We can combine these functions to build a function that takes a string as input and returns the two features as output.

It will look something like:

```python
def timeToFeatures(ts):
    time = datetime.time.fromisoformat(ts)
    seconds = ...
    x,y = ...
    return (x,y)
```

Test your function using:

```python
timeToFeatures("23:59:00")
```

The output should look like

```python
(0.9999904807207345, -0.004363309284747432)
```

It would be interesting to visualise these two features over a 24 hour period. We can generate a list of input time strings using:

```python
times = ["{:02d}".format(h)+":00:00" for h in range(24)]
```

We can create a pandas DataFrame with a column for the time using:

```python
d = pd.DataFrame({"Time":times})
```

Do this below

To generate the features, we can use the pandas *map* method to apply our function to each value in the Time column. Then we can plot the resulting features. The python code below applies the timeToFeatures function to each time and stores the values of the two features in the *x* and *y* columns.

```python
d["x"]=d["Time"].map(lambda t: timeToFeatures(t)[0])
d["y"]=d["Time"].map(lambda t: timeToFeatures(t)[1])
d.plot()
```

Run this below.

## Predicting stock values

Stock prices are a typical example of a classical time series.
We can get historic daily stock prices (so each price is from a fixed time of day) from the web up to yesterday and then build our neural models to predict possible future values.

For this experiment, I have used:
 - The first 70% of the stock prices, in date order, to TRAIN the neural network
 - The remaining 30% of the stock prices, so the newest dates, to TEST the neural network

As we discussed in the lecture, you would no doubt implement a robust time series cross validation evaluation approach (sliding window or forward chaining).

First we will download and plot the Google stock data.

```python
stock = yahooFinance.Ticker("GOOG")
stock_data = stock.history(period="max")
stock_data['Open'].plot()
```

We can break down the timeseries into the overall trend, seasonal changes, and residuals, using the decompose function in *statsmodels*.

Notice that as well as the overall trend, there are also seasonal changes detected.

```python
dec=seasonal.seasonal_decompose(stock_data["Open"],period=365)
dec.plot()
```

### Deep neural network prediction

In this experiment, we are going to use a fixed number of previous days stock prices to predict the next day price using a deep neural network. We will use the previous $n$ time steps at $t-1,t-2,\ldots,t-n$, to predict the next time step $t$.

We will be using the **Keras** library to build and train our neural network models. You can read about **Keras** on the website: https://keras.io/

Keras is a neural network library that can work on top of popular frameworks like Tensorflow, JAX, and Torch, and provides implementations of many common neural network architectures.

The first thing we need to do is detrend the data, which we can do by calculating the mean with a *rolling window*. Fortunately pandas has built in functions that can do this for us.

We will calculate the means using

```python
stock_means = stock_data["Open"].rolling(10,min_periods=1).mean()
```

We will now process the stock data by scaling it by the mean, and then adjusting so it is between $0$ and $1$. To do this we will first divide the stock data by the means we calculated above, and store this in a new DataFrame:

```python
stock_scaled = stock_data["Open"]/stock_means
```

Then we can use the minimum and maximum values to transform to be between $0$ and $1$. If the minimum is *stock_min* and maximum is *stock_max* we can use:

$$\textrm{stock\_norm} = \frac{\textrm{stock\_scaled}-\textrm{stock\_min}}{\textrm{stock\_max}-\textrm{stock\_min}}$$

To find the minimum of a dataframe *d*, you can use the method *d.min()*, and for the maximum, *d.max()*. Write Python code to create a normalised stock timeseries, *stock_norm*

Now plot the normalised and detrended data using the *stock_norm.plot()* method of your normalised stock data

#### Rolling windows

To generate a dataset for training, we can use a builtin function of the keras library to generate a dataset object that returns data points at $t-1,t-2,\ldots,t-n$ as the features (X), and the data point at $t$ as the target (y).

To see this in an example, we will create an array of values from 1 to 100, and take consecutive data points from it, with $n=10$.

```python
data = np.arange(1,101) # Generate a range from 1 to 100
input_data = data[:-10] # Leave off the last 10 values
targets = data[10:] # The targets start at the value at index 10
dataset = timeseries_dataset_from_array(input_data,targets,sequence_length=10,batch_size=1).as_numpy_iterator()
for i in range(1,6):
  print("Step",i)
  inputs, targets = next(dataset) # fetch the next X and y values
  print("X:",inputs[0]) 
  print("y:",targets[0])
```

Try this below

For a time series we cannot randomly select data for a train and test split, so instead we will take the first $70/%$ of the data as training and the remaining $30\%$ as test. Then we will create timeseries datasets from these using Keras.

```python
rows = stock_norm.shape[0]
stock_train = stock_norm.iloc[0:int(0.7*rows)]
stock_test = stock_norm.iloc[(int(0.7*rows)+1):rows]
n = 10 # Set n=10 to start with
rolling_stock_train = timeseries_dataset_from_array(stock_train[:-n], stock_train[n:], sequence_length=n,batch_size=128)
rolling_stock_test = timeseries_dataset_from_array(stock_test[:-n], stock_test[n:], sequence_length=n,batch_size=128)
```

Run this code below.

Now we will create and train a dense neural network model using Keras. Dense neural networks are composed of fully connected layers, where every neuron in a layer is connected to every neuron in the previous layer. We will use the Keras API to generate a dense network with three layers, and a "relu" activation function.

The final layer is the output, and has a linear activation as we are predicting a continuous value.

The model is then compiled and trained on our training data.

```python
dense_model = keras.Sequential()
dense_model.add(layers.Input((n,)))
dense_model.add(layers.Dense(50,activation='relu'))
dense_model.add(layers.Dense(20,activation='relu'))
dense_model.add(layers.Dense(1,activation='linear'))

dense_model.compile(optimizer=keras.optimizers.Adam(learning_rate=1e-3),loss='mean_squared_error')
dense_model.fit(rolling_stock_train,epochs=20)
```

Now we can plot the predictions made by the model, offsetting the test data to take into account that our first prediction is at index $n+1$.

```python
pred = dense_model.predict(rolling_stock_test)
l = pred.shape[0]
output = pd.DataFrame(stock_test.iloc[n:(n+l)])
output["Prediction"] = pred
output.plot()
```

We can also construct a **reccurent** neural network in Keras, using the LSTM layer. LSTM stands for Long Short-Term Memory and is used to work with sequential data. We will create and fit a model using the Keras builtin LSTM layers.

```python
rnn_model = keras.Sequential()
rnn_model.add(layers.Input((n,1)))
rnn_model.add(layers.LSTM(50,return_sequences=True))
rnn_model.add(layers.LSTM(30,return_sequences=False))
rnn_model.add(layers.Dense(1, activation='linear'))

rnn_model.compile(optimizer=keras.optimizers.Adam(learning_rate=1e-3),loss='mean_squared_error')
rnn_model.fit(rolling_stock_train,epochs=20)
```


Again we can plot the predictions compared to the true values:

```python
pred = rnn_model.predict(rolling_stock_test)
l = pred.shape[0]
output = pd.DataFrame(stock_test.iloc[n:(n+l)])
output["Prediction"] = pred
output.plot()
```

### Plotting on the original scale

We might want to plot our predictions for the test data on the original scale with the data before it was detrended. To do this, we need the mean values used to detrend the test data, and the original opening prices. We can find this data using:

```python
stock_data_test=stock_data["Open"].iloc[(int(0.7*rows)+1):rows].iloc[n:(n+l)] # Original un-normalised stock data
stock_means_test=stock_means.iloc[(int(0.7*rows)+1):rows].iloc[n:(n+l)] # Means used for detrending the test data
```

Using these see if you can generate a dataframe with the original stock data and un-normalised predictions. To do this you will need to first undo the normalisation, and then multiply by the means. Remember the data were normalised using:

$$\textrm{stock\_norm} = \frac{\textrm{stock\_scaled}-\textrm{stock\_min}}{\textrm{stock\_max}-\textrm{stock\_min}}$$

To undo this you will first need to multiply by $\textrm{stock\_max}-\textrm{stock\_min}$, before adding $\textrm{stock\_min}$. Finally you will need to multiply these values by the means.

## Optional extras

Try experimenting with different numbers of layers and neurons in the neural networks used above. For example to create a dense layer with 100 neurons as part of your model you would use:

```python
dense_model.add(layers.Dense(100,activation='relu'))
```

You can add this to the existing model definition between the input and final layer.

Experiment with both the dense and LSTM networks.