# Time series
Time Series sequences and prediction of sizonal variations and trends in data by [Laurence Moroney with examples](https://github.com/lmoroney/dlaicourse/tree/master/TensorFlow%20In%20Practice/Course%204%20-%20S%2BP).

Time Series :
- **Univariate** - single value at each step in time
- **Multivariate** - multiple values at each step in time

Used as search of patterns in time:
- Prediction of future = forecast, past = imputation
- Anomaly detection
- Feature detection in speach recognition

Common patterns in time series:
- **trend** constant or linear function
- **seasonality** when same patterns repeat at predictable intervals
- **combination** of trend and seasonality
- **autocorrelation** it correlates with a delayed copy of itself often called a lag.
- **white noise** truly random signal can't be predicted.
- **Non-stationary** different patterns within different time scope in data.

Note **Non-stationary** time series may need few different models (each with limited time scope in data) + anomaly detection.

![autocorrelation1](autocorrelation1.jpg)
![autocorrelation2](autocorrelation2.jpg)
![realmix](realmix.jpg)
![non-stationary](non-stationary.jpg)



## Metric in timeseries
train, test data sets and error measure in TS


### Fixed Partitions
Note difference TS-ML with classical ML:
- You whant to include **full cycles** in all sets to avoid bias of diferrent representation in data sets.
- tune hyper parameters on validation set
- include validation set in training and check final performance on test set
- if fianl performance is good before releasing, train production model on whole data set including test, as test has most recent data and more relevant signal for the forecast.

![fixed-partitions.jpg](fixed-partitions.jpg)

### Rolling Partitions

- take small portion of training period and try to predict next day (still inside training set)
- add that day to training and try to predict next day
- repeat until all training set is used. 
Monitor model performance on validation set. Model accuracy should gradualy increase.

Sometime you don't allocate test set and use next day life data as your test set.

![roll-partitions.jpg](roll-partitions.jpg)


### Metrics for evaluating performance

**MSE** (L2 metric) penalizes proportinal to error value, big errors affects function more severly.

**MAE** (L1 metric) does not penalize large errors as much as the mse does. 
If your gain or your loss is just slightly proportional to the size of the error, then the mae may be better.

**MAPE** - mean absolute percentage error, this is the mean ratio between the absolute error and the absolute value, this gives an idea of the size of the errors compared to the values.

```python
mae = keras.metrics.mean_absolute_error(y_validation, prediction).numpy()
```

![](err-metric.jpg)

Base line (relative performance) : naive forecast, moving average (rolling average) or moving average of difference.

"constant" naive forecast -> f(t+1) = f(t)

![](relative-performance.jpg)

Note, 
mae(naive forecast) = 5.7
mae(moving average) = 7.14
,  cause components: trend, seasonality (?). **differencing** is a technique to remove trend, seasonality. So instead of studying the time series itself, we study the difference between the value at time T and the value at an earlier period. Differencing requires **time lag** assumption.

![](differencing1.jpg)

base line prediction `f(t) = f(t-lag) + roll_avg(differencing[t, window])`
with moving window = 32

![](differencing2.jpg)

smoth by average 2 with centered window = \[t-360, t+5 \]?! for prediction (validation test?) we can't use centered window as we dont know future, use trailing windows.

![](differencing3.jpg)

Note, 
mae(naive forecast) = 5.7
mae(smooth moving average) = 4.5!

statistical forecasting = use linear aproximations in TS (mainly smooth moving average)


## Preparing features and labels
Great support for Time Series in [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). A **Dataset** can be used to represent an input pipeline as a collection of elements and a "logical plan" of transformations that act on those elements.

```python
a = Dataset.range(1, 3)   # ==> [ 1, 2 ]
b = Dataset.range(4, 8)   # ==> [ 4, 5, 6, 7 ]
a.concatenate(b).batch(2) # ==> [[1,2], [ 4, 5], [6, 7]] 
d = tf.data.Dataset.from_tensor_slices(['hello', 'world'])
```


In [108]:
# TensorSliceDataset <> narray! how to inspect?
d = tf.data.Dataset.from_tensor_slices(['hello', 'world'])
d

<TensorSliceDataset shapes: (), types: tf.string>

In [31]:
a = tf.data.Dataset.range(1, 3)  
b = tf.data.Dataset.range(4, 8)  
ds = a.concatenate(b).batch(2)   # BatchDataset([tf.Tensor<tf.int64>])
print([tensors.numpy() for tensors in ds])  # .numpy() .shape .dtype
ds == [[1,2], [4,5], [6,7]]

[array([1, 2], dtype=int64), array([4, 5], dtype=int64), array([6, 7], dtype=int64)]


False

In [81]:
import tensorflow as tf

raw_ds = tf.data.Dataset.range(10)
rolling_ds = raw_ds.window(3, shift=1)
for window_ds in rolling_ds:
    for val in window_ds:
        print(val.numpy(), end=" ")
    print()

0 1 2 
1 2 3 
2 3 4 
3 4 5 
4 5 6 
5 6 7 
6 7 8 
7 8 9 
8 9 
9 


In [82]:
# drop_remainder - cut full windows only
rolling_ds = raw_ds.window(3, shift=1, drop_remainder=True)
for window_ds in rolling_ds:
    for val in window_ds:
        print(val.numpy(), end=" ")
    print()

0 1 2 
1 2 3 
2 3 4 
3 4 5 
4 5 6 
5 6 7 
6 7 8 
7 8 9 


In [87]:
# get iputs in numpy format
ds = rolling_ds.flat_map(lambda window: window.batch(5)) # 5 - max single batch length
for w in ds:
    print(w.numpy())

[0 1 2]
[1 2 3]
[2 3 4]
[3 4 5]
[4 5 6]
[5 6 7]
[6 7 8]
[7 8 9]


In [88]:
# lableling - split time window into x,y 
ds = ds.map(lambda window: (window[:-1], window[-1]))
for x,y in ds:
    print(x.numpy(), y.numpy())

[0 1] 2
[1 2] 3
[2 3] 4
[3 4] 5
[4 5] 6
[5 6] 7
[6 7] 8
[7 8] 9


Sequence bias is when the order of things can impact the selection of things. For example, if I were to ask you your favorite TV show, and listed "Game of Thrones", "Killing Eve", "Travellers" and "Doctor Who" in that order, you're probably more likely to select 'Game of Thrones' as you are familiar with it, and it's the first thing you see. Even if it is equal to the other TV shows. So, when training data in a dataset, we don't want the sequence to impact the training in a similar way, so it's good to shuffle them up.

In [89]:
# shuffle to avoid sequence order bias at training.
# buffer_size - number of previos elements to keep for random shuffle. Max distance between two shuffled items.
ds = ds.shuffle(buffer_size=10)
for x,y in ds:
    print(x.numpy(), y.numpy())

[5 6] 7
[1 2] 3
[7 8] 9
[3 4] 5
[0 1] 2
[6 7] 8
[4 5] 6
[2 3] 4


In [98]:
ds = ds.batch(2).prefetch(1)
for x,y in ds:
    print("x=", x.numpy().tolist())
    print("y=", y.numpy())

x= [[6, 7], [0, 1]]
y= [8 2]
x= [[4, 5], [3, 4]]
y= [6 5]
x= [[7, 8], [5, 6]]
y= [9 7]
x= [[2, 3], [1, 2]]
y= [4 3]
