# Course Outline
Week 1 - focus onm time series types and basic forecasting on them. Start preparing timeseries for running ML algorithms on them (e.g. splitting into a training, validation and training sets)
Week 2 - forecasting using Dense model vs. Naive prediction
Week 3 - Using RNNs to forecast time series, stateful vs. stateless approaches, training on windows on data
Week 4 - Forecasting real-world data using CNN

## Time series examples
TS are  everywhere: stock prices, weather forecasts, historical trends (e.g. Moore's Law)
TS is an ordered sequence of values that are usually equally spaced over time.

Univariate TS - has a single value at each timestamp; Multivariate TS - has multiple values at each timestamp (e.g. Birth and Death rate or Global Temperature vs. CO2 concentration)

## ML applied to time series
What types of things can we do with ML over time series?
1. Prediction/forecasting base on the data. We can predict future values (e.g. birth vs. death to plan retirement/immigration programs in the country) or past values to se how we got to where we are now (imputation)
You might also want to fill in "gaps in your data" (e.g. predict missing intermediate values)
2. Detect anomalies, for example in website logs so that we can see potential DoS attacks
3. Spot pattern to determine what generates series itself: analyse sound waves to spot words in them.

## Common patterns in time series
There are a number of very common patterns
1. Trend - when time series have a specific direction where they are moving to (Moore's Law)
2. Seasonality - patterns repeat at predictable intervals (active users on the website)
3. Combo of both Trend and Seasonality
4. Unpredictable - aka white noise, not much we can do here
5. Auto correlated - correlates with a delayed copy of itself, often called a "lag"

Often in real world data sets we have to deal with all types in one TS.

## Train, validation and test sets (to measure performance)
*Fixed partitioning* - splits dataset into Training, Validation and Test periods. In case data has seasonality, we have to make sure that every period has all of the seasons.
First train model on training period and validate it on validation period. Here we can experiment with the right architecture for training and hyper parameters until we get a desired performance.
After that we can retrain the model using train + validation data and test it using test period to see if models performs as well.
If it works, we take a usual step of retraining again including the test data.

*Roll-forward* partitioning gradually increases training period (e.g. 1 day/week at a time). At each iteration we train the model on a training period and we use it to forcast the following day or week of validation period.

## Metrics for evaluating performance
```jupyterpython
errors = forecasts - actual
mse = np.square(errors).mean() # Mean Squared Error - squares the error values to make sure positive and negative actual values are included
rmse = np.sqrt(mse) # to get the same scale as original values
mae = np.abs(errors).mean() # Mean Absolute Deviation
mape = np.abs(errors / x_valid).mean() # Mean Absolute Percentage Error. Mean ratio between an absolute error and absolute value. Gives an idea of the size of the error compare to the values
```

mae vs. mse: if large values are potentially dangerous and they cost much more than smaller errors than you'd prefer mse.
If gain or loss are proportional to the size of the error, then mae would be better.

Naive Forecast MAE:
```jupyter
keras.metrics.mean_absolute_error(x_valid naive_forecast).numpy()
```

## Moving Average and Differencing
MA - is a formal and simple forecasting method, it plots the average of the values over a fixed period of time (e.g. 30 days) also called *averaging window*.
Allows eliminating noise and gives a curve that roughly emulates original series, but does not anticipate trend or seasonality.
Can sometimes be worse than a naive forecast.

To avoid this we can remove a trend and seasonality from a time series with a technique called 'Differencing'.
So instead of studying a time series itself, we study the difference between a value of time T and a value of earlier period, e.g. Series(t) - Series(t-365)
Having calculated a MA on a differencing, we have to add back the value to get a forecast for the original value:

Forecasts = MA of differenced series + series(t-365)

There still might be quite a lot of noise. We can smooth past values by using moving average on it:

Forecasts = trailing MA of differenced series + centered MA of past series (t-365)

## Trailing vs. Centered Windows
Note that when we use the trailing window when computing the moving average of present values from t minus 32, t minus one.
But when we use a centered window to compute the moving average of past values from one year ago, that's t minus one year minus five days, to t minus one year plus five days.
Then moving averages using centered windows can be more accurate than using trailing windows.
But we can't use centered windows to smooth present values since we don't know future values.
However, to smooth past values we can afford to use centered windows.