<a href="https://colab.research.google.com/github/jinhyung426/deeplearning.ai/blob/main/tf_chap4_Time_Series_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tensorflow

## Part 4. Time Series

## (1) Basics of Time Series

**1. Time Series**
  - ordered sequence of values that are usually equally spaced over time
  - anything that has a time factor

**2. Types of Time Series**

**1) Univariate Time Serie**s
  - Only one variable is varying over time
  
  ex) stock price

**2) Multivariate Time Series**
  - More than one variables are varying over time
  ex) birth vs death rate, co2 concentration vs global temperature
  ex) movement of a body (series of a univariate or a combined multivariate)

# Input Shape

**input_shape = (batch_size, timestep, dimension)**
 
 (1) dimension = 1  ----->  Univariate Time Series

 (2) dimension > 1  ----->  Multivariate Time Series

 (3) timestep => all timestep is possible
 
 (4) batch_size => don't have to specify it in keras

**3. Functions of Analyzing Time Series**

**1) Imputation**
  - **Using present data to forecast past data**
  - Filling holes and gaps (imputed values로 fill up)

**2) Detect Abnomalies**

- ex. detect hacker attacks

**3) Spot Patterns**

- ex. recognize words or subwords

**4. Common Patterns in Time Series**

**1) Trend**
  - Upwards facing trend
  - Downwards facing trend

**2) Seasonality**
  - patterns repeat every predictable intervals

  - ex. peek on weekends, down on weekdays

**3) Combination of Trend and Seasonality**
   
   - ex. overall upwards but local peaks

**4) Simple Random data with noise**
  - not much we can do

**5) Autocorrelated Time Series**
  - data that follows a predictable shape, even if the scale is different
  - it correlates with a delayed copy of itself, often called a lag
  - no trend, no seasonality but the entire series isn't random
  - ex. memory as steps are dependent on previous ones but spikes (innvoations) appear at random timesteps

**6) Multiple Autocorrelated Series**
  - decay can be interrupted with another pulse

**5. Real World Data & Machine Learning**

- **Real World** : Trend + Seasonality + Autocorrelation + Noise
- **Machine Learning** : spot patterns and make predictions (except noise) under the assumption that patterns in the past will continue in the future

**6. Stationary VS Non-Stationary Time Series**

**(1) Stationary Time Series**
  - behavior doesn't change over time 
  - more data results in better prediction

**(2) Non-Stationary Time Series**
  - behavior changes over time

  ex. Positive trend may change to negative trend due to unexpected events like financial crisis or tech innovations
  - optimal time window for training the model will vary

**7. Fixed Partitioning VS Roll Forward Partitioning**

**(1) Fixed Partitioning**
- each period contains entire seasonality

- [Step 1] Train with train data
- [Step 2] Evaluate with val data
- [Step 3] Retrain with train, val data
- [Step 4] Evaluate with test data
- [Step 5] Train with data including test data since test data is the most recent data and therefore the strongest signal

**(2) Roll Forward Partitioning**
- [Step 1] Start with short training period and gradually increase it
- [Step 2] Then, we use it to forecast the following day in the validation period
- [Step 3] Doing fixed partitioning a number of times, and continually refining model as such


**8. Metrics**

- errors = forecast - true

(1) **MSE** = np.sqaure(errors).mean()
  - use when big errors are not good
  - use when focusing on outliers

(2) **RMSE** = np.sqrt(mse)

(3) **MAE** = np.abs(errors).mean()

(4) **MAPE** = np.abs(errors / x_valid).mean()
  - ratio

(5) **Huber Loss**
  - doesn't focus on outliers

**9. Moving Average**

- average window : reduces noise

**10. Differencing & Smoothing both past and present values**
- remove trend and seasonality

- ex. **use t - (t-365) values then we add the value of t-365 value**
- remove noise in past -> use t - (t-365) values
- moving average1 + moving average2
- Trailing window / Centered Windows

**11. Creating Data**

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

dataset = tf.data.Dataset.range(10)
dataset = dataset.window(5, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window : window.batch(5))
dataset = dataset.map(lambda window: (window[:-1], window[-1:]))
dataset = dataset.shuffle(buffer_size=10) # to avoid sequence bias (to remove the preference on the first item)
dataset = dataset.batch(2).prefetch(1)

for x, y in dataset:
    print("x : ", x.numpy())
    print("y : ", y.numpy())

x :  [[0 1 2 3]
 [5 6 7 8]]
y :  [[4]
 [9]]
x :  [[4 5 6 7]
 [2 3 4 5]]
y :  [[8]
 [6]]
x :  [[1 2 3 4]
 [3 4 5 6]]
y :  [[5]
 [7]]
