### Time-Series:

#### Concept of Time-Series:

- Let's plot number of items sold on Amazon over time.
- Here $Y axis \to$ # items sold and $X axis \to$ time which could be weeks/months/years or even in the granular form like minutes/hours.
<img src='https://drive.google.com/uc?id=1cDRfNQ6JuN-ug-KItoTSycozZ0ZUTf_5'>
- The pattern could be repeating over time. It could hourly, weekly, seasonal, weather or yearly pattern.
<img src='https://drive.google.com/uc?id=1-Y9SDKDMzbK2RIk_Jsii-aEIYMg0Dp9K'>


#### Forecasting:

The most interesting problem in Time-Series is Forecasting.


##### Task:

- Given historical data $x_1, x_2, .. x_{t-1}, x_t$ of # items sold at Amazon till time ($t$), predict $x_{t+1}, \ x_{t+2},..$
<img src='https://drive.google.com/uc?id=1IZRPQbWYUaj0vQE9zMJuZ3kLuFOXARfC'>

So, if we are given historical hourly sales data for Amazon, can we predict how many sales will happen in the future.


***Question:*** How do we predict future data using historical data?

<img src='https://drive.google.com/uc?id=1iH-mXVEvUhOHguTQfoqZT2yCRxXcvQFb'>

##### Approaches:

###### Average of last 3 months:

- This may not work. For e.g. if we compute $x_{t+1}$ as last 3 month's average, then it will fall sharply.
<img src='https://drive.google.com/uc?id=15dmpPRI8ywn0VRT51j6M1xdpW6FBN0mt'>


###### Regression/Curve Fitting:

- This could be a valid approach.
- We can have features like: Weather, Holiday, time(hrs), Day of Week (DOW), Month of the year (MOY) and target as Sales per hour.
<img src='https://drive.google.com/uc?id=1ev9oKYwlh_GmTlqKtsKAcvvnBndUXNW6'>




***Flaw:***
- However, there is a flaw with this approach. It may not capture trend over years.
- For e.g. In a growing startup, sales on a Chilly Saturday in October during holiday season might be increasing for the same criteria over the years.
<img src='https://drive.google.com/uc?id=1iZSmjlNPGUV_M_m7YTVgE7RaAyZ_rukc'>

***Solution:***
- Create a feature $\to$ Year of Operation. Let say, the company was started in 2012. Then feature values will be: 0 $\to$ 2012, 1 $\to$ 2013, 2 $\to$ 2014 and so on.

###### Weighted Average:

- In weighted moving average, we take weighted average of last k observations.
<img src='https://drive.google.com/uc?id=1lrQRk6RWt7T0RzWcBVNjQQORz9h34xJ7'>

- $\hat{x}_{t+1}$ is computed as $\alpha_0x_t + \alpha_1x_{t_1} + \alpha_2x_{t_2} + ... + \alpha_kx_{t-k}$ divided by $\alpha_0 + \alpha_1 + \alpha_2 + .. + \alpha_k$.
- Mathematically, it can be written as:
$$\hat{x}_{t+1} = \frac{\overset{k}{\underset{i=0}{\Sigma}} \ \alpha_i * x_{t-i}}{\overset{k}{\underset{i=0}{\Sigma}} \alpha_i}$$
<img src='https://drive.google.com/uc?id=1gXZN13RzH5H-wSIVnHXXY05189MZDZD9'>
- Note, we are using observations in the window of $k$ and for each of them we give weights. Most recent observations have higher weights and lesser weights to older observations.
- So, $\alpha_0 > \alpha_1 > \alpha_2 > .. > \alpha_k$.
- Using data from $x_{t-k} ... x_t$ we can predic $\hat{x}_{t+1}$. Similarly, using $x_{t-k+1} ... \hat{x}_{t+1}$ we can predict $\hat{x}_{t+2}$.
- Thus, we can keep predicting into future.


- Weighted moving average (WMA) is a type of regression with manual weights.
<img src='https://drive.google.com/uc?id=1ioOsYvo6hO8NkXhlxz57RjGGlHacIHou'>
- We can also write WMA with computed weights i.e. we estimate these weights: $\alpha_i$.



#### Train/Test Split in Time-Series:

***Question:*** How do we split train/test data? Till now, we used random train-test split. Will this approach work in Time-Series as well?

***Answer:***

- No, random sampling will fail in Time-Series.
    <img src='https://drive.google.com/uc?id=1k8QmK7KTQiapezyOYKpdgTC-i4BJHqbf'>
- It might happen than training data (during random) split might contain future data and the test sample contains older data. So, training data has already seen the future and model may give excellent predictions that are too good to be true.

***Solution:***

- We sequentially break the data into Train, Validation and Test split.
    <img src='https://drive.google.com/uc?id=1T14QoZsKHEFlNjNxdBjWxaSeiIUxNn9O'>



#### Resampling:

##### Hourly $\to$ Daily:

Suppose we are given Hourly time-series data, then how do we convert it to daily time-series?

***Answer:***
- We sum-up the data in 24 hours window. So now 1 point is 1 day.
<img src='https://drive.google.com/uc?id=1z4CO-Lc_qxHya-VoZE1XaO4Z5q5i1-NU'>


##### Hourly $\to$ Minutely:

Suppose we are given Hourly time-series data, then how do we get a granular time-series?

***Answer:***
- Assuming a uniform spread, we divide hourly data by 60 and assign it to each minutes in an hour.
<img src='https://drive.google.com/uc?id=1Uwgo37_FfNYxW-xhLrFQTNkkf75tg-Fr'>


#### Missing values treatment:

Imagine we having some missing data. Then, how do we impute?

***Possible Approaches:***
- Drop values? We cannot drop missing values in time-series data.
- One option could be to calculate missing value from previous data point i.e. $x_t = x_{t-1}$.
- We could calculate the missing value as average of previous and next data point i.e.
$$x_t = \frac{x_{t-1} + x_{t+1}}{2}$$
<img src='https://drive.google.com/uc?id=1fLNlowuK8RMgnUStTXUJYYmvBbeA-bFU'>
- There is another option known as Centered Moving Average (CMA).
<img src='https://drive.google.com/uc?id=1_m71-rte8O09QlDxW7aRRugWTPygGx16'>
- This CMA is a generalization of looking back and forward for missing value imputation.
- Sometimes, people also use a concept called Interpolation.
<img src='https://drive.google.com/uc?id=11a-4m9xpdnG-Of2G8E7dmX0EN0ZsgW-O'>
- Imagine we have 2 observations $t-1$ and $t+1$ with $t^{th}$ observation missing. Then:
$$x_t = \frac{x_{t-1} + x_{t+1}}{2}$$
- This is could be thought of CMA with $m = 1$.

#### Outliers/Anamolies:

- In the below graph, you can see there are sudden spikes or clips in the graph for # items sold on hourly basis.
<img src='https://drive.google.com/uc?id=1WdFEij9hNQRjDFF66UINQ5AGXnGtiajV'>
- A sudden spike could be because of celebrities endorsing a product that leads to buying surge whereas a sudden clip could be due to servers that are down/website not working correctly. This results in loss of revenue.

So how to be find such sudden changes? Let's discuss some of the approaches:

##### Percentage Change:

- Between every $x_t$ and $x_{t+1}$, we get a percentage change as follows:
$$ \frac{x_t - x_{t-1}}{x_{t-1}} = \Delta_t$$
<img src='https://drive.google.com/uc?id=1phnlnQvud2FhRoa9xhLgHP0-UGa6N_iN'>
- For each $\Delta_1,\Delta_2,\Delta_3,...,\Delta_t$, we compute IQR.

##### Isolation Forest/ OneClass SVM/ DBSCAN:

- iForest/OneClass SVM/DBSCAN works on tabular data.
<img src='https://drive.google.com/uc?id=1BCe9bar5Ta9ncBDYtb9JkZnZKYTC-Kbt'>
- Note that Time-Series data is sequential data. Hence, the above approaches would not work here.

##### Regression:

- For every time $t$, we will predict $\hat{x}_{t+1}$ using Regression/WMA.
<img src='https://drive.google.com/uc?id=1PdAvdJh-4DYgGcQnF5sTlKZPCZeQQlGR'>
- Now we will have observed $x_{t+1}$ and predicted values $\hat{x}_{t+1}$.
<img src='https://drive.google.com/uc?id=1YgSCEKz6r5QCZgGsr1zzX9IqvkWZpJeu'>
- If ratio of ($x_{t+1}$ - $\hat{x}_{t+1}$) and $\hat{x}_{t+1}$ is very high or very low (a threshold is set for this), then we consider this point as outlier.

***Assumptions:***

- Historical data has no outliers.
- There is some pattern in the data.
- If we identify a point as outlier, then we don't use $x_{t+1}$ for future calculations but use $\hat{x}_{t+1}$ for further predictions.
- This works very good as long as we keep on removing historical outliers.


***Question:*** Is DL approach or classical ARIMA better for time series?

***Answer:***
- DL-based approach like LSTM /GRU's are much powerful in comparison to Classical Time-Series models like ARIMA.
<img src='https://drive.google.com/uc?id=1xyaOluZrJnYWadTCv_3kQU5qPkTpDcUj'>
- If the dataset is simple, ARIMA models are preferred as they are computationally less complex.