# Timeseries Forecasting

## 1. Definition

* Predicting value over time.<br>
* Time is index variable in forecasting problems.<br>

## 2. Applications of Forecasting

* Weather forecasting<br>
* Stock market forecasting<br>
* Sales forecasting<br>
* Revenue forecasting<br>
* Loss forecasting<br>
* ATM cash management<br>
* HR resource planning<br>
* Shift management in call centre<br>
* GDP forecasting<br>
* Employment rate forecasting<br>

<b>Demand Forecasting Applications :</b>

* Inventory management<br>
* Supply chain management<br>
* HR resource planning<br>
* Shift management in call centre<br>
* Dynamic pricing<br>

## 3. Types od Data

* Cross Sectional Data : Object varies and time is fixed.<br>
* Time Series Data : Object varies and time is fixed. Time may be hour/week/month/quarter/year<br>
* Pooled/Hierarchial/Panel Data : Both object and time varies. It is collection of time series. It is Panel data<br>

## 4. Characteristics of Timeseries data

* Data should be indexed by time with uniform intervals(hourly/daily/weekly/monthly/yearly)<br>
* Data should not have missing values<br>
* Current data is function of past data<br>

## 5. Components of Time series / Patterns in Time series

### Trend

Linear Trend : Positive linear trend or negative linear trend. Increase or decrease over time.<br>
Non linear Trend : There is no positive or negative trend. Random trend.<br>

### Seasonality

* Depend on season there is spike in the sales and drop in the sales.<br>
* For a fixed time period there is hike in sales and for fixed perios of time there is drop in sales.<br>
* This also could be daily/weekly/monthly/quarterly/yearly

### Cyclicity

* It is usually more than a year.<br>
* Ex :- Worldcup match, Olympics etc.<br>
* It is similar to Seasonality but seasonality is upto a year. Cyclicity is more than a year.<br>
* Cycle with more than a year.<br>

### Irregular

* We cannot find any pattern.

--------------------------------------------------------------------------------------------------------------------------------
* Every time series have one or more than one components.<br>
* Most frequent patterns are Trend, Seasonality or Irregular. Cyclicity is very rare as it needs more data.<br>

## 6. Decomposition

Trend - Tt, Seasonality - St, Irregular - It<br>
Additive : Yt = Tt+St+It<br>
Multiplicative : Yt = Tt*St*It<br>

* It helps to check if these components(Tt, St, It)are present or not<br>
* Also contribution from each components<br>

## 7. EDA

* Understand data for peaks and slumps<br>
* Understand reasons for peaks and slumps<br>
* Analyzing time series means analuzing variations in the time series<br>

### Auto Correlation Function (ACF)

* Correlation with itself<br>
* Correlation with sales with lag1_sales/lag2_sales/lead1_sales/lead2_sales<br>
* Which gives insights of correlation of sales with lag or lead or diff sales etc.<br>
* Sales = f(MS, lag1, lag2, lead1, lead2, ....)<br>

### Partial Auto Correlation Function(PACF)

* If Apr_sales = f(Mar_sales, Feb_sales, Jan_sales),<br>
* If we want to capture impact of only march sales by removing impact of feb and jan sales

### Auto Regression

* It is self regression<br>
* Yt = f(Yt-1, Yt-2, ...)<br>
* Mathematical relationships between Yt and its lags<br>

### Stationary Time Series

* Over the period of time, the series should have constant mean and constant variance<br>
* Sin curve is good example for constant mean and variance<br>

<b>How to check for stationary<b>

* Looking at the line chart<br>
* Hypothesis testing. ADF Augmented Dicky Fuller Test / Unit root test.<br>
* H0 : Series is not stationary, H1 : Series is stationary<br>
* If p-value<0.05, we can reject null hypothesisand conclude thatseries is stationary<br>

<b>Why to check series is stationary or not</b>

* Some of the series models like ARMAhaving pre-requisite as stationary<br>

<b>How to convert non-stationary series into stationary series</b>

* Detrending : Removing trend from data<br>
* Deseasonal : Removing seasonal from data<br>
* Log() : Apply log transformation<br>
* Diff() : Differencing time series<br>

## 8. Time Series Modeling

### Univariate Time Series Modeling

Basic Techniques : <br>
* Averages(MA, WMA, CMA) - Moving Average, Weighted Moving Average, Central Moving Average<br>
<br>
Intermediate Techniques :<br>
* Decomposition - Additive/multiplicative<br>
* ETS Models(Exponential smoothing models/Holt winter methods) - single/double/triple<br>
<br>
Advanced Techniques :<br>
* ARIMA model - Auto Regressive Integrated Moving Average<br>
* AR, MA, ARMA, ARIMA, SARIMA - seasonal ARIMA model etc.<br>

### Multivariate Time Series Modeling

* Not only depends on past values but also on other variables pastvalues.<br>
* ARIMAX : ARIMA with X variables<br>
* SARIMAX : SARIMA with X variales<br>
* Regression(helpful for panel data) and Machine learning algorithms<br>

### Very Advanced Techniques

* ARCH/GARCH<br>
* Wavelets<br>
* Kalmanfilters<br>
* VAR<br>
<b>Deep Learning Models</b><br>
LSTM, RNN<br>

## 9. Packages for Time Series Analysis

* statsmodels.tsa<br>
* prophet(by faceboook)(good for univariate analysis)<br>
* stldecompose(helpful for decompose and analysis)<br>

## 10. Modeling

* Load data<br>
  -> pd.read_csv('tractor_sales.csv')<br>
* Check data if it is time series data<br>
  -> sales_data.info()<br>
* Convert variable to time series from object<br>
  -> pd.date_range(start='2003-01-01', freq='MS', periods=len(sales_data))
* Set time series variable as index<br>
  -> sales_data.set_index(date_range_val, inplace=True)<br>
* Plot and check if it is stationary<br>
* Line chart is preferred for time series data. We can use boxplot also for this.<br>
* Check contribution coming from trend, seasonality, irregular component with the help of decomposition<br>
  -> decompose = seasonal_decompose(ts, model='multiplicative', two_sided=False, extrapolate_trend=4)
  -> decompose.plot()

<b>Libraries for Decomposition :</b><br>
* statsmodels.decompose<br>
* fbprophet.decompose<br>
* stldecopose.decompose(Preferred library)<br>

<b>Note :</b> 
* For better model we need atleast 2 years of data<br>
* If Irregular component is close to 1 then major contribution by trend and seasonal components.<br>
* If Irregular component is not close to 1 then there is major contribution from this as well<br>
* Forecasting works well with decomposition if Irregular component(It) value is close to 1.<br>
* With the values of trend and seasonal we can predict future values, Irregular value will be error<br>
* The case study has explained 96% of data with trend and seasonal component values<br>
* If one component looks very strong then Additive will give good results than Multiplicative<br>
* In Additive decomposition irregular component is close to 0 and in multiplicative decomposition it is 1<br>

### Data Preparation

* Missing Values : Impute missing vlaues<br>
* Outliers : Handle outlier(mean+3std, mean-3std)<br>
* train and test split : Its not random. Recent data is test and past data is train<br>

### Exponential Smoothing

* Yt = f(Yt-1, et-1), Yt-1 -> past value, et -> past error, Yt = alpha*Yt-1+(1-alpha)*et-1<br>
* Single exponential smoothing : If time series doesn't have trend and seasonality; alpha<br>
* Double exponential smoothing : If time series have trend or seasonality;alpha, beta<br>
* Triple exponential smoothing : If time series have trend and seasonality; alpha, beta, gamma<br>
* from statsmodels.tsa.holtwinters import ExponentialSmoothing 