### Завдання:

Датасет для роботи:

https://fred.stlouisfed.org/series/T10Y2Y

На вибір - SARIMA(ARIMA), Darts, Prophet

#### Forecasting instruments:

- ETS
- ARIMA
- SARIMA
- Prophet
- Exponentional smoothing


#### References:
- https://www.kaggle.com/code/freespirit08/time-series-for-beginners-with-arima
- https://www.kaggle.com/code/redwankarimsony/time-series-forecasting-with-arima
- https://www.kaggle.com/code/abhishekmamidi/time-series-analysis-arima-model

#### Purpose: 
The objective of this notebook is to have a simplified template to understand Time series forecasting with ARIMA model in python, acting more as a tutorial.

#### Objective:
Build a model to forecast the demand(passenger traffic) in Airplanes. The data is classified in date/time and the passengers travelling per month

#### ARIMA includes:
- AR(Auto Regressive) & MA(Moving Average). It has 3 hyperparameters - P(auto regressive lags),d(order of differentiation),Q(moving avg.) which respectively comes from the AR, I & MA components. The AR part is correlation between prev & current time periods. 
- To smooth out the noise, the MA part is used. The I part binds together the AR & MA parts.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline 
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 10, 6


from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA

## Ignore wornings
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('/kaggle/input/dataset-lesson11-forecasting/T10Y2Y.csv')
df.head(30)

Unnamed: 0,DATE,T10Y2Y
0,2018-09-26,0.23
1,2018-09-27,0.23
2,2018-09-28,0.24
3,2018-10-01,0.27
4,2018-10-02,0.23
5,2018-10-03,0.3
6,2018-10-04,0.32
7,2018-10-05,0.35
8,2018-10-08,.
9,2018-10-09,0.33


In [3]:
df.info()
print('----------------------------------------------------')
print(df.describe())
print('\n','isnul ----------------------------------------------------')
print(df.isnull().sum())
print('\n','isna ----------------------------------------------------')
print(df.isna().sum())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1305 entries, 0 to 1304
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   DATE    1305 non-null   object
 1   T10Y2Y  1305 non-null   object
dtypes: object(2)
memory usage: 20.5+ KB
----------------------------------------------------
              DATE T10Y2Y
count         1305   1305
unique        1305    250
top     2018-09-26      .
freq             1     55

 isnul ----------------------------------------------------
DATE      0
T10Y2Y    0
dtype: int64

 isna ----------------------------------------------------
DATE      0
T10Y2Y    0
dtype: int64


In [4]:
#convert 'T10Y2Y' column to integer
df['T10Y2Y'] = pd.to_numeric(df['T10Y2Y'], errors='coerce')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1305 entries, 0 to 1304
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   DATE    1305 non-null   object 
 1   T10Y2Y  1250 non-null   float64
dtypes: float64(1), object(1)
memory usage: 20.5+ KB


In [5]:
df.where(df.replace(to_replace=0, value=np.nan),
 other=(df.fillna(method='ffill') + df.fillna(method='bfill'))/2)

df.head(30)

TypeError: unsupported operand type(s) for /: 'str' and 'int'

In [None]:
#convert to datetime
df['DATE'] = pd.to_datetime(df['DATE'],infer_datetime_format=True) 
ds = df.set_index(['DATE'])
ds.head(30)

### check for stationarity of the data

In [None]:
## plot graph
fig, ax = plt.subplots(figsize=(15,6))
ds.plot(xlabel="Date", ylabel="percent", title="10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity", ax=ax);

#### Rolling Statistics

In [None]:
win=22 # selected based on the year information 

#Rolling statistics
rolmean = ds.rolling(window=win).mean() 
rolstd = ds.rolling(window=win).std()
# print(rolmean,rolstd)

#Plot rolling statistics
orig = plt.plot(ds, color='blue', label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label='Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)