# Introduction to Time Series Forecasting with Seasonality

In normal data analysis, the order of observations that are collected is usally irrelevant. However, most of the data collected in the real world has a certain order built into it. This structure can be exploited for better prediction ability by employing a Time Series Model. The first step in analyzing a time series model is identify and understand the built-in structure of the data over time. These underlying patterns can be classified into four distinct categories:
* Trend - Long term pattern in the data (easiest to identify).
* Seasonality - Short term patterns that occur within the overall time structure, and repeat indefinitely. 
* Cyclical Component - Long term oscillations (big waves) within the overall time structure.
* Noise(Error) - Random changes occuring in the overall time structure (unlikely to be repeated).

![image](./Time_Series_Example_Image.jpg)

The second step is to check the underlying assumptions:
* Stationarity - This means that the series are normally distributed and the mean and variance are constant over a long time period.
* Uncorrelated Random Error - We assume that the error term is randomly distributed and the mean and variance are constant over a time period.
* No outliers - This one is self-explanatory.
* Random shocks - If shocks are present, they are assumed to be randomly distributed with a mean of 0 and a constant variance.

## Setting up our Facebook Prophet Analysis (USA COVID-19 Data)

### Checking Directory
```python
!ls
!pwd
```

### Importing all the required packages
```python
import numpy as np
import pandas as pd
from fbprophet import Prophet
from fbprophet.plot import plot_plotly, plot_components_plotly
from fbprophet.diagnostics import cross_validation
from fbprophet.diagnostics import performance_metrics
from fbprophet.plot import plot_cross_validation_metric
```

### Processing the Raw Data
```python
stay_at_home_df = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
stay_at_home_df.head()
stay_at_home_df.columns
cleaned_stay_at_home_df = stay_at_home_df[['location','date','total_cases','total_tests','stringency_index','positive_rate','total_deaths']]
cleaned_stay_at_home_df
cleaned_stay_at_home_df['cases_per_tests_ratio'] = cleaned_stay_at_home_df['total_cases']/cleaned_stay_at_home_df['total_tests']
cleaned_stay_at_home_df
cleaned_stay_at_home_df.dtypes
groupby_location_df = cleaned_stay_at_home_df.groupby('location',as_index=False)
separated_location_df = dict(iter(groupby_location_df))
us_covid_df = separated_location_df['United States']
```

### Setting up for our Facebook Prophet Time Series Analysis
```python
def plot_metrics(forecast_cv,location, imgname):
    mse_plot = plot_cross_validation_metric(forecast_cv, metric = 'mse')
    mse_plot.suptitle(f'{location}_mse_{imgname}',y=0.95)
    mse_plot.savefig(f'./Seasonality_Comparison/{location}_mse_{imgname}')
                      
    rmse_plot =plot_cross_validation_metric(forecast_cv, metric = 'rmse')
    rmse_plot.suptitle(f'{location}_rmse_{imgname}',y=0.95)
    rmse_plot.savefig(f'./Seasonality_Comparison/{location}_rmse_{imgname}')
                      
    mae_plot = plot_cross_validation_metric(forecast_cv, metric = 'mae')
    mae_plot.suptitle(f'{location}_mae_{imgname}',y=0.95)
    mae_plot.savefig(f'./Seasonality_Comparison/{location}_mae_{imgname}')
                      
    mape_plot = plot_cross_validation_metric(forecast_cv, metric = 'mape')
    mae_plot.suptitle(f'{location}_mape_{imgname}',y=0.95)
    mae_plot.savefig(f'./Seasonality_Comparison/{location}_mape_{imgname}')
                      
                      
def plot_forecast(forecast,m,location,img_name):
    forecastplot = m.plot(forecast)
    forecastplot.suptitle(f'{location}_forecast_{img_name}',y=0.95)
    forecastplot.savefig(f'./Seasonality_Comparison/{location}_forecast_{img_name}')
    
    plot_components = m.plot_components(forecast)
    plot_components.suptitle(f'{location}_components_{img_name}',y=0.95)
    plot_components.savefig(f'./Seasonality_Comparison/{location}_components_{img_name}')
                             
                             

def prophet_predicts(location_df,location, yearly_seasonality, weekly_seasonality):
    print(location_df.columns)
    num_rows = str(location_df.shape[0] - 31) + ' days'
    prophet_df = location_df[['date','total_cases']]
    prophet_df = prophet_df.rename(columns={'date':'ds','total_cases': 'y'})
    m = Prophet(interval_width = 0.95, yearly_seasonality = yearly_seasonality, weekly_seasonality = weekly_seasonality)
    m.fit(prophet_df)
    future = m.make_future_dataframe(periods=30)
    forecast = m.predict(future)
    forecast_cv = cross_validation(m, initial = num_rows, period = '30 days', horizon = '30 days' ) # of rows - 31 days
    
    img_name = 'daily.png'
    if(yearly_seasonality):
        img_name = 'yearly_' + img_name
        
    if(weekly_seasonality):
        img_name = 'weekly_' + img_name
    
    plot_forecast(forecast,m, location, img_name)
    plot_metrics(forecast_cv, location, img_name)
```

### Results of our analysis
```python
prophet_predicts(us_covid_df, 'United States', True,  False)
prophet_predicts(us_covid_df, 'United States',  False,  False)
prophet_predicts(us_covid_df, 'United States', False,  True)
prophet_predicts(us_covid_df, 'United States',  True,  True)
```

In [7]:
!jupyter nbconvert Prophet_Covid_Analysis.ipynb --to html --output output.html

[NbConvertApp] Converting notebook Prophet_Covid_Analysis.ipynb to html
[NbConvertApp] Writing 292445 bytes to output.html
