<a class="anchor" id="0"></a>
# COVID-19 in USA: EDA & Forecasting with holidays impact for confirmed cases. Prophet with holidays and pseudo-holidays - 11 parameters tuning:
* lower_window
* upper_window
* prior_scale
* mode
* changepoint_prior_scale
* weekly_fourier_order
* mode_seasonality_weekly
* weekly_seasonality_prior_scale
* triply_fourier_order (for period = 3 days)
* mode_seasonality_triply
* triply_seasonality_prior_scale

### NEW: 
* added a new anomalies dates - pseudo-holidays (from "COVID-19 Open Data"):
    - Weakening and strengthening of quarantine
    - Very comfortable conditions for rest
* added 9th parameter ("changepoint_prior_scale") tuning 
* added 10th parameter ("weekly seasonality prior_scale") tuning 
* added 11th parameter ("triply seasonality prior_scale") tuning 
* added a new plots
* restructuring of the notebook

# Acknowledgements

### Datasets:
- dataset [COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University](https://github.com/CSSEGISandData/COVID-19)
- my dataset with holidays data [COVID-19: Holidays of countries](https://www.kaggle.com/vbmokin/covid19-holidays-of-countries) - it is recommended to follow the updates
- dataset [COVID-19 Open Data](https://github.com/GoogleCloudPlatform/covid-19-open-data), including dataset [Oxford COVID-19 government response tracker](https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker) and dataset [NOAA](https://www.ncei.noaa.gov/)

### Notebooks:
- notebook [COVID-19 : Prophet forecast - next 2 weeks](https://www.kaggle.com/vbmokin/covid-19-prophet-forecast-next-2-weeks)
- notebook with the code to read the data [COVID-19: current situation on August](https://www.kaggle.com/corochann/covid-19-current-situation-on-august)
- notebook [COVID-19 Novel Coronavirus EDA & Forecasting Cases](https://www.kaggle.com/khoongweihao/covid-19-novel-coronavirus-eda-forecasting-cases) from [@Wei Hao Khoong](https://www.kaggle.com/khoongweihao)
- plot from notebook [COVID-19 India Analysis](https://www.kaggle.com/n1sarg/covid-19-india-analysis)

### Libraries from GitHub:
- https://facebook.github.io/prophet/
- https://facebook.github.io/prophet/docs/
- https://github.com/facebook/prophet
- https://github.com/dr-prodigy/python-holidays

There are many studies in the field of coronavirus forecasting. Many researchers use **Prophet** (from Facebook). But for some reason, no one takes into account the holidays impact. After all, despite all the prohibitions, it is difficult for people to stay at home and they still somehow celebrate the **holidays** to which they are accustomed. The desire to celebrate is especially strong when people are sitting at home all the time looking for something to do. In my opinion, the impact of the holidays is manifested in the fact that within 4-10 days after these holidays there may be a jump in the number of confirmed cases, due to the fact that people went shopping, and even visiting each other, perhaps even in violation of quarantine requirements. 

The Prophet uses the library [holidays](https://github.com/dr-prodigy/python-holidays) with information about the main holidays of 67 countries, but and its package has some disadvantages. That's why I created a more perfect own dataset and plan to update it periodically. Now **my dataset [COVID-19: Holidays of countries](https://www.kaggle.com/vbmokin/covid19-holidays-of-countries) has holidays for 70 countries** and more adapted for use in the prediction of coronavirus diseases.

Holidays and pseudo-holidays (**anomalies dates**) are defined in three ways:
- dates of official public holidays;
- the weakening of quarantine according to open data;
- dates of very comfortable conditions for rest (more 95% quantile on average temperature and no rainfall) - for each country it should be adapted individually (open data NOAA are used)

The model is **tuning in two stages** - makes a complete search of values from 4 possible for each feature at first for one part of parameters, then - for another. In the second stage, the optimal parameters determined in the first stage are used. Each stage ends with an interactive graph (library "plotly"), which clearly shows the WAPE for each combination of parameters.

The Prophet model with all optimized parameters and holidays is used for **forecasting** future data for the next days and visualization of forecasting results. The data is taken from [COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University](https://github.com/CSSEGISandData/COVID-19) (usually this dataset are updated there daily and are available as of yesterday), so the next days are counted from the date of the last committee of this notebook.

<a class="anchor" id="0.1"></a>
## Table of Contents

1. [Import libraries](#1)
1. [Download data](#2)
1. [Selection data with holidays](#3)
    - [Holidays with a shift](#3.1)
    - [Additional dates of anomalies as holidays](#3.2)    
        - [The weakening of quarantine](#3.2.1)
        - [Very comfortable conditions for rest (not yet taken into account - needs clarification)](#3.2.2)
    - [Removing the holidays for the period when there were still diseases < 10](#3.3)
1. [EDA](#4)
    - [Plots - Confirmed cases over time](#4.1)
    - [Statistics](#4.2)
    - [Set initial values for tuning](#4.3)
1. [Tuning Prophet model and holidays parameters](#5)
    - [Stage 1 - Tuning holiday parameters](#5.1)
        - [Model training, forecasting and evaluation](#5.1.1)
        - [Results visualization](#5.1.2)
    - [Stage 2 - Tuning seasonality parameters](#5.2)
        - [Model training, forecasting and evaluation](#5.2.1)
        - [Results visualization](#5.2.2)
    - [Results of all tuning](#5.3)
1. [Prediction](#6)

In [None]:
country_main = 'US'

## 1. Import libraries<a class="anchor" id="1"></a>

[Back to Table of Contents](#0.1)

Import libraries

In [None]:
import os
import pandas as pd
import numpy as np
import requests
import seaborn as sns
from matplotlib import pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

from datetime import date, timedelta, datetime
from fbprophet import Prophet
from fbprophet.make_holidays import make_holidays_df
from fbprophet.diagnostics import cross_validation, performance_metrics
from fbprophet.plot import plot_cross_validation_metric
import holidays
from collections import Counter
import pycountry

import warnings
warnings.simplefilter('ignore')

## 2. Download data<a class="anchor" id="2"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Thanks https://github.com/CSSEGISandData/COVID-19
myfile = requests.get('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
open('data', 'wb').write(myfile.content)
confirmed_global_df = pd.read_csv('data')
confirmed_global_df

In [None]:
confirmed_global_df = confirmed_global_df[confirmed_global_df['Country/Region'] == country_main].reset_index(drop=True)
confirmed_global_df

In [None]:
# Thanks to https://www.kaggle.com/corochann/covid-19-current-situation-on-august
def _convert_date_str(df):
    try:
        df.columns = list(df.columns[:4]) + [datetime.strptime(d, "%m/%d/%y").date().strftime("%Y-%m-%d") for d in df.columns[4:]]
    except:
        print('_convert_date_str failed with %y, try %Y')
        df.columns = list(df.columns[:4]) + [datetime.strptime(d, "%m/%d/%Y").date().strftime("%Y-%m-%d") for d in df.columns[4:]]

_convert_date_str(confirmed_global_df)
confirmed_global_df

In [None]:
# Thanks to https://www.kaggle.com/corochann/covid-19-current-situation-on-august
df = confirmed_global_df.melt(
    id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], value_vars=confirmed_global_df.columns[4:], var_name='Date', value_name='ConfirmedCases')
df2 = df.groupby(["Date", "Country/Region"])[['Date', 'Country/Region', 'ConfirmedCases']].sum().reset_index()
df2.columns = ['Date', 'Country', 'Confirmed']

In [None]:
df2

In [None]:
df2['Confirmed'] = df2['Confirmed'].diff()
df2.loc[0,'Confirmed'] = 0
df2

In [None]:
latest_date = df2['Date'].max()
latest_date

## 3. Selection data with holidays<a class="anchor" id="3"></a>

[Back to Table of Contents](#0.1)

## 3.1. Holidays with a shift<a class="anchor" id="3.1"></a>

[Back to Table of Contents](#0.1)

### Thank to dataset [COVID-19: Holidays of countries](https://www.kaggle.com/vbmokin/covid19-holidays-of-countries)

In [None]:
# Thanks to dataset https://www.kaggle.com/vbmokin/covid19-holidays-of-countries
holidays_df = pd.read_csv('../input/covid19-holidays-of-countries/holidays_df_of_70_countries_for_covid_19_2021.csv')
holidays_df[holidays_df['country'] == country_main]

In [None]:
holidays_df_code_countries = holidays_df['code'].unique()
holidays_df_code_countries

In [None]:
# From notebook https://www.kaggle.com/vbmokin/covid-19-prophet-forecast-next-2-weeks
def dict_code_countries_with_holidays(list_name_countries: list,
                                      holidays_df: pd.DataFrame()):
        
    """
    Defines a dictionary with the names of user countries and their two-letter codes (ISO 3166) 
    in the dataset "COVID-19: Holidays of countries" 
    
    Returns: 
    - countries: dictionary with the names of user countries and their two-letter codes (ISO 3166) 
    - holidays_df_identificated: DataFrame with holidays data for countries from dictionary 'countries'
    
    Args: 
    - list_name_countries: list of the name of countries (name or common_name or official_name or alha2 or alpha3 codes from ISO 3166)
    - holidays_df: DataFrame with holidays "COVID-19: Holidays of countries"
    """
    
    import pycountry
    
    # Identification of countries for which there are names according to ISO
    countries = {}
    dataset_all_countries = list(holidays_df['code'].unique())
    list_name_countries_identificated = []
    list_name_countries_not_identificated = []
    for country in list_name_countries:
        try: 
            country_id = pycountry.countries.get(alpha_2=country)
            if country_id.alpha_2 in dataset_all_countries:
                countries[country] = country_id.alpha_2
        except AttributeError:
            try: 
                country_id = pycountry.countries.get(name=country)
                if country_id.alpha_2 in dataset_all_countries:
                    countries[country] = country_id.alpha_2
            except AttributeError:
                try: 
                    country_id = pycountry.countries.get(official_name=country)
                    if country_id.alpha_2 in dataset_all_countries:
                        countries[country] = country_id.alpha_2
                except AttributeError:
                    try: 
                        country_id = pycountry.countries.get(common_name=country)
                        if country_id.alpha_2 in dataset_all_countries:
                            countries[country] = country_id.alpha_2
                    except AttributeError:
                        try: 
                            country_id = pycountry.countries.get(alpha_3=country)
                            if country_id.alpha_2 in dataset_all_countries:
                                countries[country] = country_id.alpha_2
                        except AttributeError:
                            list_name_countries_not_identificated.append(country)
    holidays_df_identificated = holidays_df[holidays_df['code'].isin(countries.values())]
    
    print(f'Thus, the dataset has holidays in {len(countries)} countries from your list with {len(list_name_countries)} countries')
#     if len(countries) == len(dataset_all_countries):
#         print('All available in this dataset holiday data is used')
#     else:
#         print("Holidays are available in the dataset for such countries (if there are countries from your list, then it's recommended making changes to the list)")
#         print(np.array(holidays_df[~holidays_df['code'].isin(countries.values())].country_official_name.unique()))
        
    return countries, holidays_df_identificated.reset_index(drop=True)

In [None]:
countries_dict, holidays_df = dict_code_countries_with_holidays([country_main],holidays_df)
countries_dict

In [None]:
# From https://www.kaggle.com/vbmokin/covid-19-prophet-forecast-next-2-weeks
def adaption_df_to_holidays_df_for_prophet(df, col, countries_dict):
    # Adaptation the dataframe df (by column=col) to holidays_df by list of countries in dictionary countries_dict
    
    # Filter df for countries which there are in the dataset with holidays
    df = df[df[col].isin(list(countries_dict.keys()))].reset_index(drop=True)
    
    # Add alpha_2 (code from ISO 3166) for each country
    df['iso_alpha'] = None
    for key, value in countries_dict.items():
        df.loc[df[col] == key, 'iso_alpha'] = value    
    
    return df

In [None]:
df2 = adaption_df_to_holidays_df_for_prophet(df2, 'Country', countries_dict)
df2.columns = ['Date', 'Country', 'Confirmed', 'iso_alpha']
df2

In [None]:
country_iso_alpha = df2.loc[0,'iso_alpha']
country_iso_alpha

## 3.2. Additional dates of anomalies as holidays<a class="anchor" id="3.2"></a>

[Back to Table of Contents](#0.1)

**Thanks to [COVID-19 Open Data](https://github.com/GoogleCloudPlatform/covid-19-open-data)**

In [None]:
# Thank to https://github.com/GoogleCloudPlatform/covid-19-open-data
data = pd.read_csv(f"https://storage.googleapis.com/covid19-open-data/v2/{country_iso_alpha}/main.csv")

In [None]:
def aux_holidays_df_generator(holidays_df, dates_list, name, source):
    # Add dates from dates_list with anomalies of various kinds to the holiday dataset holidays_df
    # name - the name of the anomaly
    # source - the source of the primary information used for processing
    
    last_row = len(holidays_df)
    holidays_dates = holidays_df['ds_holidays'].tolist()
    common_dates = list(set(holidays_dates).intersection(set(dates_list)))
    dates_list = list(set(dates_list).difference(set(common_dates)))
    for i in range(len(dates_list)):
        holidays_df = holidays_df.append([holidays_df.loc[last_row-1,:]], ignore_index=True)
        ds_dt = datetime.strptime(dates_list[i], '%Y-%m-%d')
        holidays_df.loc[last_row+i, 'ds_holidays'] = dates_list[i]
        holidays_df.loc[last_row+i, 'holiday'] = name
        holidays_df.loc[last_row+i, 'ds'] = (ds_dt + timedelta(days=7)).strftime('%Y-%m-%d')
        holidays_df.loc[last_row+i, 'source'] = source
        
    return holidays_df.sort_values(by=['ds_holidays'])

In [None]:
# Thanks to https://www.kaggle.com/vbmokin/covid-19-in-ukraine-prophet-holidays-tuning
def plot_with_anomalies(df, cols_y_list, cols_y_list_name, dates_x, col_anomalies, val_anomal, log_y=False):
    # Draws a plot with title - the features cols_y_list (y) and dates_x (x) from the dataframe df
    # and with vertical lines in the date with col_anomalies == 1 
    # with the length between the minimum and maximum of feature cols_y_list[0]
    # with log_y = False or True
    # cols_y_list - dictionary of the names of cols from cols_y_list (keys - name of feature, value - it's name for the plot legend), 
    # name of cols_y_list[0] is the title of the all plot
    
    fig = px.line(df, x=dates_x, y=cols_y_list[0], title=cols_y_list_name[cols_y_list[0]], log_y=log_y, template='gridon',width=1000, height=800)
    for i in range(len(cols_y_list)-1):
        fig.add_trace(go.Scatter(x=df[dates_x], y=df[cols_y_list[i+1]], mode='lines', name=cols_y_list_name[cols_y_list[i+1]]))
    
    anomal_dates_list = df[df[col_anomalies] == val_anomal][dates_x].tolist()
    y_max = df[cols_y_list[0]].max()
    y_min = min(df[cols_y_list[0]].min(),0)
    for i in range(len(anomal_dates_list)):
        anomal_date = anomal_dates_list[i]
        fig.add_shape(dict(type="line", x0=anomal_date, y0=y_min, x1=anomal_date, y1=y_max, line=dict(color="red", width=1)))
    fig.show()

### 3.2.1. The weakening of quarantine<a class="anchor" id="3.2.1"></a>

[Back to Table of Contents](#0.1)

#### Thanks to:
* [COVID-19 Open Data](https://github.com/GoogleCloudPlatform/covid-19-open-data)
* [Oxford COVID-19 government response tracker](https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker)

In [None]:
# Thanks to https://www.kaggle.com/vbmokin/covid-19-in-ukraine-prophet-holidays-tuning
data['stringency_index_jump'] = 0
for i in range(len(data)-1):
    if (data.loc[i+1,'stringency_index'] is not None) and (data.loc[i,'stringency_index'] is not None) and (data.loc[i+1,'stringency_index'] < data.loc[i,'stringency_index']):
        data.loc[i+1, 'stringency_index_jump'] = 1
source_gov = 'https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker'
dates_gov_list = data[data['stringency_index_jump'] == 1]['date'].tolist()
holidays_df = aux_holidays_df_generator(holidays_df, dates_gov_list, 'the weakening of quarantine', source_gov)
plot_with_anomalies(data, ["stringency_index"], {"stringency_index" : "Stringency index and dates of the weakening of quarantine in " + country_main}, 'date', 'stringency_index_jump', 1)

### 3.2.2. Very comfortable conditions for rest (not yet taken into account - needs clarification) <a class="anchor" id="3.2.2"></a>

[Back to Table of Contents](#0.1)

#### Thanks to:
* [COVID-19 Open Data](https://github.com/GoogleCloudPlatform/covid-19-open-data)
* [NOAA](https://www.ncei.noaa.gov/)

In [None]:
# Thanks to https://www.kaggle.com/vbmokin/covid-19-in-ukraine-prophet-holidays-tuning
data['rest_comfort'] = 0
data.loc[(data['average_temperature'] > data['average_temperature'].quantile(.95)) & (data['rainfall'] == 0), 'rest_comfort'] = 1
dates_weather_list = data[data['rest_comfort'] == 1]['date'].tolist()
#holidays_df = aux_holidays_df_generator(holidays_df, dates_weather_list, 'Very comfortable conditions for rest', 'https://www.ncei.noaa.gov/')
plot_with_anomalies(data, ["average_temperature", "rainfall"], {"average_temperature" : "Average temperature over time in " + country_main, "rainfall" : "rainfall"}, 'date', 'rest_comfort', 1)

## 3.3. Removing the holidays for the period when there were still diseases < 10<a class="anchor" id="3.3"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Removing the holidays for the period when there were still diseases < 10
holidays_df['ds_dt'] = pd.to_datetime(holidays_df['ds'], format='%Y-%m-%d', errors='ignore')
date_the_first_many_cases = datetime.strptime(df2[df2.Confirmed >= 10].bfill(axis=1)['Date'].tolist()[0], '%Y-%m-%d')
holidays_df = holidays_df[holidays_df['ds_dt'] >= date_the_first_many_cases]
holidays_df

## 4. EDA<a class="anchor" id="4"></a>

[Back to Table of Contents](#0.1)

## 4.1. Plots - Confirmed cases over time<a class="anchor" id="4.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
fig = px.line(df2, x="Date", y="Confirmed", 
              title="Confirmed cases in " + country_main, 
              log_y=False,template='gridon',width=1000, height=600)
fig.show()

In [None]:
fig = px.line(df2, x="Date", y="Confirmed", 
              title="Confirmed cases (logarithmic scale) in " + country_main, 
              log_y=True,template='gridon',width=1000, height=600)
fig.show()

In [None]:
df2['holiday'] = 0
holidays_df_dates = holidays_df['ds'].tolist()
df2.loc[df2['Date'].isin(holidays_df_dates), 'holiday'] = 1
plot_with_anomalies(df2, ["Confirmed"], {"Confirmed" : "Confirmed cases and holidays data in " + country_main}, 'Date', 'holiday', 1)
df2 = df2.drop(columns=['holiday'])

## 4.2. Statistics<a class="anchor" id="4.2"></a>

[Back to Table of Contents](#0.1)

## Describe statistics

In [None]:
df2.describe()

## Earliest Cases

In [None]:
df2.head()

## Latest Cases

In [None]:
df2.tail()

## 4.3. Set initial values for tuning<a class="anchor" id="4.3"></a>

[Back to Table of Contents](#0.1)

In [None]:
# For stage 1 of tuning
changepoint_prior_scale_initial_level = 0.7
add_season_reg_coef = 2
lower_window_list = [-1, -2, -3, -4] # must be exactly 4 values (identical allowed)
upper_window_list = [1, 2, 3, 4] # must be exactly 4 values (identical allowed)
prior_scale_list = [0.05, 1, 5, 20] # must be exactly 4 values (identical allowed)

# For stage 2 of tuning
changepoint_prior_scale_list = [0.1, 0.2, 0.3, 0.5] # must be exactly 4 values (identical allowed)
weekly_fourier_order_list = [6, 8, 10, 12] # must be exactly 4 values (identical allowed), 
triply_fourier_order_list = [0, 1, 2, 3] # must be exactly 4 values (identical allowed)
# 0 in fourier_order lists means the absence of this component

# Check length of lists
if (len(lower_window_list) != 4) or (len(upper_window_list) != 4) or \
   (len(prior_scale_list) != 4) or (len(weekly_fourier_order_list) != 4) or (len(triply_fourier_order_list) != 4):
    print('Number of data is wrong!')

In [None]:
df2 = df2.drop(columns = ['Country', 'iso_alpha'])
df2.columns = ['ds','y']
df2

In [None]:
days_to_forecast = 7 # in future (after training data)
days_to_forecast_for_evalution = 7 # on the latest training data - for model training
first_forecasted_date = sorted(list(set(df2['ds'].values)))[-days_to_forecast_for_evalution]

print('The first date to perform forecasts for evaluation is: ' + str(first_forecasted_date))
print('The end date to perform forecasts in future for is: ' + (datetime.strptime(df2['ds'].max(), "%Y-%m-%d")+timedelta(days = days_to_forecast)).strftime("%Y-%m-%d"))

## 5. Tuning Prophet model and holidays parameters<a class="anchor" id="5"></a>

[Back to Table of Contents](#0.1)

In [None]:
def convert10_base4(n):
    # convert decimal to base 4
    alphabet = "0123"
    if n < 4:
        return alphabet[n]
    else:
        return (convert10_base4(n // 4) + alphabet[n % 4]).format('4f')

## 5.1. Stage 1 - Tuning holiday parameters<a class="anchor" id="5.1"></a>

[Back to Table of Contents](#0.1)

## 5.1.1. Model training, forecasting and evaluation<a class="anchor" id="5.1.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Thanks to https://www.kaggle.com/vbmokin/covid-19-in-ukraine-prophet-holidays-tuning
def make_forecasts(country_df, holidays_df, days_to_forecast, days_to_forecast_for_evalution, first_forecasted_date):
    
    def eval_error(forecast_df, country_df_val, first_forecasted_date, title):
        # Evaluate forecasts with validation set val_df and calculaction and printing with title the relative error
        forecast_df[forecast_df['yhat'] < 0]['yhat'] = 0
        result_df = forecast_df[(forecast_df['ds'] >= pd.to_datetime(first_forecasted_date))]
        result_val_df = result_df.merge(country_df_val, on=['ds'])
        result_val_df['rel_diff'] = (result_val_df['y'] - result_val_df['yhat'].round()).abs()
        relative_error = sum(result_val_df['rel_diff'].values)*100/result_val_df['y'].sum()
        return relative_error
    
    def model_training_forecasting(df, forecast_days, holidays_df=None, mode_main='multiplicative'):
        # Prophet model training and forecasting
        
        model = Prophet(daily_seasonality=False, weekly_seasonality=False, yearly_seasonality=False, 
                        holidays=holidays_df, changepoint_range=1, changepoint_prior_scale = changepoint_prior_scale_initial_level,
                        seasonality_mode = mode_main)
        model.add_seasonality(name='weekly', period=7, fourier_order=8, mode = 'multiplicative', 
                              prior_scale = changepoint_prior_scale_initial_level)
        model.fit(df)
        future = model.make_future_dataframe(periods=forecast_days)
        forecast = model.predict(future)
        forecast[forecast['yhat'] < 0]['yhat'] = 0
        return model, forecast

    
    cols_w = ['ds', 'trend', 'yhat', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper', 'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
              'multiplicative_terms','multiplicative_terms_lower', 'multiplicative_terms_upper', 'weekly', 'weekly_lower', 'weekly_upper']
    cols_h = ['ds', 'trend', 'yhat', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper', 'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
              'holidays', 'holidays_lower', 'holidays_upper', 'multiplicative_terms','multiplicative_terms_lower', 'multiplicative_terms_upper', 'weekly',
              'weekly_lower', 'weekly_upper']
    mode_main_list = ['additive', 'multiplicative']
    relative_errors_holidays = []
    counter = 0
    results = pd.DataFrame(columns=['Conf_real', 'Conf_pred', 'Conf_pred_h', 'mode', 'n_h', 'err', 'err_h', 'prior_scale', 'how_less, %'])
    
    country_holidays_df = holidays_df[holidays_df['code'] == country_iso_alpha][['ds', 'holiday', 'lower_window', 'upper_window', 'prior_scale']].reset_index(drop=True)
    country_dfs = []            

    # Data preparation for forecast with Prophet
    country_df['ds'] = pd.to_datetime(country_df['ds'])

    # Set training and validation datasets
    country_df_future = country_df.copy()
    country_df_val = country_df[(country_df['ds'] >= pd.to_datetime(first_forecasted_date))].copy()
    country_df = country_df[(country_df['ds'] < pd.to_datetime(first_forecasted_date))]

    n = 64 # number of combination of parameters lower_window / upper_window / prior_scale
    for k in range(2):
        # 'additive' and 'multiplicative' mode tuning
        # Without holidays
        # Model training and forecasting without holidays
        model, forecast = model_training_forecasting(country_df, days_to_forecast_for_evalution, mode_main=mode_main_list[k])
        #fig = model.plot_components(forecast)

        # Evaluate forecasts with validation set val_df and calculaction and printing the relative error
        forecast_df = forecast[['ds', 'yhat']].copy()
        relative_error = eval_error(forecast_df, country_df_val, first_forecasted_date, 'without holidays')

        # With holidays
        # Model training with tuning prior_scale and forecasting
        for i in range(n):
            parameters_iter = convert10_base4(i).zfill(3)
            lower_window_i = lower_window_list[int(parameters_iter[0])]
            upper_window_i = upper_window_list[int(parameters_iter[1])]
            prior_scale_i = prior_scale_list[int(parameters_iter[2])]
            country_holidays_df['lower_window'] = lower_window_i
            country_holidays_df['upper_window'] = upper_window_i
            country_holidays_df['prior_scale'] = prior_scale_i
            number_holidays = len(country_holidays_df[(country_holidays_df['ds'] > '2020-01-21') & (country_holidays_df['ds'] < '2021-10-01')])
            model_holidays, forecast_holidays = model_training_forecasting(country_df, days_to_forecast_for_evalution, country_holidays_df, 
                                                                           mode_main=mode_main_list[k])

            # Evaluate forecasts with validation set val_df and calculaction and printing the relative error
            forecast_holidays_df = forecast_holidays[['ds', 'yhat']].copy()
            relative_error_holidays = eval_error(forecast_holidays_df, country_df_val, first_forecasted_date, 'with holidays impact')

            # Save results
            if (k == 0) and (i == 0):
                relative_error_holidays_min = relative_error_holidays
                forecast_holidays_df_best = forecast_holidays[cols_h]
                model_holidays_best = model_holidays
                lower_window_best = lower_window_i
                upper_window_best = upper_window_i
                prior_scale_best = prior_scale_i
                mode_best = mode_main_list[k]

            elif (relative_error_holidays < relative_error_holidays_min):
                relative_error_holidays_min = relative_error_holidays
                forecast_holidays_df_best = forecast_holidays[cols_h]
                model_holidays_best = model_holidays
                lower_window_best = lower_window_i
                upper_window_best = upper_window_i
                prior_scale_best = prior_scale_i
                mode_best = mode_main_list[k]

            # Save results to dataframe with result for the last date
            confirmed_real_last = country_df_val.tail(1)['y'].values[0].astype('int')
            results.loc[i+n*k,'Conf_real'] = confirmed_real_last if confirmed_real_last > 0 else 0
            confirmed_pred_last = round(forecast_df.tail(1)['yhat'].values[0]).astype('int')
            results.loc[i+n*k,'Conf_pred'] = confirmed_pred_last if confirmed_pred_last > 0 else 0
            confirmed_pred_holidays_last = round(forecast_holidays_df_best.tail(1)['yhat'].values[0],0).astype('int')
            results.loc[i+n*k,'Conf_pred_h'] = confirmed_pred_holidays_last if confirmed_pred_holidays_last > 0 else 0
            results.loc[i+n*k,'mode'] = mode_main_list[k]
            results.loc[i+n*k,'n_h'] = number_holidays
            results.loc[i+n*k,'err'] = relative_error
            results.loc[i+n*k,'err_h'] = relative_error_holidays
            results.loc[i+n*k,'lower_window'] = lower_window_i
            results.loc[i+n*k,'upper_window'] = upper_window_i
            results.loc[i+n*k,'prior_scale'] = prior_scale_i
            results.loc[i+n*k,'how_less, %'] = round((relative_error-relative_error_holidays)*100/relative_error,1)

            print('i =',i+n*k,' from',2*n-1,':  lower_window =', lower_window_i, 'upper_window =',upper_window_i, 'prior_scale =', prior_scale_i)
            print('relative_error_holidays =',relative_error_holidays, 'relative_error_holidays_min =',relative_error_holidays_min, '\n')

        # Results visualization
        print('Seasonality mode is', mode_main_list[k])
        print('The best errors of model with holidays is', relative_error_holidays_min, 'with lower_window =', str(lower_window_best),
              ' upper_window =', str(upper_window_best), ' prior_scale =', str(prior_scale_best))
        print('The error of model without holidays is', relative_error, '\n')

    # Save results to dataframe with all dates
    forecast_holidays_df_best['country'] = country_main
    forecast_holidays_df_best.rename(columns={'yhat':'confirmed'}, inplace=True)
    forecast_holidays_dfs = forecast_holidays_df_best.tail(days_to_forecast_for_evalution)

    # Forecasting the future
    if relative_error < relative_error_holidays_min:
        # The forecast without taking into account the holidays is the best
        model_future_best, forecast_future_best = model_training_forecasting(country_df_future, days_to_forecast, mode_main=mode_best)
        forecast_plot = model_future_best.plot(forecast_future_best, ylabel='Confirmed in '+ country_main + ' (forecasting without holidays) - ' + mode_main_list[k])
        cols = cols_w
        print('The best model is model without holidays')
    else:
        # The forecast taking into account the holidays is the best
        print('The best model is model with holidays')
        model_future_best, forecast_future_best = model_training_forecasting(country_df_future, days_to_forecast, holidays_df,
                                                                             mode_main=mode_best)
        forecast_plot = model_future_best.plot(forecast_future_best, ylabel='Confirmed in '+ country_main + ' (forecasting with holidays) - ' + mode_best)
        cols = cols_h
    # Save forecasting results 
    forecast_future_df_best = forecast_future_best[cols]
    forecast_future_df_best['country'] = country_main
    forecast_future_df_best.rename(columns={'yhat':'confirmed'}, inplace=True)    
    forecast_future_dfs = forecast_future_df_best.tail(days_to_forecast)
    fig = model_future_best.plot_components(forecast_future_best)
    return forecast_holidays_dfs, relative_errors_holidays, forecast_future_dfs, results

In [None]:
%%time
forecast_holidays_dfs, relative_errors_holidays, forecast_future_dfs, results = make_forecasts(df2, holidays_df, 
                                                                                               days_to_forecast, days_to_forecast_for_evalution, first_forecasted_date)

In [None]:
forecast_future_dfs.head(3)

In [None]:
forecast_holidays_dfs.head(3)

In [None]:
forecast_holidays_dfs.to_csv('forecast_holidays_dfs.csv', index=False)
forecast_future_dfs.to_csv('forecast_future_dfs.csv', index=False)
results.to_csv('results.csv', index=False)

## 5.1.2. Results visualization<a class="anchor" id="5.1.2"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Visualization or results
print(f'5D plot of Prophet model parameters and COVID-19 error of forecasting to {str(days_to_forecast_for_evalution)} days')

In [None]:
# Determination of the best parameters
results['err_h'] = results['err_h'].astype('float')
results['lower_window'] = results['lower_window'].astype('int')
results['upper_window'] = results['upper_window'].astype('int')
results_a = results[results['mode'] == 'additive']
results_m = results[results['mode'] == 'multiplicative']

In [None]:
# Interactive plot with results of parameters tuning
fig = px.scatter_3d(results_a, x='lower_window', y='upper_window', z='err_h',
                     color='prior_scale', color_discrete_sequence= px.colors.sequential.Plasma_r, opacity=1,
                    title='Interactive plot with results of parameters tuning for additive mode')
fig.update(layout=dict(title=dict(x=0.5)))

In [None]:
# Interactive plot with results of parameters tuning
fig = px.scatter_3d(results_m, x='lower_window', y='upper_window', z='err_h',
                     color='prior_scale', color_discrete_sequence= px.colors.sequential.Plasma_r, opacity=1,
                    title='Interactive plot with results of parameters tuning for multiplicative mode')
fig.update(layout=dict(title=dict(x=0.5)))

In [None]:
display(results_a.nsmallest(5, 'err_h'))
display(results_m.nsmallest(5, 'err_h'))

In [None]:
# The smallest WAPE:
best_result = results.nsmallest(1, 'err_h').reset_index(drop=True)
lower_window_opt = best_result.lower_window[0]
upper_window_opt = best_result.upper_window[0]
prior_scale_opt = best_result['prior_scale'][0]
mode_opt = best_result['mode'][0]

In [None]:
print(f"Thus, for {country_main} the optimal parameters of Prophet model that gave an WAPE = {best_result['err_h'][0]} are:")
print("* lower_window =", lower_window_opt)
print("* upper_window =", upper_window_opt)
print("* prior_scale =", prior_scale_opt)
print("* mode_opt =", mode_opt)

In [None]:
holidays_df['lower_window'] = lower_window_opt
holidays_df['upper_window'] = upper_window_opt
holidays_df['prior_scale'] = prior_scale_opt

In [None]:
# The smallest WAPE:
display(best_result)

## 5.2. Stage 2 - Tuning seasonality parameters<a class="anchor" id="5.2"></a>

[Back to Table of Contents](#0.1)

## 5.2.1. Model training, forecasting and evaluation<a class="anchor" id="5.2.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Thanks to https://www.kaggle.com/vbmokin/covid-19-in-ukraine-prophet-holidays-tuning
def make_forecasts_stage2(country_df, holidays_df, days_to_forecast, days_to_forecast_for_evalution, first_forecasted_date,
                          mode_main='multiplicative'):
    
    def eval_error(forecast_df, country_df_val, first_forecasted_date, title):
        # Evaluate forecasts with validation set val_df and calculaction and printing with title the relative error
        forecast_df[forecast_df['yhat'] < 0]['yhat'] = 0
        result_df = forecast_df[(forecast_df['ds'] >= pd.to_datetime(first_forecasted_date))]
        result_val_df = result_df.merge(country_df_val, on=['ds'])
        result_val_df['rel_diff'] = (result_val_df['y'] - result_val_df['yhat'].round()).abs()
        relative_error = sum(result_val_df['rel_diff'].values)*100/result_val_df['y'].sum()
        
        return relative_error
    
    def model_training_forecasting(df, forecast_days, holidays_df=None, mode_main='multiplicative', 
                                  weekly_fourier_order=10, triply_fourier_order=10,
                                  changepoint_prior_scale = changepoint_prior_scale_initial_level, mode_seasonality = 'additive'):
        # Prophet model training and forecasting
        
        model = Prophet(daily_seasonality=False, weekly_seasonality=False, yearly_seasonality=False, 
                        holidays=holidays_df, changepoint_range=1, changepoint_prior_scale = changepoint_prior_scale,
                        seasonality_mode = mode_main)
        if weekly_fourier_order > 0:
            model.add_seasonality(name='weekly', period=7, fourier_order=weekly_fourier_order, mode = mode_seasonality, 
                                  prior_scale = changepoint_prior_scale)
        if triply_fourier_order > 0:
            model.add_seasonality(name='triply', period=3, fourier_order=triply_fourier_order, mode = mode_seasonality, 
                                  prior_scale = changepoint_prior_scale/add_season_reg_coef)
        model.fit(df)
        future = model.make_future_dataframe(periods=forecast_days)
        forecast = model.predict(future)
        forecast[forecast['yhat'] < 0]['yhat'] = 0
        return model, forecast

    
    cols_w = ['ds', 'trend', 'yhat', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper', 'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
              'multiplicative_terms','multiplicative_terms_lower', 'multiplicative_terms_upper']
    cols_h = ['ds', 'trend', 'yhat', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper', 'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
              'holidays', 'holidays_lower', 'holidays_upper', 'multiplicative_terms','multiplicative_terms_lower', 'multiplicative_terms_upper']
    mode_seasonality_list = ['additive', 'multiplicative']
    relative_errors_holidays = []
    counter = 0
    results = pd.DataFrame(columns=['Conf_real', 'Conf_pred', 'Conf_pred_h', 'mode_s', 'err', 'err_h', 'weekly_fn', 'triply_fn', 'ch_p_s_fn', 'how_less, %'])
    
    country_dfs = []
    # Data preparation for forecast with Prophet
    country_df['ds'] = pd.to_datetime(country_df['ds'])

    # Set training and validation datasets
    country_df_future = country_df.copy()
    country_df_val = country_df[(country_df['ds'] >= pd.to_datetime(first_forecasted_date))].copy()
    country_df = country_df[(country_df['ds'] < pd.to_datetime(first_forecasted_date))]

    n = 64 # number of combination of parameters weekly_fourier_order / triply_fourier_order
    relative_error_min = 100
    for k in range(2):
        # 'additive' and 'multiplicative' mode tuning
        # Without holidays
        # Model training and forecasting without holidays
        model, forecast = model_training_forecasting(country_df, days_to_forecast_for_evalution, mode_main=mode_main,
                                                     mode_seasonality = mode_seasonality_list[k])
        #fig = model.plot_components(forecast)

        # Evaluate forecasts with validation set val_df and calculaction and printing the relative error
        forecast_df = forecast[['ds', 'yhat']].copy()
        relative_error = eval_error(forecast_df, country_df_val, first_forecasted_date, 'without holidays')
        mode_seasonality_w_best = mode_seasonality_list[1] if relative_error < relative_error_min else mode_seasonality_list[0]

        # With holidays
        # Model training with tuning prior_scale and forecasting
        for i in range(n):
            parameters_iter = convert10_base4(i).zfill(3)
            weekly_fourier_order_i = weekly_fourier_order_list[int(parameters_iter[0])]
            triply_fourier_order_i = triply_fourier_order_list[int(parameters_iter[1])]
            changepoint_prior_scale_i = changepoint_prior_scale_list[int(parameters_iter[2])]
            model_holidays, forecast_holidays = model_training_forecasting(country_df, days_to_forecast_for_evalution, 
                                                                           holidays_df, mode_main=mode_main,
                                                                           weekly_fourier_order = weekly_fourier_order_i, 
                                                                           triply_fourier_order = triply_fourier_order_i,
                                                                           changepoint_prior_scale = changepoint_prior_scale_i,
                                                                           mode_seasonality = mode_seasonality_list[k])
            
            # Evaluate forecasts with validation set val_df and calculaction and printing the relative error
            forecast_holidays_df = forecast_holidays[['ds', 'yhat']].copy()
            relative_error_holidays = eval_error(forecast_holidays_df, country_df_val, first_forecasted_date, 'with holidays impact')

            # Save results
            if (k == 0) and (i == 0):
                relative_error_holidays_min = relative_error_holidays
                forecast_holidays_df_best = forecast_holidays[cols_h]
                model_holidays_best = model_holidays
                weekly_fourier_order_best = weekly_fourier_order_i
                triply_fourier_order_best = triply_fourier_order_i
                changepoint_prior_scale_best = changepoint_prior_scale_i
                mode_seasonality_best = mode_seasonality_list[k]

            elif (relative_error_holidays < relative_error_holidays_min):
                relative_error_holidays_min = relative_error_holidays
                forecast_holidays_df_best = forecast_holidays[cols_h]
                model_holidays_best = model_holidays
                weekly_fourier_order_best = weekly_fourier_order_i
                triply_fourier_order_best = triply_fourier_order_i
                changepoint_prior_scale_best = changepoint_prior_scale_i
                mode_seasonality_best = mode_seasonality_list[k]

            # Save results to dataframe with result for the last date
            confirmed_real_last = country_df_val.tail(1)['y'].values[0].astype('int')
            results.loc[i+n*k,'Conf_real'] = confirmed_real_last if confirmed_real_last > 0 else 0
            confirmed_pred_last = round(forecast_df.tail(1)['yhat'].values[0]).astype('int')
            results.loc[i+n*k,'Conf_pred'] = confirmed_pred_last if confirmed_pred_last > 0 else 0
            confirmed_pred_holidays_last = round(forecast_holidays_df_best.tail(1)['yhat'].values[0],0).astype('int')
            results.loc[i+n*k,'Conf_pred_h'] = confirmed_pred_holidays_last if confirmed_pred_holidays_last > 0 else 0
            results.loc[i+n*k,'mode_s'] = mode_seasonality_list[k]
            results.loc[i+n*k,'err'] = relative_error
            results.loc[i+n*k,'err_h'] = relative_error_holidays
            results.loc[i+n*k,'weekly_fn'] = weekly_fourier_order_i
            results.loc[i+n*k,'triply_fn'] = triply_fourier_order_i
            results.loc[i+n*k,'ch_p_s_fn'] = changepoint_prior_scale_i
            results.loc[i+n*k,'how_less, %'] = round((relative_error-relative_error_holidays)*100/relative_error,1)

            print('i =',i+n*k,' from',2*n-1,':  weekly_fourier_order =', weekly_fourier_order_i, 'triply_fourier_order =', triply_fourier_order_i,
                  'changepoint_prior_scale =', changepoint_prior_scale_i)
            print('relative_error_holidays =',relative_error_holidays, 'relative_error_holidays_min =',relative_error_holidays_min, '\n')

        # Results visualization
        print('Seasonality mode is', mode_seasonality_list[k])
        print('The best errors of model with holidays is', relative_error_holidays_min,
              'weekly_fourier_order =', weekly_fourier_order_i, 'triply_fourier_order =', triply_fourier_order_i,
              'changepoint_prior_scale =', changepoint_prior_scale_i)
        print('The error of model without holidays is', relative_error, '\n')

    # Save results to dataframe with all dates
    forecast_holidays_df_best['country'] = country_main
    forecast_holidays_df_best.rename(columns={'yhat':'confirmed'}, inplace=True)
    forecast_holidays_dfs = forecast_holidays_df_best.tail(days_to_forecast_for_evalution)

    # Forecasting the future
    if relative_error < relative_error_holidays_min:
        # The forecast without taking into account the holidays is the best
        model_future_best, forecast_future_best = model_training_forecasting(country_df, days_to_forecast_for_evalution, mode_main=mode_main,
                                                                             mode_seasonality = mode_seasonality_w_best)
        forecast_plot = model_future_best.plot(forecast_future_best, ylabel='Confirmed in '+ country_main + ' (forecasting without holidays) - ' + mode_seasonality_w_best)
        cols = cols_w
        print('The best model is model without holidays')
    else:
        # The forecast taking into account the holidays is the best
        print('The best model is model with holidays')
        model_future_best, forecast_future_best = model_training_forecasting(country_df, days_to_forecast_for_evalution, 
                                                                             holidays_df, mode_main=mode_main,
                                                                             weekly_fourier_order = weekly_fourier_order_best, 
                                                                             triply_fourier_order = triply_fourier_order_best,
                                                                             changepoint_prior_scale = changepoint_prior_scale_i,
                                                                             mode_seasonality = mode_seasonality_best)
        forecast_plot = model_future_best.plot(forecast_future_best, ylabel='Confirmed in '+ country_main + ' (forecasting with holidays) - ' + mode_seasonality_best)
        cols = cols_h
    # Save forecasting results 
    forecast_future_df_best = forecast_future_best[cols]
    forecast_future_df_best['country'] = country_main
    forecast_future_df_best.rename(columns={'yhat':'confirmed'}, inplace=True)    
    forecast_future_dfs = forecast_future_df_best.tail(days_to_forecast)
    fig = model_future_best.plot_components(forecast_future_best)
    return forecast_holidays_dfs, relative_errors_holidays, forecast_future_dfs, results

In [None]:
%%time
forecast_holidays_dfs, relative_errors_holidays, forecast_future_dfs, results = make_forecasts_stage2(df2, holidays_df, days_to_forecast, days_to_forecast_for_evalution, first_forecasted_date, mode_main=mode_opt)

In [None]:
forecast_holidays_dfs.to_csv('forecast_holidays_dfs2.csv', index=False)
forecast_future_dfs.to_csv('forecast_future_dfs2.csv', index=False)
results.to_csv('results2.csv', index=False)

## 5.2.2. Results visualization<a class="anchor" id="5.2.2"></a>

[Back to Table of Contents](#0.1)

In [None]:
results

In [None]:
# Visualization or results
print(f'3D plot of Prophet model parameters and COVID-19 error of forecasting to {str(days_to_forecast_for_evalution)} days')

In [None]:
# Determination of the best parameters
results['err_h'] = results['err_h'].astype('float')
results['weekly_fn'] = results['weekly_fn'].astype('int')
results['triply_fn'] = results['triply_fn'].astype('int')
results_a = results[results['mode_s'] == 'additive']
results_m = results[results['mode_s'] == 'multiplicative']

In [None]:
results_a

In [None]:
# Interactive plot with results of parameters tuning - additive
fig = px.scatter_3d(results_a, x='weekly_fn', y='triply_fn', z='err_h',
                    color='ch_p_s_fn', color_discrete_sequence= px.colors.sequential.Plasma_r, opacity=1,
                    title='Interactive plot with results of parameters tuning for additive mode')
fig.update(layout=dict(title=dict(x=0.5)))

In [None]:
# Interactive plot with results of parameters tuning - multiplicative
fig = px.scatter_3d(results_m, x='weekly_fn', y='triply_fn', z='err_h',
                    color='ch_p_s_fn', color_discrete_sequence= px.colors.sequential.Plasma_r, opacity=1,
                    title='Interactive plot with results of parameters tuning for multiplicative mode')
fig.update(layout=dict(title=dict(x=0.5)))

In [None]:
display(results_a.nsmallest(5, 'err_h'))
display(results_m.nsmallest(5, 'err_h'))

In [None]:
# The smallest WAPE:
best_result2 = results.nsmallest(1, 'err_h').reset_index(drop=True)
weekly_fourier_order_opt = best_result2.weekly_fn[0]
triply_fourier_order_opt = best_result2.triply_fn[0]
mode_seasonality_opt = mode_seasonality_weekly_opt = mode_seasonality_triply_opt = best_result2['mode_s'][0]
changepoint_prior_scale_opt = best_result2['ch_p_s_fn'][0]
weekly_seasonality_prior_scale_opt = changepoint_prior_scale_opt*add_season_reg_coef
triply_seasonality_prior_scale_opt = changepoint_prior_scale_opt/add_season_reg_coef

## 5.3. Results of all tuning<a class="anchor" id="5.3"></a>

[Back to Table of Contents](#0.1)

In [None]:
# The smallest WAPE:
display(best_result2)

In [None]:
print(f"Thus, for {country_main} the optimal 11 parameters of Prophet model that gave an WAPE = {best_result2.err_h[0]} are:")
print("* lower_window =", lower_window_opt)
print("* upper_window =", upper_window_opt)
print("* prior_scale =", prior_scale_opt)
print("* changepoint_prior_scale =", changepoint_prior_scale_opt)
print("* mode_opt =", mode_opt)
print("* weekly_fourier_order =", weekly_fourier_order_opt)
print("* mode_seasonality_weekly =", mode_seasonality_weekly_opt)
print("* weekly_seasonality_prior_scale =", weekly_seasonality_prior_scale_opt)
print("* triply_fourier_order =", triply_fourier_order_opt)
print("* mode_seasonality_triply =", mode_seasonality_triply_opt)
print("* triply_seasonality_prior_scale =", triply_seasonality_prior_scale_opt)

## 6. Prediction <a class="anchor" id="6"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Thanks to https://www.kaggle.com/vbmokin/covid-19-in-ukraine-prophet-holidays-tuning
def model_training_forecasting(df, forecast_days, holidays_df=None, mode_main='multiplicative', 
                               weekly_fourier_order=10, triply_fourier_order=10, 
                               changepoint_prior_scale = changepoint_prior_scale_initial_level, mode_seasonality = 'additive'):
    # Optimal Prophet model training and forecasting

    model = Prophet(daily_seasonality=False, weekly_seasonality=False, yearly_seasonality=False, 
                    holidays=holidays_df, changepoint_range=1, changepoint_prior_scale = changepoint_prior_scale,
                    seasonality_mode = mode_main)
    if weekly_fourier_order > 0:
        model.add_seasonality(name='weekly', period=7, fourier_order=weekly_fourier_order, mode = mode_seasonality, 
                                  prior_scale = changepoint_prior_scale*add_season_reg_coef)
    if triply_fourier_order > 0:
        model.add_seasonality(name='triply', period=3, fourier_order=triply_fourier_order, mode = mode_seasonality, 
                                  prior_scale = changepoint_prior_scale/add_season_reg_coef)
    model.fit(df)
    future = model.make_future_dataframe(periods=forecast_days)
    forecast = model.predict(future)
    
    forecast[forecast['yhat'] < 0]['yhat'] = 0
    forecast['yhat_lower'] = forecast['yhat_lower'].round().astype('int')
    forecast['yhat'] = forecast['yhat'].round().astype('int')
    forecast['yhat_upper'] = forecast['yhat_upper'].round().astype('int')
    
    return model, forecast

In [None]:
model_future_opt, forecast_future_opt = model_training_forecasting(df2, days_to_forecast_for_evalution, holidays_df, mode_main=mode_opt,
                                                                   weekly_fourier_order = weekly_fourier_order_opt, 
                                                                   triply_fourier_order = triply_fourier_order_opt,
                                                                   changepoint_prior_scale = changepoint_prior_scale_opt,
                                                                   mode_seasonality = mode_seasonality_opt)

In [None]:
fig = model_future_opt.plot(forecast_future_opt)

In [None]:
fig = model_future_opt.plot_components(forecast_future_opt)

In [None]:
forecast_future_opt_future = forecast_future_opt[['ds', 'yhat_lower', 'yhat', 'yhat_upper']]
forecast_future_opt_future.tail(days_to_forecast)

In [None]:
forecast_future_opt_future.to_csv('forecast_future_opt_future.csv', index=False)
best_result2.to_csv('best_result2.csv', index=False)
holidays_df.to_csv('holidays_df_all.csv', index=False)

I hope you find this kernel useful and enjoyable.

Your comments and feedback are most welcome.

[Go to Top](#0)