# Covid-19 forecasting using Prophet

This project aims to utilize the Prophet library in Python to forecast COVID-19 cases, providing valuable insights into future trends.

Methodology:

1. Data Preparation: The COVID-19 dataset is loaded into a Pandas DataFrame and preprocessed to ensure consistency and accuracy.
2. Time Series Analysis: The Prophet library is used to model the time series data, capturing both trend and seasonality.
3. Model Training: The Prophet model is trained on historical COVID-19 case data to learn patterns and relationships.
4. Forecasting: The trained model is used to generate forecasts for future COVID-19 cases based on the learned patterns.
5. Evaluation: The accuracy of the forecasts is evaluated using metrics such as Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE).
6. Visualization: The forecasted COVID-19 cases are visualized using Matplotlib and Seaborn to provide intuitive insights into future trends.

### Loading Libraries and constants

In [None]:
import numpy as np
import pandas as pd
import malib.data.clean as mc
import malib.data.format as mf
import malib.data.plotting as mp

In [None]:
PATH = "../../data/raw/covid_19_clean_complete.csv"
DS = "Date"
Y = "Confirmed"
SAVE_PATH = "../../data/processed/"
FILE_NAME = "df_clean_target_confirmed.csv"

In [None]:
# load data
df_raw = pd.read_csv(PATH)

# View the first few rows of the dataframe
display(df_raw.head())

In [None]:
### Transform Data

In [None]:
# Transform date and target columns into model format required
df = mf.format_date(df_raw,DS, Y)

In [None]:
## Exploratory Data Analysis

In [None]:
mp.plot_time_series(df,DS,['Confirmed','Deaths','Recovered'],False,"Temporal Evolution COVID-19")

In [None]:
mp.plot_decompose_time_series(df, DS, Y)

### Analysis of the Graph

1. **Original Series (Confirmed)**
   - The first subplot shows the original time series of confirmed COVID-19 cases. Here, we can observe how the confirmed cases have varied over time.

2. **Trend**
   - The second subplot shows the long-term trend of the time series. This trend represents the general behavior of the number of confirmed cases without considering seasonal fluctuations and residuals.
   - In this case, we can see that the trend is upward, indicating a continuous increase in the number of confirmed cases over time.

3. **Seasonality**
   - The third subplot shows the seasonal component of the time series. This component captures periodic variations that occur at regular intervals (e.g., weekly or monthly patterns).
   - Here, we can see that there is a clear seasonality with a regular pattern of increase and decrease in confirmed cases, which seems to repeat over a certain period of time.

4. **Residual**
   - The fourth subplot shows the residual, which is the part of the time series that remains after removing the trend and seasonality. It represents random fluctuations and noise that are not explained by the other two components.
   - In this case, the residuals show variations that do not follow a clear pattern and seem quite random, although with some significant fluctuations.

## Data Transform & Clean

In [None]:
CLEANING_RULES = {
    'del_negative': True, # Delete rows with negavites values in target column
    'del_days': None,  # Delete days of weeks. i.e. 0 (mondays) 6 (sundays)
    'del_zeros': None, # Delete rows with zeros in target column,
    'log_transform': None # Transform column y into log
}

In [None]:
# Cleaning dataset
df = mc.clean_df(df,Y,CLEANING_RULES)

In [None]:
# Format into forecast dataframe
df = mf.format_input_and_target(df,DS, Y)

In [None]:
# Group by country and date
df = mf.group_by_columns_and_sum(df,['ds'],['y'])

## **Hyperparameter Tuning in Prophet Model**

In [None]:
## Split into train and test
train,test = msp.get_train_test_by_date_ratio(df,[0.8,0.2],'ds')


In [None]:
# Avoid IOPub date_rate error by disabling verbosity
import logging
logging.getLogger("prophet").setLevel(logging.ERROR)
logging.getLogger("cmdstanpy").disabled=True

In [None]:
num_days_to_predict = msp.get_df_n_days(test)

In [None]:
cutoff = {
    'initial': str(round(msp.get_df_n_days(train)/2,0)) + " days",
    'period': '7 days',
    'horizon': '7 days'
}

In [None]:
PATH_HP_OPTUNA = "../../data/processed/hyperparameters/hp_optuna.csv"
PATH_HP_HYPEROPT = "../../data/processed/hyperparameters/hp_hyperopt.csv"
PATH_HP_PARAM_GRID = "../../data/processed/hyperparameters/hp_param_grid.csv"
trials = 2000
# results,df_optuna,df_hyperopt = mt.compare_optimizers(train, cutoff,trials)
# df_optuna.to_csv(PATH_HP_OPTUNA)
# df_hyperopt.to_csv(PATH_HP_HYPEROPT)

In [None]:
# Hyperparamters grid

# params_grid = {  'growth': ["linear"], 
#                 'changepoints': [None], 
#                 'n_changepoints': [ 25,50,75], 
#                 'changepoint_range': [0.7,0.8,0.9],
#                 'yearly_seasonality': ["auto"],
#                 'weekly_seasonality': ["auto"],
#                 'daily_seasonality': ["auto"],
#                 'holidays': [None],
#                 'seasonality_mode': ['multiplicative',"additive"],
#                 'changepoint_prior_scale': [0.01, 0.05, 0.1, 0.3, 0.4, 0.5,0.6,0.7,0.8,0.9],
#                 'seasonality_prior_scale': [0.01, 0.1, 1.0, 3.0, 5.0, 7.0, 8.0, 9.0, 10.0,20,50,80],
#                 'holidays_prior_scale': [0.01, 0.1, 1.0, 2.0, 3.0, 5.0, 8.0,9.0, 10.0,30,50,70,90],
#                 'mcmc_samples': [0],
#                 'interval_width': [ 0.8,0.9],
#                 'uncertainty_samples': [0]
#               }


# bp = mt.all_hyperparameters_tunning(train,params_grid)
# bp.to_csv(PATH_HP_PARAM_GRID)

## Model Training

In [None]:
df_opt = pd.read_csv(PATH_HP_OPTUNA)
df_hyp = pd.read_csv(PATH_HP_HYPEROPT)
df_pg = pd.read_csv(PATH_HP_PARAM_GRID)

In [None]:
columns = [ 'growth',
                'changepoints', 
                'n_changepoints', 
                'changepoint_range',
                'yearly_seasonality',
                'weekly_seasonality',
                'daily_seasonality',
                'holidays',
                'seasonality_mode',
                'changepoint_prior_scale',
                'seasonality_prior_scale',
                'holidays_prior_scale',
                'mcmc_samples',
                'interval_width',
                'uncertainty_samples'
          ]
values = list(eval(df_pg['params'][0]))
dfpg = pd.DataFrame([values], columns=columns)

common_columns = dfpg.columns.intersection(df_opt.columns).intersection(df_hyp.columns)
df_concat = pd.concat([dfpg[common_columns], df_opt[common_columns], df_hyp[common_columns]], ignore_index=True)

In [None]:
# Iterar sobre las filas del DataFrame y entrenar un modelo para cada conjunto de parámetros
models = []
for index, row in df_concat.iterrows():
    m = Prophet(
        n_changepoints=int(row['n_changepoints']),
        changepoint_range=row['changepoint_range'],
        seasonality_mode=row['seasonality_mode'],
        changepoint_prior_scale=row['changepoint_prior_scale'],
        seasonality_prior_scale=row['seasonality_prior_scale'],
        holidays_prior_scale=row['holidays_prior_scale'],
        interval_width=row['interval_width']
    )
    m.fit(train)
    models.append(m)
    print(f"Model {index + 1} trained successfully")

## Model Prediction

In [None]:
forecast = {}
hyper_opt_names = ["params_grid", "optuna", "hyperopt"]

for index, m in enumerate(models):
    future = m.make_future_dataframe(periods=num_days_to_predict)
    f = m.predict(future)
    f = f[['ds', 'yhat']].tail(num_days_to_predict)
    
    # Agregar cada predicción al diccionario usando la clave correspondiente de hyper_opt_names
    forecast[hyper_opt_names[index]] = f

In [None]:
for key,f in forecast.items():
    f.to_csv(f"../../data/processed/prediction/{key}.csv",index=False,float_format="%.2f")