<a class="anchor" id="0"></a>
# [AI-ML-DS : Training for beginners](https://www.kaggle.com/vbmokin/ai-ml-ds-training-for-beginners-in-kaggle). Level 2 (simple). 2021
## Kaggle GM, Prof. [@vbmokin](https://www.kaggle.com/vbmokin)
### [Vinnytsia National Technical University](https://vntu.edu.ua/), Ukraine
#### [Chair of the System Analysis and Information Technologies](http://mmss.vntu.edu.ua/index.php/ua/)

<a class="anchor" id="0"></a>
# COVID-19 in one region of Ukraine

## Dataset [COVID-19 in Ukraine: daily data](https://www.kaggle.com/vbmokin/covid19-in-ukraine-daily-data)

## Acknowledgements

### Datasets:
- official data of Ukraine (https://covid19.rnbo.gov.ua/) - via API
- my dataset [COVID-19 in Ukraine: daily data](https://www.kaggle.com/vbmokin/covid19-in-ukraine-daily-data) - for the next commits

### Notebooks:
* [AI-ML-DS Training. L1A : COVID in UA - Prophet](https://www.kaggle.com/vbmokin/ai-ml-ds-training-l1t-covid-in-ua-prophet?scriptVersionId=63736090)
* [COVID in UA: Prophet with 4, Nd seasonality](https://www.kaggle.com/vbmokin/covid-in-ua-prophet-with-4-nd-seasonality)
* [Data Science for tabular data: Advanced Techniques](https://www.kaggle.com/vbmokin/data-science-for-tabular-data-advanced-techniques)
* [EDA for tabular data: Advanced Techniques](https://www.kaggle.com/vbmokin/eda-for-tabular-data-advanced-techniques)
* [COVID-19 in Ukraine: EDA & Forecasting](https://www.kaggle.com/vbmokin/covid-19-in-ukraine-eda-forecasting)
* [COVID-19 new cases in 70 countries - FB Prophet](https://www.kaggle.com/vbmokin/covid-19-new-cases-in-70-countries-fb-prophet)

### Libraries from GitHub:
- https://facebook.github.io/prophet/
- https://facebook.github.io/prophet/docs/
- https://github.com/facebook/prophet

<a class="anchor" id="0.1"></a>
## Table of Contents

1. [Import libraries](#1)
1. [Download data](#2)
1. [EDA & FE](#3)
1. [Modeling](#4)
1. [Prediction & Visualization](#5)

## 1. Import libraries<a class="anchor" id="1"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Work with Data - the main Python libraries
import numpy as np
import pandas as pd

# For import data from API
import requests

# Visualization
import matplotlib.pyplot as plt

# Modeling and Prediction
from fbprophet import Prophet
from sklearn.metrics import mean_absolute_error, r2_score

import warnings
warnings.simplefilter('ignore')

In [None]:
prediction_period = 7  # Period for prediction, days

## 2. Download data<a class="anchor" id="2"></a>

[Back to Table of Contents](#0.1)

### All list of API parameters
https://api-covid19.rnbo.gov.ua/charts/main-data?mode=ukraine

### Example for region "Zhytomyrskа oblast"
code_region = 4914

In [None]:
# Download data via API from the Portal of RNBO of Ukraine: https://api-covid19.rnbo.gov.ua/
# https://api-covid19.rnbo.gov.ua/charts/main-data?mode=ukraine
code_region = 4907  # "Zhytomyrskа oblast"
print(f'Download daily data from the Portal of RNBO of Ukraine')
myfile = requests.get(f'https://api-covid19.rnbo.gov.ua/charts/main-data?mode=ukraine&country={code_region}')
open('data', 'wb').write(myfile.content)
df_data = pd.read_json('data')
df_data

In [None]:
# Display the last 10 rows of the dataframe "df_data"
df_data.tail(10)

## 3. EDA & FE<a class="anchor" id="3"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Calculation daily data for number of confirmed cases
df_data['n_confirmed'] = df_data['confirmed'].diff()
#df_data['n_confirmed'] = df_data['confirmed']

In [None]:
# Filtering the missing data
data = df_data[['dates','n_confirmed']].dropna().reset_index(drop=True)
data['n_confirmed'] = data['n_confirmed'].astype('int')
data.tail(4)

In [None]:
# Delete the last zero value
data = data[:-1]
data.tail(3)

In [None]:
# Build the plot
data['n_confirmed'].plot()

In [None]:
# Prepairing data for modeling with Prophet
data.columns = ['ds', 'y']
data.tail()

In [None]:
# Removing zero values
data = data[data['y'] > 0].reset_index(drop=True)
data

In [None]:
# Devide dataset to training and validation datasets
valid = data[(len(data)-prediction_period):].reset_index(drop=True)
data = data[:(len(data)-prediction_period)]
valid

In [None]:
data.tail(prediction_period)

### Selection data with the biggest wave

In [None]:
# The smallest increase in value
data.loc[303:305]

In [None]:
# Selection data with the biggest wave
df2 = data[304:].reset_index(drop=True)
df2

In [None]:
df2.plot()

**ADDITIONAL TASK:** Try to specify another number (instead of 304) as a reference point or other.

Look all data with plot here: https://covid19.rnbo.gov.ua/

## 4. Modeling<a class="anchor" id="4"></a>

[Back to Table of Contents](#0.1)

### For all data

In [None]:
# Build Prophet model
model = Prophet()

# Training model for all data
model.fit(data)

### For the biggest wave - df2

In [None]:
# Build Prophet model with parameters ans structure 
# from the notebook https://www.kaggle.com/vbmokin/covid-19-in-ukraine-eda-forecasting 
# but without holidays
model2 = Prophet(daily_seasonality=False, weekly_seasonality=False, yearly_seasonality=False, 
                changepoint_range=1, changepoint_prior_scale = 0.3)
model2.add_seasonality(name='weekly', period=7, fourier_order=12, 
                      mode = 'multiplicative', prior_scale = 0.24)
model2.add_seasonality(name='triply', period=3, fourier_order=2, 
                      mode = 'multiplicative', prior_scale = 0.15)

# Training model for df2
model2.fit(df2)

**ADDITIONAL TASK:** Try changing the parameters of the Prophet model and evaluate how this will affect the shape of the curve and the accuracy of the simulation

## 5. Prediction & Visualization<a class="anchor" id="5"></a>

[Back to Table of Contents](#0.1)

In [None]:
def prediction_accuracy(valid, model):
    # Calculation prediction accuracy for valid dataset by model
    
    future = model.make_future_dataframe(periods = prediction_period)   # Valid data prediction
    forecast = model.predict(future)
    forecast = forecast[(len(forecast)-prediction_period):]
    forecast[forecast['yhat'] < 0]['yhat'] = 0
    y_val = forecast['yhat'].round()  # Prediction    
    y_target = valid['y']             # Real data
    
    return round(mean_absolute_error(y_val, y_target)*100/valid['y'].sum(), 2)    

### For all data

In [None]:
# Make a forecast for prediction_period days ahead
future = model.make_future_dataframe(periods = prediction_period)
forecast = model.predict(future)

# Make values integer, and replace negative values with zero
feature_all = ['yhat_lower', 'yhat', 'yhat_upper']
forecast[feature_all] = forecast[feature_all].astype('int')
for feature in feature_all:
    forecast.loc[forecast[feature] < 0, feature] = 0

In [None]:
# Draw plot of the values with forecasting data
figure = model.plot(forecast, xlabel = 'Date', ylabel = 'Number of confirmed cases')

The adequacy of a simple model is very poor!

In [None]:
# Draw plot with the components (trend and weekly seasonality) of the forecasts
figure_component = model.plot_components(forecast)

In [None]:
# Ouput the prediction for the next prediction_period days
forecast[['ds', 'yhat_lower', 'yhat', 'yhat_upper']].tail(prediction_period)

In [None]:
print(f"Relative error (WAPE) for model = {prediction_accuracy(valid, model)}%")

### For the biggest wave - df2

**TASK:** Make similar calculations for the dataframe df2:
1. Make a forecast for it for prediction_period=7 days ahead.
2. Draw a plot of the values with forecasting data.
3. Draw plot with the components (trend and weekly seasonality) of the forecasts.
4. Ouput the prediction for the next prediction_period=7 days.

In [None]:
# Make a forecast for prediction_period days ahead
future2 = model2.make_future_dataframe(periods = prediction_period)
forecast2 = model2.predict(future2)

# Make values integer, and replace negative values with zero
forecast2[feature_all] = forecast2[feature_all].astype('int')
for feature in feature_all:
    forecast2.loc[forecast2[feature] < 0, feature] = 0

In [None]:
# Draw plot of the values with forecasting data
figure2 = model2.plot(forecast2, xlabel = 'Date', ylabel = 'Number of confirmed cases')

The adequacy of a more complex model is much better.

In [None]:
# Draw plot with the components (trend and weekly seasonality) of the forecasts
figure_component2 = model2.plot_components(forecast2)

In [None]:
# Ouput the prediction for the next prediction_period days
forecast2[['ds', 'yhat_lower', 'yhat', 'yhat_upper']].tail(prediction_period)

In [None]:
print(f"Relative error (WAPE) for model = {prediction_accuracy(valid, model2)}%")

I hope you find this notebook useful and enjoyable.

Your comments and feedback are most welcome.

[Go to Top](#0)