<a class="anchor" id="0"></a>
# COVID-19 in Vinnytsia region of Ukraine

## Thanks to <a href="https://www.kaggle.com/vbmokin">@vbmokin</a>

## Dataset [COVID-19 in Ukraine: daily data](https://www.kaggle.com/vbmokin/covid19-in-ukraine-daily-data)

## Acknowledgements

### Datasets:
- official data of Ukraine (https://covid19.rnbo.gov.ua/) - via API
- dataset [COVID-19 in Ukraine: daily data](https://www.kaggle.com/vbmokin/covid19-in-ukraine-daily-data) - for the next commits

### Notebooks:
* [AI-ML-DS Training. L1A : COVID in UA - Prophet](https://www.kaggle.com/vbmokin/ai-ml-ds-training-l1t-covid-in-ua-prophet?scriptVersionId=63736090)
* [COVID in UA: Prophet with 4, Nd seasonality](https://www.kaggle.com/vbmokin/covid-in-ua-prophet-with-4-nd-seasonality)
* [Data Science for tabular data: Advanced Techniques](https://www.kaggle.com/vbmokin/data-science-for-tabular-data-advanced-techniques)
* [EDA for tabular data: Advanced Techniques](https://www.kaggle.com/vbmokin/eda-for-tabular-data-advanced-techniques)
* [COVID-19 in Ukraine: EDA & Forecasting](https://www.kaggle.com/vbmokin/covid-19-in-ukraine-eda-forecasting)
* [COVID-19 in 67 countries: daily Prophet forecast](https://www.kaggle.com/vbmokin/covid-19-in-67-countries-daily-prophet-forecast)
* [COVID-19 UA: one region forecasting](https://www.kaggle.com/vbmokin/covid-19-ua-one-region-forecasting)
* [COVID-19 new cases in 70 countries - FB Prophet](https://www.kaggle.com/vbmokin/covid-19-new-cases-in-70-countries-fb-prophet)

### Libraries from GitHub:
- https://facebook.github.io/prophet/
- https://facebook.github.io/prophet/docs/
- https://github.com/facebook/prophet
- https://github.com/dr-prodigy/python-holidays

<a class="anchor" id="0.1"></a>
## Table of Contents

1. [Import libraries](#1)
1. [Download data](#2)
1. [EDA & FE](#3)
1. [Modeling](#4)
1. [Prediction & Visualization](#5)

## 1. Import libraries<a class="anchor" id="1"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Work with Data - the main Python libraries
import pandas as pd

import numpy as np
from datetime import date, timedelta, datetime

# For import data from API
import requests

# Modeling and Prediction
from fbprophet import Prophet

import warnings
warnings.simplefilter('ignore')

## 2. Download data<a class="anchor" id="2"></a>

[Back to Table of Contents](#0.1)

### All list of API parameters
https://api-covid19.rnbo.gov.ua/charts/main-data?mode=ukraine

### Retrieving information about region "Vinnytsia oblast"
code_region = 4907

In [None]:
# Download data via API from the Portal of RNBO of Ukraine: https://api-covid19.rnbo.gov.ua/
# https://api-covid19.rnbo.gov.ua/charts/main-data?mode=ukraine
code_region = 4907  # "Vinnytsia region"
print(f'Download daily data from the Portal of RNBO of Ukraine')
myfile = requests.get(f'https://api-covid19.rnbo.gov.ua/charts/main-data?mode=ukraine&country={code_region}')
open('data', 'wb').write(myfile.content)
df_data = pd.read_json('data')
df_data

In [None]:
# Display the last 10 rows of the dataframe "df_data"
df_data.tail(10)

## 3. EDA & FE<a class="anchor" id="3"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Calculation daily data for number of confirmed cases
df_data['n_confirmed'] = df_data['confirmed'].diff()
#df_data['n_confirmed'] = df_data['confirmed']

In [None]:
# Filtering the missing data
data = df_data[['dates','n_confirmed']].dropna().reset_index(drop=True)
data['n_confirmed'] = data['n_confirmed'].astype('int')
data.tail(4)

In [None]:
# Delete the last zero value
data = data[:-1]
data.tail(3)

In [None]:
# Build the plot
data['n_confirmed'].plot()

In [None]:
# Prepairing data for modeling with Prophet
data.columns = ['ds', 'y']
data.tail()

In [None]:
# Removing zero values
data = data[data['y'] > 0].reset_index(drop=True)
data

### Selection data with the biggest wave

In [None]:
# The smallest increase in value
data.loc[303:305]

In [None]:
# Selection data with the biggest wave
df2 = data[304:].reset_index(drop=True)
df2

In [None]:
df2.plot()

## 4. Modeling<a class="anchor" id="4"></a>

[Back to Table of Contents](#0.1)

### For all data

In [None]:
# Build Prophet model
model = Prophet()

# Training model for all data
model.fit(data)

### For the biggest wave - df2

In [None]:
# Build Prophet model with parameters ans structure 
# from the notebook https://www.kaggle.com/vbmokin/covid-19-in-ukraine-eda-forecasting 
# but without holidays
model2 = Prophet(daily_seasonality=False, weekly_seasonality=False, yearly_seasonality=False, 
                changepoint_range=1, changepoint_prior_scale = 0.3)
model2.add_seasonality(name='weekly', period=7, fourier_order=12, 
                      mode = 'multiplicative', prior_scale = 0.24)
model2.add_seasonality(name='triply', period=3, fourier_order=2, 
                      mode = 'multiplicative', prior_scale = 0.15)

# Training model for df2
model2.fit(df2)

## 5. Prediction & Visualization<a class="anchor" id="5"></a>

[Back to Table of Contents](#0.1)

### For all data

In [None]:
# Make a forecast for 7 days ahead
future = model.make_future_dataframe(periods = 7)
forecast = model.predict(future)

# Make values integer, and replace negative values with zero
feature_all = ['yhat_lower', 'yhat', 'yhat_upper']
forecast[feature_all] = forecast[feature_all].astype('int')
for feature in feature_all:
    forecast.loc[forecast[feature] < 0, feature] = 0

In [None]:
# Draw plot of the values with forecasting data
figure = model.plot(forecast, xlabel = 'Date', ylabel = 'Number of confirmed cases')

The adequacy of a simple model is very poor!

In [None]:
# Draw plot with the components (trend and weekly seasonality) of the forecasts
figure_component = model.plot_components(forecast)

In [None]:
# Ouput the prediction for the next 7 days
forecast[['ds', 'yhat_lower', 'yhat', 'yhat_upper']].tail(7)

### For the biggest wave - df2

In [None]:
# Make a forecast for 7 days ahead
future2 = model2.make_future_dataframe(periods = 7)
forecast2 = model2.predict(future2)

# Make values integer, and replace negative values with zero
forecast2[feature_all] = forecast2[feature_all].astype('int')
for feature in feature_all:
    forecast2.loc[forecast2[feature] < 0, feature] = 0

In [None]:
# Draw plot of the values with forecasting data
figure2 = model2.plot(forecast2, xlabel = 'Date', ylabel = 'Number of confirmed cases')

The adequacy of a more complex model is much better.

In [None]:
# Draw plot with the components (trend and weekly seasonality) of the forecasts
figure_component2 = model2.plot_components(forecast2)

In [None]:
# Ouput the prediction for the next 7 days
forecast2[['ds', 'yhat_lower', 'yhat', 'yhat_upper']].tail(7)

In [None]:
def merge(list1, list2):
    for d in list2:
        list1[d] = list2[d]
        for i in range(len(list2[d])):
            list1[d][i] = list2[d][i]
    return list1

In [None]:
def eval_error(forecast_df, title):
    # Evaluate forecasts with validation set val_df and calculaction and printing with title the relative error
    forecast_df[forecast_df['yhat'] < 0]['yhat'] = 0
    result_df = forecast_df[(pd.to_datetime(forecast_df['ds']) >= pd.to_datetime(first_forecasted_date))]
    result_val_df = result_df
    result_val_df['rel_diff'] = (result_val_df['y'] - result_val_df['yhat'].round()).abs()
    return (result_val_df['rel_diff'].sum())*100/y_real_sum

In [None]:
days_to_forecast = 14 # in future (after training data)
days_to_forecast_for_evalution = 14 # on the latest training data - for model training
first_forecasted_date = sorted(list(set(data['ds'].values)))[-days_to_forecast_for_evalution]
end_forecasted_date = (datetime.strptime(data['ds'].max(), "%Y-%m-%d")+timedelta(days = days_to_forecast)).strftime("%Y-%m-%d")
first_data_date = data['ds'].min()

print('The first date of data for modeling is: ' + first_data_date)
print('The first date to perform forecasts for evaluation is: ' + first_forecasted_date)
print('The end date to perform forecasts in future for is: ' + end_forecasted_date)

In [None]:
# Remove last 2 weeks in order to check accuracy
data_quality = data[:-14]
model_quality = Prophet()
model_quality.fit(data_quality)
data_quality

In [None]:
# Make a forecast for 14 days ahead (till today for quality check)
future_quality = model_quality.make_future_dataframe(periods = 14)
forecast_quality = model_quality.predict(future_quality)

# Make values integer, and replace negative values with zero
feature_all_q = ['yhat_lower', 'yhat', 'yhat_upper']
forecast_quality[feature_all_q] = forecast_quality[feature_all_q].astype('int')
for future_quality in feature_all_q:
    forecast_quality.loc[forecast_quality[future_quality] < 0, future_quality] = 0
    
forecast_quality[['ds', 'yhat_lower', 'yhat', 'yhat_upper']].tail(14)

In [None]:
# Check accuracy
cmp_df = merge(forecast_quality.set_index('ds')[['yhat', 'yhat_lower', 'yhat_upper']], data)
cmp_df['y'] = cmp_df['y'].astype('int')
y_real_sum = data.tail(14)['y'].sum()
relative_error = eval_error(cmp_df, 'Simple method')
cmp_df.tail(14)
print('Relative error of first model', relative_error)

In [None]:
# Remove last 2 weeks in order to check accuracy
df2_quality = df2[:-14]
model2_quality = Prophet(daily_seasonality=False, weekly_seasonality=False, yearly_seasonality=False, 
                changepoint_range=1, changepoint_prior_scale = 0.3)
model2_quality.add_seasonality(name='weekly', period=7, fourier_order=12, 
                      mode = 'multiplicative', prior_scale = 0.24)
model2_quality.add_seasonality(name='triply', period=3, fourier_order=2, 
                      mode = 'multiplicative', prior_scale = 0.15)
model2_quality.fit(df2_quality)
df2_quality

In [None]:
# Make a forecast for 14 days ahead
future2_quality = model2_quality.make_future_dataframe(periods = 14)
forecast2_quality = model2_quality.predict(future2_quality)

# Make values integer, and replace negative values with zero
forecast2_quality[feature_all_q] = forecast2_quality[feature_all_q].astype('int')
for feature in feature_all_q:
    forecast2_quality.loc[forecast2_quality[feature] < 0, feature] = 0
    
forecast2_quality[['ds', 'yhat_lower', 'yhat', 'yhat_upper']].tail(14)

In [None]:
# Check accuracy
cmp_df = merge(forecast2_quality.set_index('ds')[['yhat', 'yhat_lower', 'yhat_upper']], df2)
cmp_df['y'] = cmp_df['y'].astype('int')
y_real_sum = df2.tail(14)['y'].sum()
relative_error2 = eval_error(cmp_df, 'Simple method')
cmp_df.tail(14)
print('Relative error of second model', relative_error2)

In [None]:
if (relative_error2 < relative_error):
    print("So the second method has less relative error rate (", relative_error2, "). So the accuracy is bigger")
else:
    print("So the first method has less relative error rate (", relative_error, "). So the accuracy is bigger")

I hope you find this notebook useful and enjoyable.

Your comments and feedback are most welcome.

[Go to Top](#0)