## Dataset [COVID-19 in Ukraine: daily data](https://www.kaggle.com/vbmokin/covid19-in-ukraine-daily-data)

Thanks to [AI-ML-DS Training. L1A : COVID in UA - Prophet](https://www.kaggle.com/vbmokin/ai-ml-ds-training-l1t-covid-in-ua-prophet?scriptVersionId=46900540)

My upgrade:

    add new plots;
    changed some parameters;
    

    

The notebook was prepared by team "SAIT VNTU SG" from the course "AI-ML-DS Training" (tutor - [@vbmokin](https://www.kaggle.com/vbmokin****))

## 1. Import libraries<a class="anchor" id="1"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Work with Data - the main Python libraries
import numpy as np
import pandas as pd

# For import data from API
import requests

# Visualization
import matplotlib.pyplot as plt

# Modeling and Prediction
from fbprophet import Prophet
from fbprophet.make_holidays import make_holidays_df
from fbprophet.diagnostics import cross_validation, performance_metrics
from fbprophet.plot import plot_cross_validation_metric
import holidays

import warnings
warnings.simplefilter('ignore')

## 2. Download data<a class="anchor" id="2"></a>

[Back to Table of Contents](#0.1)

#### Thanks to https://api-covid19.rnbo.gov.ua/

In [None]:
# Download data vai API from the Portal of RNBO of Ukraine 
for filename in ['main-data?mode=ukraine&fbclid=IwAR1vNXEE0nkmorUmGP4StG4cLrj1Z9VoX3c3Bi8dfltr0elgOj4b0M3ONvk']:
    print(f'Download daily data from the Portal of RNBO of Ukraine')
    url = f'https://api-covid19.rnbo.gov.ua/charts/{filename}'
    myfile = requests.get(url)
    open(filename, 'wb').write(myfile.content)
    
df_data = pd.read_json('main-data?mode=ukraine&fbclid=IwAR1vNXEE0nkmorUmGP4StG4cLrj1Z9VoX3c3Bi8dfltr0elgOj4b0M3ONvk')
df_data

In [None]:
# Display the last 10 rows of the dataframe "df_data"
df_data.tail(10)

## 3. EDA & FE<a class="anchor" id="3"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Calculation daily data for number of confirmed cases
df_data['n_confirmed'] = df_data['confirmed'].diff()

In [None]:
# Filtering the missing data
data = df_data[['dates','n_confirmed']].dropna().reset_index(drop=True)
data['n_confirmed'] = data['n_confirmed'].astype('int')
data.tail(5)

In [None]:
# Build the plot
data['n_confirmed'].plot()

In [None]:
df_data['n_deaths'] = df_data['deaths'].diff()

In [None]:
data = df_data[['dates','n_deaths']].dropna().reset_index(drop=True)
data['n_deaths'] = data['n_deaths'].astype('int')
data.tail(5)

In [None]:
data['n_deaths'].plot()

In [None]:
df_data['n_recovered'] = df_data['recovered'].diff()

In [None]:
data = df_data[['dates','n_recovered']].dropna().reset_index(drop=True)
data['n_recovered'] = data['n_recovered'].astype('int')
data.tail(5)

In [None]:
data['n_recovered'].plot()

In [None]:
# Prepairing data for modeling with Prophet
data.columns = ['ds', 'y']
data.tail()

In [None]:
# Removing zero values
data = data[data['y'] > 0].reset_index(drop=True)
data

### Selection data with the biggest wave

In [None]:
# Build the plot for July
df = data[(data['ds'] >= '2020-04-01') & (data['ds'] < '2020-06-01')]
df['y'].plot()

In [None]:
# The smallest value
df.loc[73,:]

In [None]:
# Selection data with the biggest wave
df2 = data[73:].reset_index(drop=True)
df2.head()

## 4. Modeling<a class="anchor" id="4"></a>

[Back to Table of Contents](#0.1)

### For all data

In [None]:
# Build Prophet model
model = Prophet()

# Training model for all data
model.fit(data)

### For the biggest wave - df2

In [None]:
# Build Prophet model with parameters ans structure 
# from the notebook https://www.kaggle.com/vbmokin/covid-19-in-ukraine-eda-forecasting 
# but without holidays
model2 = Prophet(daily_seasonality=True, weekly_seasonality=False, yearly_seasonality=False, 
                changepoint_range=1, changepoint_prior_scale = 0.2)
model2.add_seasonality(name='weekly', period=7, fourier_order=12, 
                      mode = 'multiplicative', prior_scale = 0.24)
model2.add_seasonality(name='triply', period=3, fourier_order=2, 
                      mode = 'multiplicative', prior_scale = 0.15)

# Training model for df2
model2.fit(df2)

## 5. Prediction & Visualization<a class="anchor" id="5"></a>

[Back to Table of Contents](#0.1)

### For all data

In [None]:
# Make a forecast for 7 days ahead
future = model.make_future_dataframe(periods = 7)
forecast = model.predict(future)

In [None]:
# Draw plot of the values with forecasting data
figure = model.plot(forecast, xlabel = 'Date', ylabel = 'Number of confirmed cases')

In [None]:
# Draw plot with the components (trend and weekly seasonality) of the forecasts
figure_component = model.plot_components(forecast)

In [None]:
# Ouput the prediction for the next 7 days
forecast[['yhat_lower', 'yhat', 'yhat_upper']] = forecast[['yhat_lower', 'yhat', 'yhat_upper']].astype('int')
forecast[['ds', 'yhat_lower', 'yhat', 'yhat_upper']].tail(7)

### For the biggest wave - df2

In [None]:
# Make a forecast for 7 days ahead
future = model2.make_future_dataframe(periods = 7)
forecast = model2.predict(future)

In [None]:
# Draw plot of the values with forecasting data
figure = model2.plot(forecast, xlabel = 'Date', ylabel = 'Number of confirmed cases')

In [None]:
# Draw plot with the components (trend and weekly seasonality) of the forecasts
figure_component = model2.plot_components(forecast)

In [None]:
# Ouput the prediction for the next 7 days
forecast[['yhat_lower', 'yhat', 'yhat_upper']] = forecast[['yhat_lower', 'yhat', 'yhat_upper']].astype('int')
forecast[['ds', 'yhat_lower', 'yhat', 'yhat_upper']].tail(7)

I hope you find this notebook useful and enjoyable.

Your comments and feedback are most welcome.

[Go to Top](#0)