<a class="anchor" id="0"></a>
# [AI-ML-DS : Training for beginners](https://www.kaggle.com/vbmokin/ai-ml-ds-training-for-beginners-in-kaggle). Level 1 (very simple). 2020
## Kaggle GM, Prof. [@vbmokin](https://www.kaggle.com/vbmokin)
### [Vinnytsia National Technical University](https://vntu.edu.ua/), Ukraine
#### [Chair of the System Analysis and Information Technologies](http://mmss.vntu.edu.ua/index.php/ua/)

# The concept of training:
* the **last version (commit)** of the notebook has:
        * the basic tasks (after "TASK:")
        * the additional tasks for self-execution (after "ADDITIONAL TASK:")
* the **previuos version (commit)** of the notebook has **answers** for the basic tasks

### It is recommended to start studying this course from another notebook "[AI-ML-DS Training. L1T: Titanic - Decision Tree](https://www.kaggle.com/vbmokin/ai-ml-ds-training-l1t-titanic-decision-tree)" - and then move on to this notebook.

## Dataset [COVID-19 in Ukraine: daily data](https://www.kaggle.com/vbmokin/covid19-in-ukraine-daily-data)

## Acknowledgements

### Datasets:
- official data of Ukraine (https://covid19.rnbo.gov.ua/) - via API
- my dataset [COVID-19 in Ukraine: daily data](https://www.kaggle.com/vbmokin/covid19-in-ukraine-daily-data) - for the next commits
- my dataset with holidays data [COVID-19: Holidays of countries](https://www.kaggle.com/vbmokin/covid19-holidays-of-countries) - for the next commits

### Notebooks:
* [Data Science for tabular data: Advanced Techniques](https://www.kaggle.com/vbmokin/data-science-for-tabular-data-advanced-techniques)
* [EDA for tabular data: Advanced Techniques](https://www.kaggle.com/vbmokin/eda-for-tabular-data-advanced-techniques)
* [COVID-19 in Ukraine: EDA & Forecasting](https://www.kaggle.com/vbmokin/covid-19-in-ukraine-eda-forecasting)
* [COVID-19 in 67 countries: daily Prophet forecast](https://www.kaggle.com/vbmokin/covid-19-in-67-countries-daily-prophet-forecast)

### Libraries from GitHub:
- https://facebook.github.io/prophet/
- https://facebook.github.io/prophet/docs/
- https://github.com/facebook/prophet
- https://github.com/dr-prodigy/python-holidays

<a class="anchor" id="0.1"></a>
## Table of Contents

1. [Import libraries](#1)
1. [Download data](#2)
1. [EDA & FE](#3)
1. [Modeling](#4)
1. [Prediction & Visualization](#5)

## 1. Import libraries<a class="anchor" id="1"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Work with Data - the main Python libraries
import numpy as np
import pandas as pd

# For import data from API
import requests

# Visualization
import matplotlib.pyplot as plt

# Modeling and Prediction
from fbprophet import Prophet
from fbprophet.make_holidays import make_holidays_df
from fbprophet.diagnostics import cross_validation, performance_metrics
from fbprophet.plot import plot_cross_validation_metric
import holidays

import warnings
warnings.simplefilter('ignore')

## 2. Download data<a class="anchor" id="2"></a>

[Back to Table of Contents](#0.1)

#### Thanks to https://api-covid19.rnbo.gov.ua/

In [None]:
# Download data vai API from the Portal of RNBO of Ukraine 
for filename in ['main-data?mode=ukraine&fbclid=IwAR1vNXEE0nkmorUmGP4StG4cLrj1Z9VoX3c3Bi8dfltr0elgOj4b0M3ONvk']:
    print(f'Download daily data from the Portal of RNBO of Ukraine')
    url = f'https://api-covid19.rnbo.gov.ua/charts/{filename}'
    myfile = requests.get(url)
    open(filename, 'wb').write(myfile.content)
    
df_data = pd.read_json('main-data?mode=ukraine&fbclid=IwAR1vNXEE0nkmorUmGP4StG4cLrj1Z9VoX3c3Bi8dfltr0elgOj4b0M3ONvk')
df_data

**TASK:** Display the last 10 rows of the dataframe "df_data".

In [None]:
# Display the last 10 rows of the dataframe "df_data"


## 3. EDA & FE<a class="anchor" id="3"></a>

[Back to Table of Contents](#0.1)

In [None]:
# Calculation daily data for number of confirmed cases
df_data['n_confirmed'] = df_data['confirmed'].diff()

In [None]:
# Filtering the missing data
data = df_data[['dates','n_confirmed']].dropna().reset_index(drop=True)
data['n_confirmed'] = data['n_confirmed'].astype('int')
data.tail(5)

In [None]:
# Build the plot
data['n_confirmed'].plot()

**ADDITIONAL TASK:** Try to make a prediction for another value - not the number of new confirmed cases, but the number of deaths or other in the dataframe df_data.

In [None]:
# Prepairing data for modeling with Prophet
data.columns = ['ds', 'y']
data.tail()

In [None]:
# Removing zero values
data = data[data['y'] > 0].reset_index(drop=True)
data

### Selection data with the biggest wave

In [None]:
# Build the plot for July
df = data[(data['ds'] >= '2020-07-01') & (data['ds'] < '2020-08-01')]
df['y'].plot()

In [None]:
# The smallest value
df.loc[113,:]

In [None]:
# Selection data with the biggest wave
df2 = data[113:].reset_index(drop=True)
df2.head()

**ADDITIONAL TASK:** Try to specify another number (instead of 113) as a reference point (until the second wave in summer) or other.
Look all data with plot here: https://covid19.rnbo.gov.ua/

## 4. Modeling<a class="anchor" id="4"></a>

[Back to Table of Contents](#0.1)

### For all data

In [None]:
# Build Prophet model
model = Prophet()

# Training model for all data
model.fit(data)

### For the biggest wave - df2

In [None]:
# Build Prophet model with parameters ans structure 
# from the notebook https://www.kaggle.com/vbmokin/covid-19-in-ukraine-eda-forecasting 
# but without holidays
model2 = Prophet(daily_seasonality=False, weekly_seasonality=False, yearly_seasonality=False, 
                changepoint_range=1, changepoint_prior_scale = 0.3)
model2.add_seasonality(name='weekly', period=7, fourier_order=12, 
                      mode = 'multiplicative', prior_scale = 0.24)
model2.add_seasonality(name='triply', period=3, fourier_order=2, 
                      mode = 'multiplicative', prior_scale = 0.15)

# Training model for df2
model2.fit(df2)

**ADDITIONAL TASK:** Try changing the parameters of the Prophet model and evaluate how this will affect the shape of the curve and the accuracy of the simulation

## 5. Prediction & Visualization<a class="anchor" id="5"></a>

[Back to Table of Contents](#0.1)

### For all data

In [None]:
# Make a forecast for 7 days ahead
future = model.make_future_dataframe(periods = 7)
forecast = model.predict(future)

In [None]:
# Draw plot of the values with forecasting data
figure = model.plot(forecast, xlabel = 'Date', ylabel = 'Number of confirmed cases')

In [None]:
# Draw plot with the components (trend and weekly seasonality) of the forecasts
figure_component = model.plot_components(forecast)

In [None]:
# Ouput the prediction for the next 7 days
forecast[['yhat_lower', 'yhat', 'yhat_upper']] = forecast[['yhat_lower', 'yhat', 'yhat_upper']].astype('int')
forecast[['ds', 'yhat_lower', 'yhat', 'yhat_upper']].tail(7)

### For the biggest wave - df2

**TASK:** Make similar calculations for the dataframe df2 using model2:
1. Make a forecast for it for 7 days ahead.
2. Draw a plot of the values with forecasting data.
3. Draw plot with the components (trend and weekly seasonality) of the forecasts.
4. Ouput the prediction for the next 7 days.

In [None]:
# Make a forecast for 7 days ahead


In [None]:
# Draw plot of the values with forecasting data


In [None]:
# Draw plot with the components (trend and weekly seasonality) of the forecasts


In [None]:
# Ouput the prediction for the next 7 days


I hope you find this notebook useful and enjoyable.

Your comments and feedback are most welcome.

[Go to Top](#0)