<a href="https://colab.research.google.com/github/michalastocki/machine-learning-bootcamp/blob/master/unsupervised/05_case_studies/03_coronavirus.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* @author: krakowiakpawel9@gmail.com  
* @site: e-smartdata.org

### prophet
Strona biblioteki: [https://facebook.github.io/prophet/](https://facebook.github.io/prophet/)  

Dokumentacja/User Guide: [https://facebook.github.io/prophet/docs/quick_start.html](https://facebook.github.io/prophet/docs/quick_start.html)

Biblioteka do pracy z szeregami czasowymi od Facebook'a

Aby zainstalować bibliotekę prophet, użyj polecenia poniżej:
```
!pip install fbprophet
```
Aby zaktualizować do najnowszej wersji użyj polecenia poniżej:
```
!pip install --upgrade fbprophet
```
Kurs stworzony w oparciu o wersję `0.5`

### Spis treści:
1. [Import bibliotek](#0)
2. [Wczytanie danych](#1)
3. [Eksploracja i przygotowanie danych](#2)
4. [Budowa modelu](#3)




### <a name='0'></a> Import bibliotek

In [0]:
import numpy as np
import pandas as pd
import io
import plotly.express as px
import plotly.graph_objects as go

np.random.seed(42)

In [9]:
from google.colab import files
uploaded = files.upload()

Saving covid_19_data.csv to covid_19_data (1).csv


### <a name='1'></a> Wczytanie danych

In [11]:
# dane od 22.01.2020 do 17.02.2020
url = 'https://storage.googleapis.com/esmartdata-courses-files/ml-course/coronavirus.csv'
#data = pd.read_csv(url, parse_dates=['Date', 'Last Update'])
data = pd.read_csv(io.StringIO(uploaded['covid_19_data.csv'].decode('utf-8')), parse_dates=['ObservationDate', 'Last Update'])


data.head()

Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
0,1,2020-01-22,Anhui,Mainland China,2020-01-22 17:00:00,1.0,0.0,0.0
1,2,2020-01-22,Beijing,Mainland China,2020-01-22 17:00:00,14.0,0.0,0.0
2,3,2020-01-22,Chongqing,Mainland China,2020-01-22 17:00:00,6.0,0.0,0.0
3,4,2020-01-22,Fujian,Mainland China,2020-01-22 17:00:00,1.0,0.0,0.0
4,5,2020-01-22,Gansu,Mainland China,2020-01-22 17:00:00,0.0,0.0,0.0


### <a name='2'></a> Eksploracja i przygotowanie danych

In [12]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4719 entries, 0 to 4718
Data columns (total 8 columns):
SNo                4719 non-null int64
ObservationDate    4719 non-null datetime64[ns]
Province/State     3011 non-null object
Country/Region     4719 non-null object
Last Update        4719 non-null datetime64[ns]
Confirmed          4719 non-null float64
Deaths             4719 non-null float64
Recovered          4719 non-null float64
dtypes: datetime64[ns](2), float64(3), int64(1), object(2)
memory usage: 295.1+ KB


In [13]:
data.isnull().sum()

SNo                   0
ObservationDate       0
Province/State     1708
Country/Region        0
Last Update           0
Confirmed             0
Deaths                0
Recovered             0
dtype: int64

In [15]:
# brak Province/State -> Country
data['Province/State'] = np.where(data['Province/State'].isnull(), data['Country/Region'], data['Province/State'])
data.isnull().sum()

SNo                0
ObservationDate    0
Province/State     0
Country/Region     0
Last Update        0
Confirmed          0
Deaths             0
Recovered          0
dtype: int64

In [17]:
data['Country/Region'].value_counts().nlargest(10)

Mainland China    1513
US                 951
Australia          217
Canada             141
Thailand            49
Japan               49
Taiwan              48
South Korea         48
Singapore           48
Hong Kong           48
Name: Country/Region, dtype: int64

In [20]:
data['Country/Region'] = np.where(data['Country/Region'] == 'Mainland China', 'China', data['Country/Region']) 
data['Country/Region'].value_counts().nlargest(10)

China        1513
US            951
Australia     217
Canada        141
Thailand       49
Japan          49
Taiwan         48
Macau          48
Hong Kong      48
Singapore      48
Name: Country/Region, dtype: int64

In [24]:
tmp = data['Country/Region'].value_counts().nlargest(15).reset_index()
tmp.columns = ['Country/Region', 'Count']
tmp = tmp.sort_values(by=['Count', 'Country/Region'], ascending=[False, True])
tmp['iso_alpha'] = ['CHN', 'USA', 'AUS', 'CAN', 'JPN', 'THA','HKG',  np.nan, 'SGP', 'KOR','TWN','FRA','MYS','VNM',   'NPL'] 
tmp

Unnamed: 0,Country/Region,Count,iso_alpha
0,China,1513,CHN
1,US,951,USA
2,Australia,217,AUS
3,Canada,141,CAN
5,Japan,49,JPN
4,Thailand,49,THA
8,Hong Kong,48,HKG
7,Macau,48,
9,Singapore,48,SGP
10,South Korea,48,KOR


In [26]:
px.scatter_geo(tmp, locations='iso_alpha', size='Count', size_max=40, template='plotly_dark', color='Count',
               text='Country/Region', projection='natural earth', color_continuous_scale='reds', width=950,
               title='Liczba przypadków Koronawirusa na świcie - TOP15')

In [27]:
px.scatter_geo(tmp, locations='iso_alpha', size='Count', size_max = 40, template='plotly_dark', color='Count',
               text='Country/Region', projection='natural earth', color_continuous_scale='reds', scope='asia', width=950,
               title='Liczba przypadków Koronawirusa - Azja (z TOP15 global)')

In [28]:
px.bar(tmp, x='Country/Region', y='Count', template='plotly_dark', width=950, color_discrete_sequence=['#42f5c8'],
       title='Liczba przypadków Koronawirusa w rozbiciu na kraje')

In [36]:
px.bar(tmp.query("Country/Region != 'China'"), x='Country/Region', y='Count', template='plotly_dark', width=950, 
       color_discrete_sequence=['#42f5c8'], title='Liczba przypadków Koronawirusa w rozbiciu na kraje (poza Chinami)')

UndefinedVariableError: ignored

In [35]:
tmp = data.groupby(by=data['ObservationDate'].dt.date)[['Confirmed', 'Deaths', 'Recovered']].sum().reset_index()
tmp

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered
0,2020-01-22,555.0,17.0,28.0
1,2020-01-23,653.0,18.0,30.0
2,2020-01-24,941.0,26.0,36.0
3,2020-01-25,1438.0,42.0,39.0
4,2020-01-26,2118.0,56.0,52.0
5,2020-01-27,2927.0,82.0,61.0
6,2020-01-28,5578.0,131.0,107.0
7,2020-01-29,6165.0,133.0,126.0
8,2020-01-30,8235.0,171.0,143.0
9,2020-01-31,9925.0,213.0,222.0


In [38]:
fig = go.Figure()

trace1 = go.Scatter(x=tmp['ObservationDate'], y=tmp['Confirmed'], mode='markers+lines', name='Confirmed')
trace2 = go.Scatter(x=tmp['ObservationDate'], y=tmp['Deaths'], mode='markers+lines', name='Deaths')
trace3 = go.Scatter(x=tmp['ObservationDate'], y=tmp['Recovered'], mode='markers+lines', name='Recovered')

fig.add_trace(trace1)
fig.add_trace(trace2)
fig.add_trace(trace3)

fig.update_layout(template='plotly_dark', width=950, title='Koronawirus (22.01-10.03.2020)')

In [39]:
data_confirmed = tmp[['ObservationDate', 'Confirmed']]
data_confirmed.columns = ['ds', 'y']
data_confirmed.head()

Unnamed: 0,ds,y
0,2020-01-22,555.0
1,2020-01-23,653.0
2,2020-01-24,941.0
3,2020-01-25,1438.0
4,2020-01-26,2118.0


In [41]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=data_confirmed['ds'], y=data_confirmed['y'], mode='markers+lines',
                         name='Confirmed', fill='tozeroy'))
fig.update_layout(template='plotly_dark', width=950, title='Liczba potwierdzonych przypadków (22.01-10.03)')

### <a name='3'></a> Budowa modelu

In [42]:
from fbprophet import Prophet
from fbprophet.plot import plot_plotly

# dopasowanie modelu
model = Prophet(yearly_seasonality=False, weekly_seasonality=False, daily_seasonality=False)
model.fit(data_confirmed)

# predykcja
future = model.make_future_dataframe(periods=7, freq='D')
forecast = model.predict(future)
plot_plotly(model, forecast)