<a href="https://colab.research.google.com/github/niltontac/EspAnalise-EngDados/blob/master/Covid_19_Analysis_and_Predictions%20-%20In%20Progress.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Sobre estes conjuntos de dados

#####Estes conjuntos de dados explorados nestas análises a seguir são fornecidos pela Johns Hopkins University, renomada instituição dos Estados Unidos que está na linha de frente dos dados coletados no mundo sobre o Covid-19. Também coleto dados da plataforma Kaggle, onde reune usuários do mundo inteiro colaborando com dados reais e de fontes confiáveis.
#####Todos os conjuntos de dados explorados aqui possuem informações com atualizações diárias sobre os números de casos confirmados, de mortes e de recuperação do Covid-19. Observe que são dados de séries temporais e, portando, os números de casos em um determinado dia são números acumulados.


#About this Dataset

#####These datasets explored in these analyzes below are provided by Johns Hopkins University, a renowned institution in the United States that is at the forefront of data collected worldwide about Covid-19.  I also collect data from the Kaggle platform, where it gathers users from all over the world collaborating with real data and from reliable sources.

#####All datasets explored have information with daily updates on the numbers of confirmed cases, deaths and recovery from Covid-19. Note that they are time series data and the numbers of cases on a given day are cumulative numbers.

---

#Sobre esta Análise

#####É uma análise exploratória com o intuito descobrir relações, padrões, comportamentos e tendências por meio de predições usando algoritmos de Machine Learning, resumindo as principais características que revelem informações realmente objetivas e claras, frequentemente por métodos visuais, de forma que os dados sejam compreendidos.
#####A linguagem de programação Python será utilizada para aplicar técnicas estatísticas e algoritmos de predição.

#About this Analysis

#####It is an exploratory analysis in order to discover relations, patterns, behaviors and trends through predictions using Machine Learning algorithms, summarizing the main characteristics that reveal really objective and clear information, frequently by visual methods so that the data is understood.

#####The Python programming language will be used to apply statistical techniques and prediction algorithms.

---

#####Fonte | Source (Datasets): 
#####Johns Hopkins University:
#####https://coronavirus.jhu.edu/

#####Kaggle:
#####https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
#####https://www.kaggle.com/unanimad/corona-virus-brazil

#####All datasets on github:

##### https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
##### https://github.com/niltontac/EspAnalise-EngDados/tree/master/data/Novel_Corona_Virus_2019_Dataset
##### https://github.com/niltontac/EspAnalise-EngDados/tree/master/data/covid19_brazil_data
---

#####Analyst: Nilton Thiago de Andrade Coura
#####Recife/PE - Brazil
#####niltontac@gmail.com
#####https://github.com/niltontac

# Covid-19 - Exploratory Analysis and Predictions

![alt text](https://i.ibb.co/txCZFvr/3-D-medical-animation-coronavirus-structure.jpg)

Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go 
import seaborn as sns
import plotly as py
import plotly.express as px

from fbprophet.plot import plot_plotly
from fbprophet import Prophet
from fbprophet.plot import add_changepoints_to_plot


import warnings
warnings.filterwarnings('ignore')

# Loading dataset
# Last dataset update 04/28/2020

covid19confirmed = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')

covid19deaths = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')

covid19recovered = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')

covid19 = pd.read_csv('https://raw.githubusercontent.com/niltontac/EspAnalise-EngDados/master/data/Novel_Corona_Virus_2019_Dataset/covid_19_data.csv', parse_dates=['ObservationDate', 'Last Update'])

covid19Brazil = pd.read_csv('https://raw.githubusercontent.com/niltontac/EspAnalise-EngDados/master/data/covid19_brazil_data/brazil_covid19.csv')

  import pandas.util.testing as tm


Assigning last update:

In [0]:
last_date_update = '4/28/20'

Checking the last 5 cases to confirm when all the datasets were updated:

In [3]:
print('covid19confirmed:')
print(covid19confirmed.tail())
####
print('covid19deaths:')
print(covid19deaths.tail())
####
print('covid19recovered:')
print(covid19recovered.tail())
####
print('covid19:')
print(covid19.tail())
####
print('covid19Brazil:')
print(covid19Brazil.tail())

covid19confirmed:
                Province/State         Country/Region  ...  4/27/20  4/28/20
259  Saint Pierre and Miquelon                 France  ...        1        1
260                        NaN            South Sudan  ...        6       34
261                        NaN         Western Sahara  ...        6        6
262                        NaN  Sao Tome and Principe  ...        4        8
263                        NaN                  Yemen  ...        1        1

[5 rows x 102 columns]
covid19deaths:
                Province/State         Country/Region  ...  4/27/20  4/28/20
259  Saint Pierre and Miquelon                 France  ...        0        0
260                        NaN            South Sudan  ...        0        0
261                        NaN         Western Sahara  ...        0        0
262                        NaN  Sao Tome and Principe  ...        0        0
263                        NaN                  Yemen  ...        0        0

[5 rows x 102 colu

In [0]:
# Rename columns 'ObservationDate' for 'Date'

covid19 = covid19.rename(columns={'ObservationDate' : 'Date'})

Dimension of datasets (rows vs columns):

In [5]:
print('covid19confirmed:')
print(covid19confirmed.shape)
####
print('covid19deaths:')
print(covid19deaths.shape)
####
print('covid19recovered:')
print(covid19recovered.shape)
####
print('covid19:')
print(covid19.shape)
####
print('covid19Brazil:')
print(covid19Brazil.shape)

covid19confirmed:
(264, 102)
covid19deaths:
(264, 102)
covid19recovered:
(250, 102)
covid19:
(19607, 8)
covid19Brazil:
(2430, 5)


Checking for null or missing values:

In [6]:
print('covid19confirmed:')
print(pd.DataFrame(covid19confirmed.isnull().sum()))
####
print('covid19deaths:')
print(pd.DataFrame(covid19deaths.isnull().sum()))
####
print('covid19recovered:')
print(pd.DataFrame(covid19recovered.isnull().sum()))
####
print('covid19:')
print(pd.DataFrame(covid19.isnull().sum()))
####
print('covid19Brazil:')
print(pd.DataFrame(covid19Brazil.isnull().sum()))

covid19confirmed:
                  0
Province/State  182
Country/Region    0
Lat               0
Long              0
1/22/20           0
...             ...
4/24/20           0
4/25/20           0
4/26/20           0
4/27/20           0
4/28/20           0

[102 rows x 1 columns]
covid19deaths:
                  0
Province/State  182
Country/Region    0
Lat               0
Long              0
1/22/20           0
...             ...
4/24/20           0
4/25/20           0
4/26/20           0
4/27/20           0
4/28/20           0

[102 rows x 1 columns]
covid19recovered:
                  0
Province/State  183
Country/Region    0
Lat               0
Long              0
1/22/20           0
...             ...
4/24/20           0
4/25/20           0
4/26/20           0
4/27/20           0
4/28/20           0

[102 rows x 1 columns]
covid19:
                    0
SNo                 0
Date                0
Province/State  10001
Country/Region      0
Last Update         0
Confirmed       

Some data sets have missings values or null in "Province/State" column.
Let's replace them with 'unknow':

In [0]:
# Replacing data missings

covid19confirmed = covid19confirmed.fillna('unknow')
covid19deaths = covid19deaths.fillna('unknow')
covid19recovered = covid19recovered.fillna('unknow')
covid19 = covid19.fillna('unknow')

In [8]:
# Checking for null or missing values again

print('covid19confirmed:')
print(pd.DataFrame(covid19confirmed.isnull().sum()))
####
print('covid19deaths:')
print(pd.DataFrame(covid19deaths.isnull().sum()))
####
print('covid19recovered:')
print(pd.DataFrame(covid19recovered.isnull().sum()))
####
print('covid19:')
print(pd.DataFrame(covid19.isnull().sum()))

covid19confirmed:
                0
Province/State  0
Country/Region  0
Lat             0
Long            0
1/22/20         0
...            ..
4/24/20         0
4/25/20         0
4/26/20         0
4/27/20         0
4/28/20         0

[102 rows x 1 columns]
covid19deaths:
                0
Province/State  0
Country/Region  0
Lat             0
Long            0
1/22/20         0
...            ..
4/24/20         0
4/25/20         0
4/26/20         0
4/27/20         0
4/28/20         0

[102 rows x 1 columns]
covid19recovered:
                0
Province/State  0
Country/Region  0
Lat             0
Long            0
1/22/20         0
...            ..
4/24/20         0
4/25/20         0
4/26/20         0
4/27/20         0
4/28/20         0

[102 rows x 1 columns]
covid19:
                0
SNo             0
Date            0
Province/State  0
Country/Region  0
Last Update     0
Confirmed       0
Deaths          0
Recovered       0


##Plotly Visualizations: Exploratory data analysis and predictions in the World and Brazil. 

###Worldwide:

Interactive Graph 01

Global records including confirmed, deaths and recovered cases:

In [9]:
cases_growth = covid19.groupby('Date')['Confirmed', 'Deaths', 'Recovered'].sum()
cases_growth = cases_growth.reset_index()
cases_growth = cases_growth.sort_values('Date', ascending=False)

fig = go.Figure()
fig.update_layout(title_text='Global records including confirmed, deaths and recovered cases:', 
                  xaxis_title='Period Date', yaxis_title='Cases', template='plotly_dark')

fig.add_trace(go.Scatter(x=cases_growth['Date'], 
                        y=cases_growth['Confirmed'], 
                        mode='lines+markers',
                        name='Global Confirmed',
                        line=dict(color='Yellow', width=2)))

fig.add_trace(go.Scatter(x=cases_growth['Date'], 
                        y=cases_growth['Deaths'], 
                        mode='lines+markers',
                        name='Global Deaths',
                        line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=cases_growth['Date'], 
                        y=cases_growth['Recovered'], 
                        mode='lines+markers',
                        name='Global Recovered',
                        line=dict(color='green', width=2)))

fig.show()

Interactive Graph 02

Global rate for growth confirmed, death and recovered cases:

In [10]:
cases_rate = covid19.groupby(['Date']).agg({'Deaths': ['sum'],'Recovered': ['sum'],'Confirmed': ['sum']})
cases_rate.columns = ['Global_Deaths','Global_Recovered','Global_Confirmed']
cases_rate = cases_rate.reset_index()
cases_rate['Increase_new_cases_per_day']=cases_rate['Global_Confirmed'].diff().shift(-1)
# Calculating rates
# lambda function
cases_rate['Global_Deaths_rate_%'] = cases_rate.apply(lambda row: ((row.Global_Deaths)/(row.Global_Confirmed))*100 , axis=1).round(2)
cases_rate['Global_Recovered_rate_%'] = cases_rate.apply(lambda row: ((row.Global_Recovered)/(row.Global_Confirmed))*100 , axis=1).round(2)
cases_rate['Global_Growth_rate_%']=cases_rate.apply(lambda row: row.Increase_new_cases_per_day/row.Global_Confirmed*100, axis=1).round(2)
cases_rate['Global_Growth_rate_%']=cases_rate['Global_Growth_rate_%'].shift(+1)



fig = go.Figure()
fig.update_layout(title_text='Global rate for growth confirmed, death and recovered cases:', 
                  xaxis_title='Period Date', yaxis_title='Rate', template='plotly_dark')
fig.add_trace(go.Scatter(x=cases_rate['Date'], 
                         y=cases_rate['Global_Deaths_rate_%'],
                         mode='lines+markers',
                         name='Global Death rate %',
                         line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=cases_rate['Date'], 
                         y=cases_rate['Global_Recovered_rate_%'],
                         mode='lines+markers',
                         name='Global Recovery rate %',
                         line=dict(color='Green', width=2)))

fig.add_trace(go.Scatter(x=cases_rate['Date'], 
                         y=cases_rate['Global_Growth_rate_%'],
                         mode='lines+markers',
                         name='Global Growth Confirmed rate %',
                         line=dict(color='Yellow', width=2)))

fig.show()

General numbers, but note the increase of new confirmed cases day by day:

In [11]:
cases_rate.tail()

Unnamed: 0,Date,Global_Deaths,Global_Recovered,Global_Confirmed,Increase_new_cases_per_day,Global_Deaths_rate_%,Global_Recovered_rate_%,Global_Growth_rate_%
93,2020-04-24,197151.0,793420.0,2810715.0,86031.0,7.01,28.23,3.76
94,2020-04-25,202846.0,816685.0,2896746.0,74729.0,7.0,28.19,3.06
95,2020-04-26,206544.0,865733.0,2971475.0,70289.0,6.95,29.13,2.58
96,2020-04-27,211167.0,893967.0,3041764.0,74634.0,6.94,29.39,2.37
97,2020-04-28,217153.0,928658.0,3116398.0,,6.97,29.8,2.45


Confirmed, Deaths, Recovered and Active cases in all affected countries around the world:

In [12]:
cases_temp = covid19confirmed 
cases_temp = cases_temp[['Country/Region', last_date_update]]
cases_temp = cases_temp.groupby('Country/Region').sum().sort_values(by = last_date_update,ascending = False)
cases_temp['Recovered'] = covid19recovered[['Country/Region', last_date_update]].groupby('Country/Region').sum().sort_values(by = last_date_update, ascending = False)
cases_temp['Deaths'] = covid19deaths[['Country/Region', last_date_update]].groupby('Country/Region').sum().sort_values(by = last_date_update, ascending = False)
cases_temp['Confirmed'] = covid19confirmed[['Country/Region', last_date_update]].groupby('Country/Region').sum().sort_values(by = last_date_update, ascending = False)
cases_temp['Active'] = cases_temp[last_date_update] - cases_temp['Recovered'] - cases_temp['Deaths']
cases_temp['Mortality_Rate_%'] = ((cases_temp['Deaths'])/(cases_temp['Confirmed'])*100).round(2)
cases_temp = cases_temp.rename(columns = {last_date_update: 'Confirmed', 'Recovered' : 'Recovered', 'Deaths' : 'Deaths', 'Active' : 'Active', 'Mortality_Rate_%' : 'Mortality_Rate_%'})

cases_temp.head(30)

Unnamed: 0_level_0,Confirmed,Recovered,Deaths,Confirmed,Active,Mortality_Rate_%
Country/Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
US,1012582,115936,58355,1012582,838291,5.76
Spain,232128,123903,23822,232128,84403,10.26
Italy,201505,68941,27359,201505,105205,13.58
France,169053,47775,23694,169053,97584,14.02
United Kingdom,162350,813,21745,162350,139792,13.39
Germany,159912,117400,6314,159912,36198,3.95
Turkey,114653,38809,2992,114653,72852,2.61
Russia,93558,8456,867,93558,84235,0.93
Iran,92584,72439,5877,92584,14268,6.35
China,83940,78422,4637,83940,881,5.52


###Global Predictions using Machine Learning Algorithm - Prophet - procedure for forecasting time series data 

In [0]:
mortality = covid19.copy()

mortality = mortality.groupby(['Date', 'Country/Region']).agg({'Deaths': ['sum'],'Recovered': ['sum'],'Confirmed': ['sum']})
mortality.columns = ['Deaths','Recovered','Confirmed']
mortality = mortality.reset_index()
mortality = mortality[mortality.Deaths != 0]
mortality = mortality[mortality.Confirmed != 0]
# Prevent division by zero
def ifNull(d):
    temp=1
    if d!=0:
        temp=d
    return temp

mortality['mortality_rate'] = mortality.apply(lambda row: ((row.Deaths+1)/ifNull((row.Confirmed)))*100, axis=1)

In [0]:
floorVar=0
worldPop=10000000

# Modelling total confirmed cases 
confirmed_training_dataset = pd.DataFrame(covid19.groupby('Date')['Confirmed'].sum().reset_index()).rename(columns={'Date': 'ds', 'Confirmed': 'y'})
confirmed_training_dataset['floor'] = floorVar
confirmed_training_dataset['cap'] = worldPop

# Modelling mortality rate
mortality_training_dataset = pd.DataFrame(mortality.groupby('Date')['mortality_rate'].mean().reset_index()).rename(columns={'Date': 'ds', 'mortality_rate': 'y'})

# Modelling deaths
death_training_dataset = pd.DataFrame(covid19.groupby('Date')['Deaths'].sum().reset_index()).rename(columns={'Date': 'ds', 'Deaths': 'y'})
death_training_dataset['floor'] = 0
death_training_dataset['cap'] = 2500

In [15]:
# Total dataframe model 
m = Prophet(
    growth="logistic",
    interval_width=0.98,
    yearly_seasonality=False,
    weekly_seasonality=False,
    daily_seasonality=True,
    seasonality_mode='additive'
    )

m.fit(confirmed_training_dataset)
future = m.make_future_dataframe(periods=50)
future['cap']=worldPop
future['floor']=floorVar
confirmed_forecast = m.predict(future)

# Mortality rate model
m_mortality = Prophet ()
m_mortality.fit(mortality_training_dataset)
mortality_future = m_mortality.make_future_dataframe(periods=31)
mortality_forecast = m_mortality.predict(mortality_future)

# Deaths model
m2 = Prophet(interval_width=0.95,
            growth="logistic")
m2.fit(death_training_dataset)

future2 = m2.make_future_dataframe(periods=7)
future2['cap']=2500
future2['floor']=0
death_forecast = m2.predict(future2)

INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [16]:
fig = plot_plotly(m, confirmed_forecast)
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Total Confirmed cases (Global)',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig.update_layout(annotations=annotations)
fig

In [17]:
fig = plot_plotly(m_mortality, mortality_forecast)
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Mortality Rate (Global)',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig.update_layout(annotations=annotations)
fig

In [18]:
fig_death = plot_plotly(m2, death_forecast)  
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Deaths (Global)',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig_death.update_layout(annotations=annotations)
fig_death

###Brazil:

Analysis of the advancement of covid-19 in Brazil.

Confirmed, Deaths and Recovered cases, in addition to the increase in cases day by day:

In [19]:
cases_Brazil = covid19.copy()
cases_Brazil = covid19.loc[covid19['Country/Region']=='Brazil']
cases_Brazil = cases_Brazil.groupby(['Date', 'Country/Region']).agg({'Confirmed':['sum'], 'Deaths':['sum'], 'Recovered':['sum']}).sort_values('Date', ascending = False)
cases_Brazil.columns = ['Confirmed', 'Deaths', 'Recovered']
cases_Brazil = cases_Brazil.reset_index()
cases_Brazil['Increase_new_confirmed_per_day'] = cases_Brazil['Confirmed'].diff().shift(-1)
cases_Brazil['Increase_new_deaths_per_day'] = cases_Brazil['Deaths'].diff().shift(-1)
cases_Brazil['Increase_new_recovered_per_day'] = cases_Brazil['Recovered'].diff().shift(-1)

cases_Brazil_confirmed = cases_Brazil[cases_Brazil['Confirmed']!=0]
cases_Brazil_confirmed

Unnamed: 0,Date,Country/Region,Confirmed,Deaths,Recovered,Increase_new_confirmed_per_day,Increase_new_deaths_per_day,Increase_new_recovered_per_day
0,2020-04-28,Brazil,73235.0,5083.0,32544.0,-5789.0,-480.0,-1402.0
1,2020-04-27,Brazil,67446.0,4603.0,31142.0,-4346.0,-317.0,-990.0
2,2020-04-26,Brazil,63100.0,4286.0,30152.0,-3776.0,-229.0,-992.0
3,2020-04-25,Brazil,59324.0,4057.0,29160.0,-5281.0,-353.0,-1505.0
4,2020-04-24,Brazil,54043.0,3704.0,27655.0,-4007.0,-373.0,-1082.0
...,...,...,...,...,...,...,...,...
58,2020-03-01,Brazil,2.0,0.0,0.0,0.0,0.0,0.0
59,2020-02-29,Brazil,2.0,0.0,0.0,-1.0,0.0,0.0
60,2020-02-28,Brazil,1.0,0.0,0.0,0.0,0.0,0.0
61,2020-02-27,Brazil,1.0,0.0,0.0,0.0,0.0,0.0


Interactive Graph 03

All records in Brazil - confirmed, death and recovered cases:

In [20]:
fig = go.Figure()
fig.update_layout(title_text='All records in Brazil - confirmed, death and recovered cases:', 
                  xaxis_title='Period Date', yaxis_title='Cases',template='plotly_dark')

fig.add_trace(go.Scatter(x=cases_Brazil_confirmed['Date'], 
                        y=cases_Brazil_confirmed['Confirmed'], 
                        mode='lines+markers',
                        name='Confirmed',
                        line=dict(color='yellow', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_confirmed['Date'], 
                        y=cases_Brazil_confirmed['Deaths'], 
                        mode='lines+markers',
                        name='Deaths',
                        line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_confirmed['Date'], 
                        y=cases_Brazil_confirmed['Recovered'], 
                        mode='lines+markers',
                        name='Recovered',
                        line=dict(color='green', width=2)))

fig.show()

Interactive Graph 04

Brazil rate for growth confirmed, death and recovered cases:

In [21]:
cases_Brazil_rate = covid19.copy()
cases_Brazil_rate = covid19.loc[covid19['Country/Region']=='Brazil']
cases_Brazil_rate = cases_Brazil_rate.groupby(['Date']).agg({'Deaths': ['sum'],'Recovered': ['sum'],'Confirmed': ['sum']})
cases_Brazil_rate.columns = ['Brazil_Deaths','Brazil_Recovered','Brazil_Confirmed']
cases_Brazil_rate = cases_Brazil_rate.reset_index()
cases_Brazil_rate['Increase_cases_per_day_in_Brazil']=cases_Brazil_rate['Brazil_Confirmed'].diff().shift(-1)

cases_Brazil_rate = cases_Brazil_rate[cases_Brazil_rate.Brazil_Deaths != 0]
cases_Brazil_rate = cases_Brazil_rate[cases_Brazil_rate.Brazil_Confirmed != 0]
# Prevent division by zero
def ifNull(d):
    temp=1
    if d!=0:
        temp=d
    return temp

# Calculating rate
# lambda function
cases_Brazil_rate['Brazil_Deaths_rate_%'] = cases_Brazil_rate.apply(lambda row: ((row.Brazil_Deaths)/(row.Brazil_Confirmed))*100 , axis=1)
cases_Brazil_rate['Brazil_Recovered_rate_%'] = cases_Brazil_rate.apply(lambda row: ((row.Brazil_Recovered)/(row.Brazil_Confirmed))*100 , axis=1)
cases_Brazil_rate['Brazil_Growth_rate_%']=cases_Brazil_rate.apply(lambda row: row.Increase_cases_per_day_in_Brazil/row.Brazil_Confirmed*100, axis=1)
cases_Brazil_rate['Brazil_Growth_rate_%']=cases_Brazil_rate['Brazil_Growth_rate_%'].shift(+1)



fig = go.Figure()
fig.update_layout(title_text='Brazil rate for growth confirmed, death and recovered cases:', 
                  xaxis_title='Period Date', yaxis_title='Rate', template='plotly_dark')

fig.add_trace(go.Scatter(x=cases_Brazil_rate['Date'], 
                         y=cases_Brazil_rate['Brazil_Deaths_rate_%'],
                         mode='lines+markers',
                         name='Brazil Death rate %',
                         line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_rate['Date'], 
                         y=cases_Brazil_rate['Brazil_Recovered_rate_%'],
                         mode='lines+markers',
                         name='Brazil Recovery rate %',
                         line=dict(color='Green', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_rate['Date'], 
                         y=cases_Brazil_rate['Brazil_Growth_rate_%'],
                         mode='lines+markers',
                         name='Brazil Growth Confirmed rate %',
                         line=dict(color='yellow', width=2)))

fig.show()

####Province/Region of Brazil:

Confirmed cases for each region:

In [22]:
cases_Brazil_region = covid19Brazil.loc[:,['region', 'cases', 'date']].groupby(['region', 'date']).sum().reset_index().sort_values(['cases', 'date'], ascending=False)
cases_Brazil_region = cases_Brazil_region.drop_duplicates(subset = ['region'])
cases_Brazil_region = cases_Brazil_region.set_index('date')
cases_Brazil_region_confirmed = cases_Brazil_region[cases_Brazil_region["cases"]>0]
cases_Brazil_region_confirmed

Unnamed: 0_level_0,region,cases
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-04-28,Sudeste,36068
2020-04-28,Nordeste,20665
2020-04-28,Norte,8745
2020-04-28,Sul,4033
2020-04-28,Centro-Oeste,2375


Interactive Graph 05

Difference between confirmed cases for each region:

In [23]:
fig = go.Figure()
fig.update_layout(
    title_text='Confirmed cases by region to date',
    height=400, width=500, xaxis_title='Regions', yaxis_title='Confirmed Cases')

fig.add_trace(go.Bar(
                x=cases_Brazil_region_confirmed["region"],
                y=cases_Brazil_region_confirmed["cases"],
                name='Confirmed cases',
                marker_color='darkcyan',
                marker_line_color='rgb(8,48,107)',
                marker_line_width=2, 
                opacity=0.7)
             )

fig.show()

Death cases for each region:

In [24]:
cases_Brazil_region_deaths = covid19Brazil.loc[:,['region', 'deaths', 'date']].groupby(['region', 'date']).sum().reset_index().sort_values(['deaths', 'date'], ascending=False)
cases_Brazil_region_deaths = cases_Brazil_region_deaths.drop_duplicates(subset = ['region'])
cases_Brazil_region_deaths = cases_Brazil_region_deaths.set_index('date')
cases_Brazil_region_deaths_confirmed = cases_Brazil_region_deaths[cases_Brazil_region_deaths["deaths"]>0]
cases_Brazil_region_deaths_confirmed

Unnamed: 0_level_0,region,deaths
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-04-28,Sudeste,2922
2020-04-28,Nordeste,1311
2020-04-28,Norte,543
2020-04-28,Sul,166
2020-04-28,Centro-Oeste,75


Interactive Graph 06

Difference between deaths for each region:

In [25]:
fig2 = go.Figure()
fig2.update_layout(
    title_text='Deaths by region to date',
    height=400, width=500, xaxis_title='Regions', yaxis_title='Deaths')
fig2.add_trace(go.Bar(
                x=cases_Brazil_region_deaths_confirmed["region"],
                y=cases_Brazil_region_deaths_confirmed["deaths"],
                name='Deaths',
                marker_color='red',
                marker_line_color='rgb(8,48,107)',
                marker_line_width=2, 
                opacity=0.7)
             )

fig2.show()

####States of Brazil:

Confirmed e Deaths cases in each state of Brazil:

In [26]:
cases_Brazil_state = covid19Brazil.groupby(['state', 'date']).sum().reset_index().sort_values(['cases', 'deaths','date'], ascending=False)
cases_Brazil_state = cases_Brazil_state.drop_duplicates(subset = ['state'])
cases_Brazil_state = cases_Brazil_state.set_index('date')

cases_Brazil_state_confirmed = cases_Brazil_state[cases_Brazil_state["cases"]>0]
cases_Brazil_state_confirmed

Unnamed: 0_level_0,state,cases,deaths
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-04-28,São Paulo,24041,2049
2020-04-28,Rio de Janeiro,8504,738
2020-04-28,Ceará,6918,403
2020-04-28,Pernambuco,5724,508
2020-04-28,Amazonas,4337,351
2020-04-28,Bahia,2540,86
2020-04-28,Maranhão,2528,145
2020-04-28,Pará,2262,129
2020-04-28,Espírito Santo,1874,64
2020-04-28,Minas Gerais,1649,71


Interactive Graph 07

Confirmed cases in each state of Brazil:

In [27]:
fig = go.Figure()

fig.update_layout(
    title_text='Confirmed Cases by States of Brazil to Date',
    height=700, width=800, xaxis_title='States', yaxis_title='Confirmed Cases')
fig.add_trace(go.Bar(
                x=cases_Brazil_state_confirmed["state"],
                y=cases_Brazil_state_confirmed["cases"],
                marker_color='darkcyan',
                marker_line_color='rgb(8,48,107)',
                marker_line_width=2, 
                opacity=0.7)
             )



fig.show()

Interactive Graph 08

Deaths cases in each state of Brazil:

In [28]:
cases_Brazil_state_graph_Deaths = covid19Brazil.groupby(['state', 'date']).sum().reset_index().sort_values(['deaths', 'date'], ascending=False)
cases_Brazil_state_graph_Deaths = cases_Brazil_state_graph_Deaths.drop_duplicates(subset = ['state'])
cases_Brazil_state_graph_Deaths = cases_Brazil_state_graph_Deaths.set_index('date')
cases_Brazil_state_graph_Deaths_conf = cases_Brazil_state_graph_Deaths[cases_Brazil_state["cases"]>0]

fig_deaths = go.Figure()

fig_deaths.add_trace(go.Bar(
                x=cases_Brazil_state_graph_Deaths_conf["state"],
                y=cases_Brazil_state_graph_Deaths_conf["deaths"],
                marker_color='red',
                marker_line_color='rgb(8,48,107)',
                marker_line_width=2, 
                opacity=0.7)
             )

fig_deaths.update_layout(
    title_text='Deaths Cases by States of Brazil to Date',
    height=700, width=800, xaxis_title='States', yaxis_title='Deaths')

fig_deaths.show()

####State of Pernambuco (I live here)

Confirmed and Deaths cases in Pernambuco:

In [29]:
cases_Brazil_state_Pernambuco = covid19Brazil.copy()
cases_Brazil_state_Pernambuco = covid19Brazil.loc[covid19Brazil['state']=='Pernambuco']
cases_Brazil_state_Pernambuco = cases_Brazil_state_Pernambuco.groupby(['date']).agg({'cases':['sum'], 'deaths':['sum']}).sort_values('date', ascending = False)
cases_Brazil_state_Pernambuco.columns = ['cases', 'deaths']
cases_Brazil_state_Pernambuco = cases_Brazil_state_Pernambuco.reset_index()

cases_Brazil_state_Pernambuco_confirmed = cases_Brazil_state_Pernambuco[cases_Brazil_state_Pernambuco['cases']!=0]
cases_Brazil_state_Pernambuco_confirmed.style.background_gradient(cmap='Reds')

Unnamed: 0,date,cases,deaths
0,2020-04-28,5724,508
1,2020-04-27,5358,450
2,2020-04-26,4898,415
3,2020-04-25,4507,381
4,2020-04-24,3999,352
5,2020-04-23,3519,312
6,2020-04-22,3298,282
7,2020-04-21,2908,260
8,2020-04-20,2690,234
9,2020-04-19,2459,216


Interactive Graph 09

Confirmed and Deaths cases in Pernambuco:

In [30]:
fig = go.Figure()
fig.update_layout(title_text='Confirmed and Deaths cases in Pernambuco', 
                  xaxis_title='Period Date', yaxis_title='Cases', template='seaborn', width=1200, height=600)

fig.add_trace(go.Scatter(x=cases_Brazil_state_Pernambuco_confirmed['date'], 
                        y=cases_Brazil_state_Pernambuco_confirmed['cases'], 
                        mode='lines+markers',
                        name='Confirmed',
                        line=dict(color='darkcyan', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_state_Pernambuco_confirmed['date'], 
                        y=cases_Brazil_state_Pernambuco_confirmed['deaths'], 
                        mode='lines+markers',
                        name='Deaths',
                        line=dict(color='red', width=2)))

Interactive Graph 10

Confirmed and Deaths cases in Pernambuco - Rate %:

In [31]:
cases_Brazil_state_Pernambuco_rate = covid19Brazil.copy()
cases_Brazil_state_Pernambuco_rate = covid19Brazil.loc[covid19Brazil['state']=='Pernambuco']
cases_Brazil_state_Pernambuco_rate = cases_Brazil_state_Pernambuco_rate.groupby(['date']).agg({'deaths': ['sum'],'cases': ['sum']})
cases_Brazil_state_Pernambuco_rate.columns = ['Pernambuco_Deaths','Pernambuco_Cases']
cases_Brazil_state_Pernambuco_rate = cases_Brazil_state_Pernambuco_rate.reset_index()
cases_Brazil_state_Pernambuco_rate['Increase_cases_per_day_in_Pernambuco']=cases_Brazil_state_Pernambuco_rate['Pernambuco_Cases'].diff().shift(-1)

cases_Brazil_state_Pernambuco_rate = cases_Brazil_state_Pernambuco_rate[cases_Brazil_state_Pernambuco_rate.Pernambuco_Deaths != 0]
cases_Brazil_state_Pernambuco_rate = cases_Brazil_state_Pernambuco_rate[cases_Brazil_state_Pernambuco_rate.Pernambuco_Cases != 0]
# Prevent division by zero
def ifNull(d):
    temp=1
    if d!=0:
        temp=d
    return temp

cases_Brazil_state_Pernambuco_rate['Pernambuco_Deaths_rate_%'] = cases_Brazil_state_Pernambuco_rate.apply(lambda row: ((row.Pernambuco_Deaths)/(row.Pernambuco_Cases))*100 , axis=1)
cases_Brazil_state_Pernambuco_rate['Pernambuco_Growth_rate_%'] = cases_Brazil_state_Pernambuco_rate.apply(lambda row: row.Increase_cases_per_day_in_Pernambuco/row.Pernambuco_Cases*100, axis=1)
cases_Brazil_state_Pernambuco_rate['Pernambuco_Growth_rate_%'] = cases_Brazil_state_Pernambuco_rate['Pernambuco_Growth_rate_%'].shift(+1)



fig = go.Figure()
fig.update_layout(title_text='Confirmed and Deaths cases in Pernambuco - Rate %', 
                  xaxis_title='Period Date', yaxis_title='Rate', template='seaborn', width=1200, height=600)
fig.add_trace(go.Scatter(x=cases_Brazil_state_Pernambuco_rate['date'], 
                         y=cases_Brazil_state_Pernambuco_rate['Pernambuco_Deaths_rate_%'],
                         mode='lines+markers',
                         name='Death rate %',
                         line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_state_Pernambuco_rate['date'], 
                         y=cases_Brazil_state_Pernambuco_rate['Pernambuco_Growth_rate_%'],
                         mode='lines+markers',
                         name='Confirmed rate %',
                         line=dict(color='darkcyan', width=2)))

####Predictions in Brazil using Machine Learning Algorithm - Prophet - procedure for forecasting time series data 

In [0]:
cases_Brazil_predictions = covid19Brazil.copy()
cases_Brazil_predictions = cases_Brazil_predictions.groupby(['date']).agg({'deaths': ['sum'],'cases': ['sum']})
cases_Brazil_predictions.columns = ['Brazil_Deaths','Brazil_Confirmed']
cases_Brazil_predictions = cases_Brazil_predictions.reset_index()
cases_Brazil_predictions = cases_Brazil_predictions[cases_Brazil_predictions.Brazil_Deaths != 0]
cases_Brazil_predictions = cases_Brazil_predictions[cases_Brazil_predictions.Brazil_Confirmed != 0]
# Prevent division by zero
def ifNull(d):
    temp=1
    if d!=0:
        temp=d
    return temp

cases_Brazil_predictions['Brazil_mortality_rate_%'] = cases_Brazil_predictions.apply(lambda row: ((row.Brazil_Deaths+1)/ifNull((row.Brazil_Confirmed)))*100 , axis=1)

In [0]:
floorVar=0
BrazilPop=10000000

# Modelling total confirmed cases 
Brazil_confirmed_training_dataset = pd.DataFrame(covid19Brazil.groupby('date')['cases'].sum().reset_index()).rename(columns={'date': 'ds', 'cases': 'y'})
Brazil_confirmed_training_dataset['floor'] = floorVar
Brazil_confirmed_training_dataset['cap'] = BrazilPop

# Modelling mortality rate
Brazil_mortality_training_dataset = pd.DataFrame(cases_Brazil_predictions.groupby('date')['Brazil_mortality_rate_%'].mean().reset_index()).rename(columns={'date': 'ds', 'Brazil_mortality_rate_%': 'y'})

# Modelling deaths
Brazil_death_training_dataset = pd.DataFrame(covid19Brazil.groupby('date')['deaths'].sum().reset_index()).rename(columns={'date': 'ds', 'deaths': 'y'})
Brazil_death_training_dataset['floor'] = 0
Brazil_death_training_dataset['cap'] = 2500

In [34]:
# Total dataframe model 
m_Brazil = Prophet(
    growth="logistic",
    interval_width=0.98,
    yearly_seasonality=False,
    weekly_seasonality=False,
    daily_seasonality=True,
    seasonality_mode='additive'
    )

m_Brazil.fit(Brazil_confirmed_training_dataset)
future_Brazil = m_Brazil.make_future_dataframe(periods=50)
future_Brazil['cap']=BrazilPop
future_Brazil['floor']=floorVar
confirmed_forecast_Brazil = m_Brazil.predict(future_Brazil)

# Mortality rate model
m_Brazil_mortality = Prophet ()
m_Brazil_mortality.fit(Brazil_mortality_training_dataset)
mortality_future_Brazil = m_Brazil_mortality.make_future_dataframe(periods=31)
mortality_forecast_Brazil = m_Brazil_mortality.predict(mortality_future_Brazil)

# Deaths model
m2_Brazil = Prophet(interval_width=0.95,
            growth="logistic")
m2_Brazil.fit(Brazil_death_training_dataset)

future2_Brazil = m2_Brazil.make_future_dataframe(periods=7)
future2_Brazil['cap']=2500
future2_Brazil['floor']=0
death_forecast_Brazil = m2_Brazil.predict(future2_Brazil)

INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [35]:
fig_confirmed_Brazil = plot_plotly(m_Brazil, confirmed_forecast_Brazil)
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Total Confirmed cases in Brazil',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig_confirmed_Brazil.update_layout(annotations=annotations)
fig_confirmed_Brazil

In [36]:
fig_death_Brazil_rate = plot_plotly(m_Brazil_mortality, mortality_forecast_Brazil)
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Mortality Rate in Brazil',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig_death_Brazil_rate.update_layout(annotations=annotations)
fig_death_Brazil_rate

In [37]:
fig_death_Brazil = plot_plotly(m2_Brazil, death_forecast_Brazil)  
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Deaths in Brazil',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig_death_Brazil.update_layout(annotations=annotations)
fig_death_Brazil

report in progress for the next few days...