<a href="https://colab.research.google.com/github/niltontac/EspAnalise-EngDados/blob/master/Covid_19_Analysis_and_Predictions%20-%20In%20Progress.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Sobre estes conjuntos de dados

#####Estes conjuntos de dados explorados nestas análises a seguir são fornecidos pela Johns Hopkins University, renomada instituição dos Estados Unidos que está na linha de frente dos dados coletados no mundo sobre o Covid-19. Também coleto dados da plataforma Kaggle, onde reune usuários do mundo inteiro colaborando com dados reais e de fontes confiáveis.
#####Todos os conjuntos de dados explorados aqui possuem informações com atualizações diárias sobre os números de casos confirmados, de mortes e de recuperação do Covid-19. Observe que são dados de séries temporais e, portando, os números de casos em um determinado dia são números acumulados.


#About this Dataset

#####These data sets explored in these analyzes below are provided by Johns Hopkins University, a renowned institution in the United States that is at the forefront of data collected worldwide about Covid-19.  I also collect data from the Kaggle platform, where it gathers users from all over the world collaborating with real data and from reliable sources.

#####All data sets explored have information with daily updates on the numbers of confirmed cases, deaths and recovery from Covid-19. Note that they are time series data and the numbers of cases on a given day are cumulative numbers.

---

#####Fonte | Source (Datasets): 
#####Johns Hopkins University:
#####https://coronavirus.jhu.edu/

#####Kaggle:
#####https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
#####https://www.kaggle.com/unanimad/corona-virus-brazil

#####All datasets on github:

##### https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
##### https://github.com/niltontac/EspAnalise-EngDados/tree/master/data/Novel_Corona_Virus_2019_Dataset
##### https://github.com/niltontac/EspAnalise-EngDados/tree/master/data/covid19_brazil_data
---

#####Analyst: Nilton Thiago de Andrade Coura
#####Recife/PE - Brazil
#####niltontac@gmail.com
#####https://github.com/niltontac

# Covid-19 - Exploratory Analysis and Predictions

![alt text](https://i.ibb.co/txCZFvr/3-D-medical-animation-coronavirus-structure.jpg)

Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go 
import seaborn as sns
import plotly as py
import plotly.express as px

from fbprophet.plot import plot_plotly
from fbprophet import Prophet
from fbprophet.plot import add_changepoints_to_plot


import warnings
warnings.filterwarnings('ignore')

# Loading dataset
# Last dataset update 04/12/2020

covid19confirmed = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')

covid19deaths = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')

covid19recovered = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')

covid19 = pd.read_csv('https://raw.githubusercontent.com/niltontac/EspAnalise-EngDados/master/data/Novel_Corona_Virus_2019_Dataset/covid_19_data.csv', parse_dates=['ObservationDate', 'Last Update'])

covid19Brazil = pd.read_csv('https://raw.githubusercontent.com/niltontac/EspAnalise-EngDados/master/data/covid19_brazil_data/brazil_covid19.csv')

  import pandas.util.testing as tm


Assigning last update:

In [0]:
last_date_update = '4/12/20'

Checking the last 5 cases to confirm when all the data sets were updated:

In [3]:
print('covid19confirmed:')
print(covid19confirmed.tail())
####
print('covid19deaths:')
print(covid19deaths.tail())
####
print('covid19recovered:')
print(covid19recovered.tail())
####
print('covid19:')
print(covid19.tail())
####
print('covid19Brazil:')
print(covid19Brazil.tail())

covid19confirmed:
                Province/State         Country/Region  ...  4/11/20  4/12/20
259  Saint Pierre and Miquelon                 France  ...        1        1
260                        NaN            South Sudan  ...        4        4
261                        NaN         Western Sahara  ...        4        6
262                        NaN  Sao Tome and Principe  ...        4        4
263                        NaN                  Yemen  ...        1        1

[5 rows x 86 columns]
covid19deaths:
                Province/State         Country/Region  ...  4/11/20  4/12/20
259  Saint Pierre and Miquelon                 France  ...        0        0
260                        NaN            South Sudan  ...        0        0
261                        NaN         Western Sahara  ...        0        0
262                        NaN  Sao Tome and Principe  ...        0        0
263                        NaN                  Yemen  ...        0        0

[5 rows x 86 column

In [0]:
# Rename columns 'ObservationDate' for 'Date'

covid19 = covid19.rename(columns={'ObservationDate' : 'Date'})

Dimension of data sets (rows vs columns):

In [5]:
print('covid19confirmed:')
print(covid19confirmed.shape)
####
print('covid19deaths:')
print(covid19deaths.shape)
####
print('covid19recovered:')
print(covid19recovered.shape)
####
print('covid19:')
print(covid19.shape)
####
print('covid19Brazil:')
print(covid19Brazil.shape)

covid19confirmed:
(264, 86)
covid19deaths:
(264, 86)
covid19recovered:
(250, 86)
covid19:
(14171, 8)
covid19Brazil:
(1998, 5)


Checking for null or missing values:

In [6]:
print('covid19confirmed:')
print(pd.DataFrame(covid19confirmed.isnull().sum()))
####
print('covid19deaths:')
print(pd.DataFrame(covid19deaths.isnull().sum()))
####
print('covid19recovered:')
print(pd.DataFrame(covid19recovered.isnull().sum()))
####
print('covid19:')
print(pd.DataFrame(covid19.isnull().sum()))
####
print('covid19Brazil:')
print(pd.DataFrame(covid19Brazil.isnull().sum()))

covid19confirmed:
                  0
Province/State  182
Country/Region    0
Lat               0
Long              0
1/22/20           0
...             ...
4/8/20            0
4/9/20            0
4/10/20           0
4/11/20           0
4/12/20           0

[86 rows x 1 columns]
covid19deaths:
                  0
Province/State  182
Country/Region    0
Lat               0
Long              0
1/22/20           0
...             ...
4/8/20            0
4/9/20            0
4/10/20           0
4/11/20           0
4/12/20           0

[86 rows x 1 columns]
covid19recovered:
                  0
Province/State  183
Country/Region    0
Lat               0
Long              0
1/22/20           0
...             ...
4/8/20            0
4/9/20            0
4/10/20           0
4/11/20           0
4/12/20           0

[86 rows x 1 columns]
covid19:
                   0
SNo                0
Date               0
Province/State  6924
Country/Region     0
Last Update        0
Confirmed          0
Deat

Some data sets have missings values or null in "Province/State" column.
Let's replace them with 'unknow':

In [0]:
# Replacing data missings

covid19confirmed = covid19confirmed.fillna('unknow')
covid19deaths = covid19deaths.fillna('unknow')
covid19recovered = covid19recovered.fillna('unknow')
covid19 = covid19.fillna('unknow')

In [8]:
# Checking for null or missing values again

print('covid19confirmed:')
print(pd.DataFrame(covid19confirmed.isnull().sum()))
####
print('covid19deaths:')
print(pd.DataFrame(covid19deaths.isnull().sum()))
####
print('covid19recovered:')
print(pd.DataFrame(covid19recovered.isnull().sum()))
####
print('covid19:')
print(pd.DataFrame(covid19.isnull().sum()))

covid19confirmed:
                0
Province/State  0
Country/Region  0
Lat             0
Long            0
1/22/20         0
...            ..
4/8/20          0
4/9/20          0
4/10/20         0
4/11/20         0
4/12/20         0

[86 rows x 1 columns]
covid19deaths:
                0
Province/State  0
Country/Region  0
Lat             0
Long            0
1/22/20         0
...            ..
4/8/20          0
4/9/20          0
4/10/20         0
4/11/20         0
4/12/20         0

[86 rows x 1 columns]
covid19recovered:
                0
Province/State  0
Country/Region  0
Lat             0
Long            0
1/22/20         0
...            ..
4/8/20          0
4/9/20          0
4/10/20         0
4/11/20         0
4/12/20         0

[86 rows x 1 columns]
covid19:
                0
SNo             0
Date            0
Province/State  0
Country/Region  0
Last Update     0
Confirmed       0
Deaths          0
Recovered       0


##Plotly Visualizations: Exploratory data analysis and predictions in the World and Brazil. 

###Worldwide:

Interactive Graph 01

Global records including confirmed, deaths and recovered cases:

In [9]:
cases_growth = covid19.groupby('Date')['Confirmed', 'Deaths', 'Recovered'].sum()
cases_growth = cases_growth.reset_index()
cases_growth = cases_growth.sort_values('Date', ascending=False)

fig = go.Figure()
fig.update_layout(title_text='Global records including confirmed, deaths and recovered cases:', 
                  xaxis_title='Period Date', yaxis_title='Cases', template='plotly_dark')

fig.add_trace(go.Scatter(x=cases_growth['Date'], 
                        y=cases_growth['Confirmed'], 
                        mode='lines+markers',
                        name='Global Confirmed',
                        line=dict(color='Yellow', width=2)))

fig.add_trace(go.Scatter(x=cases_growth['Date'], 
                        y=cases_growth['Deaths'], 
                        mode='lines+markers',
                        name='Global Deaths',
                        line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=cases_growth['Date'], 
                        y=cases_growth['Recovered'], 
                        mode='lines+markers',
                        name='Global Recovered',
                        line=dict(color='green', width=2)))

fig.show()

Interactive Graph 02

Global rate for growth confirmed, death and recovered cases:

In [10]:
cases_rate = covid19.groupby(['Date']).agg({'Deaths': ['sum'],'Recovered': ['sum'],'Confirmed': ['sum']})
cases_rate.columns = ['Global_Deaths','Global_Recovered','Global_Confirmed']
cases_rate = cases_rate.reset_index()
cases_rate['Increase_new_cases_per_day']=cases_rate['Global_Confirmed'].diff().shift(-1)
# Calculating rates
# lambda function
cases_rate['Global_Deaths_rate_%'] = cases_rate.apply(lambda row: ((row.Global_Deaths)/(row.Global_Confirmed))*100 , axis=1)
cases_rate['Global_Recovered_rate_%'] = cases_rate.apply(lambda row: ((row.Global_Recovered)/(row.Global_Confirmed))*100 , axis=1)
cases_rate['Global_Growth_rate_%']=cases_rate.apply(lambda row: row.Increase_new_cases_per_day/row.Global_Confirmed*100, axis=1)
cases_rate['Global_Growth_rate_%']=cases_rate['Global_Growth_rate_%'].shift(+1)



fig = go.Figure()
fig.update_layout(title_text='Global rate for growth confirmed, death and recovered cases:', 
                  xaxis_title='Period Date', yaxis_title='Rate', template='plotly_dark')
fig.add_trace(go.Scatter(x=cases_rate['Date'], 
                         y=cases_rate['Global_Deaths_rate_%'],
                         mode='lines+markers',
                         name='Global Death rate %',
                         line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=cases_rate['Date'], 
                         y=cases_rate['Global_Recovered_rate_%'],
                         mode='lines+markers',
                         name='Global Recovery rate %',
                         line=dict(color='Green', width=2)))

fig.add_trace(go.Scatter(x=cases_rate['Date'], 
                         y=cases_rate['Global_Growth_rate_%'],
                         mode='lines+markers',
                         name='Global Growth Confirmed rate %',
                         line=dict(color='Yellow', width=2)))

fig.show()

General numbers, but note the increase of new confirmed cases day by day:

In [11]:
cases_rate.tail()

Unnamed: 0,Date,Global_Deaths,Global_Recovered,Global_Confirmed,Increase_new_cases_per_day,Global_Deaths_rate_%,Global_Recovered_rate_%,Global_Growth_rate_%
76,2020-04-07,81865.0,300054.0,1426096.0,85008.0,5.740497,21.040239,6.021481
77,2020-04-08,88338.0,328661.0,1511104.0,84246.0,5.845925,21.749727,5.960889
78,2020-04-09,95455.0,353975.0,1595350.0,96369.0,5.983327,22.187921,5.575129
79,2020-04-10,102525.0,376096.0,1691719.0,79795.0,6.060404,22.231588,6.040618
80,2020-04-11,108503.0,402110.0,1771514.0,,6.124874,22.698663,4.7168


Confirmed, Deaths, Recovered and Active cases in all affected countries around the world:

In [12]:
cases_temp = covid19confirmed 
cases_temp = cases_temp[['Country/Region', last_date_update]]
cases_temp = cases_temp.groupby('Country/Region').sum().sort_values(by = last_date_update,ascending = False)
cases_temp['Recovered'] = covid19recovered[['Country/Region', last_date_update]].groupby('Country/Region').sum().sort_values(by = last_date_update, ascending = False)
cases_temp['Deaths'] = covid19deaths[['Country/Region', last_date_update]].groupby('Country/Region').sum().sort_values(by = last_date_update, ascending = False)
cases_temp['Active'] = cases_temp[last_date_update] - cases_temp['Recovered'] - cases_temp['Deaths']
cases_temp = cases_temp.rename(columns = {last_date_update: 'Confirmed', 'Recovered' : 'Recovered', 'Deaths' : 'Deaths', 'Active' : 'Active'})

cases_temp.style.background_gradient(cmap='Reds')

Unnamed: 0_level_0,Confirmed,Recovered,Deaths,Active
Country/Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
US,555313,32988,22020,500305
Spain,166831,62391,17209,87231
Italy,156363,34211,19899,102253
France,133670,27469,14412,91789
Germany,127854,60300,3022,64532
United Kingdom,85206,626,10629,73951
China,83134,77956,3343,1835
Iran,71686,43894,4474,23318
Turkey,56956,3446,1198,52312
Belgium,29647,6463,3600,19584


###Global Predictions using Prophet Algorithm - procedure for forecasting time series data 

In [0]:
mortality = covid19.copy()

mortality = mortality.groupby(['Date', 'Country/Region']).agg({'Deaths': ['sum'],'Recovered': ['sum'],'Confirmed': ['sum']})
mortality.columns = ['Deaths','Recovered','Confirmed']
mortality = mortality.reset_index()
mortality = mortality[mortality.Deaths != 0]
mortality = mortality[mortality.Confirmed != 0]
# Prevent division by zero
def ifNull(d):
    temp=1
    if d!=0:
        temp=d
    return temp

mortality['mortality_rate'] = mortality.apply(lambda row: ((row.Deaths+1)/ifNull((row.Confirmed)))*100, axis=1)

In [0]:
floorVar=0
worldPop=10000000

# Modelling total confirmed cases 
confirmed_training_dataset = pd.DataFrame(covid19.groupby('Date')['Confirmed'].sum().reset_index()).rename(columns={'Date': 'ds', 'Confirmed': 'y'})
confirmed_training_dataset['floor'] = floorVar
confirmed_training_dataset['cap'] = worldPop

# Modelling mortality rate
mortality_training_dataset = pd.DataFrame(mortality.groupby('Date')['mortality_rate'].mean().reset_index()).rename(columns={'Date': 'ds', 'mortality_rate': 'y'})

# Modelling deaths
death_training_dataset = pd.DataFrame(covid19.groupby('Date')['Deaths'].sum().reset_index()).rename(columns={'Date': 'ds', 'Deaths': 'y'})
death_training_dataset['floor'] = 0
death_training_dataset['cap'] = 2500

In [15]:
# Total dataframe model 
m = Prophet(
    growth="logistic",
    interval_width=0.98,
    yearly_seasonality=False,
    weekly_seasonality=False,
    daily_seasonality=True,
    seasonality_mode='additive'
    )

m.fit(confirmed_training_dataset)
future = m.make_future_dataframe(periods=50)
future['cap']=worldPop
future['floor']=floorVar
confirmed_forecast = m.predict(future)

# Mortality rate model
m_mortality = Prophet ()
m_mortality.fit(mortality_training_dataset)
mortality_future = m_mortality.make_future_dataframe(periods=31)
mortality_forecast = m_mortality.predict(mortality_future)

# Deaths model
m2 = Prophet(interval_width=0.95,
            growth="logistic")
m2.fit(death_training_dataset)

future2 = m2.make_future_dataframe(periods=7)
future2['cap']=2500
future2['floor']=0
death_forecast = m2.predict(future2)

INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [16]:
fig = plot_plotly(m, confirmed_forecast)
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Total Confirmed cases (Global)',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig.update_layout(annotations=annotations)
fig

In [17]:
fig = plot_plotly(m_mortality, mortality_forecast)
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Mortality Rate (Global)',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig.update_layout(annotations=annotations)
fig

In [18]:
fig_death = plot_plotly(m2, death_forecast)  
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Deaths (Global)',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig_death.update_layout(annotations=annotations)
fig_death

###Brazil:

Analysis of the advancement of covid-19 in Brazil.

Confirmed, Deaths and Recovered cases, in addition to the increase in cases day by day:

In [19]:
cases_Brazil = covid19.copy()
cases_Brazil = covid19.loc[covid19['Country/Region']=='Brazil']
cases_Brazil = cases_Brazil.groupby(['Date', 'Country/Region']).agg({'Confirmed':['sum'], 'Deaths':['sum'], 'Recovered':['sum']}).sort_values('Date', ascending = False)
cases_Brazil.columns = ['Confirmed', 'Deaths', 'Recovered']
cases_Brazil = cases_Brazil.reset_index()
cases_Brazil['Increase_new_confirmed_per_day'] = cases_Brazil['Confirmed'].diff().shift(-1)
cases_Brazil['Increase_new_deaths_per_day'] = cases_Brazil['Deaths'].diff().shift(-1)
cases_Brazil['Increase_new_recovered_per_day'] = cases_Brazil['Recovered'].diff().shift(-1)

cases_Brazil_confirmed = cases_Brazil[cases_Brazil['Confirmed']!=0]
cases_Brazil_confirmed

Unnamed: 0,Date,Country/Region,Confirmed,Deaths,Recovered,Increase_new_confirmed_per_day,Increase_new_deaths_per_day,Increase_new_recovered_per_day
0,2020-04-11,Brazil,20727.0,1124.0,173.0,-1089.0,-67.0,0.0
1,2020-04-10,Brazil,19638.0,1057.0,173.0,-1546.0,-107.0,0.0
2,2020-04-09,Brazil,18092.0,950.0,173.0,-1922.0,-131.0,-46.0
3,2020-04-08,Brazil,16170.0,819.0,127.0,-2136.0,-133.0,0.0
4,2020-04-07,Brazil,14034.0,686.0,127.0,-1873.0,-122.0,0.0
5,2020-04-06,Brazil,12161.0,564.0,127.0,-1031.0,-78.0,0.0
6,2020-04-05,Brazil,11130.0,486.0,127.0,-770.0,-41.0,0.0
7,2020-04-04,Brazil,10360.0,445.0,127.0,-1304.0,-86.0,0.0
8,2020-04-03,Brazil,9056.0,359.0,127.0,-1012.0,-35.0,0.0
9,2020-04-02,Brazil,8044.0,324.0,127.0,-1208.0,-84.0,0.0


Interactive Graph 03

All records in Brazil - confirmed, death and recovered cases:

In [20]:
fig = go.Figure()
fig.update_layout(title_text='All records in Brazil - confirmed, death and recovered cases:', 
                  xaxis_title='Period Date', yaxis_title='Cases',template='plotly_dark')

fig.add_trace(go.Scatter(x=cases_Brazil_confirmed['Date'], 
                        y=cases_Brazil_confirmed['Confirmed'], 
                        mode='lines+markers',
                        name='Confirmed',
                        line=dict(color='yellow', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_confirmed['Date'], 
                        y=cases_Brazil_confirmed['Deaths'], 
                        mode='lines+markers',
                        name='Deaths',
                        line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_confirmed['Date'], 
                        y=cases_Brazil_confirmed['Recovered'], 
                        mode='lines+markers',
                        name='Recovered',
                        line=dict(color='green', width=2)))

fig.show()

Interactive Graph 04

Brazil rate for growth confirmed, death and recovered cases:

In [21]:
cases_Brazil_rate = covid19.copy()
cases_Brazil_rate = covid19.loc[covid19['Country/Region']=='Brazil']
cases_Brazil_rate = cases_Brazil_rate.groupby(['Date']).agg({'Deaths': ['sum'],'Recovered': ['sum'],'Confirmed': ['sum']})
cases_Brazil_rate.columns = ['Brazil_Deaths','Brazil_Recovered','Brazil_Confirmed']
cases_Brazil_rate = cases_Brazil_rate.reset_index()
cases_Brazil_rate['Increase_cases_per_day_in_Brazil']=cases_Brazil_rate['Brazil_Confirmed'].diff().shift(-1)

cases_Brazil_rate = cases_Brazil_rate[cases_Brazil_rate.Brazil_Deaths != 0]
cases_Brazil_rate = cases_Brazil_rate[cases_Brazil_rate.Brazil_Confirmed != 0]
# Prevent division by zero
def ifNull(d):
    temp=1
    if d!=0:
        temp=d
    return temp

# Calculating rate
# lambda function
cases_Brazil_rate['Brazil_Deaths_rate_%'] = cases_Brazil_rate.apply(lambda row: ((row.Brazil_Deaths)/(row.Brazil_Confirmed))*100 , axis=1)
cases_Brazil_rate['Brazil_Recovered_rate_%'] = cases_Brazil_rate.apply(lambda row: ((row.Brazil_Recovered)/(row.Brazil_Confirmed))*100 , axis=1)
cases_Brazil_rate['Brazil_Growth_rate_%']=cases_Brazil_rate.apply(lambda row: row.Increase_cases_per_day_in_Brazil/row.Brazil_Confirmed*100, axis=1)
cases_Brazil_rate['Brazil_Growth_rate_%']=cases_Brazil_rate['Brazil_Growth_rate_%'].shift(+1)



fig = go.Figure()
fig.update_layout(title_text='Brazil rate for growth confirmed, death and recovered cases:', 
                  xaxis_title='Period Date', yaxis_title='Rate', template='plotly_dark')

fig.add_trace(go.Scatter(x=cases_Brazil_rate['Date'], 
                         y=cases_Brazil_rate['Brazil_Deaths_rate_%'],
                         mode='lines+markers',
                         name='Brazil Death rate %',
                         line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_rate['Date'], 
                         y=cases_Brazil_rate['Brazil_Recovered_rate_%'],
                         mode='lines+markers',
                         name='Brazil Recovery rate %',
                         line=dict(color='Green', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_rate['Date'], 
                         y=cases_Brazil_rate['Brazil_Growth_rate_%'],
                         mode='lines+markers',
                         name='Brazil Growth Confirmed rate %',
                         line=dict(color='yellow', width=2)))

fig.show()

####Province/Region of Brazil:

Confirmed cases for each region:

In [22]:
cases_Brazil_region = covid19Brazil.loc[:,['region', 'cases', 'date']].groupby(['region', 'date']).sum().reset_index().sort_values(['cases', 'date'], ascending=False)
cases_Brazil_region = cases_Brazil_region.drop_duplicates(subset = ['region'])
cases_Brazil_region = cases_Brazil_region.set_index('date')
cases_Brazil_region_confirmed = cases_Brazil_region[cases_Brazil_region["cases"]>0]
cases_Brazil_region_confirmed

Unnamed: 0_level_0,region,cases
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-04-12,Sudeste,12799
2020-04-12,Nordeste,4246
2020-04-12,Sul,2159
2020-04-12,Norte,1898
2020-04-12,Centro-Oeste,1067


Interactive Graph 05

Difference between confirmed cases for each region:

In [23]:
fig = go.Figure()
fig.update_layout(
    title_text='Confirmed cases by region to date',
    height=400, width=500, xaxis_title='Regions', yaxis_title='Confirmed Cases')

fig.add_trace(go.Bar(
                x=cases_Brazil_region_confirmed["region"],
                y=cases_Brazil_region_confirmed["cases"],
                name='Confirmed cases',
                marker_color='darkcyan',
                marker_line_color='rgb(8,48,107)',
                marker_line_width=2, 
                opacity=0.7)
             )

fig.show()

Death cases for each region:

In [24]:
cases_Brazil_region_deaths = covid19Brazil.loc[:,['region', 'deaths', 'date']].groupby(['region', 'date']).sum().reset_index().sort_values(['deaths', 'date'], ascending=False)
cases_Brazil_region_deaths = cases_Brazil_region_deaths.drop_duplicates(subset = ['region'])
cases_Brazil_region_deaths = cases_Brazil_region_deaths.set_index('date')
cases_Brazil_region_deaths_confirmed = cases_Brazil_region_deaths[cases_Brazil_region_deaths["deaths"]>0]
cases_Brazil_region_deaths_confirmed

Unnamed: 0_level_0,region,deaths
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-04-12,Sudeste,787
2020-04-12,Nordeste,246
2020-04-12,Norte,87
2020-04-12,Sul,70
2020-04-12,Centro-Oeste,33


Interactive Graph 06

Difference between deaths for each region:

In [25]:
fig2 = go.Figure()
fig2.update_layout(
    title_text='Deaths by region to date',
    height=400, width=500, xaxis_title='Regions', yaxis_title='Deaths')
fig2.add_trace(go.Bar(
                x=cases_Brazil_region_deaths_confirmed["region"],
                y=cases_Brazil_region_deaths_confirmed["deaths"],
                name='Deaths',
                marker_color='red',
                marker_line_color='rgb(8,48,107)',
                marker_line_width=2, 
                opacity=0.7)
             )

fig2.show()

####States of Brazil:

Confirmed e Deaths cases in each state of Brazil:

In [26]:
cases_Brazil_state = covid19Brazil.groupby(['state', 'date']).sum().reset_index().sort_values(['cases', 'deaths','date'], ascending=False)
cases_Brazil_state = cases_Brazil_state.drop_duplicates(subset = ['state'])
cases_Brazil_state = cases_Brazil_state.set_index('date')

cases_Brazil_state_confirmed = cases_Brazil_state[cases_Brazil_state["cases"]>0]
cases_Brazil_state_confirmed

Unnamed: 0_level_0,state,cases,deaths
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-04-12,São Paulo,8755,588
2020-04-12,Rio de Janeiro,2855,170
2020-04-12,Ceará,1676,74
2020-04-12,Amazonas,1206,62
2020-04-12,Pernambuco,960,85
2020-04-12,Minas Gerais,806,20
2020-04-12,Santa Catarina,768,24
2020-04-12,Paraná,738,30
2020-04-12,Bahia,673,21
2020-04-12,Rio Grande do Sul,653,16


Interactive Graph 07

Confirmed cases in each state of Brazil:

In [27]:
fig = go.Figure()

fig.add_trace(go.Bar(
                x=cases_Brazil_state_confirmed["state"],
                y=cases_Brazil_state_confirmed["cases"],
                marker_color='darkcyan',
                marker_line_color='rgb(8,48,107)',
                marker_line_width=2, 
                opacity=0.7)
             )

fig.update_layout(
    title_text='Confirmed Cases by States of Brazil to Date',
    height=700, width=800, xaxis_title='States', yaxis_title='Confirmed Cases')

fig.show()

Interactive Graph 08

Deaths cases in each state of Brazil:

In [28]:
cases_Brazil_state_graph_Deaths = covid19Brazil.groupby(['state', 'date']).sum().reset_index().sort_values(['deaths', 'date'], ascending=False)
cases_Brazil_state_graph_Deaths = cases_Brazil_state_graph_Deaths.drop_duplicates(subset = ['state'])
cases_Brazil_state_graph_Deaths = cases_Brazil_state_graph_Deaths.set_index('date')
cases_Brazil_state_graph_Deaths_conf = cases_Brazil_state_graph_Deaths[cases_Brazil_state["cases"]>0]

fig_deaths = go.Figure()

fig_deaths.add_trace(go.Bar(
                x=cases_Brazil_state_graph_Deaths_conf["state"],
                y=cases_Brazil_state_graph_Deaths_conf["deaths"],
                marker_color='red',
                marker_line_color='rgb(8,48,107)',
                marker_line_width=2, 
                opacity=0.7)
             )

fig_deaths.update_layout(
    title_text='Deaths Cases by States of Brazil to Date',
    height=700, width=800, xaxis_title='States', yaxis_title='Deaths')

fig_deaths.show()

####State of Pernambuco (I live here)

Confirmed and Deaths cases in Pernambuco:

In [29]:
cases_Brazil_state_Pernambuco = covid19Brazil.copy()
cases_Brazil_state_Pernambuco = covid19Brazil.loc[covid19Brazil['state']=='Pernambuco']
cases_Brazil_state_Pernambuco = cases_Brazil_state_Pernambuco.groupby(['date']).agg({'cases':['sum'], 'deaths':['sum']}).sort_values('date', ascending = False)
cases_Brazil_state_Pernambuco.columns = ['cases', 'deaths']
cases_Brazil_state_Pernambuco = cases_Brazil_state_Pernambuco.reset_index()

cases_Brazil_state_Pernambuco_confirmed = cases_Brazil_state_Pernambuco[cases_Brazil_state_Pernambuco['cases']!=0]
cases_Brazil_state_Pernambuco_confirmed.style.background_gradient(cmap='Reds')

Unnamed: 0,date,cases,deaths
0,2020-04-12,960,85
1,2020-04-11,816,72
2,2020-04-10,684,65
3,2020-04-09,555,56
4,2020-04-08,401,46
5,2020-04-07,352,34
6,2020-04-06,223,30
7,2020-04-05,201,21
8,2020-04-04,176,14
9,2020-04-03,136,10


Interactive Graph 09

Confirmed and Deaths cases in Pernambuco:

In [30]:
fig = go.Figure()
fig.update_layout(title_text='Confirmed and Deaths cases in Pernambuco', 
                  xaxis_title='Period Date', yaxis_title='Cases', template='seaborn', width=1200, height=600)

fig.add_trace(go.Scatter(x=cases_Brazil_state_Pernambuco_confirmed['date'], 
                        y=cases_Brazil_state_Pernambuco_confirmed['cases'], 
                        mode='lines+markers',
                        name='Confirmed',
                        line=dict(color='darkcyan', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_state_Pernambuco_confirmed['date'], 
                        y=cases_Brazil_state_Pernambuco_confirmed['deaths'], 
                        mode='lines+markers',
                        name='Deaths',
                        line=dict(color='red', width=2)))

Interactive Graph 10

Confirmed and Deaths cases in Pernambuco - Rate %:

In [31]:
cases_Brazil_state_Pernambuco_rate = covid19Brazil.copy()
cases_Brazil_state_Pernambuco_rate = covid19Brazil.loc[covid19Brazil['state']=='Pernambuco']
cases_Brazil_state_Pernambuco_rate = cases_Brazil_state_Pernambuco_rate.groupby(['date']).agg({'deaths': ['sum'],'cases': ['sum']})
cases_Brazil_state_Pernambuco_rate.columns = ['Pernambuco_Deaths','Pernambuco_Cases']
cases_Brazil_state_Pernambuco_rate = cases_Brazil_state_Pernambuco_rate.reset_index()
cases_Brazil_state_Pernambuco_rate['Increase_cases_per_day_in_Pernambuco']=cases_Brazil_state_Pernambuco_rate['Pernambuco_Cases'].diff().shift(-1)

cases_Brazil_state_Pernambuco_rate = cases_Brazil_state_Pernambuco_rate[cases_Brazil_state_Pernambuco_rate.Pernambuco_Deaths != 0]
cases_Brazil_state_Pernambuco_rate = cases_Brazil_state_Pernambuco_rate[cases_Brazil_state_Pernambuco_rate.Pernambuco_Cases != 0]
# Prevent division by zero
def ifNull(d):
    temp=1
    if d!=0:
        temp=d
    return temp

cases_Brazil_state_Pernambuco_rate['Pernambuco_Deaths_rate_%'] = cases_Brazil_state_Pernambuco_rate.apply(lambda row: ((row.Pernambuco_Deaths)/(row.Pernambuco_Cases))*100 , axis=1)
cases_Brazil_state_Pernambuco_rate['Pernambuco_Growth_rate_%'] = cases_Brazil_state_Pernambuco_rate.apply(lambda row: row.Increase_cases_per_day_in_Pernambuco/row.Pernambuco_Cases*100, axis=1)
cases_Brazil_state_Pernambuco_rate['Pernambuco_Growth_rate_%'] = cases_Brazil_state_Pernambuco_rate['Pernambuco_Growth_rate_%'].shift(+1)



fig = go.Figure()
fig.update_layout(title_text='Confirmed and Deaths cases in Pernambuco - Rate %', 
                  xaxis_title='Period Date', yaxis_title='Rate', template='seaborn', width=1200, height=600)
fig.add_trace(go.Scatter(x=cases_Brazil_state_Pernambuco_rate['date'], 
                         y=cases_Brazil_state_Pernambuco_rate['Pernambuco_Deaths_rate_%'],
                         mode='lines+markers',
                         name='Death rate %',
                         line=dict(color='red', width=2)))

fig.add_trace(go.Scatter(x=cases_Brazil_state_Pernambuco_rate['date'], 
                         y=cases_Brazil_state_Pernambuco_rate['Pernambuco_Growth_rate_%'],
                         mode='lines+markers',
                         name='Confirmed rate %',
                         line=dict(color='darkcyan', width=2)))

####Predictions in Brazil using Prophet Algorithm - procedure for forecasting time series data 

In [0]:
cases_Brazil_predictions = covid19Brazil.copy()
cases_Brazil_predictions = cases_Brazil_predictions.groupby(['date']).agg({'deaths': ['sum'],'cases': ['sum']})
cases_Brazil_predictions.columns = ['Brazil_Deaths','Brazil_Confirmed']
cases_Brazil_predictions = cases_Brazil_predictions.reset_index()
cases_Brazil_predictions = cases_Brazil_predictions[cases_Brazil_predictions.Brazil_Deaths != 0]
cases_Brazil_predictions = cases_Brazil_predictions[cases_Brazil_predictions.Brazil_Confirmed != 0]
# Prevent division by zero
def ifNull(d):
    temp=1
    if d!=0:
        temp=d
    return temp

cases_Brazil_predictions['Brazil_mortality_rate_%'] = cases_Brazil_predictions.apply(lambda row: ((row.Brazil_Deaths+1)/ifNull((row.Brazil_Confirmed)))*100 , axis=1)

In [0]:
floorVar=0
BrazilPop=10000000

# Modelling total confirmed cases 
Brazil_confirmed_training_dataset = pd.DataFrame(covid19Brazil.groupby('date')['cases'].sum().reset_index()).rename(columns={'date': 'ds', 'cases': 'y'})
Brazil_confirmed_training_dataset['floor'] = floorVar
Brazil_confirmed_training_dataset['cap'] = BrazilPop

# Modelling mortality rate
Brazil_mortality_training_dataset = pd.DataFrame(cases_Brazil_predictions.groupby('date')['Brazil_mortality_rate_%'].mean().reset_index()).rename(columns={'date': 'ds', 'Brazil_mortality_rate_%': 'y'})

# Modelling deaths
Brazil_death_training_dataset = pd.DataFrame(covid19Brazil.groupby('date')['deaths'].sum().reset_index()).rename(columns={'date': 'ds', 'deaths': 'y'})
Brazil_death_training_dataset['floor'] = 0
Brazil_death_training_dataset['cap'] = 2500

In [34]:
# Total dataframe model 
m_Brazil = Prophet(
    growth="logistic",
    interval_width=0.98,
    yearly_seasonality=False,
    weekly_seasonality=False,
    daily_seasonality=True,
    seasonality_mode='additive'
    )

m_Brazil.fit(Brazil_confirmed_training_dataset)
future_Brazil = m_Brazil.make_future_dataframe(periods=50)
future_Brazil['cap']=BrazilPop
future_Brazil['floor']=floorVar
confirmed_forecast_Brazil = m_Brazil.predict(future_Brazil)

# Mortality rate model
m_Brazil_mortality = Prophet ()
m_Brazil_mortality.fit(Brazil_mortality_training_dataset)
mortality_future_Brazil = m_Brazil_mortality.make_future_dataframe(periods=31)
mortality_forecast_Brazil = m_Brazil_mortality.predict(mortality_future_Brazil)

# Deaths model
m2_Brazil = Prophet(interval_width=0.95,
            growth="logistic")
m2_Brazil.fit(Brazil_death_training_dataset)

future2_Brazil = m2_Brazil.make_future_dataframe(periods=7)
future2_Brazil['cap']=2500
future2_Brazil['floor']=0
death_forecast_Brazil = m2_Brazil.predict(future2_Brazil)

INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
INFO:fbprophet:n_changepoints greater than number of observations. Using 20.
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


In [35]:
fig_confirmed_Brazil = plot_plotly(m_Brazil, confirmed_forecast_Brazil)
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Total Confirmed cases in Brazil',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig_confirmed_Brazil.update_layout(annotations=annotations)
fig_confirmed_Brazil

In [36]:
fig_death_Brazil_rate = plot_plotly(m_Brazil_mortality, mortality_forecast_Brazil)
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Mortality Rate in Brazil',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig_death_Brazil_rate.update_layout(annotations=annotations)
fig_death_Brazil_rate

In [37]:
fig_death_Brazil = plot_plotly(m2_Brazil, death_forecast_Brazil)  
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.10,
                              xanchor='left', yanchor='bottom',
                              text='Predictions for Deaths in Brazil',
                              font=dict(family='Arial',
                                        size=25,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig_death_Brazil.update_layout(annotations=annotations)
fig_death_Brazil

report in progress for the next few days...