**Introduction**

This is my first Python Notebook I am writing and sharing on Kaggle Platform.
My goal here is to practice my python programming and data science skills, also I have decided to learn how to use the Plotly Graphing Libraries (https://plot.ly/graphing-libraries/), of course I am going to try provide some insights I have found on this database about Coronavirus in Brazil.
I think it is important to say that I have no medical background.

**COVID-19 in Brazil**

This database is updated daily by Ministry of Health and contains the reported cases over the time since the first case of COVID-19 in Brazil.

Many thanks Raphael Fontes who has prepared this dataset. I do recommend his article teaching how to get COVID-19 dataset from Brazilian Ministry of Health website.

Links:
* [Raphael Fonte teaching how he prepared the dataset](http://https://medium.com/mindstorm-%EF%B8%8F/coletando-e-armazenando-os-n%C3%BAmeros-de-casos-do-coronav%C3%ADrus-no-brasil-73c4f5909514)
* [COVID-19 - Brazil - Kaggle dataset](https://www.kaggle.com/unanimad/corona-virus-brazil)
* [Original data source - Ministry of Health](http://plataforma.saude.gov.br/novocoronavirus/)
* [Veja Abril Charts used for validation](https://veja.abril.com.br/saude/a-epidemia-de-coronavirus-no-brasil-em-tempo-real/)

Notes/Questions:
There are dates where Suspects and Negative cases decrease over time, it is a weird behavior once the dataset is the accumulated of reported cases around Brazil (Veja website do not use these data)

Next Step:
Work on geo data

**Importing Libraries and ETL process**

In [1]:
import numpy as np
import pandas as pd
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot, plot
import plotly.express as px
from plotly.subplots import make_subplots

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
# Any results you write to the current directory are saved as output.

#Loading database
COVID19_BR = pd.read_csv("../input/corona-virus-brazil/brazil_covid19.csv")
#Transforming data
COVID19_BR.date = pd.to_datetime(COVID19_BR.date)
COVID19_BR.hour = pd.to_datetime(COVID19_BR.hour, format='%H:%M').dt.time
COVID19_BR["Week_Number"] = COVID19_BR.date.dt.week

#Removing duplicated lines (when updated more than once a day)
COVID19_BR = COVID19_BR.drop_duplicates(subset=["date","state"], keep = "last")

#Rename columns
COVID19_BR.columns = ['Date', 'Hour', 'State', 'Suspects', 'Negative', 'Confirmed', "Deaths","Week_Number"]

#reading data infos
COVID19_BR.info()

/kaggle/input/corona-virus-brazil/brazil_covid19.csv
<class 'pandas.core.frame.DataFrame'>
Int64Index: 669 entries, 0 to 683
Data columns (total 8 columns):
Date           669 non-null datetime64[ns]
Hour           669 non-null object
State          669 non-null object
Suspects       669 non-null int64
Negative       669 non-null int64
Confirmed      669 non-null int64
Deaths         669 non-null int64
Week_Number    669 non-null int64
dtypes: datetime64[ns](1), int64(5), object(2)
memory usage: 47.0+ KB


**Data overview**


In [2]:
temp = COVID19_BR.groupby('State', as_index=False)["Suspects","Negative","Confirmed","Deaths"].max()
msg = """
This database contains the cases published by Ministry of Health over the time since the first case of COVID-19 in Brazil.
The first record in database was on """ + str(COVID19_BR.Date.dt.date.min()) + """ 
and the last record updated was on """ + str(COVID19_BR.Date.dt.date.max()) + """
Until now, Ministry of Healthy has recorded """ + str(temp.Suspects.sum()) + """ suspects, the number of negative cases is """ + str(temp.Negative.sum()) + """, """ + str(temp.Confirmed.sum()) + """ were confirmed and """ + str(temp.Deaths.sum()) + """ deaths have been reported.
"""
print(msg)


This database contains the cases published by Ministry of Health over the time since the first case of COVID-19 in Brazil.
The first record in database was on 2020-01-30 
and the last record updated was on 2020-03-18
Until now, Ministry of Healthy has recorded 11280 suspects, the number of negative cases is 2074, 428 were confirmed and 4 deaths have been reported.



In order to validate my analysis, I have made a comparison with Veja website. 

In [3]:
temp = COVID19_BR.groupby('Date', as_index=False)['Confirmed'].sum()
temp = temp[(temp.Confirmed>0)]
x1 = temp.Date
y1 = temp.Confirmed
temp = COVID19_BR.groupby('Date', as_index=False)['Deaths'].sum()
temp = temp[(temp.Deaths>0)]
x2 = temp.Date
y2 = temp.Deaths

fig = make_subplots(rows=1, cols=2)

fig.add_trace(    
    go.Scatter(x = x1, y = y1, name = y1.name,
                         line=dict(color='green', width=2)),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x = x2, y = y2, name = y2.name,
                         line=dict(color='darkred', width=2)),
    row=1, col=2
)

fig.update_layout({
"plot_bgcolor": "rgba(0, 0, 0, 0)",
"paper_bgcolor": "rgba(0, 0, 0, 0)",
})

fig.update_layout(title_text="Number of cases - COVID-19 in Brazil")

fig.show()
print("Data validation: https://veja.abril.com.br/saude/a-epidemia-de-coronavirus-no-brasil-em-tempo-real/")

Data validation: https://veja.abril.com.br/saude/a-epidemia-de-coronavirus-no-brasil-em-tempo-real/


**Exploratory Data Analysis**

Table1 - Cases per Brazilian State

In [4]:
print("Cases per Brazilian State - sorted by Confirmed Cases")
temp = COVID19_BR.groupby('State', as_index=False)["Suspects","Negative","Confirmed","Deaths"].max()
temp.sort_values("Confirmed",ascending = False).style.background_gradient(cmap='Reds')

Cases per Brazilian State - sorted by Confirmed Cases


Unnamed: 0,State,Suspects,Negative,Confirmed,Deaths
25,São Paulo,5334,709,240,4
20,Rio de Janeiro,1254,194,45,0
6,Distrito Federal,327,107,26,0
19,Rio Grande do Sul,416,330,19,0
16,Pernambuco,89,33,16,0
12,Minas Gerais,925,104,15,0
13,Paraná,400,119,13,0
23,Santa Catarina,346,55,10,0
5,Ceará,493,95,9,0
7,Espírito Santo,71,21,9,0


Bar Chart - Cases per Brazilian State

In [5]:
col = "Suspects"
temp = COVID19_BR.groupby('State', as_index=False)[col].max()
temp = temp[temp[col]>0]

print("Cases per Brazilian State -  "+temp[col].name+" Cases")

fig = px.bar(temp, x="State", y=col, text=col)
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(
    title= "COVID-19 - Number of " + temp[col].name + " cases in Brazilian States")
fig.update_layout({
"plot_bgcolor": "rgba(0, 0, 0, 0)",
"paper_bgcolor": "rgba(0, 0, 0, 0)",
})

fig.show()

Cases per Brazilian State -  Suspects Cases


In [6]:
col = "Negative"
temp = COVID19_BR.groupby('State', as_index=False)[col].max()
temp = temp[temp[col]>0]

print("Cases per Brazilian State -  "+temp[col].name+" Cases")

fig = px.bar(temp, x="State", y=col, text=col)
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(
    title= "COVID-19 - Number of " + temp[col].name + " cases in Brazilian States")
fig.update_layout({
"plot_bgcolor": "rgba(0, 0, 0, 0)",
"paper_bgcolor": "rgba(0, 0, 0, 0)",
})

fig.show()


Cases per Brazilian State -  Negative Cases


In [7]:
col = "Confirmed"
temp = COVID19_BR.groupby('State', as_index=False)[col].max()
temp = temp[temp[col]>0]

print("Cases per Brazilian State -  "+temp[col].name+" Cases")

fig = px.bar(temp, x="State", y=col, text=col)
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(
    title= "COVID-19 - Number of " + temp[col].name + " cases in Brazilian States")
fig.update_layout({
"plot_bgcolor": "rgba(0, 0, 0, 0)",
"paper_bgcolor": "rgba(0, 0, 0, 0)",
})

fig.show()


Cases per Brazilian State -  Confirmed Cases


In [8]:
col = "Deaths"
temp = COVID19_BR.groupby('State', as_index=False)[col].max()
temp = temp[temp[col]>0]

print("Cases per Brazilian State -  "+temp[col].name+" Cases")

fig = px.bar(temp, x="State", y=col, text=col)
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_layout(
    title= "COVID-19 - Number of " + temp[col].name + " cases in Brazilian States")
fig.update_layout({
"plot_bgcolor": "rgba(0, 0, 0, 0)",
"paper_bgcolor": "rgba(0, 0, 0, 0)",
})

fig.show()


Cases per Brazilian State -  Deaths Cases


Chart - Cases over time in Brazil
> Suspects and Negative cases sometimes decrease over time.

In [9]:
temp = COVID19_BR.groupby('Date', as_index=False)['Suspects', 'Negative', 'Confirmed', 'Deaths'].sum()
temp.head()
x1 = temp.Date
y1 = temp.Suspects
y2 = temp.Negative
y3 = temp.Confirmed
y4 = temp.Deaths

trace1 = go.Scatter(x = x1, y = y1, name = y1.name,
                         line=dict(color='orange', width=2))
trace2 = go.Scatter(x = x1, y = y2, name = y2.name,
                         line=dict(color='blue', width=2, dash='dot'))
trace3 = go.Scatter(x = x1, y = y3, name = y3.name,
                         line=dict(color='green', width=2))
trace4 = go.Scatter(x = x1, y = y4, name = y4.name,
                         line=dict(color='darkred', width=2))
data = [trace1, trace2, trace3, trace4]
layout = dict(title = 'Cases over time - COVID-19 in Brazil',
              xaxis= dict(title= 'Date',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Number of cases',ticklen= 5,zeroline= False),
              plot_bgcolor='white'
             )
fig = dict(data = data, layout = layout)

iplot(fig)

Chart - Cases over time in Brazilian States

In [10]:
col = "Suspects"

fig = go.Figure()
for name, group in COVID19_BR.groupby('State'):
    #trace = go.Histogram()
    trace = go.Scatter(x = group.Date, y = group[col])
    trace.name = name
#    trace.x = group.Deaths
    fig.add_trace(trace)
#fig.update_xaxes(title_text = "Date")
#fig.update_yaxes(title_text = "Number of cases")

fig.update_layout(
    title="Cases over time - COVID-19 - " + str(col) + " cases in Brazilian States",
    xaxis_title="Date",
    yaxis_title="Number of cases - " + temp[col].name
)

fig.update_layout({
"plot_bgcolor": "rgba(0, 0, 0, 0)",
"paper_bgcolor": "rgba(0, 0, 0, 0)",
})
iplot(fig)
print("You are able to hide some lines here")

You are able to hide some lines here


In [11]:
col = "Negative"

fig = go.Figure()
for name, group in COVID19_BR.groupby('State'):
    #trace = go.Histogram()
    trace = go.Scatter(x = group.Date, y = group[col])
    trace.name = name
#    trace.x = group.Deaths
    fig.add_trace(trace)
#fig.update_xaxes(title_text = "Date")
#fig.update_yaxes(title_text = "Number of cases")

fig.update_layout(
    title="Cases over time - COVID-19 - " + str(col) + " cases in Brazilian States",
    xaxis_title="Date",
    yaxis_title="Number of cases - " + temp[col].name
)

fig.update_layout({
"plot_bgcolor": "rgba(0, 0, 0, 0)",
"paper_bgcolor": "rgba(0, 0, 0, 0)",
})
iplot(fig)
print("You are able to hide some lines here")

You are able to hide some lines here


In [12]:
col = "Confirmed"

fig = go.Figure()
for name, group in COVID19_BR.groupby('State'):
    #trace = go.Histogram()
    trace = go.Scatter(x = group.Date, y = group[col])
    trace.name = name
#    trace.x = group.Deaths
    fig.add_trace(trace)
#fig.update_xaxes(title_text = "Date")
#fig.update_yaxes(title_text = "Number of cases")

fig.update_layout(
    title="Cases over time - COVID-19 - " + str(col) + " cases in Brazilian States",
    xaxis_title="Date",
    yaxis_title="Number of cases - " + temp[col].name
)

fig.update_layout({
"plot_bgcolor": "rgba(0, 0, 0, 0)",
"paper_bgcolor": "rgba(0, 0, 0, 0)",
})
iplot(fig)
print("You are able to hide some lines here")

You are able to hide some lines here


In [13]:
col = "Deaths"

fig = go.Figure()
for name, group in COVID19_BR.groupby('State'):
    #trace = go.Histogram()
    trace = go.Scatter(x = group.Date, y = group[col])
    trace.name = name
#    trace.x = group.Deaths
    fig.add_trace(trace)
#fig.update_xaxes(title_text = "Date")
#fig.update_yaxes(title_text = "Number of cases")

fig.update_layout(
    title="Cases over time - COVID-19 - " + str(col) + " cases in Brazilian States",
    xaxis_title="Date",
    yaxis_title="Number of cases - " + temp[col].name
)

fig.update_layout({
"plot_bgcolor": "rgba(0, 0, 0, 0)",
"paper_bgcolor": "rgba(0, 0, 0, 0)",
})
iplot(fig)
print("You are able to hide some lines here")

You are able to hide some lines here


In [14]:
print("Number States with confirmed cases over time")
temp = COVID19_BR[["Date","State","Confirmed"]].groupby("Date", as_index=False)["State"].count()
x1 = temp.Date
y1 = temp.State

trace1 = go.Scatter(x = x1, y = y1, name = y1.name)
data = [trace1]
layout = dict(title = 'Number States with confirmed cases over time - COVID-19 in Brazil',
              xaxis= dict(title= 'Date',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Number of states',ticklen= 5,zeroline= False),
              plot_bgcolor='white'
             )
fig = dict(data = data, layout = layout)

iplot(fig)

Number States with confirmed cases over time
