# **Exploring coronavirus data from Brazil**

![](https://image.freepik.com/free-vector/big-data-research-coronavirus-disease-3d-vector-neon-illustration-virus-data-cloud-futuristic-virology-analysis-sars-pathogen-exploration-concept_1217-1713.jpg)

We will do a simple data analysis from data of Covid-19 spread on Brazil.

We will begin importing some useful libraries.

In [None]:
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go

In [None]:
df = pd.read_csv('/kaggle/input/corona-virus-brazil/brazil_covid19.csv')

In [None]:
df

Suspects cases.
We will search for the states that have more cases.

In [None]:
df_ranked = df.groupby("state").last().drop(columns=['date']).sort_values(by=['cases'], ascending=False)
df_ranked

São Paulo, Rio de Janeiro and Ceara is the states with more confirmed cases.
Pernambuco even though it is the fifth in number of cases, is the third in death numbers.

Let's see how is the cases increasing overtime.

In [None]:
cases = df[['date','cases','deaths']].groupby('date').sum().reset_index()

#Starting the plotting after cases is more than 0.
cases = cases[(cases['cases'] > 0)].melt(id_vars =['date'], value_vars =['cases','deaths']) 

fig = px.line(cases, x='date', y='value', color='variable')
fig.update_layout(title='Data from Covid19 on Brazil overtime.',
                  xaxis_title='States', yaxis_title='Number of cases',legend_title='<b>COVID-19</b>',
                  legend=dict(x=0.03,y=0.98))
fig.show()

In [None]:
states = df[(df['state'].isin([i for i in df_ranked.index[:5]]))]
states = states[(states['cases'] > 0)]

fig = px.line(states, x='date', y='cases', color='state')
fig.update_layout(title='Data from Covid19 on Brazil overtime(Cases).',
                  xaxis_title='Date', yaxis_title='Number of cases', legend_title='<b>Rank of top 5 states</b>',
                  legend=dict(x=0.03,y=0.98))
fig.show()

As we can see on the plots, around March 15º and 25º we had a great increasing in the cases numbers on these states. 
At the end of March, São Paulo starts to have an accelerated growth of cases of covid-19.

In [None]:
states = df[(df['state'].isin([i for i in df_ranked.index[:5]]))]
states = states[(states['deaths'] > 0)]

fig = px.line(states, x='date', y='deaths', color='state')
fig.update_layout(title='Data from Covid19 on Brazil overtime(Deaths).',
                  xaxis_title='Date', yaxis_title='Number of deaths', legend_title='<b>Rank of top 5 states</b>',
                  legend=dict(x=0.03,y=0.98))
fig.show()

São Paulo is also first in number of deaths by covid-19, followed by Rio de Janeiro and Pernambuco, that passed Ceara in the first week of April.

In [None]:
fig = go.Figure(data=[
    go.Bar(name='Cases', x=df_ranked.index, y=df_ranked['cases']),
    go.Bar(name='Deaths', x=df_ranked.index, y=df_ranked['deaths'])
])
fig.update_layout(barmode='stack', title="COVID-19 in Brazil: number of cases by state", 
                  xaxis_title="States", yaxis_title="Number of cases", legend_title='<b>COVID-19</b>',
                  legend=dict(x=0.90,y=0.5))
fig.show()

Let's create a dataframe ranked by region.

In [None]:
#Creating dataframe ranked by region.
df_ranked_region = df_ranked.groupby("region").agg({'cases':'sum', 'deaths':'sum'}).sort_values(by=['cases'], ascending=False)
df_ranked_region

Now let's analyze the data distributed by regions.

In [None]:
plt.figure(figsize=(12,12))

df_ranked_region['cases'].plot( kind='pie'
                       , autopct='%1.1f%%'
                       , shadow=True
                       , startangle=10)

plt.title('Covid-19 Distribution - Cases on Brazil regions',size=25)
plt.legend(loc = "upper right"
           , fontsize = 10
           , ncol = 1 
           , fancybox = True
           , framealpha = 0.80
           , shadow = True
           , borderpad = 1);

The southeastern region has almost 60% of all country confirmed cases.

In [None]:
plt.figure(figsize=(12,12))

df_ranked_region['deaths'].plot( kind='pie'
                       , autopct='%1.1f%%'
                       , shadow=True
                       , startangle=10)

plt.title('Covid-19 Distribution - Deaths toll on Brazil regions',size=25)
plt.legend(loc = "upper right"
           , fontsize = 10
           , ncol = 1 
           , fancybox = True
           , framealpha = 0.80
           , shadow = True
           , borderpad = 1);

The southeastern region has the highest death toll in Brazil.

**CONCLUSION**

The southeastern region is the most affected by the nouveau Coronavirus until now. This region is the most populous, so it would be interesting to get more data of population of the States. Maybe numbers of international flights and isolation level could contribute to a deeper analysis and predict the infection spread.