# Coronavirus in Brazil

The current notebook proposes a visual analysis of the present Coronavirus pandemic in Brazil, comparing different states' situations over time. The [COVID-19 dataset](https://www.kaggle.com/unanimad/corona-virus-brazil) was provided by [Raphael Fontes](https://www.kaggle.com/unanimad), gathered from Brazilian's Health Ministry.

---

# Table of contents

1. [Acknowledgment](#acknowledgment)
1. [Motivation](#motivation)
1. [COVID-19 in Brazil](#brazil)
1. [Methodology and results](#methodology)
1. [Conclusion](#conclusion)
1. [Disclaimer](#disclaimer)

---

<div id="acknowledgment"></div>
# Acknowledgment
* The [COVID-19 cases dataset](https://www.kaggle.com/unanimad/corona-virus-brazil) used on this notebook was kindly built and provided by Raphael Fontes, using oficial data published daily by the Brazilian Health Ministry.

<div id="brazil"></div>
# COVID-19 in Brazil
Recently, Brazil passed 10K cases on April 4th and 500 deaths on April 6th. Currently, the Brazilian pandemic has multiple foci, mainly in São Paulo, Rio de Janeiro, and Ceará, doubling the number of cases every 4 days. Brazilian Public Healthcare Services is testing just a small group of people, especially the most severe cases, due to the lack of viral tests. This leads to a high number of undetected cases and, probably, a high increase in the number of confirmed cases in the next days.

> Due to the crescent number of confirmed cases, on March 20th the Brazilian Health Ministry stopped counting the number of suspected cases of some states.

<div id="methodology"></div>
# Methodology and results

## Data Processing
First, I downloaded the Brazilian states dataset, containing data related to states names, abbreviations, population size and geographic coordinates.

In [None]:
import numpy as np
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import plotly.express as px
import folium, json, math, requests, plotly, warnings
from branca.colormap import linear
from bs4 import BeautifulSoup

In [None]:
print('Last update on', pd.to_datetime('now'))

In [None]:
warnings.filterwarnings('ignore')

In [None]:
states = pd.read_csv('../input/brazilianstates/states.csv')
states.columns = map(str.lower, states.columns)
states.head()

Afterward, I downloaded the Brazilian COVID-19 dataset, containing the number of cases by state, then I added a new column to the dataset, named potentials. This column contains the share of the population who will probably be infected by the COVID-19, based on the average infection rate of 80% over the total population, as stated by the [World Health Organization](https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200306-sitrep-46-covid-19.pdf?sfvrsn=96b04adf_2).

In [None]:
corona = pd.read_csv('/kaggle/input/corona-virus-brazil/brazil_covid19.csv').drop('region', axis=1)
corona = corona.merge(states, on='state', how='left')
corona['potentials'] = corona['population'] * 0.8
corona['potentials'] = corona['potentials'].astype(int)
corona['date'] = pd.to_datetime(corona['date'])
corona = corona[(corona['cases'] > 0)]
corona.head()

In [None]:
corona.dtypes

## Total number of cases by state
Next, I created a new dataframe named group_uf, containing the number of cases grouped by state, then plotted them on a heatmap for easy data visualization. A new plot with the total number of cases was also created.

In [None]:
group_uf = corona.groupby('state')
uf = group_uf.tail(1).sort_values('cases', ascending=False).drop(columns=['date']).set_index('state')
uf.style.background_gradient(cmap='Reds', subset=['suspects','refuses','cases','deaths'])

In [None]:
d = {'cases' : uf['cases'].sum(), 'deaths': uf['deaths'].sum()}
total = pd.DataFrame(d.items(), columns=['type', 'total_count']).set_index('type')
total

In [None]:
fig, ax = plt.subplots(figsize=(12, 8))
plt.bar(x=total.index, height=total['total_count'], color=['limegreen','red'])

[ax.annotate('%s' % y, xy=(x-0.03,y+500), fontsize=14, fontweight='bold') for x,y in zip(range(0,2), total['total_count'])]
[ax.spines[side].set_visible(False) for side in ['left','right','top']]
plt.grid(which='major', axis='y')
plt.xlabel(None)
plt.ylabel('Cases count')
plt.xticks(fontsize=14)
plt.yticks(fontsize=12)
plt.title('COVID-19: number of cases in Brazil', fontsize=16, fontweight='bold', color='#333333')
plt.show();

## Number of confirmed cases over time per state
For a better visualization of the infection growth rate, a new dataframe `cumulated` was created. This dataframe groups the number of cases by state over time. New columns were added: the number of new cases and deaths by day and the growth rate of new cases and deaths. Then a plot was create showing the COVID-19 evolution in Brazil.

In [None]:
cumulated = corona.groupby('date').sum().reset_index()
cumulated = cumulated[(cumulated['cases'] >= 100)]
cumulated['new_cases'] = cumulated['cases'].diff().fillna(0).astype(int)
cumulated['growth_cases'] = cumulated['cases'].diff().fillna(0).astype(int)/cumulated['cases']
cumulated['new_deaths'] = cumulated['deaths'].diff().fillna(0).astype(int)
cumulated['growth_deaths'] = cumulated['deaths'].diff().fillna(0).astype(int)/cumulated['deaths']
cumulated.head()

In [None]:
fig, ax = plt.subplots(figsize=(14, 10))
plt.plot(cumulated['date'], cumulated['cases'], color='limegreen', linewidth=8, alpha=0.5, marker='o')
plt.plot(cumulated['date'], cumulated['deaths'], color='red', linewidth=4, alpha=0.9, marker='o')
plt.bar(cumulated['date'], cumulated['new_cases'])
[ax.annotate('%s' % y, xy=(x,y+100), fontsize=10) for x,y in zip(cumulated['date'], cumulated['cases'])]

plt.xticks(rotation=90, ha='right')
plt.title('COVID-19: number of cases in Brazil', fontsize=18, fontweight='bold', color='#333333')

plt.ylabel('Number of cases', fontsize=12)
plt.xlabel(None)

plt.axvline('2020-03-16', 0, 1200, c='#CCCCCC', linestyle='--', linewidth=2, alpha=1)
ax.annotate('Companies start home-office', xy=('2020-03-16',19000), fontsize=12, rotation=90)
plt.axvline('2020-03-21', 0, 1200, c='#CCCCCC', linestyle='dotted', linewidth=2, alpha=1)
ax.annotate('SP government declares quarantine', xy=('2020-03-21',19000), fontsize=12, rotation=90)

plt.legend(loc=2, labels=['cases','deaths'], fontsize=14)

plt.grid(which='major', axis='y')
[ax.spines[side].set_visible(False) for side in ['left','right','top']]
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m'))
plt.show();

In [None]:
fig, ax = plt.subplots(figsize=(14, 10))

threshold = 1

labels = []
for s in uf.index[:7]:
    plt.plot(corona['date'][(corona['state'] == s)], corona['cases'][(corona['state'] == s)], linewidth=4, alpha=0.9)
    labels.append(s)
    
plt.xticks(rotation=90, ha='right')
plt.title('COVID-19: number of cases per state in Brazil', fontsize=18, fontweight='bold', color='#333333')

plt.ylabel('Number of cases', fontsize=12)
plt.xlabel(None)

plt.legend(loc=6, fontsize=14, labels=labels)

plt.grid(which='major', axis='y', color='#EEEEEE')
[ax.spines[side].set_visible(False) for side in ['left','right','top']]
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m'))
plt.show();

## Number of cases over time per states
The following chart shows the infection progression over time per state, comparing:
* The number of confirmed cases per day
* The number of suspect cases per day

In [None]:
plt.subplots(figsize=(40, 60))
for s, i in zip(uf.index, range(1,len(uf))):
    ax = plt.subplot(9,3,i)
    plt.subplots_adjust(bottom=-0.1)
    plt.xticks(rotation=90, ha='right', fontsize=16)
    plt.yticks(fontsize=16)
    
    c = corona[(corona['state'] == s)]
    plt.plot(c['date'], c['cases'], linewidth=8, color='limegreen', alpha=0.5, marker='o')
    plt.plot(c['date'], c['deaths'], linewidth=8, color='red', alpha=0.7, marker='o')

    ax.text(0.05,0.9,s, transform=ax.transAxes, fontsize=24, fontweight='bold')
    plt.ylabel(None)
    plt.xlabel(None)
    plt.legend(labels=['cases','deaths'], loc='center left', fontsize=16)
    plt.grid(which='major', axis='y')
    [ax.spines[side].set_visible(False) for side in ['left','right','top']]
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m'))
    
plt.show();

Next, a new barplot was created to help visualize the data.

In [None]:
fig, ax = plt.subplots(figsize=(12, 8))
plt.bar(uf['uf'], uf['cases'], color='limegreen')

[ax.annotate('%s' % y, xy=(x-0.5,y+100), fontsize=10, fontweight='bold') for x,y in zip(range(0,27), uf['cases'])]
[ax.spines[side].set_visible(False) for side in ['left','right','top']]
plt.grid(which='major', axis='y')
plt.ylim(0,6000)
plt.xlabel(None)
plt.ylabel('Cases count')
plt.xticks(fontsize=14)
plt.yticks(fontsize=12)
plt.title('COVID-19: number of cases per state in Brazil', fontsize=16, fontweight='bold', color='#333333')
plt.show();

In [None]:
url = '/kaggle/input/brazil-geojson/brazil_geo.json'
geo = json.load(open(url))

In [None]:
df = uf.reset_index().set_index('uf')
df.head()

In [None]:
fig, ax = plt.subplots(figsize=(12,10))
s = sns.scatterplot(data=df, x='deaths', y='cases', hue='region', s=300, alpha=0.5)
plt.legend(loc=5,markerscale=1.5, frameon=False, fontsize=12)
[ax.spines[side].set_visible(False) for side in ['left','right','top']]
plt.grid(which='major', axis='both', color='#EEEEEE')
plt.ylim(0,7000)
plt.xlabel('Deaths', fontsize=12)
plt.ylabel('Cases', fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

d = df.reset_index().loc[:4,:]
for i in range(0,len(d)):
    plt.annotate(d.iloc[i]['uf'], xy=(d.iloc[i]['deaths']-3.5, d.iloc[i]['cases']+125))

plt.title('COVID-19: number of cases and deaths per state in Brazil', fontsize=16, fontweight='bold', color='#333333')
    
plt.show();

In [None]:
colormap = linear.YlOrRd_09.scale(0,5000)

map = folium.Map(
    width=800, height=600,
    location=[-15.77972, -47.92972], 
    zoom_start=4
)
folium.GeoJson(
    geo,
    name='cases',
    style_function=lambda feature: {
        'fillColor': colormap(df['cases'][feature['id']]),
        'color': 'black',
        'weight': 0.4,
    }
).add_to(map)
colormap.caption = 'Confirmed COVID-19 cases per state'
colormap.add_to(map)

map

<div id='conclusion'></div>
# Conclusion
Despite the high rate of subnotifications and unnotices cases, and considering the recent social distancing and quarantine efforts, Brazil will face something between 50 and 140 million infected people during the next 60 days. According to the World Health Organization, 80% of the infected people manifest mild symptoms or no sign of infection. It's evident the importance of social distancing and quarantine efforts to minimize the infection spread rate and avoid the healthcare infrastructure collapse.

## Disclaimer
Due to the complex nature of a pandemic, this work does not intend to be an accurate projection or a model that tries to reproduce the complexity of reality. The main goal of this project is to propose a reflection of the importance of social distancing, quarantine and other infection prevention actions, to minimize the pandemic effects and try to flat the infection curve.

## Thank you and stay home! :)