# COVID-19 Brazil Analysis
### What is the Brazilian status regarding the COVID-19 pandemic? 

![COVID-19](https://images.trustinnews.pt/uploads/sites/5/2020/04/28587115-1600x1067.jpg)

---

### Table of contents
* [Introduction](#intro)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and discussion](#results)
* [Conclusion](#conclusion)

---

# Introduction <a id='intro'></a>
Brazil is the 5th biggest country in the World, with a population size of 209 million distributed among 27 federative units. Brazil also has one of the biggest cities, São Paulo, with a population of 12.2 million. 

Brazil is an emergent country, with economic, poverty and corruption issues. Despite this fact, the country has [SUS](https://www.saude.gov.br/), one of the biggest public healthcare services in the world. Besides, rich states like São Paulo,  Rio de Janeiro, and Distrito Federal have an impressive health infrastructure composed of private and public services. Though, those states face problems related to high demographic density. However, poor states like Roraima, Acre, Piauí and other states from north and northeast regions suffer from extreme poverty, poor sanitation, and medical assistance.

## The problem
Given the recent COVID-19 pandemic, Brazil, like other countries around the world, is facing one of the hardest events of the century. The imminent risk of the healthcare system collapse mobilized the government, the companies, and the population towards prevention and mitigation strategies. Those strategies include quarantines, infected people's aid and, macro-economic actions that try to prevent the slow down of the economy.

### Based on these circumstances, what is the current situation of each Brazilian state? How do they compare to each other, considering the infection spread rate, the number of confirmed cases, the number of deaths and, other demographic features of each state?

## Acknowledgments
* [Brazilian Health Ministry](https://www.saude.gov.br/) is doing strong efforts trying to provide relevant information both on their official website and press releases;
* [Raphael Fontes](https://www.kaggle.com/unanimad/corona-virus-brazil) did an impressive work gathering, processing, publishing and daily updating a dataset containing the official COVID-19 data in Brazil.

* This work could not be possible without the combined efforts of many people.

---

# Data <a id='data'></a>
## Import, load and process data
* The main dataset used in this work was downloaded from the [Coronavirus - Brasil dataset](https://www.kaggle.com/unanimad/corona-virus-brazil), with daily reports published by the Brazilian Health Ministry and assembled by [Raphael Fontes](https://www.kaggle.com/unanimad);
* The [Brazilian Dataset](https://www.kaggle.com/thiagobodruk/brazilianstates) was used to enhance the information related to the Brazilian states, like population size, GDP, geo locations and other demographic data;
* Additionally, I created a dataset containing the number of ICU beds on each state;
* Finally, I downloaded a [datasets containing the airport traffic](https://pt.wikipedia.org/wiki/Lista_de_aeroportos_do_Brasil) by state.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from urllib.request import urlopen
from bs4 import BeautifulSoup
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.cluster import KMeans, AffinityPropagation
import json, requests

In [None]:
print('Last update on', pd.to_datetime('now'))

In [None]:
def load_data(url):
    return pd.read_csv(url)

In [None]:
df = load_data('../input/corona-virus-brazil/brazil_covid19.csv').drop(columns=['region']).rename(columns={'date':'Date','state':'State','cases':'Cases','deaths':'Deaths'})
brazil = load_data('../input/brazilianstates/states.csv').rename({'Demographic Density': 'Density', 'Cities count': 'Cities'}, axis=1)
airports = load_data('../input/brazilianstates/airports.csv').rename({'Passengers rate': 'Airport'}, axis=1)
icu = load_data('../input/brazilianstates/icu-beds.csv').rename({'Public beds per citizen': 'Public BPC', 'Private beds per citizen': 'Private BPC'}, axis=1)
df = df.merge(brazil, how='left', on='State')
df = df.merge(airports, how='left', on='UF')
df = df.merge(icu, how='left', on='UF')
df.head()

## Data Engineering
First, two new features were created, to improve the understanding of the context of each Brazilian state:
* `Mortality` shows the percentage of the mortality rate over the total cases count;
* `Subnotif` shows the number of estimated sub-notifications, based on the [global average mortality rate of 3.4%](https://www.worldometers.info/coronavirus/coronavirus-death-rate/). Every time the death rate goes over the global average, it means that there are unnoticed cases. The number of sub-notified cases is calculated using the following equation `sub = ((0.97*deaths)/0.034) - cases`.

In [None]:
def subnotif(row):
    d = row['Mortality']
    if d > 0.03: return ((0.97 * row['Deaths']) / 0.034) - row['Cases']
    else: return 0

    
def engineer_data(df):
    df['Date'] = pd.to_datetime(df['Date'])
    df['Mortality'] = df['Deaths']/df['Cases']
    df['Mortality'] = df['Mortality'].fillna(0)
    df['Subnotif'] = df.apply(subnotif, axis=1).astype(int)
    df['Cases per 1K'] = df['Cases']/(df['Population']/1000)
    df['Deaths per 1K'] = df['Deaths']/(df['Population']/1000)
    df = df.fillna(0)
    df = df[['Date', 'UF', 'State','Region', 'Capital', 'Cases', 'Deaths', 'Cases per 1K', 'Deaths per 1K', 'Mortality', 'Subnotif', 'Area','Cities',
             'Population', 'Density', 'Airport', 'GDP','GDP rate', 'Poverty','Latitude', 'Longitude','ICU beds',
             'Public beds','Private beds','Public BPC','Private BPC']]
    return df

df = engineer_data(df)
df.head()

---

# Methodology <a id='methodology'></a>
The main goal of this work is to analyze and compare each Brazilian state, based on different features, and try to create category clusters grouping them by similarity. To achieve this goal, the current methodology is based on two stages: data visualization and clustering models.

## Data Visualization
The Brazilian states will be compared, correlating different features, providing a visual approach for a better understanding of the context. New datasets will be created, covering different grouping and sorting modes, as demanded by each plot.

## Clustering Models
Afterward, different clustering models will be created and trained using the assembled data:
* K-means Clustering;
* Agglomerative Clustering;
* Spectral Clustering;
* Affinity Propagation.

Before the creation of the clustering models, the following steps seem necessary:
* Evaluation of the correlation of the features;
* Selection of the most relevant features;
* Normalization of the data, using `minMaxScaler`, if necessary;
* Binning of the data, if necessary.

---

# Analysis <a id='analysis'></a>
## What is the current status?
Before we start, let's create another dataset first, grouping the data by state, ordering by the number of cases and setting the state's abbreviation as the index.

In [None]:
pd.options.mode.chained_assignment = None
g = df.groupby('UF')
t = g.tail(1).sort_values('Cases', ascending=False).set_index('UF').drop(columns=['Date'])
t['Subnotif'][(t['Subnotif'] < 0)] = 0
t

Now, let's take a look at the number of cases and deaths over time. For practical purposes, the data was sliced considering the day when the number of cases reached 100.

In [None]:
c = df[['Date','Cases','Deaths']].groupby('Date').sum().reset_index()
c = c[(c['Cases'] >= 100)].melt(id_vars='Date', value_vars=['Cases', 'Deaths'])

fig = px.line(c, x='Date', y='value', color='variable')
fig.update_layout(title='COVID-19 in Brazil: total number of cases over time',
                  xaxis_title='Brazilian states', yaxis_title='Number of cases',legend_title='<b>COVID-19</b>',
                  legend=dict(x=0.02,y=0.98))
fig.show()

In [None]:
s = df[(df['UF'].isin([i for i in t.index[:7]]))]
s = s[(s['Cases'] >= 100)]

fig = px.line(s, x='Date', y='Cases', color='UF')
fig.update_layout(title='COVID-19 in Brazil: total number of cases over time',
                  xaxis_title='Date', yaxis_title='Number of cases', legend_title='<b>Top 7 states</b>',
                  legend=dict(x=0.02,y=0.98))
fig.show()

In [None]:
fig = px.scatter(t, x="Population", y="Cases", title="COVID-19 in Brazil: population size vs number of cases by state",
                 size="Cases", color="Mortality",hover_name=t.index, log_x=True, log_y=True, size_max=60)
fig.show()

In [None]:
fig = px.scatter(t, x="Density", y="Cases", title="COVID-19 in Brazil: demographic density vs number of cases by state",
                 size="Cases", color="Mortality",hover_name=t.index, log_x=True, log_y=True, size_max=60)
fig.show()

As seen above, SP has the biggest amount of cases, followed by RJ, CE, AM, MG, and PR. We can also notice a strong correlation between both population size and the number of cases, and demographic density and the number of cases. The next plot shows the total number of cases and deaths in each state.

In [None]:
fig = go.Figure(data=[
    go.Bar(name='Cases', x=t.index, y=t['Cases']),
    go.Bar(name='Deaths', x=t.index, y=t['Deaths'])
])
fig.update_layout(barmode='stack', title="COVID-19 in Brazil: number of cases by state", 
                  xaxis_title="Brazilian states", yaxis_title="Number of cases", legend_title='<b>COVID-19</b>',
                  legend=dict(x=0.84,y=0.5))
fig.show()

In [None]:
fig = px.scatter(t, x="Deaths", y="Cases", title="COVID-19 in Brazil: cases vs deaths by state",
                 size="Cases", color="Mortality",hover_name=t.index, log_y=True, log_x=True, size_max=60)
fig.show()

In [None]:
y = t.copy()
y.sort_values('Cases per 1K', ascending=False, inplace=True)
fig = go.Figure(data=[
    go.Bar(name='Cases per 1K citizens', x=y.index, y=y['Cases per 1K']),
    go.Bar(name='Deaths per 1K citizens', x=y.index, y=y['Deaths per 1K'])
])
fig.update_layout(barmode='stack', title="COVID-19 in Brazil: cases and deaths per 1K citizens by state", 
                  xaxis_title="Brazilian states", yaxis_title="Cases per 1K citizens", legend_title='<b>COVID-19</b>',
                  legend=dict(x=0.78,y=0.5))
fig.show()

SP has both the biggest number of confirmed cases and the biggest number of deaths, surpassing the 2nd place by more than 70%. Besides, we can notice a very strong correlation between the number of cases and the number of deaths.

In [None]:
fig = go.Figure(data=[
    go.Bar(x=t.sort_values('Airport', ascending=False).index, y=t.sort_values('Airport', ascending=False)['Airport'],
    text=t.sort_values('Airport', ascending=False)['Airport'],
    marker_color='#EF553B', name='Airport traffic'),
    go.Scatter(x=t.sort_values('Airport', ascending=False).index,
           y=np.full((1,len(t.index)), t['Airport'].mean()).tolist()[0], marker_color='blue', name='Brazilian avg')
])

fig.update_layout(title='Airport traffic by state',
                  xaxis_title='Brazilian states', yaxis_title='Airport traffic', legend_title='<b>COVID-19</b>',
                  legend=dict(x=0.75,y=0.95))
fig.show();

In [None]:
fig = px.scatter(t, x="Airport", y="Cases", title="COVID-19 in Brazil: airport traffic vs number of cases by state",
                 size="Cases", color="Mortality",hover_name=t.index, log_x=True, log_y=True, size_max=60)
fig.show()

As supposed, the number of passengers in the airports of each state seems to have has a strong correlation with the number of cases.

In [None]:
fig = go.Figure(data=[
    go.Bar(x=t.sort_values('Mortality', ascending=False).index,
    y=t.sort_values('Mortality', ascending=False)['Mortality'],
    text=round(t.sort_values('Mortality', ascending=False)['Mortality'], 2),
    marker_color='#EF553B', name='Mortality'),
    go.Scatter(x=t.sort_values('Mortality', ascending=False).index,
           y=np.full((1,len(t.index)), 0.03).tolist()[0], marker_color='blue', name='World avg')
])

fig.update_layout(title='COVID-19 in Brazil: average mortality by state',
                  xaxis_title='Brazilian states', yaxis_title='Deaths per case', legend_title='<b>COVID-19</b>',
                  legend=dict(x=0.75,y=0.5))
fig.show();

Dividing the number of deaths by the number of cases, we have the mortality rate. According to the World Health Organization, the [COVID-19 global average mortality rate is 3.4%](https://www.worldometers.info/coronavirus/coronavirus-death-rate/). The same source indicates that 80% of the infected people show mild or no symptoms, reducing the number of applied tests and, consequently, the number of confirmed cases. As seen above, many Brazilian states have mortality rate over the global average, indicating sub-notified cases.

Based on this assumption, we can calculate the estimated number of cases, considering the difference between the actual mortality rate and the global average rate. The equation is calculated as follows `((0.97*deaths)/0.034) - cases`.

In [None]:
s = df[['Date','Cases','Subnotif']].groupby('Date').sum().reset_index()
s = s.rename({'Cases':'Confirmed cases','Subnotif':'Sub-notified cases'}, axis=1)
s = s[(s['Confirmed cases'] >= 100)].melt(id_vars='Date', value_vars=['Confirmed cases', 'Sub-notified cases'])

fig = px.area(s, x='Date', y='value', color='variable')
fig.update_layout(title='COVID-19 in Brazil: total number of cases vs sub-notifications',
                  xaxis_title='Brazilian states', yaxis_title='Number of cases',legend_title='<b>COVID-19</b>',
                  legend=dict(x=0.02,y=0.98))
fig.show()

We can notice an important increase in the number of cases, considering that states like SP and RJ, responsible for more than 50% of the total number of cases, register mortality rates of over 6%, almost double the global average.

In [None]:
fig = go.Figure(data=[
    go.Bar(name='Cases', x=t.index, y=t['Cases']),
    go.Bar(name='Sub-notifications', x=t.index, y=t['Subnotif'])
])
fig.update_layout(barmode='stack', title="COVID-19 in Brazil: cases vs sub-notified cases by state", 
                  xaxis_title="Brazilian states", yaxis_title="Number of cases", legend_title='<b>COVID-19</b>',
                  legend=dict(x=0.75,y=0.5))
fig.show()

The plot above shows the Brazilian states with higher rates of sub-notifications, namely: SP, RJ, PE, AM, CE, PB, and PI. Those states probably face this problem due to different motives.
* States like SP and RJ have the biggest population size, resulting in a faster infection spread. Despite their infrastructure size, they can't handle a massive surge at once.
* Other northern states probably face problems related to the lack of healthcare infrastructure.

In [None]:
fig = go.Figure(data=[
    go.Bar(x=t.sort_values('Public BPC', ascending=False).index, y=t.sort_values('Public BPC', ascending=False)['Public BPC'],
    text=t.sort_values('Public BPC', ascending=False)['Public BPC'],
    marker_color='#EF553B', name='ICU beds per citizen'),
    go.Scatter(x=t.sort_values('Public BPC', ascending=False).index,
           y=np.full((1,len(t.index)), t['Public BPC'].mean()).tolist()[0], marker_color='blue', name='Brazilian avg')
])

fig.update_layout(title='Number of ICU beds per 10K citizen by state',
                  xaxis_title='Brazilian states', yaxis_title='ICU beds per 10K citizen', legend_title='<b>COVID-19</b>',
                  legend=dict(x=0.75,y=0.95))
fig.show();

In [None]:
fig = px.scatter(t, x="Cases", y="Public BPC", title="Number of ICU beds per 10K citizens vs population size",
                 size="Cases", color="Mortality",hover_name=t.index, log_y=False, log_x=True, size_max=60)
fig.show()

As seen, most Brazilian states with a mortality rate of over 3.4% have a number of public ICU beds close, or under, one bed per 10K citizens. The following plots show the evolution of the infection over the last weeks.

In [None]:
h = df[(df['Cases'] >= 100)].sort_values(['Date','UF'])
fig = go.Figure(data=go.Heatmap(
        z=h['Cases'],
        x=h['Date'],
        y=h['UF'],
        colorscale='Viridis'))

fig.update_layout(
    title='COVID-19 in Brazil: number of cases over time', xaxis_nticks=45)

fig.show()

In [None]:
fig = go.Figure(data=go.Heatmap(
        z=h['Mortality'],
        x=h['Date'],
        y=h['UF'],
        colorscale='Viridis'))

fig.update_layout(
    title='COVID-19 in Brazil: mortality rate over time', xaxis_nticks=45)

fig.show()

We can notice a fast increase in the number of cases in SP and slightly increases in CE, RJ, and AM. Besides, there was a massive increase in the mortality rate in CE, probably due to an increase in regional infection surveillance, mainly the increase in the number of detection tests since April 1st.

In [None]:
geo = json.load(open('/kaggle/input/brazil-geojson/brazil_geo.json'))
fig = px.choropleth(t, geojson=geo, locations=t.index, color='Mortality',
            scope="south america",labels={'Cases':'Cases'})
fig.update_layout(title='COVID-19 in Brazil: cases by state', margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

In [None]:
h = t.reset_index()
fig = px.treemap(h, path=['Region','UF'], values='Cases', color='Mortality', color_continuous_scale='thermal')
fig.update_layout(title='COVID-19 in Brazil: number of cases by region')
fig.show()

In [None]:
h = t.reset_index()
fig = px.treemap(h, path=['Region','UF'], values=t['Cases']/(t['Population']/1000), color='Mortality', color_continuous_scale='thermal')
fig.update_layout(title='COVID-19 in Brazil: cases per 1K citizens by region')
fig.show()

The Brazilian northeast region has the highest mortality rate with an average of 5.3%. Besides, 5 out of 9 states have mortality over 8%. Due to both economic and infrastructure issues, the region suffers from a poor healthcare service, when compared to southeast and south regions.

In [None]:
fig = px.scatter(t, x="Poverty", y="Mortality", title="COVID-19 in Brazil: poverty vs deaths by state",
                 size="Cases", color="Mortality",hover_name=t.index, log_x=False, log_y=False, size_max=60)
fig.show()

Interestingly, the extreme poverty rate seems to have a weak correlation with the mortality rate. Probably, the infection rate is higher in rich states, places with high-density and intense traffic of people. 

## Features correlation
Based on the data exploration, we can notice a stronger correlation with the number of cases and the following features:
* Deaths count
* Sub-notifications count
* Population size
* Airports traffic
* ICU beds count

In [None]:
t.replace({'North':0, 'Northeast': 1, 'Center-west': 2, 'South': 3, 'Southeast': 4}, inplace=True)

fig, ax = plt.subplots(figsize=(12,10))
sns.heatmap(t.corr(), vmin=0, cmap='YlGnBu')
plt.show();

In [None]:
corr = t.corr().iloc[[0,1]].transpose()
corr = corr[(corr['Cases'] > 0.25)].sort_values('Cases', ascending=False)
features = corr.index.tolist()
features.append('Mortality')
features.remove('Private BPC')
print('Selected features:', features)

## Normalizing, binning and splitting
After the features selection, a subset of the states' dataset was created. This new subset will be used to store the normalized and binned data.

In [None]:
d = t[features].copy()
d.head()

Two methods, `create_bins()` and `normalize_data()` were created, easing the binning and the normalizing process.

In [None]:
def create_bins(df, columns, q=5):
    for column in columns:
        df[column] = pd.qcut(df[column], q, duplicates='drop').cat.codes

def normalize_data(df, columns):
    minMaxScaler = MinMaxScaler()
    df[columns] = minMaxScaler.fit_transform(d[columns])

Then, all the selected features were binned into 7 bins, except `Cases`, `Deaths` and `Mortality`, which are important scalar features. The main objective of this segmentation is to create a classification scale with seven tiers. Next, all the features were normalized using `minMaxScaler`, converting all the data to an interval between zero and one.

In [None]:
create_bins(d, ['Airport', 'Subnotif', 'ICU beds', 'Private beds', 'Population', 'Public beds', 'Cities', 'GDP rate', 'GDP', 'Density'], q=7)
normalize_data(d, d.columns)
d.head()

## K-means Clustering
Finally, a K-means Clustering model was created with 5 clusters, creating two levels over the average and two levels above the average. Then, the model was fit with the selected features and the clusters predicted. The output was set both on the normalized dataset and the original one.

In [None]:
kmeans = KMeans(n_clusters=5)
pred = kmeans.fit_predict(d[d.columns])
t['K-means'], d['K-means'] = [pred, pred]
d[d.columns].sort_values(['K-means','Cases','Deaths'], ascending=False).style.background_gradient(cmap='YlGnBu', low=0, high=0.2)

As expected, the model classified the Brazilian states according to the correlations matrix shown above.

In [None]:
fig = px.treemap(t.reset_index(), path=['K-means','UF'], values='Cases')
fig.update_layout(title='K-means clusters')
fig.show()

In [None]:
c = t.sort_values(['K-means','Cases'], ascending=False)
data = [go.Bar(x=c[(c['K-means'] == i)].index, y=c[(c['K-means'] == i)]['Cases'],text=c[(c['K-means'] == i)]['Cases'],name=i) for i in range(0,5)]
fig = go.Figure(data=data)

fig.update_layout(title='K-means Clustering: number of cases by cluster',
                  xaxis_title='Brazilian states', yaxis_title='Deaths per case', legend_title='<b>Clusters</b>',
                  legend=dict(x=0.02,y=0.5))
fig.show();

In [None]:
c = t.sort_values(['K-means','Mortality'], ascending=False)
data = [go.Bar(x=c[(c['K-means'] == i)].index, y=c[(c['K-means'] == i)]['Mortality'],text=c[(c['K-means'] == i)]['Mortality'],name=i) for i in range(0,5)]
data.append(go.Scatter(x=t.sort_values('Mortality', ascending=False).index,
           y=np.full((1,len(t.index)), 0.03).tolist()[0], marker_color='black', name='World avg'))

fig = go.Figure(data=data)
fig.update_layout(title='K-means Clustering: mortality rate by cluster',
                  xaxis_title='Brazilian states', yaxis_title='Deaths per case', legend_title='<b>Clusters</b>',
                  legend=dict(x=0.82,y=0.5))
fig.show();

---

# Results and discussion<a id='results'></a>
The description of each generated group is described as follows.

## Small, disprepared and deadly
* Composed by Piauí (PI), Paraíba (PB), Sergipe (SE), Alagoas (AL), Pará (PA), Maranhã (MA) and Rio Grande do Norte (RN);
* Those states present a small number of cases. Although, they have the highest mortality rate, over 17%, indicating a very poor infection monitoring;
* Beyond that fact, those states have mostly a high-rate of geographic density, despite the reduced population size;
* This cluster is composed mainly by states from the Northeast region;
* Those regions are slightly poor when compared to the Southeast and South regions, presenting a small GDP and, a reduced number of ICU beds;
* The airport traffic is also very reduced, compared to the other regions.

## Medium, touristic and contagious
* Composed by Pernambuco (PE), Amazonas (AM) and Ceará (CE);
* Those states present an intermediate number of cases. They have also a very high mortality rate, indicating poor infection monitoring;
* This cluster is composed mainly by states from the Northeast region;
* Besides, those states have an intermediate geographic density rate and population size, compared to the other states;
* The economy in this region is mainly based on the tourism, presenting a small to intermediate GDP rate, compared to the other states;
* The number of ICU beds is slightly over the national average;
* The airport traffic is intense in this region, due to the fact this is a tourist destination.

## Big, well prepared and a bit safer
* Composed by Minas Gerais (MG), Paraná (PR), Bahia (BA), Santa Catarina (SC), Rio Grande do Sul (RS), Distrito Federal (DF), Espírito Santo (ES), Goiás (GO) and, Mato Grosso (MT);
* Those states have an average mortality rate very close, or under the global average, indicating a better infection monitoring and prevention;
* All the states have an intermediate amount of cases;
* They have high GPD, a high number of ICU beds and relevant airport traffic;
* The population size and demographic density are smaller than the first group, but remain big and relevant when compared to the other states;
* The airport traffic is intense, mostly related to business and tourism travels.

## Medium, distant and infected
* Composed by Rondônia (RO), Roraima (RR), Acre (AC), Mato Grosso do Sul (MS), Amapá (AP) and, Tocantins (TO);
* Those states present a small number of cases. Although, they have a high mortality rate, indicating a poor infection monitoring;
* Besides, those states have mostly a low-rate of geographic density and reduced population size;
* This cluster is composed mainly by states from the North region;
* Those regions are slightly poor when compared to other regions, presenting a small GDP and, the smallest number of ICU beds;
* The airport traffic is the smallest, indicating a low traffic of people.

## Huge, populous and highly infectious
* Composed by São Paulo (SP) and Rio de Janeiro (RJ);
* The two states combined are responsible for over 50% percent of the COVID-19 cases and have an average mortality rate that doubles the global average, indicating a high-rate of sub-notifications;
* São Paulo occupies the 1st place on the number of cases ranking and Rio de Janeiro the 2nd place;
* Both states have also the highest number of deaths, compared to the other states;
* São Paulo and Rio de Janeiro are considered two colossi, considering their population size, demographic density, GDP rate, number of ICU beds and, airport traffic, when compared to the other states;
* The airport traffic is extremely intense, regarding the fact the states have the biggest international airports in Brazil.

---

# Conclusion <a id='conclusion'></a>
At the present date, Brazil reached almost 20K cases and over 1K deaths. As seen in this report, the Brazilian pandemic has multiple foci, distributed all over the country. States like São Paulo, Rio de Janeiro, and Ceará, have the biggest number of cases doubling the number of cases every 4 days. Due to the lack of available tests, Brazilian public healthcare services are testing just a small group of people, especially the most severe cases. This poor monitoring leads to a higher number of sub-notified cases and, probably, a surge in the number of confirmed cases in the next days.

The population size and the traffic of passengers seem to have an important influence on the infection spread. The number of ICU beds and proper healthcare infrastructure also show relevance to the mortality rate. Big states like São Paulo and Rio de Janeiro are suffering from the rapid spread of the infection due to the high demographic density. Despite their big healthcare infrastructure, those states can't handle the pandemic surge rapidly. In another hand, poor states, mostly northern and northeastern states, don't have proper healthcare infrastructure, considering the low number of ICU beds per citizen. This handicap leads to a very low rate of infection tests and monitoring, culminating into a very high mortality rate, due to the high number of sub-notified cases. Some states, like Paraná, Santa Catarina and Distrito Federal, present an intermediate scenario, due to a high GDP rate, high number of ICU beds per citizen and, lower demographic density. Those states deal with a lower number of cases and a good monitoring of the infection spread, resulting on a mortality rate under, or very close to the global average.

Despite the recent regional quarantines and the mitigation efforts, Brazil will face its critical moment in the next few weeks, between April 15th and May 15th. Currently, the social media and the traditional press have a very important role in reinforcing the importance of social distancing and other prevention efforts to avoid the healthcare system collapse. Although, a crescent number of people, including the Brazilian president himself defend what they call the "vertical quarantine", instead of the traditional horizontal quarantine. Based on this strategy, just the risk group should be isolated and the healthy citizens should circulate normally. As seem in this report, the higher the demographic density, the higher the infection rate, leading to a rapid increase in the number of cases. Despite the fact that 80% of the infected people will have mild or no symptoms, 15% of the share will need intensive healthcare.

Finally, based on this analysis, the social isolation and the public efforts to prevent the rapid infection spread seem very appropriate. Mainly in high-density areas and regions with poor healthcare services.

---

## Disclaimer
Due to the complex nature of a pandemic, this work does not intend to be an accurate projection or a model that tries to reproduce the complexity of reality. The main goal of this project is to propose a reflection of the importance of social distancing, quarantine and other infection prevention efforts, to minimize the pandemic effects and try to flat the infection curve.

### Thank you! Please, upvote if you like it. :)