# Overview: Covid-19 analysis in the brazilian state, Maranhao

The following notebook is an attempt to analyse the cases of COVID-19 in the brazilian state, Maranhao, the data is available in [Covid-19 - Brazil by brasil.io](https://brasil.io/dataset/covid19/caso/) and [Covid-19 - by Raphael Fontes](https://www.kaggle.com/unanimad/corona-virus-brazil). In addition, it is an opportunity to practice my recent knowledge in Data Analysis. If ou have any suggestion, please do not hesitate to comment in the comment section.
![Covid-19](https://newslab.com.br/wp-content/uploads/2020/02/750x450_covid.jpg) 

## Methodology
* Firstly, I extracted the data provided, exploring and organizing the data in a way to make useful for creating the plots;
* Then I showed how the cases, deaths and death rates are across the state;
* Finally, I compared Maranhao to other states on a regional and national basis.

![MA](http://4.bp.blogspot.com/-Sj5s_wZfd54/TvUPE6W95II/AAAAAAAAL_4/I0-OIe_1puc/s400/mapa_localizacao_maranhao_no_Brasil.png)

This notebook was inspired by the notebook made by Elloá B.Guedes.

Check it out: [Overview - Brazil](https://www.kaggle.com/elloaguedes/panorama-do-covid-19-no-brasil?rvi=1)

**Disclaimer: This notebook does not intend to be an accurate analysis that tries to reproduce the complexity of reality. The information in this notebook provides an approach to better understand the situation through data visualization.**



### Inputs

In [None]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
import json
%matplotlib inline

# Bokeh
from bokeh.io import output_file,show,output_notebook,push_notebook
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource,HoverTool,CategoricalColorMapper
from bokeh.layouts import row,column,gridplot
from bokeh.models.widgets import Tabs,Panel
from bokeh.models import GeoJSONDataSource
output_notebook()

# plotly packages
import plotly.express as px
from plotly.subplots import make_subplots
from plotly.graph_objs import *
import plotly.graph_objects as go



# Data cleaning & organization

### Dataframe Maranhao

In [None]:
# Dataframe COVID-19 - Maranhao
path= "../input/covid19-ma/covid19_ma.csv"
df = pd.read_csv(path, encoding ='utf-8')
df.head(5)

In [None]:
# Basic info
df.info()

In [None]:
# Selecting the columns we want
df_ma = df[['date','epidemiological_week','city_ibge_code','city','estimated_population',
            'last_available_confirmed','new_confirmed','last_available_deaths','new_deaths',
            'last_available_confirmed_per_100k_inhabitants','last_available_death_rate']]

In [None]:
# Transforming NaN cells
df_ma = df_ma.fillna(0)
df_ma.head(3)

In [None]:
# Adding the region column for further analysis: Capital, Metropolitan city, Countryside
# Creating Imepratriz dataframe for further analysis
df_ma['region'] = df_ma.city
def region_func(row):
    if row.region == 'São Luís':
        row.region = 'Capital'
    elif row.region == 'Imperatriz':
        row.region = 'Imperatriz'
    elif row.region == 'Raposa':
        row.region = 'Metropolitan'
    elif row.region == 'São José de Ribamar':
        row.region = 'Metropolitan'
    elif row.region == 'Paço do Lumiar':
        row.region = 'Metropolitan'
    else:
        row.region = 'Countryside'
    return row

df_ma = df_ma.apply(region_func, axis='columns')
df_ma.head(3)

In [None]:
zero_rows = df_ma[df_ma['city'] == 0].index
df_ma.drop(zero_rows, inplace = True)
df_ma.tail()

## Which period is the dataframe refered to?

In [None]:
import datetime
print('Primeiro caso confirmado: ', datetime.datetime.strptime(df.date.min(), '%Y-%m-%d').strftime('%d/%m/%Y'))
print('Último caso confirmado: ', datetime.datetime.strptime(df.date.max(), '%Y-%m-%d').strftime('%d/%m/%Y'))

In [None]:
# Object of the last date available
last_date = datetime.datetime.strptime(df_ma.date.max(), '%Y-%m-%d').strftime('%d/%m/%Y')
print(last_date)

# Data analysis

**Firstly, we will divide the state in three regions: Capital, Countryside and the Metropolitan region, the last consists of the cities of Raposa, Paço do Lumiar and São José  de Ribamar.**
* Creating Imperatriz dataset to look how covid-19 is acting in this city specifically since it is the second biggest city in number of inhabitants in Maranhao.

In [None]:
# Dividing by region
countryside = df_ma[df_ma['region']== 'Countryside']
capital = df_ma[df_ma['region']== 'Capital']
metropolitan = df_ma[df_ma['region']== 'Metropolitan']
# Adding Imperatriz
itz = df_ma[df_ma['city']== 'Imperatriz']

In [None]:
# Table - Countryside
# Agreggating data: confirmed and deaths
cases = countryside.groupby(['date']).last_available_confirmed.sum()
deaths = countryside.groupby(['date']).last_available_deaths.sum()


# Creating a table
tabelax = pd.DataFrame({"cases": cases}).reset_index()
tabelay = pd.DataFrame({'deaths': deaths}).reset_index()

# Joining the tables
tabela_countryside = tabelax.join(tabelay.set_index('date'), on = 'date')

# Adding death rate to the table
# Death rate is defined by the number of deaths divided by the number of confirmed cases
tabela_countryside['death_rate'] = tabela_countryside['deaths']/tabela_countryside['cases']

# Number of new cases every day
tabela_countryside['new_cases'] = tabela_countryside.cases.diff()

# Número of new deaths every day
tabela_countryside['new_deaths'] = tabela_countryside.deaths.diff()
tabela_countryside.tail()

In [None]:
# Table - Capital
# Agreggating data: confirmed and deaths
cases = capital.groupby(['date']).last_available_confirmed.sum()
deaths = capital.groupby(['date']).last_available_deaths.sum()


# Creating a table
tabelax = pd.DataFrame({"cases": cases}).reset_index()
tabelay = pd.DataFrame({'deaths': deaths}).reset_index()

# Joining the tables
tabela_capital = tabelax.join(tabelay.set_index('date'), on = 'date')

# Adding death rate to the table
# Death rate is defined by the number of deaths divided by the number of confirmed cases
tabela_capital['death_rate'] = tabela_capital['deaths']/tabela_capital['cases']

# Number of new cases every day
tabela_capital['new_cases'] = tabela_capital.cases.diff()

# Number of new deaths every day
tabela_capital['new_deaths'] = tabela_capital.deaths.diff()
tabela_capital.tail()

In [None]:
# Table - Metropolitan
# Agreggating data: confirmed and deaths
cases = metropolitan.groupby(['date']).last_available_confirmed.sum()
deaths = metropolitan.groupby(['date']).last_available_deaths.sum()


# Creating a table
tabelax = pd.DataFrame({"cases": cases}).reset_index()
tabelay = pd.DataFrame({'deaths': deaths}).reset_index()

# Joining the tables
tabela_metropolitan = tabelax.join(tabelay.set_index('date'), on = 'date')

# Adding death rate to the table
# Death rate is defined by the number of deaths divided by the number of confirmed cases
tabela_metropolitan['death_rate'] = tabela_metropolitan['deaths']/tabela_metropolitan['cases']

# Number of new cases every day
tabela_metropolitan['new_cases'] = tabela_metropolitan.cases.diff()

# Número of new deaths every day
tabela_metropolitan['new_deaths'] = tabela_metropolitan.deaths.diff()
tabela_metropolitan.tail()

In [None]:
# Table - Imperatriz
# Agreggating data: confirmed and deaths
cases = itz.groupby(['date']).last_available_confirmed.sum()
deaths = itz.groupby(['date']).last_available_deaths.sum()

# Creating a table
tabelax = pd.DataFrame({"cases": cases}).reset_index()
tabelay = pd.DataFrame({'deaths': deaths}).reset_index()

# Joining the tables
tabela_itz = tabelax.join(tabelay.set_index('date'), on = 'date')

# Adding death rate to the table
# Death rate is defined by the number of deaths divided by the number of confirmed cases
tabela_itz['death_rate'] = tabela_itz['deaths']/tabela_itz['cases']

# Number of new cases every day
tabela_itz['new_cases'] = tabela_itz.cases.diff()

# Número of new deaths every day
tabela_itz['new_deaths'] = tabela_itz.deaths.diff()
tabela_itz.tail()

### Total confirmed cases per region

In [None]:
fig_region = go.Figure()
fig_region.add_trace(go.Scatter(x=tabela_countryside['date'], y=tabela_countryside['cases'] , 
                         line=dict(color='green', width=3),
                         name = 'Demais Regiões'))
fig_region.add_trace(go.Scatter(x=tabela_capital['date'], y=tabela_capital['cases'] , 
                         line=dict(color='navy', width=3),
                         name = 'São Luís'))
fig_region.add_trace(go.Scatter(x=tabela_metropolitan['date'], y=tabela_metropolitan['cases'] , 
                         line=dict(color='darkorange', width=3),
                         name = 'Região Metropolitana'))
fig_region.add_trace(go.Scatter(x=tabela_itz['date'], y=tabela_itz['cases'] , 
                         line=dict(color='steelblue', width=3),
                         name = 'Imperatriz'))
fig_region.update_xaxes(title = 'Data')
fig_region.update_yaxes(title = 'Quantidade(Escala Logarítmica)')
fig_region.update_layout(yaxis_type="log",hovermode='x',
                         width=700,height=700,
                         title_text = 'Casos confirmados totais por região até ' + last_date,
                         font=dict(family="Arial, monospace",size=12,color="black"))
fig_region.show()

### New cases per region

In [None]:
layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',)

fig_daily = make_subplots(rows= 4, cols=1,subplot_titles=('São Luís', 
                                                   'Imperatriz','Região Metropolitana','Demais Regiões'
                                                 ))
# São Luís
fig_daily.append_trace(go.Scatter(x=tabela_capital['date'], y=tabela_capital['new_cases'],
                             line_color='navy', fill='tonexty', name = ""),row=1, col=1)
# Imperatriz                                       
fig_daily.append_trace(go.Scatter(x=tabela_itz['date'], y=tabela_itz['new_cases'],
                             line_color='steelblue' ,fill = 'tonexty',name = ""),row=2, col=1)

# Região Metropolitana
fig_daily.append_trace(go.Scatter(x=tabela_metropolitan['date'], y=tabela_metropolitan['new_cases'],
                             line_color='darkorange',fill = 'tonexty' ,name = ""),row=3, col=1)


# Demais Regiões
fig_daily.append_trace(go.Scatter(x=tabela_countryside['date'], y=tabela_countryside['new_cases'],
                             line_color='green',fill = 'tonexty' ,name = ""),row=4, col=1)

# Layout
fig_daily.update_yaxes(showticklabels=True)
fig_daily.update_layout(width=700,height=1600,
                   showlegend = False,yaxis={'categoryorder':'total ascending'},
                  title_text="Novos casos confirmados até " + last_date)
fig_daily['layout'].update(layout)
fig_daily.show()

### Total deaths per region

In [None]:
fig_region3 = go.Figure()
fig_region3.add_trace(go.Scatter(x=tabela_countryside['date'], y=tabela_countryside['deaths'] , 
                         line=dict(color='green', width=3),
                         name = 'Demais Regiões'))
fig_region3.add_trace(go.Scatter(x=tabela_capital['date'], y=tabela_capital['deaths'] , 
                         line=dict(color='navy', width=3),
                         name = 'São Luís'))
fig_region3.add_trace(go.Scatter(x=tabela_metropolitan['date'], y=tabela_metropolitan['deaths'] , 
                         line=dict(color='darkorange', width=3),
                         name = 'Região Metropolitana'))
fig_region3.add_trace(go.Scatter(x=tabela_itz['date'], y=tabela_itz['deaths'] , 
                         line=dict(color='steelblue', width=3),
                         name = 'Imperatriz'))
fig_region3.update_xaxes(title = 'Data')
fig_region3.update_yaxes(title = 'Quantidade')
fig_region3.update_layout(hovermode='x',
                         width=700,height=500,
                         title_text = 'Óbitos totais por região até ' + last_date,
                         font=dict(family="Arial, monospace",size=12,color="black"))
fig_region3.show()

### New deaths per region

In [None]:
layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',)

fig_daily2 = make_subplots(rows= 4, cols=1,subplot_titles=('São Luís', 
                                                   'Imperatriz','Região Metropolitana','Demais Regiões'
                                                   ))
# São Luís
fig_daily2.append_trace(go.Scatter(x=tabela_capital['date'], y=tabela_capital['new_deaths'],
                             line_color='navy',fill = 'tonexty' ,name = ""),row=1, col=1)

# Imperatriz                                       
fig_daily2.append_trace(go.Scatter(x=tabela_itz['date'], y=tabela_itz['new_deaths'],
                             line_color='steelblue',fill = 'tonexty',name = ""),row=2, col=1)

# Região Metropolitana
fig_daily2.append_trace(go.Scatter(x=tabela_metropolitan['date'], y=tabela_metropolitan['new_deaths'],
                             line_color='darkorange',fill = 'tonexty',name = ""),row=3, col=1)

# Demais Regiões
fig_daily2.append_trace(go.Scatter(x=tabela_countryside['date'], y=tabela_countryside['new_deaths'],
                             line_color='green',fill = 'tonexty' ,name = ""),row=4, col=1)


# Layout
fig_daily2.update_yaxes(showticklabels=True)
fig_daily2.update_layout(width=700,height=1500,
                   showlegend = False,yaxis={'categoryorder':'total ascending'},
                  title_text="Novos óbitos confirmados até " + last_date)
fig_daily2['layout'].update(layout)
fig_daily2.show()

In [None]:
# Dataframe Deaths - Details
path_deaths = '../input/covid19-ma/obitos_detalhes_covid19_ma.csv'
deaths = pd.read_csv(path_deaths, encoding ='latin1', sep = ';')

## Geographical analysis

In [None]:
# Groupping by city and showing the total of cases confirmed and deaths.
tabela2 = df_ma.sort_values(by=['last_available_confirmed'], ascending=False).copy()
# Dropping duplicates and keeping the first
tabela2 = tabela2.drop_duplicates(subset='city', keep='first')

In [None]:
df_ma['date'] = pd.to_datetime(df_ma['date'])
df_15 = df_ma.groupby('city').apply(lambda x: x.set_index('date').resample('1D').last())
# Table with new confirmed cases in the last 15 days
tabela_new_confirmed = df_15.groupby(level=0)['new_confirmed'].apply(
    lambda x: x.shift().rolling(min_periods=1,window=15).sum()).reset_index(
    name='new_confirmed_last_15_days')
tabela_new_confirmed= tabela_new_confirmed.sort_values(by=['date',
                                     'new_confirmed_last_15_days'], ascending=False).copy()
tabela_new_confirmed= tabela_new_confirmed.drop_duplicates(subset='city',keep ='first')

## Table with new deaths in the last 15 days
tabela_new_deaths = df_15.groupby(level=0)['new_deaths'].apply(
    lambda x: x.shift().rolling(min_periods=1,window=15).sum()).reset_index(
    name='new_deaths_last_15_days')
tabela_new_deaths= tabela_new_deaths.sort_values(by=['date',
                                     'new_deaths_last_15_days'], ascending=False).copy()
tabela_new_deaths= tabela_new_deaths.drop_duplicates(subset='city',keep ='first')
#Merging the tables
tabela_last_15_days = pd.merge(tabela_new_confirmed,tabela_new_deaths, on = ['city','date'], how ='left')
tabela_last_15_days.head(10)

In [None]:
# Adding deaths by 10k
tabela2['last_available_deaths_per_10k_inhabitants'] = tabela2['last_available_deaths']/(tabela2['estimated_population']/10000)
tabela2['last_available_confirmed_per_10k_inhabitants'] = tabela2['last_available_confirmed']/(tabela2['estimated_population']/10000) 
tabela2.head()

### The cities with the most confirmed cases  in the state

In [None]:
layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',)

fig3 = make_subplots(rows= 2, cols=1,subplot_titles=('Totais', 
                                                   'Por 10 mil habitantes'
                                                   ))
# Confirmed
tabela2_by_confirmed = tabela2.sort_values(by = 'last_available_confirmed', ascending=False)
top10_confirmed = tabela2_by_confirmed.head(10)
fig3.append_trace(go.Bar(name = 'Total',
                         y=top10_confirmed['last_available_confirmed'], x=top10_confirmed['city'],
                             marker_color='darkblue',text = top10_confirmed['last_available_confirmed']), 
                  row=1, col=1)
fig3.update_traces(texttemplate='%{text:.0f}', textposition='auto')
fig3.update_yaxes(showticklabels=False)

# Confirmed by 10k
tabela2_by_k = tabela2.sort_values(by = 'last_available_confirmed_per_10k_inhabitants', ascending=False)
top10_k = tabela2_by_k.head(10)                                       
fig3.append_trace(go.Bar(name = 'Por 10 mil',
                         y=top10_k['last_available_confirmed_per_10k_inhabitants'], x=top10_k['city'],
                            marker_color='blue',text = top10_k['last_available_confirmed_per_10k_inhabitants']) ,
                  row=2, col=1)
fig3.update_yaxes(showticklabels=False)
fig3.update_traces(texttemplate='%{text:.0f}', textposition='auto')

# Layout
fig3.update_layout(width=700,height=650,
                   showlegend = False,yaxis={'categoryorder':'total ascending'},
                  title_text="As cidades com mais casos confirmados até " + last_date)
fig3['layout'].update(layout)
fig3.show()

### The cities with the most deaths in the state

In [None]:
layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',)

fig4 = make_subplots(rows= 2, cols=1,subplot_titles=('Totais', 
                                                   'Por 10 mil habitantes'
                                                   ))
# Deaths
tabela2_by_deaths = tabela2.sort_values(by = 'last_available_deaths', ascending=False)
top10_deaths = tabela2_by_deaths.head(10)
fig4.append_trace(go.Bar(name = 'Total',
                         y=top10_deaths['last_available_deaths'], x=top10_deaths['city'],
                             marker_color='darkred',text = top10_deaths['last_available_deaths']), 
                  row=1, col=1)
fig4.update_traces(texttemplate='%{text:.0f}', textposition='auto')
fig4.update_yaxes(showticklabels=False)

# Deaths by 10k
tabela2_deaths_by_k = tabela2.sort_values(by = 'last_available_deaths_per_10k_inhabitants', ascending=False)
top10_deaths_k = tabela2_deaths_by_k.head(10)                                       
fig4.append_trace(go.Bar(name = 'Por 10 mil',
                         y=top10_deaths_k['last_available_deaths_per_10k_inhabitants'], x=top10_deaths_k['city'],
                            marker_color='red',text = top10_deaths_k['last_available_deaths_per_10k_inhabitants']) ,
                  row=2, col=1)
fig4.update_yaxes(showticklabels=False)
fig4.update_traces(texttemplate='%{text:.0f}', textposition='auto')

# Layout
fig4.update_layout(width=700,
    height=650,showlegend = False,yaxis={'categoryorder':'total ascending'},
                  title_text="As cidades com mais óbitos até " + last_date)
fig4['layout'].update(layout)
fig4.show()

### How the cases are spread across the state?

In [None]:
# Loading the json file of Maranhao
path2 = "../input/majson/data/MA.json"
df_mapa = gpd.read_file(path2) 
df_mapa.head()

In [None]:
# Transforming GEOCODIGO and city_ibge_code to str
df_mapa['GEOCODIGO'] = df_mapa['GEOCODIGO'].astype(str)
tabela2['city_ibge_code'] = tabela2['city_ibge_code'].astype(str)

In [None]:
# Setting the indexes
tabela2.set_index('city_ibge_code', inplace = True)
df_mapa.set_index('GEOCODIGO', inplace = True)

In [None]:
# Renaming axes
tabela2.rename_axis('code', inplace = True)
df_mapa.rename_axis('code', inplace = True)

In [None]:
# Merging the tables
tabela3 = pd.merge(df_mapa, tabela2, on='code', how='left')

In [None]:
# Replacing NaN cells in the death_rate column
tabela3.last_available_death_rate.fillna(0,inplace = True)
# Replacing the other cells with "No cases"
tabela3.fillna('Sem casos', inplace = True)
tabela3.tail(3)

In [None]:
# Manipualting Gejson data
geosource = GeoJSONDataSource(geojson = tabela3.to_json())
merged_json = json.loads(tabela3.to_json())
json_data = json.dumps(merged_json)
geosource = GeoJSONDataSource(geojson = json_data)

In [None]:
import json
import math
from bokeh.io import show
from bokeh.models import (CDSView, ColorBar, ColumnDataSource,
                          CustomJS, CustomJSFilter, 
                          GeoJSONDataSource, HoverTool,
                          LinearColorMapper,LogColorMapper, LogTicker)
from bokeh.layouts import column, row, widgetbox
from bokeh.palettes import magma
from bokeh.plotting import figure

# Pallete color
palette = magma(256)

# Inverting the pallete color
palette = palette[::-1]

# Setting Colormappper high value
#high = int(math.ceil(tabelaslz['confirmados'].max()/ 1000.0)) * 1000

# Color mapping the values
color_mapper = LogColorMapper(palette = palette, low = 1, 
                              high = 10000, nan_color = '#F8F8FF')

# Custom tick labels
tick_labels = {1:'1', 10:'10', 100:'100', 1000:'1000', 10000:'>10000'}

# Colro bar creation
color_bar = ColorBar(color_mapper = color_mapper, 
                     label_standoff = 8,
                     width = 450, height = 15,
                     border_line_color = None,
                     location = (0,1), 
                     orientation = 'horizontal',
                     ticker = LogTicker(),
                     major_label_overrides = tick_labels
                     )

# Figure object
p = figure(title = 'Casos Confirmados '+ last_date, 
           plot_height = 550 ,
           plot_width = 500, 
           toolbar_location = 'right',
           tools = 'pan, wheel_zoom, box_zoom, reset')
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.visible = False
p.yaxis.visible = False

# Patch to render
render = p.patches('xs','ys', source = geosource,
                   fill_color = {'field' :'last_available_confirmed',
                                 'transform' : color_mapper},
                   line_color = 'gray', 
                   line_width = 1, 
                   fill_alpha = 1)
# Creating hover tool
p.add_tools(HoverTool(renderers = [render],
                      tooltips = [('Cidade','@NOME'),
                                  ('População(IBGE, 2019)','@estimated_population{,}'),
                                  ('Confirmados totais','@last_available_confirmed'),
                                  ('Obitos totais','@last_available_deaths'),
                                  ('Taxa de Letalidade','@last_available_death_rate{0.1f%}')], 
                                  formatters={'death_rate':'printf',
                                              'estimated_population_2019': 'numeral'}
                                 ))

p.add_layout(color_bar, 'above')

show(p)

In [None]:
tests_path = '../input/covid19-ma/testes_covid19_ma.csv'
tests = pd.read_csv(tests_path, encoding ='latin1', sep = ';',skipfooter = 3, 
                    engine = 'python',
                    infer_datetime_format = True,
                    dayfirst= True,
                    thousands = '.',
                    decimal = ','
                   )
tests.tail()

## How the cases in Maranhao are on a regional and national basis?

### Regional Analysis

![Nordeste](http://2.bp.blogspot.com/-DXKo_6Loaik/TV3uzGbaNfI/AAAAAAAAArE/qq9NSbP-kl8/s1600/mapa_nordeste1.jpg)

In [None]:
# Dataframe COVID-19 - Brazil
path3 = "../input/corona-virus-brazil/brazil_covid19.csv"
df3 = pd.read_csv(path3, encoding ='utf-8')
# Rename columns
df3.rename(columns={'state':'uf'}, inplace = True)
df3.tail(5)

In [None]:
# Dataframe with important information aboout the states
path4 = "../input/brazilianstates/states.csv"
df_states = pd.read_csv(path4, encoding = 'utf-8')
df_states.tail(5)

In [None]:
# Lowering cases of columns
df_states = df_states.rename(str.lower, axis='columns')
# Selecting columns we want
df_states = df_states[['uf', 'state','area', 'population',
       'demographic density', 'cities count', 'gdp', 'gdp rate', 'poverty']]
df_states.columns

In [None]:
# Selecting Nordeste region where Maranhao is located
df_ne = df3.copy()
rows_ne = df_ne[df_ne['region'] != 'Nordeste'].index
df_ne.drop(rows_ne, inplace = True)
print(df_ne.region.unique())

In [None]:
# Confirmed cases and deaths
tabela4 = df_ne.groupby(['date','uf'])[['cases','deaths']].agg('sum')
tabela4 = tabela4.reset_index()

In [None]:
tabela4 = pd.merge(tabela4,df_states, on = 'uf', how ='left')
tabela4.tail(3)

In [None]:
# Removing zero rows in the column 'cases'
zero_rows = tabela4[tabela4['cases'] == 0].index
tabela4.drop(zero_rows, inplace = True)

In [None]:
# Adding death rate
tabela4['death_rate'] = tabela4['deaths']/tabela4['cases']
# Adding cases per 100k inhabitants
tabela4['cases_per_100k'] = tabela4['cases']/(tabela4['population']/100000)
# Adding deaths per 100k inhabitants
tabela4['deaths_per_100k'] = tabela4['deaths']/(tabela4['population']/100000)
tabela4.tail()

In [None]:
# Subsets by state in the Nordeste region
ma = tabela4[tabela4['state']== 'Maranhão']
al = tabela4[tabela4['state']=='Alagoas']
ba = tabela4[tabela4['state']=='Bahia']
ce = tabela4[tabela4['state']=='Ceará']
pb = tabela4[tabela4['state']=='Paraíba']
pe = tabela4[tabela4['state']=='Pernambuco']
pi = tabela4[tabela4['state']=='Piauí']
rn = tabela4[tabela4['state']=='Rio Grande do Norte']
se = tabela4[tabela4['state']=='Sergipe']

## Confirmed cases comparison - Regional

### Total confirmed cases

In [None]:
fig6 = go.Figure()
##### TOTAL ######
# Maranhão
fig6.add_trace(go.Scatter(x=ma['date'], y=ma['cases'] , 
                         line=dict(color='navy', width=3),
                         name = 'MA'))
# Alagoas
fig6.add_trace(go.Scatter(x=al['date'], y=al['cases'] , 
                         line=dict(color='darksalmon', width=2,dash = 'dash'), 
                         name = 'AL'))
# Bahia
fig6.add_trace(go.Scatter(x=ba['date'], y=ba['cases'] , 
                         line=dict(color='darkred', width=2, dash = 'dash'), 
                         name = 'BA'))
# Ceará
fig6.add_trace(go.Scatter(x=ce['date'], y=ce['cases'] , 
                         line=dict(color='darkkhaki', width=2, dash = 'dash'), 
                         name = 'CE'))
# Paraíba
fig6.add_trace(go.Scatter(x=pb['date'], y=pb['cases'] , 
                         line=dict(color='darkorange', width=2, dash = 'dash'), 
                         name = 'PB'))
# Pernambuco
fig6.add_trace(go.Scatter(x=pe['date'], y=pe['cases'] , 
                         line=dict(color='steelblue', width=2, dash = 'dash'), 
                         name = 'PE'))
# Piauí
fig6.add_trace(go.Scatter(x=pi['date'], y=pi['cases'] , 
                         line=dict(color='darkgreen', width=2, dash = 'dash'), 
                         name = 'PI'))
# Rio Grande do Norte
fig6.add_trace(go.Scatter(x=rn['date'], y=rn['cases'] , 
                         line=dict(color='darkviolet', width=2, dash = 'dash'), 
                         name = 'RN'))
# Sergipe
fig6.add_trace(go.Scatter(x=se['date'], y=se['cases'] , 
                         line=dict(color='crimson', width=2, dash = 'dash'), 
                         name = 'SE'))
fig6.update_layout(width=600,
    height=500,
    title={
        'text': "Casos confirmados totais",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Data",
    yaxis_title="Quantidade (Escala Logarítimca)",
    font=dict(
        family="Arial, monospace",
        size=12,
        color="black"))
fig6.update_traces(hovertemplate= None)
fig6.update_layout(hovermode='x', yaxis_type="log")
fig6['layout'].update(layout)
fig6.show()

### Confirmed cases per 100k inhabitants

In [None]:
##### Per 100k Inhabitants ######
fig7 = go.Figure()
# Maranhão
fig7.add_trace(go.Scatter(x=ma['date'], y=ma['cases_per_100k'] , 
                         line=dict(color='navy', width=3),
                         name = 'MA'))
# Alagoas
fig7.add_trace(go.Scatter(x=al['date'], y=al['cases_per_100k'] , 
                         line=dict(color='darksalmon', width=2,dash = 'dash'), 
                         name = 'AL'))
# Bahia
fig7.add_trace(go.Scatter(x=ba['date'], y=ba['cases_per_100k'] , 
                         line=dict(color='darkred', width=2, dash = 'dash'), 
                         name = 'BA'))
# Ceará
fig7.add_trace(go.Scatter(x=ce['date'], y=ce['cases_per_100k'] , 
                         line=dict(color='darkkhaki', width=2, dash = 'dash'), 
                         name = 'CE'))
# Paraíba
fig7.add_trace(go.Scatter(x=pb['date'], y=pb['cases_per_100k'] , 
                         line=dict(color='darkorange', width=2, dash = 'dash'), 
                         name = 'PB'))
# Pernambuco
fig7.add_trace(go.Scatter(x=pe['date'], y=pe['cases_per_100k'] , 
                         line=dict(color='steelblue', width=2, dash = 'dash'), 
                         name = 'PE'))
# Piauí
fig7.add_trace(go.Scatter(x=pi['date'], y=pi['cases_per_100k'] , 
                         line=dict(color='darkgreen', width=2, dash = 'dash'), 
                         name = 'PI'))
# Rio Grande do Norte
fig7.add_trace(go.Scatter(x=rn['date'], y=rn['cases_per_100k'] , 
                         line=dict(color='darkviolet', width=2, dash = 'dash'), 
                         name = 'RN'))
# Sergipe
fig7.add_trace(go.Scatter(x=se['date'], y=se['cases_per_100k'] , 
                         line=dict(color='crimson', width=2, dash = 'dash'), 
                         name = 'SE'))
fig7.update_layout(width=600,
    height=500,
    title={
        'text': "Casos confirmados por 100 mil habitantes",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Data",
    yaxis_title="Quantidade",
    font=dict(
        family="Arial, monospace",
        size=12,
        color="black"))
fig7.update_traces(hovertemplate= None)
fig7.update_layout(hovermode='x')
fig7['layout'].update(layout)
fig7.show()

* The number of cases confirmed in Maranhao has been increasing on a slower rate compared to other states in the last few weeks;
* At the moment, MA has the 3rd highest number of total cases in the region and it is the 6th state in number of cases per 100k inhabitants

## Deaths comparison - Regional

### Total deaths

In [None]:
fig8 = go.Figure()
# Maranhão
fig8.add_trace(go.Scatter(x=ma['date'], y=ma['deaths'] , 
                         line=dict(color='navy', width=3),
                         name = 'MA'))
# Alagoas
fig8.add_trace(go.Scatter(x=al['date'], y=al['deaths'] , 
                         line=dict(color='darksalmon', width=2,dash = 'dash'), 
                         name = 'AL'))
# Bahia
fig8.add_trace(go.Scatter(x=ba['date'], y=ba['deaths'] , 
                         line=dict(color='darkred', width=2, dash = 'dash'), 
                         name = 'BA'))
# Ceará
fig8.add_trace(go.Scatter(x=ce['date'], y=ce['deaths'] , 
                         line=dict(color='darkkhaki', width=2, dash = 'dash'), 
                         name = 'CE'))
# Paraíba
fig8.add_trace(go.Scatter(x=pb['date'], y=pb['deaths'] , 
                         line=dict(color='darkorange', width=2, dash = 'dash'), 
                         name = 'PB'))
# Pernambuco
fig8.add_trace(go.Scatter(x=pe['date'], y=pe['deaths'] , 
                         line=dict(color='steelblue', width=2, dash = 'dash'), 
                         name = 'PE'))
# Piauí
fig8.add_trace(go.Scatter(x=pi['date'], y=pi['deaths'] , 
                         line=dict(color='darkgreen', width=2, dash = 'dash'), 
                         name = 'PI'))
# Rio Grande do Norte
fig8.add_trace(go.Scatter(x=rn['date'], y=rn['deaths'] , 
                         line=dict(color='darkviolet', width=2, dash = 'dash'), 
                         name = 'RN'))
# Sergipe
fig8.add_trace(go.Scatter(x=se['date'], y=se['deaths'] , 
                         line=dict(color='crimson', width=2, dash = 'dash'), 
                         name = 'SE'))
fig8.update_layout(width=600,
    height=500,
    title={
        'text': "Óbitos totais",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Data",
    yaxis_title="Quantidade",
    font=dict(
        family="Arial, monospace",
        size=12,
        color="black"))
fig8.update_traces(hovertemplate= None)
fig8.update_layout(hovermode='x')
fig8['layout'].update(layout)
fig8.show()

### Deaths per 100k inhabitants

In [None]:
#Layout
layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',)
fig9 = go.Figure()
# Maranhão
fig9.add_trace(go.Scatter(x=ma['date'], y=ma['deaths_per_100k'] , 
                         line=dict(color='navy', width=3),
                         name = 'MA'))
# Alagoas
fig9.add_trace(go.Scatter(x=al['date'], y=al['deaths_per_100k'] , 
                         line=dict(color='darksalmon', width=2,dash = 'dash'), 
                         name = 'AL'))
# Bahia
fig9.add_trace(go.Scatter(x=ba['date'], y=ba['deaths_per_100k'] , 
                         line=dict(color='darkred', width=2, dash = 'dash'), 
                         name = 'BA'))
# Ceará
fig9.add_trace(go.Scatter(x=ce['date'], y=ce['deaths_per_100k'] , 
                         line=dict(color='darkkhaki', width=2, dash = 'dash'), 
                         name = 'CE'))
# Paraíba
fig9.add_trace(go.Scatter(x=pb['date'], y=pb['deaths_per_100k'] , 
                         line=dict(color='darkorange', width=2, dash = 'dash'), 
                         name = 'PB'))
# Pernambuco
fig9.add_trace(go.Scatter(x=pe['date'], y=pe['deaths_per_100k'] , 
                         line=dict(color='steelblue', width=2, dash = 'dash'), 
                         name = 'PE'))
# Piauí
fig9.add_trace(go.Scatter(x=pi['date'], y=pi['deaths_per_100k'] , 
                         line=dict(color='darkgreen', width=2, dash = 'dash'), 
                         name = 'PI'))
# Rio Grande do Norte
fig9.add_trace(go.Scatter(x=rn['date'], y=rn['deaths_per_100k'] , 
                         line=dict(color='darkviolet', width=2, dash = 'dash'), 
                         name = 'RN'))
# Sergipe
fig9.add_trace(go.Scatter(x=se['date'], y=se['deaths_per_100k'] , 
                         line=dict(color='crimson', width=2, dash = 'dash'), 
                         name = 'SE'))
fig9.update_layout(width=600,
    height=500,
    title={
        'text': "Óbitos por 100 mil habitantes ",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Data",
    yaxis_title="Quantidade",
    font=dict(
        family="Arial, monospace",
        size=12,
        color="black"))
fig9.update_traces(hovertemplate= None)
fig9.update_layout(hovermode='x')
fig9['layout'].update(layout)
fig9.show()

* The number of deaths in Maranhao was increasing on a faster rate compared to most of the states in the first three months, now the cases in Alagoas,Rio Grande do Norte and Sergipe are increasing considerably.
* At the moment, MA has the 4th highest number of deaths in the region, if we consider deaths per 100k inhabitants MA has the 2nd lowest number.

## Death rate comparison - Regional

In [None]:
#Layout
layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',)
fig10 = go.Figure()
# Maranhão
fig10.add_trace(go.Scatter(x=ma['date'], y=ma['death_rate'] , 
                         line=dict(color='navy', width=3),
                         name = 'MA'))
# Alagoas
fig10.add_trace(go.Scatter(x=al['date'], y=al['death_rate'] , 
                         line=dict(color='darksalmon', width=2,dash = 'dash'), 
                         name = 'AL'))
# Bahia
fig10.add_trace(go.Scatter(x=ba['date'], y=ba['death_rate'] , 
                         line=dict(color='darkred', width=2, dash = 'dash'), 
                         name = 'BA'))
# Ceará
fig10.add_trace(go.Scatter(x=ce['date'], y=ce['death_rate'] , 
                         line=dict(color='darkkhaki', width=2, dash = 'dash'), 
                         name = 'CE'))
# Paraíba
fig10.add_trace(go.Scatter(x=pb['date'], y=pb['death_rate'] , 
                         line=dict(color='darkorange', width=2, dash = 'dash'), 
                         name = 'PB'))
# Pernambuco
fig10.add_trace(go.Scatter(x=pe['date'], y=pe['death_rate'] , 
                         line=dict(color='steelblue', width=2, dash = 'dash'), 
                         name = 'PE'))
# Piauí
fig10.add_trace(go.Scatter(x=pi['date'], y=pi['death_rate'] , 
                         line=dict(color='darkgreen', width=2, dash = 'dash'), 
                         name = 'PI'))
# Rio Grande do Norte
fig10.add_trace(go.Scatter(x=rn['date'], y=rn['death_rate'] , 
                         line=dict(color='darkviolet', width=2, dash = 'dash'), 
                         name = 'RN'))
# Sergipe
fig10.add_trace(go.Scatter(x=se['date'], y=se['death_rate'] , 
                         line=dict(color='crimson', width=2, dash = 'dash'), 
                         name = 'SE'))
fig10.update_layout(width=600,
    height=500,
    title={
        'text': "Taxa de Letalidade",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Data",
    yaxis_title="Taxa",
    yaxis_tickformat = '.1%',
    font=dict(
        family="Arial, monospace",
        size=12,
        color="black"))
fig10.update_traces(hovertemplate= None)
fig10.update_layout(hovermode='x')
fig10['layout'].update(layout)
fig10.show()

* The death rate in Maranhão has fluctuated in the beginning, then reaching a peak of 6.7% on April, 14th;
* At the moment, MA has the lowest death rate in the region(2.3%).

## National Analysis

### How is Maranhao on a national basis regarding confirmed cases and deaths?

In [None]:
# Creating a table for all brazilian states
df_br = df3.copy()
# Confirmed cases and deaths
tabela_br = df_br.groupby(['date','uf'])[['cases','deaths']].agg('sum')
tabela_br = tabela_br.reset_index()

In [None]:
# Merging tables
tabela_br = pd.merge(tabela_br,df_states, on = 'uf', how ='left')
tabela_br.tail(3)

In [None]:
# Removing zero rows in the column 'cases'
zero_rows = tabela_br[tabela_br['cases'] == 0].index
tabela_br.drop(zero_rows, inplace = True)

In [None]:
# Adding death rate
tabela_br['death_rate'] = tabela_br['deaths']/tabela_br['cases']
# Adding cases per 100k inhabitants
tabela_br['cases_per_100k'] = tabela_br['cases']/(tabela_br['population']/100000)
# Adding deaths per 100k inhabitants
tabela_br['deaths_per_100k'] = tabela_br['deaths']/(tabela_br['population']/100000)
tabela_br.tail()

In [None]:
# Groupping by cases
tabela_states = tabela_br.sort_values(by=['cases'], ascending=False).copy()
# Dropping duplicates and keeping the first
tabela_states = tabela_states.drop_duplicates(subset='uf', keep='first')

### Correlation Matrix
* Let's see if there is any correlation among the variables

In [None]:
# Editing the columns names
tabela_states = tabela_states.rename(columns= {'demographic density':'demographic_density',
                               'cities count':'cities_count',
                             'gdp rate':'gdp_rate'})
# Correlation matrix
corr = tabela_states.corr()
corr.style.background_gradient(cmap='coolwarm').set_precision(2)

* The correlation table shows that population correlates highly with cases and deaths (0.96 and 0.94, respectively);
* Therefore, let's make a scatter plot showing this correlation.

## Covid-19 Brazilian states: Confirmed cases X Deaths

In [None]:
figx = px.scatter(tabela_states, x= 'cases',
                    y= 'deaths',
                    color= "state", hover_name="uf",
                    hover_data=["cases","deaths","death_rate"],
                    color_continuous_scale=px.colors.sequential.Plasma,
                    title='Covid-19 Brasil: Casos confirmados X Óbitos' ,
                    size = 'population',
                    width = 600,
                    height =700,text = tabela_states['uf'],
                    
                    )

figx.update_traces(texttemplate='%{text}', textposition='top center')
figx.update_coloraxes(colorscale="hot")
figx.update(layout_coloraxis_showscale=False)
figx.update_yaxes(title_text="Óbitos (Escala Logarítmica)", )
figx.update_xaxes(title_text="Casos confirmados (Escala Logarítmica)", )
figx.update_layout(showlegend = False,yaxis_type="log", xaxis_type = 'log')
figx.show()

## Covid-19 Brazilian states: Confirmed cases X Deaths - Per 100k inhabitants

In [None]:
figy = px.scatter(tabela_states, x= 'cases_per_100k',
                    y= 'deaths_per_100k',
                    color= "state", hover_name="uf",
                    hover_data=["cases","deaths","death_rate"],
                    color_continuous_scale=px.colors.sequential.Plasma,
                    title='Casos confirmados X Óbitos - por 100 mil habitantes' ,
                    size = 'population',
                    width = 600,
                    height =600,text = tabela_states['uf'],
                    
                    )

figy.update_traces(texttemplate='%{text}', textposition='bottom center')
figy.update_coloraxes(colorscale="hot")
figy.update(layout_coloraxis_showscale=False)
figy.update_yaxes(title_text="Óbitos por 100 mil habitantes", )
figy.update_xaxes(title_text="Casos confirmados por 100 mil habitantes", )
figy.update_layout(showlegend = False)
figy.show()

## Death rate comparison - National

In [None]:
# Removing Maranhao from the dataframe
rows_ma = df3[df3['uf'] == 'MA'].index
df3.drop(rows_ma, inplace = True)
print(df3.uf.unique())

In [None]:
# Confirmed cases and deaths
tabela5 = df3.groupby(['date'])[['cases','deaths']].agg('sum')
tabela5.tail(3)

In [None]:
# Adding death rate
tabela5['death_rate'] = tabela5['deaths']/tabela5['cases']
tabela5.fillna(0,inplace = True)
tabela5 = tabela5.reset_index()
tabela5.tail(3)

In [None]:
#Layout
layout = Layout(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',)
fig11 = go.Figure()
# Maranhão
fig11.add_trace(go.Bar(x=ma['date'], y=ma['death_rate'], 
                          marker_color='navy',
                         name = 'MA'))
# Brasil
fig11.add_trace(go.Bar(x=tabela5['date'] , y=tabela5['death_rate'],
                    marker_color ='green',
                         name ='BR'))

fig11.update_layout(width=600,
    height=500,
    title={
        'text': "Letalidade: Demais estados X Maranhão",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Data",
    yaxis_title="Taxa de letalidade",
    yaxis_tickformat = '.1%',
    font=dict(
        family="Arial, monospace",
        size=12,
        color="black"))
fig11.update_traces(hovertemplate= None)
fig11.update_layout(hovermode='x')
fig11.show()

* While in other states the death rate has increased substantially in the first month, in Maranhao the death rate has fluctuated. 
* The death rate in other states of Brazil has been decreasing since May 12th.
* The death rate in Maranhão has reachead a plateau around 2.3%.

# Conclusion
* This notebook has shown how Covid-19 in Maranhao has been acting, the capital Sao Luis has been the epicenter in the  state, being by far the city with most cases an deaths right now;
* The number of cases in the countryside of Maranhao has been increasing on a faster rate in the last few weeks which can be noticed by the confirmed cases per 10k inhabitants in several cities being above the capital;
* When compared to other states in Nordeste, Maranhao is on the top 5 states in the region regarding total confirmed cases and total deaths, having a relatively low death rate though (2.2%).
* On a national basis Maranhao is the 11th state with most deaths so far.

**This notebook will be updated and improved weekly, so you are invited to come back whenever you want**

**If you liked this notebook, please upvote so it will inspire me to keep doing more analysis, and if you want to share some notebooks related to Covid-19 or whatever you are working on, use the comment section, I will be glad to see.**

## Thank you! Stay home, stay safe.

# Links

[GeoJson data](https://github.com/luizpedone/municipal-brazilian-geodata)
License: MIT License

[Statiscal data about Brazilian states - by Thiago Bodruk](https://www.kaggle.com/thiagobodruk/brazilianstates#states.csv)

[Covid-19 - Kaggle - by Raphael Fontes](https://www.kaggle.com/unanimad/corona-virus-brazil)

[Covid-19 - Brasil by brasil.io](https://brasil.io/dataset/covid19/caso/)* 



*Fonte original: Secretarias de Saúde estaduais

*Libertado por: Álvaro Justen e dezenas de colaboradores

*Licença: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
