# Visualizando a COVID-19 no mundo
> Evolução e estágio atual dos países

- toc: true 
- badges: true
- hide_binder_badge: true
- comments: true
- categories: [colab]
- image: images/world.png

> Caso esteja visualizando este post em um dispositivo móvel, por favor utilize seu dispositivo na horizontal.

Este post reúne análises das ocorrências de COVID-19 em diferentes países. Especificamente, os gráficos desta análise isolam países da Europa Ocidental, EUA, Japão e Irã, além do Brasil. Todos os países apresentados nesta análise tinham mais casos confirmados do que o Brasil no dia 18/03/2020. Os dados utilizados para análise são fornecidos pelo [Universidade John Hopkins](https://github.com/CSSEGISandData/COVID-19).

A análise inicial apresenta diferentes métricas, tanto absolutas (confirmados, mortes e recuperados) quanto relativas (confirmados, mortes e recuperados por 1 milhão de habitantes). Além disso, apresentamos a análise por dia, em dias decorridos desde o primeiro caso e em dias decorridos desde o centésimo caso.

> Sempre que alternar entre tipos de condição (confirmado, morte ou recuperado), clique duas vezes na área principal do gráfico para que o zoom se ajuste (a área principal do gráfico tem pano de fundo quadriculado).

A análise final se concentra em casos confirmados e usa georeferenciamento para visualizar a trajetória do vírus pelo mundo. No mapa, é possível visualizar informações sobre cada localidade passando o mouse por cima do país desejado.

> É possível animar a linha do tempo da evolução dos casos apertando o botão play, ou visualizar uma data específica usando o slider.

Em cada gráfico, várias outras opções de interação estão disponíveis, podendo ser exploradas pelo menu que aparece no canto superior direito de cada um.

In [1]:
#hide
!pip install -U pandas
!pip install -U plotly
!pip install -U plotly-express

Requirement already up-to-date: pandas in /usr/local/lib/python3.6/dist-packages (1.0.5)
Collecting plotly
[?25l  Downloading https://files.pythonhosted.org/packages/27/99/9794bcd22fae2e12b689759d53fe26939a4d11b8b44b0b7056e035c64529/plotly-4.8.2-py2.py3-none-any.whl (11.5MB)
[K     |████████████████████████████████| 11.5MB 2.6MB/s 
Installing collected packages: plotly
  Found existing installation: plotly 4.4.1
    Uninstalling plotly-4.4.1:
      Successfully uninstalled plotly-4.4.1
Successfully installed plotly-4.8.2
Collecting plotly-express
  Downloading https://files.pythonhosted.org/packages/d4/d6/8a2906f51e073a4be80cab35cfa10e7a34853e60f3ed5304ac470852a08d/plotly_express-0.4.1-py2.py3-none-any.whl
Installing collected packages: plotly-express
Successfully installed plotly-express-0.4.1


In [4]:
#hide 
import datetime

import numpy as np
import pandas as pd
import plotly.express as px

# base variables
initial_date = datetime.date(2020,1,22)
days_elapsed = (datetime.date.today() - initial_date).days
base_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports"

# dicts for data collection
dates = (f"{(initial_date + datetime.timedelta(days=days)).strftime('%m-%d-%Y')}" for days in range(0, days_elapsed))
csvs = {f"{day}": f"{base_url}/{day}.csv" for day in dates}

# collecting data
keep_columns = "Country/Region,Confirmed,Deaths,Recovered".split(",")
dfs = pd.concat(pd.read_csv(url)[keep_columns].assign(date=day) for day, url in csvs.items() if day < "03-22-2020")
dfs.columns = ["country", "confirmed", "deaths", "recovered", "date"]

keep_columns2 = "Country_Region,Confirmed,Deaths,Recovered".split(",")
dfs2 = pd.concat(pd.read_csv(url)[keep_columns2].assign(date=day) for day, url in csvs.items() if day >= "03-22-2020")
dfs2.columns = ["country", "confirmed", "deaths", "recovered", "date"]

dfs = pd.concat([dfs, dfs2])
dfs.index = range(0, len(dfs))
all_na = dfs["confirmed"].isna() & dfs["deaths"].isna() & dfs["recovered"].isna()

# removing empty entries
df_imputed = dfs[~all_na]

In [5]:
#hide 
keep_columns3 = ["iso3", "Country_Region", "Population"]
data_pop = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv")[keep_columns3]
data_pop.columns = ["countrycode", "country", "pop"]
data_pop = data_pop.drop_duplicates(subset=["country"])

In [6]:
#hide
# fixing inconsistencies
china_list = ['Mainland China', 'Hong Kong', 'Macau', 'Hong Kong SAR', 'Macao SAR']
df_imputed.loc[df_imputed["country"].isin(china_list), "country"] = "China"

korea_list = ['Korea, South', 'Republic of Korea']
df_imputed.loc[df_imputed["country"].isin(korea_list), "country"] = "South Korea"

uk_list = ['UK', 'North Ireland']
df_imputed.loc[df_imputed["country"].isin(uk_list), "country"] = 'United Kingdom'

bahamas_list = ['The Bahamas', 'Bahamas, The']
df_imputed.loc[df_imputed["country"].isin(bahamas_list), "country"] = 'Bahamas'

gambia_list = ['The Gambia', 'Gambia, The']
df_imputed.loc[df_imputed["country"].isin(gambia_list), "country"] = 'Gambia'

taiwan_list = ['Taiwan*', 'Taipei and environs']
df_imputed.loc[df_imputed["country"].isin(taiwan_list), "country"] = "Taiwan"


df_imputed.loc[df_imputed["country"] == 'Ivory Coast', 'country'] = "Cote d'Ivoire"
df_imputed.loc[df_imputed["country"] == ' Azerbaijan', 'country'] = "Azerbaijan"
df_imputed.loc[df_imputed["country"] == 'Czech Republic', 'country'] = "Czechia"
df_imputed.loc[df_imputed["country"] == 'Republic of Ireland', 'country'] = "Ireland"
df_imputed.loc[df_imputed["country"] == 'Iran (Islamic Republic of)', 'country'] = "Iran"
df_imputed.loc[df_imputed["country"] == 'Viet Nam', 'country'] = "Vietnam"
df_imputed.loc[df_imputed["country"] == 'Russian Federation', 'country'] = "Russia"
df_imputed.loc[df_imputed["country"] == 'Republic of Moldova', 'country'] = "Moldova"
df_imputed.loc[df_imputed["country"] == 'Republic of the Congo', 'country'] = "Congo (Brazzaville)"
df_imputed.loc[df_imputed["country"] == 'Cape Verde', 'country'] = "Cabo Verde"
df_imputed.loc[df_imputed["country"] == 'East Timor', 'country'] = "Timor-Leste"

data_pop.loc[data_pop["country"] == 'Taiwan*', 'country'] = "Taiwan"
data_pop.loc[data_pop["country"] == 'Korea, South', 'country'] = 'South Korea'



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [7]:
#hide
df_relative = pd.merge(df_imputed, data_pop, on="country")
df_relative.columns = ['country', 'confirmed', 'deaths', 'recovered', 'date', 'countrycode', 'pop']
df_relative["confirmed_1mi"] = (df_relative["confirmed"] / df_relative["pop"]) * 1000000
df_relative["deaths_1mi"] = (df_relative["deaths"] / df_relative["pop"]) * 1000000
df_relative["recovered_1mi"] = (df_relative["recovered"] / df_relative["pop"]) * 1000000
#ts_relative = pd.melt(df_relative.drop("pop", axis=1), 
#                      ["country", "date"], var_name="status", value_name="count")

#### Evolução e estágio atual

##### Acumulado de casos por dia

In [8]:
#hide 
# aggregating occurrences
df_aggregated = df_relative.pivot_table(index=["date", "country", "countrycode"], 
                                       values=["confirmed", "confirmed_1mi", "deaths", "deaths_1mi", "recovered", "recovered_1mi"], 
                                       aggfunc='sum').reset_index()

# filtering countries
countries = ['China', 'Italy', 'Iran', 'Spain', 'Germany', 'France', 'US', 
             'Switzerland', 'United Kingdom', 'Japan', 'Portugal', 'Brazil']
# df_countries = df_aggregated.query("date == '03-22-2020'").sort_values("confirmed", ascending=False).set_index("country")
# countries = list(df_countries.loc[:"Brazil"].index)
df_brazil = df_aggregated.query(f"country in {countries}").drop("countrycode", axis=1)

# resampling time series
df_brazil["date"] = pd.to_datetime(df_brazil["date"])
df_brazil = df_brazil.groupby("country").resample("D", on="date").sum().reset_index()
df_brasil = df_brazil.copy()
df_brasil.columns = ["country", "date", "Confirmado", "Confirmado/milhão", "Morte", "Morte/milhão", "Recuperado", "Recuperado/milhão"]
ts_brazil = pd.melt(df_brasil, ["country", "date"], var_name="status", value_name="count")

In [9]:
#hide_input
labels = {"country": "País",
          "status": "Métrica",
          "count": "Valor",
          "date": "Dia"}

po = px.line(ts_brazil, x="date", y="count", color="country", animation_frame="status",
             labels=labels)

po.update_layout(
    xaxis_title="Dia",
    xaxis_tickformat = "%d/%m",
    yaxis_title="Métrica",
    yaxis_tickformat = "g",
    updatemenus=[{"visible": False}],
)

po.show()

##### Acumulado de casos em dias desde o primeiro caso confirmado

In [10]:
#hide 

# computing day zero
start_date = df_relative.groupby("country")["date"].min().reset_index()
start_date.columns = ["country", "start_date"]

# adding day zero to df
df_extended = pd.merge(start_date, df_relative)
days_elapsed = pd.to_datetime(df_extended["date"]) - pd.to_datetime(df_extended["start_date"])
df_extended["days"] = days_elapsed.astype('timedelta64[D]')
df_extended[["confirmed", "confirmed_1mi", "deaths", "deaths_1mi", "recovered", "recovered_1mi"]] = df_extended[["confirmed", "confirmed_1mi", "deaths", 
                                                                                                                  "deaths_1mi", "recovered", "recovered_1mi"]].fillna(0)

# aggregating occurrences
df_aggregated_day0 = df_extended.pivot_table(index=["date","days","country"],
                                             values=["confirmed", "confirmed_1mi", "deaths", "deaths_1mi", "recovered", "recovered_1mi"], 
                                             aggfunc='sum').reset_index()

# filtering countries
df_brazil_day0 = df_aggregated_day0.query(f"country in {countries} and country != 'China'")

# resampling time series
df_brazil_day0["date"] = pd.to_datetime(df_brazil_day0["date"])
df_brazil_day0 = df_brazil_day0.groupby("country").resample("D", on="date").sum().reset_index()
df_brasil_day0 = df_brazil_day0.copy()
df_brasil_day0.columns = ["country", "date", "days", "Confirmado", "Confirmado/milhão", "Morte", "Morte/milhão", "Recuperado", "Recuperado/milhão"]
ts_brazil_day0 = pd.melt(df_brasil_day0.drop("date", axis=1), ["country", "days"], var_name="status", value_name="count")



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [11]:
#hide_input
labels["days"] = "Dias"

po = px.line(ts_brazil_day0, x="days", y="count", color="country", animation_frame="status",
             labels=labels)

po.update_layout(
    xaxis_title="Dias desde a confirmação do 1o caso",
    yaxis_title="Métrica",
    yaxis_tickformat = "g",
    updatemenus=[{"visible": False}],
)

po.show()

##### Acumulado de casos em dias desde centésimo caso confirmado

In [12]:
#hide

# removing entries with less than 100 ocurrences
ts_100_confirmed = df_brazil_day0.query("confirmed >= 100")

# computing start day
start_day = ts_100_confirmed.groupby("country")["days"].min().reset_index()
start_day.columns = ["country", "start_day"]

# updating start day
df_100_confirmed = pd.merge(start_day, ts_100_confirmed)
df_100_confirmed["days100"] = df_100_confirmed["days"] - df_100_confirmed["start_day"]
df_100_confirmed = df_100_confirmed.drop(["start_day","days"], axis = 1)
df_brasil_100_confirmed = df_100_confirmed.copy()
df_brasil_100_confirmed.columns = ["country", "date", "Confirmado", "Confirmado/milhão", "Morte", "Morte/milhão", "Recuperado", "Recuperado/milhão", "days100"]
ts_100_confirmed = pd.melt(df_brasil_100_confirmed.drop("date", axis=1), ["country", "days100"], var_name="status", value_name="count")

In [13]:
#hide_input
labels["days100"] = "Dias"

po = px.line(ts_100_confirmed, x="days100", y="count", color="country", animation_frame="status")

po.update_layout(
    xaxis_title="Dias desde a confirmação do 100o caso",
    yaxis_title="Métrica",
    yaxis_tickformat = "g",
    updatemenus=[{"visible": False}],
)

po.show()

#### Trajetória de dispersão pelo mundo

In [14]:
#hide
def reindex_by_date(df, max_day):
    min_day = min(df["date"])
    df = df.set_index("date")
    dates = pd.date_range(min_day, max_day, name="date")
    return df.reindex(dates).ffill()

##### Acumulado de casos confirmados por país ao longo do tempo

In [15]:
#hide
df_aggregated["date"] = pd.to_datetime(df_aggregated["date"])
last_day = max(df_aggregated["date"])
df_world = df_aggregated.groupby("country").apply(lambda df: reindex_by_date(df, last_day)).reset_index("date").reset_index(drop=True)
df_world["date"] = pd.to_datetime(df_world["date"])

In [16]:
#hide
df_weeks = df_world.set_index("date").groupby(["country", "countrycode"]).resample("W").max()
df_weeks = df_weeks.reset_index(["country", "countrycode"], drop=True).reset_index()
df_weeks["log_confirmed"] = np.log1p(np.log1p(df_weeks["confirmed_1mi"]))
df_weeks["log_deaths"] = np.log1p(np.log1p(df_weeks["deaths_1mi"]))

df_deaths = df_weeks.query("deaths_1mi > 0")
df_weeks = df_weeks.sort_values("date")
df_weeks["date"] = df_weeks["date"].dt.strftime("%d/%m")
df_deaths = df_deaths.sort_values("date")
df_deaths["date"] = df_deaths["date"].dt.strftime("%d/%m")

In [17]:
#hide_input
fig = px.choropleth(df_weeks,
                    color="log_confirmed", 
                    locations="countrycode", hover_name="country",
                    hover_data=["confirmed", "confirmed_1mi", 
                                "deaths", "deaths_1mi", 
                                "recovered", "recovered_1mi"],
                    animation_frame="date",
                    range_color=(0, max(df_weeks["log_confirmed"])*1.1),
                    color_continuous_scale='reds',
                    labels = {"date": "Data",
                              "countrycode": "Código do país",
                              "confirmed": "Casos confirmados",
                              "deaths": "Número de mortes",
                              "recovered": "Pessoas recuperadas",
                              "confirmed_1mi": "Casos confirmados (por milhão de habitantes)",
                              "deaths_1mi": "Número de mortes (por milhão de habitantes)",
                              "recovered_1mi": "Pessoas recuperadas (por milhão de habitantes)",
                              "log_confirmed": "Casos confirmados (em escala logarítmica)"
                              }
                    )

fig.update_geos(projection={"scale": 30}, 
                fitbounds="locations", 
                visible=False,
                )

fig.update_layout(coloraxis_colorbar_title = "Máximo",
                  coloraxis_colorbar_showticklabels = False)

fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 1500

fig.show()

##### Acumulado de mortes por país ao longo do tempo

In [18]:
#hide_input
fig = px.choropleth(df_deaths,
                    color="log_deaths", 
                    locations="countrycode", hover_name="country",
                    hover_data=["confirmed", "confirmed_1mi", 
                                "deaths", "deaths_1mi", 
                                "recovered", "recovered_1mi"],
                    animation_frame="date",
                    range_color=(0, max(df_deaths["log_deaths"])*1.1),
                    color_continuous_scale='reds',
                    labels = {"date": "Data",
                              "countrycode": "Código do país",
                              "confirmed": "Casos confirmados",
                              "deaths": "Número de mortes",
                              "recovered": "Pessoas recuperadas",
                              "confirmed_1mi": "Casos confirmados (por milhão de habitantes)",
                              "deaths_1mi": "Número de mortes (por milhão de habitantes)",
                              "recovered_1mi": "Pessoas recuperadas (por milhão de habitantes)",
                              "log_deaths": "Número de mortes (em escala logarítmica)"
                              }
                    )

fig.update_geos(projection={"scale": 30}, 
                fitbounds="locations", 
                visible=False,
                )

fig.update_layout(coloraxis_colorbar_title = "Máximo",
                  coloraxis_colorbar_showticklabels = False)

fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 1500

fig.show()