<a href="https://colab.research.google.com/github/laventura/covid19/blob/master/Covid19_Fatality_Curves_FT_style.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Covid-19 Fatality Curves (FT-style chart)

Compare fatality curves of Covid-19 cases, by country, creating a FT-style chart

* Author: Atul Acharya
* Credit: code based on Pratap Vardhan. Visualization original: John Burn-Murdoch (@jburnmurdoch) of Financial Times
* categories: [growth, fatalities, interactive]


In [0]:
#hide
import pandas as pd
import altair as alt
import numpy as np
from IPython.display import HTML

In [0]:
#hide
url = ('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/'
       'csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
df = pd.read_csv(url)


In [0]:
#hide
# rename countries
df.rename(columns={'Country/Region': 'Country'}, inplace=True)
df['Country'] = df['Country'].replace({'Korea, South': 'South Korea'})
df = df[~df['Country'].isin(['Cruise Ship', 'Diamond Princess'])]   # Remove Ships
# drop unwanted columns
df.drop(columns=['Province/State', 'Lat', 'Long'], inplace=True)


In [0]:
#hide
date_cols = df.columns[~df.columns.isin(['Country'])]
## reformat table to group columns by country
df_country = (df.groupby('Country')[date_cols].sum())
COL_CONFIRMED_DEATHS = 'Confirmed Deaths'
df_country = df_country.stack().reset_index(name=COL_CONFIRMED_DEATHS)
df_country = df_country.rename(columns={'level_1': 'Date'})
df_country['Date'] = pd.to_datetime(df_country['Date'], format='%m/%d/%y')

In [0]:
#hide
## minimum cases
MIN_CASES = 10
LAST_DATE = date_cols[-1]

#someties last col may be empty; if so, go backwards
for c in date_cols[::-1]:
  if not df[c].fillna(0).eq(0).all():
    LAST_DATE = c
    break


In [0]:
#hide 
## get list of countries where MIN_CASES is met
countries = df_country[df_country['Date'].eq(LAST_DATE) & 
                       df_country[COL_CONFIRMED_DEATHS].ge(MIN_CASES)].sort_values(by=COL_CONFIRMED_DEATHS, ascending=False)
countries = countries['Country'].values

In [7]:
countries

array(['Italy', 'Spain', 'China', 'France', 'US', 'Iran',
       'United Kingdom', 'Netherlands', 'Germany', 'Belgium',
       'Switzerland', 'Turkey', 'Brazil', 'South Korea', 'Sweden',
       'Portugal', 'Indonesia', 'Austria', 'Canada', 'Philippines',
       'Denmark', 'Romania', 'Ecuador', 'Ireland', 'Japan', 'Iraq',
       'Greece', 'Dominican Republic', 'Egypt', 'Malaysia', 'Algeria',
       'Morocco', 'Norway', 'India', 'Poland', 'San Marino', 'Panama',
       'Peru', 'Czechia', 'Argentina', 'Luxembourg', 'Pakistan', 'Mexico',
       'Australia', 'Serbia', 'Israel', 'Hungary', 'Finland', 'Ukraine',
       'Colombia', 'Burkina Faso', 'Slovenia', 'Lebanon', 'Albania',
       'Bosnia and Herzegovina'], dtype=object)

In [0]:
#hide
## for countries with > 10 Fatalities
SINCE_CASES_MIN = 10  # Days since at least 10 fatalities
COL_X = f'Days since {SINCE_CASES_MIN}th fatality'

#new df
dff2 = df_country[df_country['Country'].isin(countries)].copy()

days_since = dff2.assign(
      F=dff2[COL_CONFIRMED_DEATHS].ge(SINCE_CASES_MIN)
    ).set_index('Date').groupby('Country')['F'].transform('idxmax')

In [0]:
#hide
# 
dff2[COL_X] = (dff2['Date'] - days_since.values).dt.days.values
# keep only countries > 10 fatalities
dff2 = dff2[dff2[COL_X].ge(0)]

In [0]:
#hide
# color mapping for countries
def get_country_colors(x):
    mapping = {
        'Italy': 'black',
        'Iran': '#A1BA59',
        'South Korea': '#E45756',
        'Spain': '#F58518',
        'Germany': '#9D755D',
        'France': '#9B1005',
        'China': '#F3A514',
        'US': '#2495D3',
        'Switzerland': '#9D755D',
        'Norway': '#C1B7AD',
        'United Kingdom': '#2495D3',
        'Netherlands': '#C1B7AD',
        'Sweden': '#C1B7AD',
        'Belgium': '#C1B7AD',
        'Denmark': '#C1B7AD',
        'Austria': '#C1B7AD',
        'Japan': '#9467bd'}
    return mapping.get(x, '#C1B7AD')

In [0]:
#hide

# baseline countries -  interesting ones
base_countries = ['Italy', 'US', 'United Kingdom', 'South Korea', 'Japan', 'China']

max_date = dff2['Date'].max()

color_domain = list(dff2['Country'].unique())
color_range  = list(map(get_country_colors, color_domain))

COL_Y_CUMU_DEATHS = 'Cumulative Fatalities'

## Fatatilies growth trajectory - US, UK, Spain, Italy, S. Korea

Recent trends since last week (Mar 27th onwards) have shown number of cases, and deaths, in the US accelerating. How are the growth curves evolving so far?

From the chart below, it appears that:
* UK's covid-19 fatalities are doubling ~2.1 days
* US fatalities are doubling once every ~3 days
* Spain's fatalities, which were growing fast, are now stabilising to double every ~2.3 days

In [12]:
#hide_input
HTML(f'<small class="float-right">Last updated on {pd.to_datetime(LAST_DATE).strftime("%B, %d %Y")}</small>')

In [0]:
#hide
#### Make fatalities growth curves, with FT-style chart

def make_fatality_curves_chart(highlight_countries=['France', 'Spain', 'United Kingdom'], baseline_countries=base_countries):
  '''
  '''
  # A. selection
  selection = alt.selection_multi(fields=['Country'],
                                  bind='legend',
                                  init=[{'Country': c} for c in highlight_countries + baseline_countries])
  
  # B. base chart
  base_chart = alt.Chart(dff2, width=800, height=500).encode(
      x=alt.X(f'{COL_X}:Q',
              scale=alt.Scale(domain=(0,40))),  # upto 40 days 
      y=alt.Y(f'{COL_CONFIRMED_DEATHS}:Q', 
              scale=alt.Scale(type='log', domain=(10,30_000)),  # up to 30K
              axis=alt.Axis(title=COL_Y_CUMU_DEATHS)),
      color=alt.Color(
          'Country:N',
          scale=alt.Scale(domain=color_domain, range=color_range),
          legend=alt.Legend(columns=len(color_domain)//15+1, symbolLimit=len(color_domain))),
      tooltip=list(dff2),
      opacity=alt.condition(selection, alt.value(1), alt.value(0.05))
  )

  # C. ref growth lines: series that double every d days
  max_day = dff2[COL_X].max()
  Xs = [x for x in range(0, max_day+1)]

  # exp growth rate, k; for doublings every 'd' day
  Ks = [np.log(2) / d for d in [1, 2, 3, 7]]
  # daily growth rate (%)
  Rs = [np.exp(k) - 1 for k in Ks]   

  #columns for ref growth lines
  Ys = ['a1', 'a2', 'a3', 'a7']
  ref_colors = ['#9494B8', '#B3B3CC', '#D1D1E0', '#E0E0EB']
  yseries = []
  for r in Rs:
    y = [SINCE_CASES_MIN * (1+r)**t for t in range(0, max_day+1)]
    yseries.append(y)
  # create a DF with growth lines
  ref_growth_df = pd.DataFrame({COL_X: Xs,
                    Ys[0]: yseries[0],
                    Ys[1]: yseries[1],
                    Ys[2]: yseries[2],
                    Ys[3]: yseries[3]})
  # D. ref growth lines chart
  ref_growth_base = alt.Chart(ref_growth_df).transform_fold(
      Ys,
  ).mark_line(clip=True, strokeDash=[2,2]).encode(
      alt.X(f'{COL_X}:Q', scale=alt.Scale(domain=(0,40))),  # only upto N days on X-axis
      alt.Y('value:Q', scale=alt.Scale(type='log', domain=(10,30_000))),  # up to N on y-axis
      color=alt.Color(
        'key:N',
        scale=alt.Scale(domain=Ys, range=ref_colors), legend=None),
      # dont really need any condition; it is junk
      opacity=alt.condition(alt.FieldEqualPredicate(field='y1',equal=1), alt.value(0.5), alt.value(0.5)),
  )

  # E. Ref lines text - a lot of hardcoding for alignment
  rdf1 = pd.DataFrame({COL_X: [10], 'value': 20_000 })
  rtxt1 = alt.Chart(rdf1).mark_point(
      filled=False, color='white').encode(x=COL_X,y='value').mark_text(
          text='DEATHS DOUBLE EVERY DAY',
          align='center',
          angle=299, 
          dy=5,
          color='gray',
          opacity=0.5
      )
  rdf2 = pd.DataFrame({COL_X: [22], 'value': 30_000 })
  rtxt2 = alt.Chart(rdf2).mark_point(
      filled=False, color='white').encode(x=COL_X,y='value').mark_text(
          text='...EVERY 2 DAYS',
          align='left',
          angle=318, 
          dy=5,
          color='gray',
          opacity=0.5
      )
  rdf3 = pd.DataFrame({COL_X: [36], 'value': 50_000 })
  rtxt3 = alt.Chart(rdf3).mark_point(
      filled=False, color='white').encode(x=COL_X,y='value').mark_text(
          text='...EVERY 3 DAYS',
          align='center',
          angle=326, 
          dy=-2,
          color='gray',
          opacity=0.5
      )
  rdf4 = pd.DataFrame({COL_X: [36], 'value': 400 })
  rtxt4 = alt.Chart(rdf4).mark_point(
      filled=False, color='white').encode(x=COL_X,y='value').mark_text(
          text='...EVERY WEEK',
          align='left',
          angle=342,
          dy=-2,
          color='gray',
          opacity=0.5
      )

  # F. add actual growth curves to chart
  full_chart = (
      
      # base curves
      base_chart.mark_line(point=True, clip=True).add_selection(selection) + 
      # mark country names, excl China
      base_chart.transform_filter(
        alt.datum['Country'] != 'China' &
        alt.datum['Date'] >= int(max_date.timestamp() * 1000)  # write country code at end of line
      ).mark_text(dy=-8, align='right').encode(text='Country:N') + 
      # mark China
      base_chart.transform_filter( 
          alt.datum['Country'] == 'China' & 
          alt.datum[COL_X] == 38 & 
          alt.datum[COL_CONFIRMED_DEATHS] == 2837
      ).mark_text(align='center', dy=-8).encode(text='Country:N') + 
      # ref growth lines - add this at the end
      ref_growth_base +
      # add the text for ref growth lines
      rtxt1 + 
      rtxt2 + 
      rtxt3 + 
      rtxt4
  ).properties(
      title=f"Coronoavirus Deaths in Countries",
  ).configure_title(
      font='Helvetica Neue',
      fontSize=20,
      anchor='middle',
      color='gray'
  ).configure_legend(
      labelFontSize=13, 
      titleFontSize=15
  ).configure_axis(
      labelFontSize=13, 
      titleFontSize=15
  ).configure_view(
      fill='#FFF1E4', # FT bg color
  )

  return full_chart

In [14]:
chart1 = make_fatality_curves_chart()
chart1

In [15]:
#hide_input
HTML(f'<small class="float-right">Author: @AtulAcharya (based on code credit: @PratapVardhan, viz credit: @jburnmurdoch<)/small>')

### Sources
Interactive by [@AtulAcharya](https://twitter.com/AtulAcharya)[^1]

Based on code by [@PratapVardhan](https://github.com/pratapvardhan/notebooks/blob/master/covid19/covid19-compare-country-death-trajectories.ipynb)

Visualization style: [John Burn-Murdoch (@jburnmurdoch)](https://twitter.com/jburnmurdoch) of [Financial Times](https://www.ft.com/coronavirus-latest)

[^1]: Source: ["2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE"](https://systems.jhu.edu/research/public-health/ncov/) [GitHub repository](https://github.com/CSSEGISandData/COVID-19). 

