# Where is the curve flattening?

> Inflection-sensitive chart for detecting successful interventions - from the article "How to tell if we're beating Covid-19".

- author: Daniel Cox, Martin Boehler
- categories: [compare, growth, interactive, altair, europe, US, states, world, countries]
- image: images/where-are-we-winning.png
- permalink: /us-inflection/
- toc: true

States/countries will drift off the diagonal when they are flattening the curve.

> Tip: To highlight states/countries click (shift+ for multiple) on the name in the legend. Click outside the legend to highlight all states/countries.


Only entries with at least 100 confirmed cases are considered.

The top 5 entries are initially highlighted.

In [1]:
#hide

import altair as alt
import numpy as np
import pandas as pd
import requests

In [2]:
#hide

# number of minimum cases to be considered in chart:
threshold = 100

In [3]:
#hide

africa = [
    'Algeria',
    'Angola',
    'Benin',
    'Botswana',
    'Burkina Faso',
    'Burundi',
    'Cabo Verde',
    'Cameroon',
    'Central African Republic',
    'Chad',
    'Comoros',
    'Congo (Brazzaville)',
    'Congo (Kinshasa)',
    'Djibouti',
    'Egypt',
    'Equatorial Guinea',
    'Eritrea',
    'Eswatini',
    'Ethiopia',
    'Gabon',
    'Gambia',
    'Ghana',
    'Guinea',
    'Guinea-Bissau',
    'Ivory Coast',
    'Kenya',
    'Lesotho',
    'Liberia',
    'Libya',
    'Madagascar',
    'Malawi',
    'Mali',
    'Mauritania',
    'Mauritius',
    'Morocco',
    'Mozambique',
    'Namibia',
    'Niger',
    'Nigeria',
    'Rwanda',
    'Sao Tome and Principe',
    'Senegal',
    'Seychelles',
    'Sierra Leone',
    'Somalia',
    'South Africa',
    'South Sudan',
    'Sudan',
    'Tanzania',
    'Togo',
    'Tunisia',
    'Uganda',
    'Western Sahara',
    'Zambia',
    'Zimbabwe'
]

In [4]:
#hide

america = [
    'Antigua and Barbuda',
    'Argentina',
    'Bahamas',
    'Barbados',
    'Belize',
    'Bolivia',
    'Brazil',
    'Canada',
    'Chile',
    'Colombia',
    'Costa Rica',
    'Cuba',
    'Dominica',
    'Dominican Republic',
    'Ecuador',
    'El Salvador',
    'Grenada',
    'Guatemala',
    'Guyana',
    'Haiti',
    'Honduras',
    'Jamaica',
    'Mexico',
    'Nicaragua',
    'Panama',
    'Paraguay',
    'Peru',
    'Saint Kitts and Nevis',
    'Saint Lucia',
    'Saint Vincent and the Grenadines',
    'Suriname',
    'Trinidad and Tobago',
    'United States of America',
    'Uruguay',
    'Venezuela'
]

In [5]:
#hide

asiapacific = [
    'Afghanistan',
    'Armenia',
    'Australia',
    'Azerbaijan',
    'Bahrain',
    'Bangladesh',
    'Bhutan',
    'Brunei',
    'Cambodia',
    'China',
    'Cyprus',
    'East Timor',
    'Fiji',
    'Georgien',
    'Hong Kong',
    'India',
    'Indonesia',
    'Iran',
    'Iraq',
    'Israel',
    'Japan',
    'Jordan',
    'Kazakhstan',
    'Kuwait',
    'Kyrgyzstan',
    'Laos',
    'Lebanon',
    'Malaysia',
    'Maldives',
    'Mongolia',
    'Myanmar',
    'Nepal',
    'New Zealand',
    'Oman',
    'Pakistan',
    'Papua New Guinea',
    'Philippines',
    'Qatar',
    'Russia',
    'Saudi Arabia',
    'Singapore',
    'South Korea',
    'Sri Lanka',
    'Syria',
    'Taiwan',
    'Tajikistan'
    'Thailand',
    'Turkey',
    'United Arab Emirates',
    'Uzbekistan',
    'Vietnam',
    'West Bank and Gaza',
    'Yemen'
]

In [6]:
#hide

europe = [
    'Albania',
    'Andorra',
    'Armenia',
    'Austria',
    'Azerbaijan',
    'Belarus',
    'Belgium',
    'Bosnia Herzegovina',
    'Bulgaria',
    'Croatia',
    'Cyprus',
    'Czechia',
    'Denmark',
    'Estonia',
    'Finland',
    'France',
    'Georgien',
    'Germany',
    'Greece',
    'Hungary',
    'Iceland',
    'Ireland',
    'Italy',
    'Kazakhstan',
    'Kosovo',
    'Latvia',
    'Liechtenstein',
    'Lithuania',
    'Luxembourg',
    'Malta',
    'Moldova',
    'Monaco',
    'Montenegro',
    'Netherlands',
    'North Macedonia',
    'Norway',
    'Poland',
    'Portugal',
    'Romania',
    'Russia',
    'San Marino',
    'Serbia',
    'Slovakia',
    'Slovenia',
    'Spain',
    'Sweden',
    'Switzerland',
    'Turkey',
    'Ukraine',
    'United Kingdom',
    'Vatican City'
]

In [7]:
#hide

usa = [
    'Alabama',
    'Alaska',
    'American Samoa',
    'Arizona',
    'Arkansas',
    'California',
    'Colorado',
    'Connecticut',
    'Delaware',
    'Diamond Princess',
    'District of Columbia',
    'Florida',
    'Georgia',
    'Grand Princess',
    'Guam',
    'Hawaii',
    'Idaho',
    'Illinois',
    'Indiana',
    'Iowa',
    'Kansas',
    'Kentucky',
    'Louisiana',
    'Maine',
    'Maryland',
    'Massachusetts',
    'Michigan',
    'Minnesota',
    'Mississippi',
    'Missouri',
    'Montana',
    'Nebraska',
    'Nevada',
    'New Hampshire',
    'New Jersey',
    'New Mexico',
    'New York',
    'North Carolina',
    'North Dakota',
    'Northern Mariana Islands',
    'Ohio',
    'Oklahoma',
    'Oregon',
    'Pennsylvania',
    'Puerto Rico',
    'Rhode Island',
    'South Carolina',
    'South Dakota',
    'Tennessee',
    'Texas',
    'Utah',
    'Vermont',
    'Virgin Islands',
    'Virginia',
    'Washington',
    'West Virginia',
    'Wisconsin',
    'Wyoming'
]

In [8]:
#hide

# load and transform 'csse global' dataset:
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
df_world = pd.read_csv(url, index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
df_world.reset_index(['Lat', 'Long'], drop=True, inplace=True)
df_world.columns.name = 'date'
df_world = df_world.stack().reset_index().set_index('date')
df_world.index = pd.to_datetime(df_world.index)
df_world.columns = ['country', 'state', 'confirmed']

# move Hong Kong to country level:
df_world.loc[df_world.state =='Hong Kong', 'country'] = 'Hong Kong'
df_world.loc[df_world.state =='Hong Kong', 'state'] = np.nan

# clean country names:
df_world['country'] = df_world['country'].replace({'Bosnia and Herzegovina': 'Bosnia Herzegovina'      })
df_world['country'] = df_world['country'].replace({'Georgia'               : 'Georgien'                })
df_world['country'] = df_world['country'].replace({'Timor-Leste'           : 'East Timor'              })
df_world['country'] = df_world['country'].replace({"Cote d'Ivoire"         : 'Ivory Coast'             })
df_world['country'] = df_world['country'].replace({'Burma'                 : 'Myanmar'                 })
df_world['country'] = df_world['country'].replace({'Korea, South'          : 'South Korea'             })
df_world['country'] = df_world['country'].replace({'Taiwan*'               : 'Taiwan'                  })
df_world['country'] = df_world['country'].replace({'US'                    : 'United States of America'})
df_world['country'] = df_world['country'].replace({'Holy See'              : 'Vatican City'            })

# aggregate:
df_world = df_world.groupby(['country', 'date']).sum()

In [9]:
#hide

# load and transform 'csse us' dataset:
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv'
df_usa = pd.read_csv(url, index_col=['UID', 'iso2', 'iso3', 'code3', 'FIPS', 'Admin2', 'Province_State', 'Country_Region', 'Lat', 'Long_', 'Combined_Key'])
df_usa.reset_index(['UID', 'iso2', 'iso3', 'code3', 'FIPS', 'Admin2', 'Lat', 'Long_', 'Combined_Key'], drop=True, inplace=True)
df_usa.columns.name = 'date'
df_usa = df_usa.stack().reset_index().set_index('date')
df_usa.index = pd.to_datetime(df_usa.index)
df_usa.columns = ['country', 'state', 'confirmed']

# aggregate:
df_usa = df_usa.groupby(['country', 'date']).sum()

In [10]:
#hide

# concat data:
df = pd.concat([df_world, df_usa]).reset_index()
df = (df.sort_values(by=['country', 'date'])
        .groupby(['country', 'date'])['confirmed']
        .agg(sum)).reset_index()

# additional measurements
df = df.assign(daily_new_abs=(df.groupby('country', as_index=False)[['confirmed']]
                                .diff()
                                .fillna(0)
                                .astype('int64')))
df = df.assign(daily_new_avg=(df.groupby('country', as_index=False)[['daily_new_abs']]
                                .rolling(7)
                                .mean()
                                .fillna(0)
                                .round(decimals=2)
                                .reset_index(drop=True)))

# slice data
df_africa      = df[(df.confirmed > threshold) & (df.daily_new_avg > 0) & (df['country'].isin(africa))]
df_america     = df[(df.confirmed > threshold) & (df.daily_new_avg > 0) & (df['country'].isin(america))]
df_asiapacific = df[(df.confirmed > threshold) & (df.daily_new_avg > 0) & (df['country'].isin(asiapacific))]
df_europe      = df[(df.confirmed > threshold) & (df.daily_new_avg > 0) & (df['country'].isin(europe))]
df_usa         = df[(df.confirmed > threshold) & (df.daily_new_avg > 0) & (df['country'].isin(usa))]

In [11]:
#hide

alt.data_transformers.disable_max_rows()

def make_chart(data=df):

    countries = data.country.unique().tolist()

    highlighted = data.sort_values('confirmed', ascending=False).groupby('country').head(1).country.tolist()[:5]

    selection = alt.selection_multi(bind='legend',
                                    fields=['country'],
                                    init=[{'country': x} for x in highlighted])

    base = (alt.Chart(data=data)
               .properties(width=550)
               .encode(x=alt.X(scale=alt.Scale(type='log'),
                               shorthand='confirmed:Q',
                               title='Total Confirmed Cases (log scale)'),
                       y=alt.Y(scale=alt.Scale(type='log'),
                               shorthand='daily_new_avg:Q',
                               title='Average Daily New Cases (log scale)'),
                       color=alt.Color(legend=alt.Legend(columns=3,
                                                         symbolLimit=len(countries),
                                                         title='Country/State:'),
                                       scale=alt.Scale(scheme='category20b'),
                                       shorthand='country:N'),
                       tooltip=list(data),
                       opacity=alt.condition(selection, alt.value(1), alt.value(0.05))))

    chart = (base.mark_line()
                 .add_selection(selection)
                 .configure_legend(labelFontSize=10,
                                   titleFontSize=12)
                 .configure_axis(labelFontSize=10,
                                 titleFontSize=12))

    return chart

## Africa

In [12]:
#hide_input
make_chart(df_africa)

## America

In [13]:
#hide_input
make_chart(df_america)

## Asia-Pacific

In [14]:
#hide_input
make_chart(df_asiapacific)

## Europe

In [15]:
#hide_input
make_chart(df_europe)

## United States of America

In [16]:
#hide_input
make_chart(df_usa)

## Explanation
The exponential growth stage of a pandemic must end sometime, either as the virus runs out of people to infect, or as societies get it under control. However, it can be difficult to tell exactly when exponential growth is ending, for several reasons:

* Humans aren't wired to understand exponentials at a glance.
* It can be difficult to compare regions with differing first-infection dates, testing rates, and populations.
* The news tends to report individual data points, without the contextual information necessary to interpret it.
* If the plot doesn't explicitly plot the rate of new cases, a change must be quite dramatic before it becomes distinguishable.

This visualization plots the (sliding average of) daily new cases against the total cases. This has the advantage of aligning all of them onto a baseline trajectory of exponential growth, with a very clear downward plummet when a given state gets the virus under control. As explained in the caveats below, this visualization has a very specific purpose: to make it clear whether a given state has managed to exit the exponential trajectory or not.

_minutephysics_ has an excellent video on this visualization type, [How to tell if we're beating Covid-19](https://youtu.be/54XLXg4fYsc).

## Caveats
1. The logarithmic scales can make it seem as if states are closer together than they actually are. For example, at time of writing (April 5th) New York (the leader in US cases) and New Jersey (the runner-up) look as though it's a close race, but New York has over three times as any cases as New Jersey.
2. The logarithmic scale can also obscure a resurgence of infections after a significant downturn, since the trace won't move much to the right during a short period late in time.
3. Time is not represented by the x-axis, which is unusual for most charts made about Covid-19. This is the plot's main advantage, because it aligns states onto roughly the same trajectory regardless of population or testing rate, but it may be unexpected.
4. The true number of cases is unknown, so the actual slope of the log-log change plot is unknown. All countries/states are also increasing their testing rate over time, so these data may imply that the infection rate is increasing faster than it actually is.
5. The data these plots rely on are incomplete, and come in less smoothly than they may imply. Healthcare systems around the world collect and report data when they can.
6. This chart plots the logarithm of the sliding window average of the daily growth rate on the y-axis, not the raw daily growth rate, because there's too much variability day-to-day to visually detect the trend. This also makes the plot a pessimistic estimate of where each state is on its trajectory.

## References
* The initial animated implementation and descriptions were made by [Daniel Cox](https://twitter.com/danielpcox), with thanks to Henry of _minutephysics_ for [How to tell if we're beating Covid-19](https://youtu.be/54XLXg4fYsc), and [covidtracking.com](covidtracking.com) for US data.
* The visualizations for the US, Africa, America, Asia-Pacific and Europe were made by [Martin Boehler](https://www.linkedin.com/in/martin-boehler/), with thanks to Daniel Cox for this great inspiration and initial implementation, and [Johns Hopkins University CSSE](https://systems.jhu.edu/) for the [2019 Novel Coronavirus Covid-19 (2019-nCoV) Data Repository](https://github.com/CSSEGISandData/COVID-19).