## Summary notes

Visualising the share of deaths by different causes by country.

<blockquote class="twitter-tweet"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/r4ds?src=hash&amp;ref_src=twsrc%5Etfw">#r4ds</a> presents Week 3 of <a href="https://twitter.com/hashtag/TidyTuesday?src=hash&amp;ref_src=twsrc%5Etfw">#TidyTuesday</a>! Let&#39;s explore global causes of mortality!<br><br>Make a meaningful graphic, and post your code!<br><br>Data: <a href="https://t.co/ygKv8PqOfI">https://t.co/ygKv8PqOfI</a><br>Article: <a href="https://t.co/MOnlCBzdaL">https://t.co/MOnlCBzdaL</a><br>Blog: <a href="https://t.co/cZJ94Hhz7U">https://t.co/cZJ94Hhz7U</a> <a href="https://twitter.com/hashtag/tidyverse?src=hash&amp;ref_src=twsrc%5Etfw">#tidyverse</a> <a href="https://twitter.com/hashtag/rstats?src=hash&amp;ref_src=twsrc%5Etfw">#rstats</a> <a href="https://twitter.com/hashtag/dataviz?src=hash&amp;ref_src=twsrc%5Etfw">#dataviz</a> <a href="https://twitter.com/hashtag/ggplot2?src=hash&amp;ref_src=twsrc%5Etfw">#ggplot2</a> <a href="https://twitter.com/R4DScommunity?ref_src=twsrc%5Etfw">@R4DScommunity</a> <a href="https://t.co/52rktsOcSQ">pic.twitter.com/52rktsOcSQ</a></p>&mdash; Tom Mock ❤️ Quarto (@thomas_mock) <a href="https://twitter.com/thomas_mock/status/985864534832402432?ref_src=twsrc%5Etfw">April 16, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

## Dependencies

In [1]:
import os
import requests
import polars as pl
import pandas as pd
import altair as alt
from vega_datasets import data as vdata

## Functions

In [2]:
def cache_file(url: str, fname: str, dir_: str = './__cache') -> str:
    """Cache the file at given url in the given dir_ with the given
    fname and return the local path.

    Preconditions:
    - dir_ exists
    """
    local_path = f'{dir_}/{fname}'
    if fname not in os.listdir(dir_):
        r = requests.get(url, allow_redirects=True)
        open(local_path, 'wb').write(r.content)
    return local_path

## Main

### Set theme

In [3]:
alt.themes.enable('latimes')

ThemeRegistry.enable('latimes')

### Cache the data

In [4]:
gmortality_path = cache_file(
    url=('https://github.com/rfordatascience/tidytuesday/blob/master'
         + '/data/2018/2018-04-16/global_mortality.xlsx?raw=true'),
    fname='global_mortality.xlsx'
)

In [5]:
iso_path = cache_file(
    url=('https://raw.githubusercontent.com/lukes/'
         + 'ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'),
    fname='iso_3166.csv'
)

### Load the data

In [6]:
mortality = pl.DataFrame(pd.read_excel(gmortality_path)).lazy()
mortality.schema

{'country': polars.datatypes.Utf8,
 'country_code': polars.datatypes.Utf8,
 'year': polars.datatypes.Int64,
 'Cardiovascular diseases (%)': polars.datatypes.Float64,
 'Cancers (%)': polars.datatypes.Float64,
 'Respiratory diseases (%)': polars.datatypes.Float64,
 'Diabetes (%)': polars.datatypes.Float64,
 'Dementia (%)': polars.datatypes.Float64,
 'Lower respiratory infections (%)': polars.datatypes.Float64,
 'Neonatal deaths (%)': polars.datatypes.Float64,
 'Diarrheal diseases (%)': polars.datatypes.Float64,
 'Road accidents (%)': polars.datatypes.Float64,
 'Liver disease (%)': polars.datatypes.Float64,
 'Tuberculosis (%)': polars.datatypes.Float64,
 'Kidney disease (%)': polars.datatypes.Float64,
 'Digestive diseases (%)': polars.datatypes.Float64,
 'HIV/AIDS (%)': polars.datatypes.Float64,
 'Suicide (%)': polars.datatypes.Float64,
 'Malaria (%)': polars.datatypes.Float64,
 'Homicide (%)': polars.datatypes.Float64,
 'Nutritional deficiencies (%)': polars.datatypes.Float64,
 'Meningit

In [7]:
iso = pl.DataFrame(pd.read_csv(iso_path)).lazy()
iso.schema

{'name': polars.datatypes.Utf8,
 'alpha-2': polars.datatypes.Utf8,
 'alpha-3': polars.datatypes.Utf8,
 'country-code': polars.datatypes.Int64,
 'iso_3166-2': polars.datatypes.Utf8,
 'region': polars.datatypes.Utf8,
 'sub-region': polars.datatypes.Utf8,
 'intermediate-region': polars.datatypes.Utf8,
 'region-code': polars.datatypes.Float64,
 'sub-region-code': polars.datatypes.Float64,
 'intermediate-region-code': polars.datatypes.Float64}

In [8]:
countries = alt.topo_feature(vdata.world_110m.url, 'countries')

### Visualise the data

In [9]:
_query = mortality.filter(
    (pl.col('country') == 'Vanuatu')
    & (pl.col('year') == 2003)
).melt(
    id_vars=['country', 'country_code', 'year'],
    variable_name='cause',
    value_name='share (%)'
).with_columns(
    [pl.col('cause').str.replace(r' (%)', '', literal=True),
     pl.col('share (%)').round(2)]
)

_ch = alt.Chart(
    _query.collect().to_pandas()
).encode(
    x='share (%):Q',
    y=alt.Y('cause:N', sort='-x'),
)
_bars = _ch.mark_bar().encode(
    color=alt.Color('cause:N', legend=None)
)
_text = _ch.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='share (%):Q',
)
(_bars + _text).properties(
    width=500,
    title='Share of deaths by cause, Vanuatu, 2003'
).configure_title(
    fontSize=20,
    anchor='start'
)

In [10]:
_query = mortality.join(
    other=iso,
    left_on='country_code',
    right_on='alpha-3'
).filter(
    (pl.col('year') == 2016)
    & (pl.col('region') == 'Europe')
).select(
    ['country',
     pl.col('country-code').alias('country_id'),
     pl.col('Cardiovascular diseases (%)').round(2).alias('share (%)')]
).collect(
).to_pandas(
)

alt.Chart(
    _query
).mark_geoshape(
    stroke='black'  # adds country borders
).encode(
    shape='geo:G',
    color=alt.Color('share (%):Q', scale=alt.Scale(scheme="reds")),
    tooltip=['country:N',
             'share (%):Q']
).transform_lookup(
    lookup='country_id',
    from_=alt.LookupData(data=countries, key='id'),
    as_='geo'
).project(
    type='mercator',
    scale=415,
    center=[15, 54],
    clipExtent=[[0, 0], [600, 400]],
).properties(
    width=650,
    height=400,
    title=('Percentage share of deaths by Cardiovascular diseases'
           + ' in Europe, 2016')
)

Due to an issue with plotting world maps in Altair[^2], we used a different kernel to prevent conflicts with dependencies.

[^2]: See [Compatibility issues with Python 3.9.7 and altair 4.1.0?](https://github.com/altair-viz/altair/issues/2504)