# Death Cause by Country

*Table of Contents*

1.   Initialize the Project Requirements
2.   The Datasets
3.   Transforming the Death Cause Reason by Country Dataset
4.   Creating the Selectors
5.   The Geographical Chart
6.   The Top Countries Bar Chart
7.   The Top Causes Bar Chart
8.   The Final Visualization
9.   Bonus Visulization


Originally by **Erik Salsborn** & **Anders Lundkvist**


Further deveopments by **Erik Salsborn**

# 1. Initialize the Project Requirements

In [87]:
import altair as alt
import pandas as pd
import io
import numpy as np
import sys
import seaborn as sns
from vega_datasets import data

alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')


# 2. The Datasets


In [88]:
# Death Cause Reason by Country
df_dcr = pd.read_csv("https://dl.dropbox.com/s/ag3ox1ye3xognle/DeathCauseReasonbyCountry_LaddaUpp-4.csv?dl=0")
# OLD https://dl.dropbox.com/s/k959wjskwnjsn3d/DeathCauseReasonbyCountry.csv?dl=0
# ClEANED https://dl.dropbox.com/s/ag3ox1ye3xognle/DeathCauseReasonbyCountry_LaddaUpp-4.csv?dl=0


# Country Coordinates
df_ccw = pd.read_csv('https://dl.dropbox.com/s/k8weq4ybd23mi78/country-coordinates-world-updatedLADDAUPPNM-3.csv?dl=0')
# OLD https://dl.dropbox.com/s/p6bumbl5he1pblb/country-coordinates-world.csv?dl=0
# CLEANED https://dl.dropbox.com/s/k8weq4ybd23mi78/country-coordinates-world-updatedLADDAUPPNM-3.csv?dl=0


# Population by Country
df_pbc = pd.read_csv('https://dl.dropbox.com/s/uzlir3ovrymqya9/population_by_country_2020.csv?dl=0')


# Continent by Country
df_cbc = pd.read_csv('https://dl.dropbox.com/s/q4dkgkur131xofj/countryContinent-LaddaUPP-4.csv?dl=0', encoding="ISO-8859-1")
# OLD https://dl.dropbox.com/s/7i3c2gl2l8lqxj3/countryContinent.csv?dl=0
# CLEANED https://dl.dropbox.com/s/q4dkgkur131xofj/countryContinent-LaddaUPP-4.csv?dl=0

Lets have a look at all the datasets:


1. Country Coordinates:

In [89]:
df_ccw.head()

Unnamed: 0,latitude,longitude,Country
0,33.93911,67.709953,Afghanistan
1,41.153332,20.168331,Albania
2,28.033886,1.659626,Algeria
3,-14.270972,-170.132217,American Samoa
4,42.546245,1.601554,Andorra


2. Population by Country:

In [90]:
df_pbc.head()

Unnamed: 0,Country (or dependency),Population (2020),Yearly Change,Net Change,Density (P/Km²),Land Area (Km²),Migrants (net),Fert. Rate,Med. Age,Urban Pop %,World Share
0,China,1440297825,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
1,India,1382345085,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
2,United States,331341050,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
3,Indonesia,274021604,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
4,Pakistan,221612785,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %


3. Country by Contintent:

In [91]:
df_cbc.head()

Unnamed: 0,country,code_2,code_3,country_code,iso_3166_2,continent,sub_region,region_code,sub_region_code
0,Afghanistan,AF,AFG,4,ISO 3166-2:AF,Asia,Southern Asia,142.0,34.0
1,Åland Islands,AX,ALA,248,ISO 3166-2:AX,Europe,Northern Europe,150.0,154.0
2,Albania,AL,ALB,8,ISO 3166-2:AL,Europe,Southern Europe,150.0,39.0
3,Algeria,DZ,DZA,12,ISO 3166-2:DZ,Africa,Northern Africa,2.0,15.0
4,American Samoa,AS,ASM,16,ISO 3166-2:AS,Oceania,Polynesia,9.0,61.0


And finally, 4. Death Cause Reason by Country:

In [92]:
df_dcr.head()

Unnamed: 0,Country Name,Covid-19 Deaths,Cardiovascular diseases,Respiratory diseases,Kidney diseases,Neonatal disorders,Meningitis,Malaria,Interpersonal violence,HIV/AIDS,...,Neoplasms,"Fire, heat",Drowning,Drug use disorders,Road injuries,Environmental heat and cold exposure,Self-harm,Conflict and terrorism,Diabetes,Unnamed: 32
0,Afghanistan,2201.0,61995,7082,5637,23701,1563,530,5015,318,...,21247,485,1687,406,8254,59,1613,24295,4817,
1,Albania,1181.0,12904,815,329,161,13,0,57,2,...,4705,18,36,29,243,4,152,0,175,
2,Algeria,2762.0,97931,7528,8201,8756,292,0,459,264,...,23816,782,526,526,11051,40,1515,13,5328,
3,Andorra,84.0,169,39,16,0,0,0,0,3,...,230,0,0,0,8,0,8,0,9,
4,Angola,33.0,25724,3934,2464,18189,2520,10784,974,16802,...,12791,513,793,80,9253,114,1928,16,4033,


# 3. Transforming the Death Cause Reason by Country Dataset

To facilitate the visualization of the number of deaths attributed to each cause in every country, we aim to **transform** the *Death Cause Reason* dataset such that each row represents a country, the cause of death, and the corresponding number of deaths. This transformation can be achieved using pandas melt operation.

In [93]:
df_dcr_melted = pd.melt(df_dcr,id_vars=['Country Name'],
var_name='Cause',
value_name='Number of Cases')

# Drop any NaN's and 0s
df_dcr_melted = df_dcr_melted.dropna()
df_dcr_melted = df_dcr_melted[df_dcr_melted['Number of Cases'] != 0.0]

Now, lets have a look at the transformed Death Cause Reason by Country

In [94]:
df_dcr_melted.head()

Unnamed: 0,Country Name,Cause,Number of Cases
0,Afghanistan,Covid-19 Deaths,2201.0
1,Albania,Covid-19 Deaths,1181.0
2,Algeria,Covid-19 Deaths,2762.0
3,Andorra,Covid-19 Deaths,84.0
4,Angola,Covid-19 Deaths,33.0


As each country has a unique population, our objective is to showcase the number of cases per capita. To achieve this, we must generate a new column labeled 'Cases per Capita.' Let's reexamine the population dataset

In [95]:
df_pbc.head()

Unnamed: 0,Country (or dependency),Population (2020),Yearly Change,Net Change,Density (P/Km²),Land Area (Km²),Migrants (net),Fert. Rate,Med. Age,Urban Pop %,World Share
0,China,1440297825,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
1,India,1382345085,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
2,United States,331341050,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
3,Indonesia,274021604,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
4,Pakistan,221612785,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %


Let's begin by merging the **Population (2020)** of each country to the Death Cause Reason by Country dataset.

In [96]:
# Renaming the column 'Country (or dependency)' to 'Country Name'
# to match the df_dcr_melted column for Countries
df_pbc = df_pbc.rename(columns={'Country (or dependency)': 'Country Name'})

df_dcr_melted = pd.merge(
    df_dcr_melted,
    df_pbc[['Country Name', 'Population (2020)']], on='Country Name'
)

df_dcr_melted.head()

Unnamed: 0,Country Name,Cause,Number of Cases,Population (2020)
0,Afghanistan,Covid-19 Deaths,2201.0,39074280
1,Afghanistan,Cardiovascular diseases,61995.0,39074280
2,Afghanistan,Respiratory diseases,7082.0,39074280
3,Afghanistan,Kidney diseases,5637.0,39074280
4,Afghanistan,Neonatal disorders,23701.0,39074280


Excellent! With the addition of the new column **Population (2020)** to the **Death Cause Reason by Country** dataset, our next step is to create another column called "Cases Per Capita"

In [97]:
df_dcr_melted['Cases Per Capita'] = df_dcr_melted['Number of Cases'] / df_dcr_melted['Population (2020)']
df_dcr_melted.head()

Unnamed: 0,Country Name,Cause,Number of Cases,Population (2020),Cases Per Capita
0,Afghanistan,Covid-19 Deaths,2201.0,39074280,5.6e-05
1,Afghanistan,Cardiovascular diseases,61995.0,39074280,0.001587
2,Afghanistan,Respiratory diseases,7082.0,39074280,0.000181
3,Afghanistan,Kidney diseases,5637.0,39074280,0.000144
4,Afghanistan,Neonatal disorders,23701.0,39074280,0.000607


Fantastic! Now that all the data is ready for visualization, let's create some selectors that will be used in the final visualization to filter the data

# 4. Creating the Selectors

As our prototype states we want to have a couple of different selections and filters. These are the three types:  

1.   Being able to **select** a dot/country in the geoChart and to see more information of that particular country in the **Top Cause Bar Chart**
2.   Dropdown-menu to select and filter by **Cause of Death**
3.   Being able to select and filter by **continent** in the legend

## Single selection - Clicking on and filter by country

Let's start by creating the first one, which is fairly straightforward. All we need to do is create a selection_single using the Altair library and base the selection on the country, in our case, using the 'Country Name' column.

In [98]:
selector = alt.selection_single( fields=['Country Name'], empty='all')

## Dropdown-menu - selecting and filtering by a death cause

The second selector involves filtering by different causes of death using a dropdown menu. This is quite similar to our first selector; however, this time, it needs to be an actual dropdown menu. Therefore, we will initially create a binding_select to generate a dropdown menu and then apply it to our selector named 'dropselect.'

In [99]:
input_dropdown = alt.binding_select(
    options = ['Covid-19 Deaths','Cardiovascular diseases','Respiratory diseases ','Kidney diseases','Neonatal disorders ','Meningitis ','Malaria ','Interpersonal violence','HIV/AIDS','Tuberculosis','Maternal disorders','Lower respiratory infections','Alcohol use disorders','Diarrheal diseases','Poisoning','Nutritional deficiencies',' Alzheimers disease','Parkinsons disease',' Acute hepatitis','Digestive diseases',' Cirrhosis and other chronic liver diseases','Protein-energy malnutrition','Neoplasms',"Fire, heat",'Drowning','Drug use disorders','Road injuries','Environmental heat and cold exposure','Self-harm',' Conflict and terrorism','Diabetes '], name="Death cause: ")

dropSelect = alt.selection_single(fields=['Cause'], init={'Cause':'Covid-19 Deaths'},
                                 bind=input_dropdown)


## Single selection (Legend) - clicking on and filter by continent

Our final selection is a legend containing all continents that can be selected and used for filtering. Now, let's utilize the 'legend' in our bind parameter to transform it into an actual legend.

In [100]:
selection = alt.selection_single(fields=['continent'], bind='legend', )

Now, all our selectors are prepared and ready to be applied to all of our charts.

# 5. The Geographical Chart

Let's begin creating our initial plots. The geographical chart should consist of dots, with each dot representing a different country. The size of each dot should vary based on the 'Cases per Capita' column and be filtered by the cause of death. Additionally, these dots should be positioned according to their actual world coordinates.

## Circle/Dot chart

We will create a mark_circle chart. For the longitude and latitude positions, we will utilize the actual coordinates of the countries. The size of the circles will be contingent on the cases per capita. The color will be determined by the continent to which the country belongs. Finally, we will implement a tooltip to display Country Name, Cases per Capita, and the number of cases.

In [101]:
geoChart = alt.Chart(df_dcr_melted
).transform_filter(
  dropSelect

).transform_filter(
  selection

).transform_lookup( # lookup to get the position for the country
    lookup='Country Name',
    from_=alt.LookupData(df_ccw,'Country', ['longitude', 'latitude'])

).transform_lookup( # lookup to get the continent of the country
    lookup='Country Name',
    from_=alt.LookupData(df_cbc,'country', ['continent'])

).mark_circle().encode(
    longitude='longitude:Q',
    latitude='latitude:Q',
    size=alt.Size('Cases Per Capita:Q', title='Cases Per Capita', legend=alt.Legend(orient='right')),
    color= alt.Color('continent:N', title='Continents', legend=alt.Legend(orient='right')),
    opacity = alt.condition(selector, alt.value(1), alt.value(0.17)),
    tooltip=['Country Name:N', 'Cases Per Capita:Q', 'Number of Cases:Q']

).add_selection(
  dropSelect

).add_selection(
  selector

).add_selection(
  selection
)


Lets have a look at what we just did:

In [102]:
geoChart

Looks good, but we are missing a world background. Lets add that.

## World background

To ensure coherence in our geographical plot, it is essential that the dots symbolizing each country on the map are positioned on top of a world map. Let's create a TopoJSON feature collection that encompasses information about countries across the globe.

In [103]:
source = alt.topo_feature(data.world_110m.url, 'countries')

world_background = alt.Chart(source).mark_geoshape(
    fill='lightgrey',
    stroke='white'
).properties(
    width=900,
    height=500
).project('naturalEarth1')

world_background

Now lets combine the geoChart and the map





## The results

In [104]:
world_background + geoChart

# 6. The Top Countries Bar Chart

To create this bar chart, we utilize the mark_bar with the x-axis representing Cases per Capita and the y-axis representing Country Name. We then apply tooltips for Country Name, Continent, and Cases per Capita, along with the total number of cases.


In [105]:
barChart_toplist = alt.Chart(df_dcr_melted
).properties(
    width=200

).transform_filter(
    selection

).transform_filter(
    dropSelect

).transform_lookup( # lookup Country from the other dataset and get the continent of each country

    lookup='Country Name',
    from_=alt.LookupData(df_cbc,'country', ['continent'])

).mark_bar().encode(
    x = alt.X('Cases Per Capita:Q', title='Cases Per Capita'),
    y = alt.Y('Country Name:N', title='Country' , sort='-x'),
    tooltip=['Country Name:N', 'continent:N', 'Cases Per Capita:Q', 'Number of Cases:Q'],
    color = 'continent:N' ,
    opacity = alt.condition(selector, alt.value(1), alt.value(0.17))

).transform_window(
    rank='rank(Cases Per Capita)',
    sort=[alt.SortField('Cases Per Capita', order='descending')]

).transform_filter(
    (alt.datum.rank < 17)

).properties(
    title="Toplist"

).add_selection(
  dropSelect

).add_selection(
  selector
)


Lets have a look at our bar chart together with what we have done so far:

In [106]:
alt.vconcat(world_background + geoChart , barChart_toplist)

# 7. The Top Causes Bar Chart

Looks good. Now, finally, let's add the last element to our first visualization, the top causes bar chart. To achieve this, we use a similar approach as the previous bar chart with the mark_bar.

In [107]:
barChart_commoncauses = alt.Chart(df_dcr_melted
).properties(
    width=200

).mark_bar(

).transform_filter(
    selector

).transform_aggregate( # Aggregate to count the number of cases for each cause
    totalIncidents='sum(Cases Per Capita):Q',
    groupby=["Cause"]

).encode(
    x = alt.X('totalIncidents:Q', title='Cases Per Capita'),
    y = alt.Y('Cause:N', title='Death Cause', sort='-x'),
    tooltip=['Cause:N','totalIncidents:Q'],
    opacity=alt.condition(selector, alt.value(0), alt.value(1))

).transform_window(
    rank='rank(totalIncidents)',
    sort=[alt.SortField('totalIncidents', order='descending')]

).transform_filter(
    (alt.datum.rank < 17)

).properties(
    title= 'Most Common Causes of Death For Selected Country'
)

# 8. The Final Visualization

Let's have a look at the top causes bar chart together with the rest of the visualization:

In [108]:
alt.vconcat(world_background + geoChart , barChart_toplist |  barChart_commoncauses)

# 9. Bonus Visualization

Finally, we are set to create our second visualization. This visualization aims to use a dot plot to illustrate the relationship between population and the number of cases. This approach allows you to analyze the influence of population on the number of deaths for specific cases.

This is achieved simply by using a mark_point (scatter chart) with population as the x-coordinate and the number of cases as the y-coordinate.

In [109]:
scatterChart = alt.Chart(df_dcr_melted

).transform_filter(
  dropSelect

).transform_lookup( # lookup to get the populaiton of each country
    lookup='Country Name',
    from_=alt.LookupData(df_pbc,'Country Name', ['Population (2020)'])

).transform_lookup( # lookup to get the continent of each country
    lookup='Country Name',
    from_=alt.LookupData(df_cbc,'country', ['continent'])

).mark_point(size=70).encode(
    x = alt.X('Population (2020):Q', axis=alt.Axis(title='Population')),
    y = alt.Y('Number of Cases:Q', axis=alt.Axis(title='Number of Cases')),
    tooltip = ['Country Name:N', 'continent:N', 'Population (2020):Q','Number of Cases:Q'],
    color = alt.Color('continent:N', title='Continents'),
    opacity = alt.condition(selection, alt.value(1), alt.value(0.17))

).properties(
    width = 760,
    height = 560
)

scatter_plots = scatterChart.add_selection(dropSelect).add_selection(selection).interactive()

Lets have a look at the results, our second visualization

In [110]:
scatter_plots