# Relationships and Hierarchy (Notebook 2)

Structured data usually represents a number of instances with their attributes. Those instances usually have some ID column (similiar to primary keys in databases) as well as references to other ID columns (similiar to foreign keys). Understanding those relationships is of high importance.

Our dataset contains countries which we group into continents. Aside of this, we already found a classification using countries wealth.


In this Notebook we will analyse:

* Which countries and continents have the most corona cases?
* How can we visualize continents and countries at the same time?

In [None]:
%run _prepare.ipynb

## Visualize CO2 per continent

In [None]:
countries[countries.year == 2020].groupby("continent").co2.sum().sort_values(ascending=False).plot.barh(title="CO2 2020 per Continent")

In [None]:
top50_countries_by_co2 = countries[filter_most_recent].groupby(["country","continent"]).co2.sum().sort_values(ascending=False).head(50)
top50_countries_by_co2.plot.barh(title="CO2 Aggregated in most emitting 50 countries", figsize=(10,10))

## How could we display both, continent and country relationship?

Most similiar to above, but with ordering and coloring by continent

In [None]:
px.bar(top50_countries_by_co2.reset_index(), y="country", x ="co2", color="continent", height=700)

Switching axis, to make it easier to compare continents: 

In [None]:
px.bar(
    top50_countries_by_co2.reset_index(), 
    x="continent",
    color="country", 
    y="co2", 
    title='CO2 per conintent / country in 2020', 
    height=500, 
    width=1200
)

How ever, even that we only show the top 50 countries, their colors are not unique in this chart and it looks kind of messy

#  📝 Find better ways to visualize both: continent and country

* Does it work for even more then 50 countries?
* What is the main aspect of the visualization
* Which aspect does the visualization hide

For ideas, you can check the [plotly-express examples](https://plotly.com/python/plotly-express)

**Option 1:**

In [None]:
px.icicle(
    countries[filter_most_recent], path=["continent","country"], values="co2", 
 title='CO2 consumption in the last recorded year', height=400, width=800
).update_layout(margin=dict(t=30,l=0,r=0,b=0))

**Option 2:**

In [None]:
px.treemap(
    countries[filter_most_recent], 
    path=[px.Constant("World"), "continent","country"], 
    values="co2", 
    title='CO2 consumption in the last recorded year', 
    height=600, 
    width=1000
)

**Option 3:**

In [None]:
px.sunburst(
    countries[filter_most_recent], 
    path=[px.Constant("World"), "continent","country"], 
    values="co2", 
    title='CO2 consumption in the last recorded year', 
    height=600, 
    width=1000
)

# How can we more quickly iterate over years?

In [None]:
def display_chart(year):
    return px.sunburst(
        countries[countries.year == year], 
        path=[px.Constant("World"), "continent","country"], 
        values="co2", 
        title='CO2 consumption in the last recorded year', 
        height=600, 
        width=1000
    )

In [None]:
source_year = widgets.IntSlider(value=2020, min=1950, max=2020)

In [None]:
interact(display_chart, year=source_year)