In [None]:
%run _prepare.ipynb

# Analysing time (Notebook 3)

Time is the fourth dimension in our world and an essential component of most datasets. Therefore it is often helpful to analyse important features at different points of time, to get a full understanding.

While time is a truly contionus scale, it has multiple repretitions:
 * Every 24h we have day
 * Every 7 days, we have a week
 * Some varrying amount of days between 27 and 31 say it is a month
 * Most of the time, 365 days form a year.
 
Technically it is often a bit annoying to work with a proper time format for reasons of parsing, timezones and similiar details, that in most use cases don't make a real impact. Therefore is is totally valid to focus on only those abstractions we are interested in, to not loose track. In this case, our dataset makes it easy, as it gives us one row per country and year.

The **most commonly used charts for time-series** are:

* **line-charts**: Visualize independent timelines 
* **area-charts**: Visualize aggregating timelines

## Visualizing Timelines

### Using pandas plotting backend

Plots created with the pandas API require the wide-column format:

In [None]:
plot_df_pandas = countries.groupby(["year","continent"]).co2.sum().unstack("continent").reset_index()

In [None]:
plot_df_pandas.plot(x="year", figsize=(20,5), kind="line")

In [None]:
plot_df_pandas.plot(x="year", figsize=(20,5), kind="area")

### Using plotly

Plotly (and Altair) prefer the long-data format:

In [None]:
plot_df_px = countries.groupby(["year","continent"]).co2.sum().reset_index()

In [None]:
px.line(plot_df_px, y="co2", x="year", color="continent")

In [None]:
px.area(plot_df_px, y="co2", x="year", color="continent")

### Using Altair

In [None]:
alt.data_transformers.disable_max_rows()

In [None]:
alt.Chart(plot_df_px, width=1009, height=250).mark_line().encode(
    x="year", 
    y="co2",
    color="continent"
)

Altair also offers the option to dynamically do aggregations for us:

## Using different encodings

Altair uses pandas dtype attribute to choose a default data encoding. How ever you can easily change those manually:


| Data Type | Shorthand Code | Description |
| ----------- | ----------- | ----------- |
|quantitative| Q| a continuous real-valued quantity|
|ordinal| O| a discrete ordered quantity|
|nominal| N| a discrete unordered category|
|temporal| T| a time or date value|
|geojson| G| a geographic shape|

In [None]:
alt.Chart(continents.query("year>1990"), width=500).mark_point().encode(
    x=alt.X("year", scale=alt.Scale(zero=False)), 
    y=alt.Y("co2"), 
    color="co2_per_capita",
    tooltip=["country"],
    shape="country"
)

Split one chart Chart into multiple

In [None]:
alt.data_transformers.disable_max_rows()
alt.Chart(countries.query("year>1900"), width=1009, height=150).mark_area().encode(
    x="year", 
    y=alt.Y(field='co2', aggregate='sum', type='quantitative'),
    color="country",
    row="continent"
)

# More Timelines with Altair

In [None]:
raw_data_info.co2_per_capita

In [None]:
fig = alt.Chart(countries, width=900, height=250).mark_line().encode(
    x="year", 
    y=alt.Y(field='co2_per_capita', aggregate='mean', type='quantitative'),
    color="continent"
)

In [None]:
base_line_chart = alt.Chart(countries, width=900, height=250).mark_line().encode(
    x="year", 
    color="continent"
)

base_line_chart.encode(alt.Y(field='co2_per_capita', aggregate='mean')) & base_line_chart.encode(alt.Y(field='co2', aggregate='sum'))

# 📝 Task: Make a similiar chart that shows the usage of the different co2 sources

* Hint: Altair requires the [long-data-format](https://de.wikipedia.org/wiki/Wide-Format_und_Long-Format). Use the `.melt()` command an your dataframe
* You can use the `cols_co2_sources` variable

In [None]:
cols_co2_sources

In [None]:
plot_df = countries[["country","continent", "year", *cols_co2_sources]].melt(["country","continent","year"])

In [None]:
co2_sources_chart = alt.Chart(plot_df, width=900, height=250).mark_line().encode(
    x="year", 
    y=alt.Y(field='value', aggregate='sum', type='quantitative'),
    color="variable"
)

In [None]:
co2_sources_chart

# Altair Interactivity

Altair's super power are selections, filters and transformations. By using those, we can create great, interactive charts.

In [None]:
import altair as alt

brush = alt.selection(type='interval', encodings=['x'])

plot_df = countries[["country","continent", "year", *cols_co2_sources]].melt(["country","continent","year"])
co2_sources_chart = alt.Chart(plot_df, width=900, height=250).mark_line().encode(
    x="year", 
    y=alt.Y(field='value', aggregate='sum', type='quantitative'),
    color="variable"
)

upper = co2_sources_chart.encode(
    alt.X('year', scale=alt.Scale(domain=brush))
)

lower = co2_sources_chart.properties(
    height=60
).add_selection(brush)

upper & lower

And all of this is still combinable with altairs other super-powers. For example try:
`upper.mark_area() & upper.encode(row="continent") & lower`

# Discover the relationship between current co2 consumption, co2 per capita and historical co2 consumption

Steps:
 * Create a Chart that corporates all three dimensions on a high level
 * Create another chart with timelines
 * Combine both charts into a dashboard using a selection and filter functionality 

In [None]:
scatter = alt.Chart(countries[filter_most_recent].query("co2>100")).mark_circle().encode(
    x="co2_per_capita",
    y="cumulative_co2",
    color="continent",
    size="co2",
    tooltip=["country", "co2", "co2_per_capita", "cumulative_co2"]
).properties(
    width=1000,
    height=400
)
scatter

In [None]:
selection = alt.selection_multi(fields=['country'], empty='none')

alt.data_transformers.disable_max_rows()
line_co2 = alt.Chart(countries, width=600, height=150).mark_line().encode(
    x="year", 
    y="co2",
    color="country:N",
    tooltip = ["country", "co2"]
)

dashboard = alt.hconcat(
    scatter.properties(width=400, height=450).add_selection( selection ), line_co2.transform_filter(selection)
)
dashboard.resolve_scale(color='independent').configure(autosize=alt.AutoSizeParams(resize=True))

# 📝 Task: Extend the dashboard 

Goal is to enable deeper understanding of **co2_per_capita** and **historical co2** consumption. Therefore add additional timelines to the dashboard.

In [None]:
selection = alt.selection_multi(fields=['country'], empty='none')

alt.data_transformers.disable_max_rows()
line_co2 = alt.Chart(countries, width=600, height=150).mark_line().encode(
    x="year", 
    y="co2",
    color="country:N",
    tooltip = ["country", "co2"]
).transform_filter(selection)

dashboard = alt.hconcat(
    scatter.properties(width=400, height=450).add_selection( selection ),
    line_co2 & line_co2.encode(y="co2_per_capita") & 
    line_co2.encode(y="cumulative_co2") & line_co2.encode(y="gdp")
)
dashboard.resolve_scale(color='independent').configure(autosize=alt.AutoSizeParams(resize=True))