# Session 9: Interactive Visualization with Altair


# Background on Altair 

-   Visualization library for Python
-   Called a *declarative* approach to data visualization - which basically means that it's trying to make visualization more concise by having you as the user specify relationships between the data and the output (e.g., "map x to a position and y to a color") rather than specifying how something should be done ("put a red circle here and a blue circle there")*

*Source: <https://altair-viz.github.io/altair-tutorial/README.html>

# Basic building block: the mark

-   Basic building block is a `mark`, a type of marker where you can then specify various configurations of `x`, `y`, color, and interactivity
- Types of marks:
    - mark_point()
    - mark_circle()
    - mark_square()
    - mark_line()
    - mark_area()
    - mark_bar()
    - mark_tick()

-   Illustrating example with WHO data on country-year-level life expectancy

# Reading in data

In [43]:
import pandas as pd
import numpy as np
import altair as alt
from altair import datum

who = pd.read_csv('Life Expectancy Data.csv')
who.head()
who.columns = [col.strip().lower() for col in 
          who.columns]
who.columns
who.year.value_counts()


2013    193
2015    183
2014    183
2012    183
2011    183
2010    183
2009    183
2008    183
2007    183
2006    183
2005    183
2004    183
2003    183
2002    183
2001    183
2000    183
Name: year, dtype: int64

# Basic mark_point() with no encoding 

In [None]:
alt.Chart(who).mark_point()


# Adding encoding to the mark_point()

In [11]:
alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = 'schooling',
  y = 'life expectancy'
)

# Cleaning up

In [16]:
alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = 'schooling',
  y = 'life expectancy',
  color = 'status'
).configure_axis(
    grid=False
)

# Further customizing
- To further customize, switch syntax within `encode()` from:
    - x = variable name; y = variable name; color = variable name; etc
- To:
    - `x = alt.X()`, `y = alt.Y()`, `color = alt.Color()` with the parentheses then holding further customizations 

# Further customizing: colors 

In [18]:
domain = ['Developing', 'Developed']
colors = ['seagreen', '#7D3C98']

alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = 'schooling',
  y = 'life expectancy',
  color = alt.Color('status').scale(domain = domain, range = colors)
).configure_axis(
    grid=False
)

# Further customizing: X and Y labels 

In [19]:
alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.Color('status').scale(domain = domain, range = colors)
).configure_axis(
    grid=False
)

# Bar charts: `mark_bar()`

- Similar to stat = 'identity' in ggplot, can create bar charts using a specific column of the dataset
- Can also do transformations to create the values displayed in bars within the code itself. See here for a list: https://altair-viz.github.io/user_guide/transform/index.html 
- Our example will also show the value of using explicit variable type encodings rather than relying on altair's detection of type of data:
    - Q: quantitative
    - O: ordinal
    - N: nominal 
    - T: temporal
    - G: geojson

# Example of an identity bar chart: don't declare types

In [33]:
who_subset = who[(who.country.isin(['United States of America', 'Canada', 'Mexico'])) &
                (who.year > 2004)].copy()
alt.Chart(who_subset).mark_bar().encode(
    x = alt.X('year', title = "Year"),
    y = alt.Y('life expectancy', title = "Life expectancy"),
    xOffset="country:N",
    color = alt.Color('country:N', title = "")
)

# Example of an identity bar chart: declare types

In [34]:

alt.Chart(who_subset).mark_bar().encode(
    x = alt.X('year:O', title = "Year"),
    y = alt.Y('life expectancy:Q', title = "Life expectancy"),
    xOffset="country:N",
    color = alt.Color('country:N', title = "")
)

# Example of a transformation-based bar chart: mean by group

In [40]:
alt.Chart(who[who.year > 2009]).mark_bar().encode(
    x = alt.X('year:O', title = "Year"),
    xOffset = "status:N",
    y = alt.Y('avg_life:Q', title = "Average Life expectancy"),
    color = alt.Color('status:N', title = "")
).transform_aggregate(
    avg_life = 'mean(life expectancy)',
    groupby = ['status', 'year']
)

# Example of a transformation-based bar chart: filter within chart itself

In [51]:
## notice layering of filters 
alt.Chart(who).mark_bar().encode(
    x = alt.X('year:O', title = "Year"),
    y = alt.Y('life expectancy:Q', title = "Life expectancy"),
    xOffset="country:N",
    color = alt.Color('country:N', title = "")
).transform_filter(
    alt.FieldOneOfPredicate(field = 'country',
                            oneOf = ["Canada", "Mexico",
                                              "United States of America",
                                              "Cuba"])
).transform_filter(
    alt.FieldGTPredicate(field = 'year', gt = 2009)
)

# Different types of interactivity

- Tooltips: hovering over points to bring up information
- Selections: 
        - Allow users to select an interval range of the chart
        - Allow users to select a single point
        - Allow users to select multiple points

# Illustrating tooltips

In [56]:
c = alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.Color('status').scale(domain = domain, range = colors),
  tooltip = [alt.Tooltip('country', title = "Country:"),
            alt.Tooltip('life expectancy', title = "Life exp:"),
            alt.Tooltip('schooling', title = 'Years schooling:')]
).configure_axis(
    grid=False
).interactive()

# Illustrating selections

- Can use the `add_selection()` set of commands to select a certain region of points
- Can add code to the main chart to make the chart respond to the selection


In [57]:
brush = alt.selection_interval()
alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.Color('status').scale(domain = domain, range = colors)
).configure_axis(
    grid=False
).add_selection(
brush
)

  alt.Chart(who[who.year == 2013]).mark_point().encode(


In [74]:
brush = alt.selection_interval()
alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.condition(brush, 'status:N', alt.value('lightgray'))
).configure_axis(
    grid=False
).add_selection(
brush
)

  alt.Chart(who[who.year == 2013]).mark_point().encode(


# Interactivity across multiple charts 

- Altair also gives us the ability to have selections on one chart propagate through to other charts 
- It does this using the `transform_filter()` that we outlined earlier
- Can use this to explore correlations across multiple variables 

# Step one: create multiple charts 

In [91]:
scatter_schooling = alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
color = alt.Color('status').scale(domain = domain, range = colors))

scatter_gdp = alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('gdp', title = "GDP"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
color = alt.Color('status').scale(domain = domain, range = colors))

scatter_schooling | scatter_gdp

# Step two: add selection on one chart and filtering on another



In [94]:
brush = alt.selection_interval()
scatter_schooling = alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.condition(brush, 'status:N', alt.value('lightgray'))
).add_selection(
brush
)
scatter_gdp = alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('gdp', title = "GDP"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
color = 'status:N').transform_filter(
brush
)
scatter_schooling | scatter_gdp


  scatter_schooling = alt.Chart(who[who.year == 2013]).mark_point().encode(


# Another example: filtering by year 

In [97]:
select_year = alt.selection_interval(encodings = ['x'])

bar_slider = alt.Chart(who).mark_bar().encode(
    x = 'year:O',
    y = 'count()'
).add_selection(select_year)

scatter_schooling = alt.Chart(who).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.condition(select_year, 'status:N', alt.value('lightgray')),
  opacity = alt.condition(select_year, alt.value(0.8), alt.value(0.1))
)

scatter_schooling & bar_slider

  bar_slider = alt.Chart(who).mark_bar().encode(


# Summing up 

- Reviewed general syntax for visualizations in altair: 
    - `mark` as the basic building block
    - `encode` to specify mappings to x, y, and color
- Use of `transform_aggregate` and `transform_filter` to transform data within the plotting call itself
- Two types of interactivity:
    - Tooltips
    - Selections -> can propagate through to multiple charts 