# Problem Set 3.2: Altair

[Click here to open this notebook in your browser](https://leifwalsh.github.io/data-analysis-problem-sets/lab/index.html?path=3-visualization-basics/3.2-altair/3.2-altair.ipynb)

Learn my favorite visualization library, [Altair](https://altair-viz.github.io/).

We'll use the same `mpg` dataset from the [last problem set](../3.1-basic-plotting/3.1-basic-plotting.ipynb) and make roughly the same charts, but this time we'll do it with an API that makes more sense. We'll be able to do some more sophisticated things easily.

In [None]:
import pandas as pd
import altair as alt
mpg = pd.read_csv("mpg.csv")

## The Grammar of Graphics

As before, we start with a simple scatter plot:

In [None]:
alt.Chart(mpg).encode(
    x="weight",
    y="horsepower"
).mark_point()

Recall how we did this with the pandas API:

In [None]:
mpg.plot.scatter(x="weight", y="horsepower")

Again, we're specifying two main ideas here:

1. How we want to _draw_ each observation (the "geometry")
2. How we want to _encode_ each measurement of an observation (encoding `weight` as the x-coordinate and `horsepower` as the y-coordinate)

I think the Altair library does a really good job of representing these concepts in a way that's cleanly separated, and this makes them more composable. I find it easier to tweak different pieces of the chart when they're clearly independently controllable.

The API design theory behind this comes from [_The Grammar of Graphics_](https://www.cs.uic.edu/~wilkinson/TheGrammarOfGraphics/GOG.html) by Leland Wilkinson, popularized in the R package [`ggplot2`](https://ggplot2.tidyverse.org/) by Hadley Wickham (see also [the `ggplot2` book](https://ggplot2-book.org/mastery.html)).

The two ideas above (drawing and encoding) are what the Grammar of Graphics calls "layers" or "structures" (there are seven, you don't often need all of them). What this theory does well is helps you think about them independently and combine them to get what you want.

The Grammar of Graphics is widely regarded as a good idea and there are many Python libraries that implement something inspired by it. I like Altair best because it's the first one where the idea really clicked for me, but the others are good too.

I also think that learning to think like the Grammar of Graphics wants you to also makes you better at using the other visualization libraries that aren't designed that way. It helps you (or at least it helps me) phrase the kind of thing I want to do so that I can go look for how to do it.

**Caveat:** We're not building things up from first principles in this notebook. We're going to see a little bit of how things compose, and sample a few things you can do with Altair. The scope of what you can do is huge, way beyond what we can discuss in detail. The goal here is to give you some examples to play with to start to understand the feel of the API, and a couple of cool advanced things you can do to entice you to learn more. Beyond that, my best advice is to explore the gallery and Stack Overflow.

Let's look at that code again and point out the pieces:

In [None]:
alt.Chart(
    # Here we provide the DataFrame we want to display. More on the format of this later.
    mpg
).encode(
    # Here we're saying which columns should map to which visual features of what's to be drawn. There are many others.
    x="weight",
    y="horsepower",
).mark_point()  # And here we're just saying "the marks I want to draw are points". That is, a scatter plot.

Since these objects follow something like the Builder pattern, you can save partially configured Chart objects and reuse them in different ways. We'll see how this makes this library very composable soon.

In [None]:
chart = alt.Chart(mpg).encode(
    x="weight",
    y="horsepower",
)

You can take the same chart with some encodings and just draw it with different shapes:

In [None]:
scatter = chart.mark_point()
scatter

In [None]:
line = chart.mark_line()
line

You can also add more encoding channels to one of your charts:

In [None]:
scatter.encode(color="origin")

You could back up to the `chart` object too and change things from there:

In [None]:
chart.mark_line().encode(color="origin")

Of course, you can make it bigger too:

In [None]:
scatter.encode(color="origin").properties(width=800, height=600)

### Multiple Charts

Another thing you can do with reusing chart objects is to put them next to each other (like with `matplotlib`'s `plt.subplots()` we saw last time):

In [None]:
bars = alt.Chart(mpg).mark_bar().encode(x="model_year")
weight = bars.encode(y="mean(weight)")
acceleration = bars.encode(y="mean(acceleration)")
efficiency = bars.encode(y="mean(mpg)")
alt.hconcat(weight, acceleration, efficiency)

In [None]:
alt.vconcat(weight, acceleration, efficiency)

There's a shorthand for these:
- `vconcat` is `&`
- `hconcat` is `|`

In [None]:
weight | acceleration | efficiency

And you can mix them:

In [None]:
(weight | acceleration) & efficiency

Last time, we made a box plot and remarked that it would be nice to be able to see the population sizes as well. We can do this easily by stacking our charts:

In [None]:
x_axis = alt.X("model_year", scale=alt.Scale(domain=[69, 83]))
boxes = alt.Chart(mpg).encode(x=x_axis, y="mpg").mark_boxplot().properties(width=300, height=300)
populations = alt.Chart(mpg).encode(x=x_axis, y="count()").mark_bar().properties(width=300, height=100)
boxes & populations

We can also encode variables in our data as the chart columns or rows the data points will be separated into:

In [None]:
boxes.facet(column="origin") & populations.facet(column="origin")

## Interactivity

Altair is actually just a Python DSL (Domain Specific Language) for creating a [Vega-Lite](https://vega.github.io/vega-lite/) specification. Vega-Lite is a graphics library for browsers, which means it has facilities for all kinds of interactions with the user (typically with the mouse).

One easy one is tooltips, which are another channel you can encode information into:

In [None]:
scatter.encode(
    color="origin",
    tooltip=["origin", "model_year", "name", "weight", "horsepower"]
).properties(width=800, height=600)

### Crossfilter

One really cool interactivity example is called crossfiltering: you can select a portion of data in one chart in order to highlight that portion in another.

First, let's make some charts that show the distribution of cars along a few different dimensions.

In [None]:
chart = alt.Chart(mpg).mark_bar().properties(width=200, height=200).encode(y="count()")
base = chart.encode(x=alt.X(alt.repeat("column")).bin(maxbins=20))
base.repeat(column=["mpg", "weight", "acceleration", "displacement"])

Next, we can add in a selection and highlight what's selected.

I should be honest here, I don't entirely understand how this works. I just looked at <https://altair-viz.github.io/gallery/interactive_layered_crossfilter.html> and copied as much of it as I needed to get things working.

In [None]:
brush = alt.selection_interval(encodings=["x"])
background = base.encode(color=alt.value("#ddd")).add_params(brush)
highlight = base.transform_filter(brush)
alt.layer(background, highlight).repeat(column=["mpg", "weight", "acceleration", "displacement"])

## Gallery

As with pandas and seaborn, there is also an [Altair gallery](https://altair-viz.github.io/gallery/index.html#example-gallery).

## Exercises

We're just doing the same exercises from the previous notebook, but as an opportunity to explore Altair.

### Exercise 1

Make a plot that shows the distribution of acceleration separately for each manufacturer.

### Exercise 2

Make a bar plot of something with error bars. Then make a box plot of the same thing.

### Exercise 3

Use `altair` to show the joint distribution between `mpg` and `weight`. Use `hue` to show some interesting property.

### Exercise 4

Visit the [Altair Gallery](https://altair-viz.github.io/gallery/index.html) and make three plots of these data that interest you.