# Declarative visualizations with Altair

To understand the approach that we will now introduce, we need to make a new distinction: _imperative_ data visualization programming will be defined as data visualization programming that focuses on telling the computer **how** to present information. For example, whether a line should be drawn, what style it should have, what color it should be and so on. Matplotlib is an example of imperative visualization. An alternative approach is _declarative_ data visualization, which focuses more on **what** you would like to happen, and letting the computer figure out how to achieve the desired outcome.

## Introducing Altair

Altair is a Python library that implements a declarative approach to data visualization. It was initially developed by Jake Vanderplas and Brian Granger, in close collaboration with several members of Jeff Heer's group. It is based on a framework that Heer and members of his group developed for declarative visualizations, called [vega lite](https://idl.cs.washington.edu/papers/vega-lite/). The main idea is that there can be a neat separation between the specification of the visualization, the code that generates the specification and the front end that renders it. This makes it possible to write down rather concise descriptions of a visualization that are independent from the details of both the interface that is used to create this description, as well as all of the details. This level of abstraction has advantages in terms of the expressiveness that a programming interface can have. In particular, this is used for rather easily creating interactive visualizations with rather powerful properties. We will get to that.

Let's start simple. Like Seaborn, Altair relies on Tidy Pandas DataFrames as its input. Let's start by getting such a DataFrame:

In [None]:
import pandas as pd

In [None]:
abide = pd.read_csv('/home/jovyan/shared/abide2/abide2.tsv', sep='\t')

In [None]:
v1 = abide.filter(regex="\w_\w_V1")

In [None]:
v1["subject"] = abide["subject"]
v1["group"] = abide["group"]
v1["age"] = abide["age"]

In [None]:
import altair as alt

The fundamental unit of operation in Altair is the `Chart` object. When we first create a `Chart` it is full of potential, but it doesn't really do anything. For it to do something, we must call a `mark_*` method on it, which will create the markings. Still nothing. That is because markings are nothing without their encoding. It is only when we specify how the markings should be defined that data will appear on the page.

In [None]:
chart = alt.Chart(v1)

In [None]:
point = chart.mark_point()

In [None]:
point.encode(
    x='fsArea_R_V1_ROI',
    y='fsArea_L_V1_ROI',
).interactive()

We can relatively easily add additional encodings:

In [None]:
point.encode(
    x='fsArea_R_V1_ROI',
    y='fsArea_L_V1_ROI',
    color="age",
    size="age"
).interactive()

Also, once a `Chart` has been created, it can accept different marks and these can be defined using different encodings. Moreover, we can tell Altair something about the variables, to help it decide how to execute the markings. For example, adding ":N" to the "group" variable, tells Altair that this is a nominal variable (not a quantitative one, despite the fact that it takes the values "1" and "2"). 

In [None]:
chart.mark_bar().encode(
    x='group:N',
    y='fsArea_L_V1_ROI',
).interactive()

With this language in mind, we can start composing visualizations. For example, we can add transforms to a chart. This is done literally using the "+" operation!

In [None]:
chart = alt.Chart(v1).mark_point().encode(
    x='fsArea_R_V1_ROI',
    y='fsArea_L_V1_ROI'
)

chart + chart.transform_regression('fsArea_R_V1_ROI', 'fsArea_L_V1_ROI').mark_line()

With these basic building blocks in hand, we can combine things to flexibly create rather rich and elaborate visualizations:

In [None]:
brush = alt.selection(type='interval', resolve='global')

base = alt.Chart(v1).mark_point().encode(
    y='age',
    color=alt.condition(brush, 'group:N', alt.ColorValue('gray')),
).add_selection(
    brush
).properties(
    width=250,
    height=250
)

base.encode(x='fsArea_R_V1_ROI') | base.encode(x='fsArea_L_V1_ROI')

An important thing to keep in mind is that as we move from more imperative visualization (e.g., Matplotlib) to more declarative visualization (i.e., Seaborn and then Matplotlib), we give up quite a bit of control regarding the appearance of the visualizations and the elements that appear in the visualization. For example, it would be rather tricky to 