# Introduction to Altair

[Altair](https://altair-viz.github.io/) is a declarative statistical visualization library for Python. Altair offers a powerful and concise visualization grammar for quickly building a wide range of statistical graphics.

By *declarative*, we mean that you can provide a high-level specification of *what* you want the visualization to include, in terms of *data*, *graphical marks*, and *encoding channels*, rather than having to specify *how* to implement the visualization in terms of for-loops, low-level drawing commands, *etc*. The key idea is that you declare links between data fields and visual encoding channels, such as the x-axis, y-axis, color, *etc*. The rest of the plot details are handled automatically. Building on this declarative plotting idea, a surprising range of simple to sophisticated visualizations can be created using a concise grammar.

Altair is based on [Vega-Lite](https://vega.github.io/vega-lite/), a high-level grammar of interactive graphics. Altair provides a friendly Python [API (Application Programming Interface)](https://en.wikipedia.org/wiki/Application_programming_interface) that generates Vega-Lite specifications in [JSON (JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON) format. Environments such as Jupyter Notebooks, JupyterLab, and Colab can then take this specification and render it directly in the web browser. To learn more about the motivation and basic concepts behind Altair and Vega-Lite, watch the [Vega-Lite presentation video from OpenVisConf 2017](https://www.youtube.com/watch?v=9uaHRWj04D4).

This notebook will guide you through the basic process of creating visualizations in Altair.

The content has been adopted from [this tutorial](https://github.com/uwdata/visualization-curriculum).



In [4]:
import altair as alt
import pandas as pd
import numpy as np

In [2]:
alt.__version__

'4.2.0'

In [5]:
starwars = pd.read_csv('data/starwars.csv')

In [15]:
starwars.head()

Unnamed: 0,name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
0,Luke Skywalker,172.0,77.0,blond,fair,blue,19.0,male,masculine,Tatooine,Human,,,
1,C-3PO,167.0,75.0,,gold,yellow,112.0,none,masculine,Tatooine,Droid,,,
2,R2-D2,96.0,32.0,,"white, blue",red,33.0,none,masculine,Naboo,Droid,,,
3,Darth Vader,202.0,136.0,none,white,yellow,41.9,male,masculine,Tatooine,Human,,,
4,Leia Organa,150.0,49.0,brown,light,brown,19.0,female,feminine,Alderaan,Human,,,


## The Chart Object

The fundamental object in Altair is the `Chart`, which takes a data frame as a single argument:

In [6]:
chart = alt.Chart(starwars)

So far, we have defined the `Chart` object and passed it the simple data frame we generated above. We have not yet told the chart to *do* anything with the data.

## Marks and Encodings

With a chart object in hand, we can now specify how we would like the data to be visualized. We first indicate what kind of graphical *mark* (geometric shape) we want to use to represent the data. We can set the `mark` attribute of the chart object using the the `Chart.mark_*` methods.

For example, we can show the data as a point using `Chart.mark_point()`:

In [20]:
chart.mark_point()

  for col_name, dtype in df.dtypes.iteritems():


Here the rendering consists of one point per row in the dataset, all plotted on top of each other, since we have not yet specified positions for these points.

To visually separate the points, we can map various *encoding channels*, or *channels* for short, to fields in the dataset. For example, we could *encode* the field `height` of the data using the `x` channel, which represents the y-axis position of the points. To specify this, use the `encode` method:

In [24]:
chart.mark_point().encode(
  x='height'
)

The `encode()` method builds a key-value mapping between encoding channels (such as `x`, `y`, `color`, `shape`, `size`, *etc.*) to fields in the dataset, accessed by field name. For Pandas data frames, Altair automatically determines an appropriate data type for the mapped column, which in this case is the *nominal* type, indicating unordered, categorical values.

Though we've now separated the data by one attribute, we still have multiple points overlapping within each category. Let's further separate these by adding an `y` encoding channel, mapped to the `'mass'` field:

In [27]:
chart.mark_point().encode(
  x='height',
  y= 'mass'
)

Look, there is one outlier character that is really heavy. In Altair, enabling a simple interactive tooltip is very easy.


In [28]:
chart.mark_point().encode(
  x='height',
  y= 'mass',
  tooltip=['name', 'height', 'mass']
)

  for col_name, dtype in df.dtypes.iteritems():


 It is worth noting that Altair provides construction methods for encoding definitions, using the syntax alt.X('precip'). The code above can be re-written as:

In [30]:
chart.mark_point().encode(
  alt.X('height'),
  alt.Y('mass'),
  alt.Tooltip(['name', 'height', 'mass'])
)

  for col_name, dtype in df.dtypes.iteritems():


In the examples above, the data type for each field is inferred automatically based on its type within the Pandas data frame. We can also explicitly indicate the data type to Altair by annotating the field name:

- `'b:N'` indicates a *nominal* type (unordered, categorical data),
- `'b:O'` indicates an *ordinal* type (rank-ordered data),
- `'b:Q'` indicates a *quantitative* type (numerical data with meaningful magnitudes), and
- `'b:T'` indicates a *temporal* type (date/time data)

For example, `alt.X('height:N')` to override the type that was automatically inferred.

## Graphical Marks

In the example above, we saw the use of `point` marks to visualize the data. However, the `point` mark type is only one of many geometric shapes that can be used to visually represent data. Altair includes a number of built-in mark types, including:

- `mark_area()` - Filled areas defined by a top-line and a baseline.
- `mark_bar()` -	Rectangular bars.
- `mark_circle()`	- Scatter plot points as filled circles.
- `mark_line()` - Connected line segments.
- `mark_point()` - Scatter plot points with configurable shapes.
- `mark_rect()` - Filled rectangles, useful for heatmaps.
- `mark_rule()` - Vertical or horizontal lines spanning the axis.
- `mark_square()` - Scatter plot points as filled squares.
- `mark_text()` - Scatter plot points represented by text.
- `mark_tick()` - Vertical or horizontal tick marks.	

For a complete list, and links to examples, see the [Altair marks documentation](https://altair-viz.github.io/user_guide/marks.html).

The below is an example of `mark_tick()`.

In [97]:
chart.mark_tick().encode(
  alt.X('height')
)

  for col_name, dtype in df.dtypes.iteritems():


## Encoding Channels

At the heart of Altair is the use of *encodings* that bind data fields (with a given data type) to available encoding *channels* of a chosen *mark* type. Some of most commonly used encoding channels are:

- `x`: Horizontal (x-axis) position of the mark.
- `y`: Vertical (y-axis) position of the mark.
- `size`: Size of the mark. May correspond to area or length, depending on the mark type.
- `color`: Mark color, specified as a [legal CSS color](https://developer.mozilla.org/en-US/docs/Web/CSS/color_value).
- `opacity`: Mark opacity, ranging from 0 (fully transparent) to 1 (fully opaque).
- `shape`: Plotting symbol shape for `point` marks.
- `tooltip`: Tooltip text to display upon mouse hover over the mark.
- `order`: Mark ordering, determines line/area point order and drawing order.
- `column`: Facet the data into horizontally-aligned subplots.
- `row`: Facet the data into vertically-aligned subplots.

For a complete list of available channels, see the [Altair encoding documentation](https://altair-viz.github.io/user_guide/encoding.html).

The example below introduces `Color` to the scatter plot we have generated previously.

In [101]:
chart.mark_point().encode(
  alt.X('height'),
  alt.Y('mass'),
  alt.Color('sex'),
  alt.Tooltip(['name', 'height', 'mass', 'species'])
)

Also making trellis or small multiples is very easy in Altair. You define `Column` or `Row`in the `encode` method.

In [7]:
chart.mark_point().encode(
  alt.X('height'),
  alt.Y('mass'),
  alt.Color('sex'),
  alt.Tooltip(['name', 'height', 'mass', 'species']), 
  alt.Column('sex')
)

  for col_name, dtype in df.dtypes.iteritems():


## Data Transformation: Aggregation

To allow for more flexibility in how data are visualized, Altair has a built-in syntax for *aggregation* of data. For example, we can compute the average of all values by specifying an aggregation function along with the field name:

In [32]:
chart.mark_point().encode(
  x='average(height)',
  y='sex'
)

Now within each y-axis category, we see a single point reflecting the *average* of the values within that category. 

Altair supports a variety of aggregation functions, including `count`, `min` (minimum), `max` (maximum), `average`, `median`, and `stdev` (standard deviation).

## Changing the Mark Type

Let's say we want to represent our aggregated values using rectangular bars rather than circular points. We can do this by replacing `Chart.mark_point` with `Chart.mark_bar`:

In [33]:
chart.mark_bar().encode(
  x='average(height)',
  y='sex'
)

Because the nominal field `sex` is mapped to the `y`-axis, the result is a horizontal bar chart. To get a vertical bar chart, we can simply swap the `x` and `y` keywords:

In [37]:
chart.mark_bar().encode(
  x='sex',
  y='average(height)'
)

## Scale - Customizing a Visualization

By default Altair / Vega-Lite make some choices about properties of the visualization, but these can be changed using methods to customize the look of the visualization. For example, we can specify the axis titles using the `axis` attribute of channel classes, we can modify scale properties using the `scale` attribute, and we can specify the color of the marking by setting the `color` keyword of the `Chart.mark_*` methods to any valid [CSS color string](https://developer.mozilla.org/en-US/docs/Web/CSS/color_value):

In [45]:
chart.mark_point(color='firebrick').encode(
  alt.X('height', axis=alt.Axis(title="Height (cm)")),
  alt.Y('mass', scale=alt.Scale(type='log'), axis=alt.Axis(title='Weight (kg)')),
  alt.Tooltip(['name', 'height', 'mass'])
)

## Interactivity

One of Altair and Vega-Lite's most exciting features is its support for interaction.

To create a simple interactive plot that supports panning and zooming, we can invoke the `interactive()` method of the `Chart` object. In the chart below, click and drag to *pan* or use the scroll wheel to *zoom*:

In [71]:
chart.mark_point(color='firebrick').encode(
  alt.X('height', axis=alt.Axis(title="Height (cm)")),
  alt.Y('mass', scale=alt.Scale(type='log'), axis=alt.Axis(title='Weight (kg)')),
  alt.Tooltip(['name', 'height', 'mass'])
).interactive()

  for col_name, dtype in df.dtypes.iteritems():


## Layers and Composition
Using a set of *view composition* operators, Altair can take multiple chart definitions and combine them to create more complex views.


In [68]:
# Droids
# starwars[starwars.species == 'Droid']

training_relationship =["Yoda", "Dooku", "Qui-Gon Jinn", "Obi-Wan Kenobi", "Luke Skywalker"]
jedi = starwars[starwars['name'].isin(training_relationship)].sort_values('birth_year', ascending=False)

line = alt.Chart(jedi).mark_line().encode(
  alt.X("birth_year", axis=alt.Axis(title="Year born (BBY = Before Battle of Yavin)"), scale=alt.Scale(domain=[900, 0])),
  alt.Y("height", axis= alt.Axis(title="Height (cm)"))
)
point = alt.Chart(jedi).mark_circle().encode(
  alt.X("birth_year", axis=alt.Axis(title="Year born (BBY = Before Battle of Yavin)"), scale=alt.Scale(domain=[900, 0])),
  alt.Y("height", axis= alt.Axis(title="Height (cm)"))
)

line + point

Or, we can also create this chart by *reusing* and *modifying* a previous chart definition! Rather than completely re-write a chart, we can start with the line chart, then invoke the `mark_circle` method to generate a new chart definition with a different mark type:

In [69]:
height = alt.Chart(jedi).mark_line().encode(
  alt.X("birth_year", axis=alt.Axis(title="Year born (BBY = Before Battle of Yavin)"), scale=alt.Scale(domain=[900, 0])),
  alt.Y("height", axis= alt.Axis(title="Height (cm)"))
)

height + height.mark_circle()

Now, what if we'd like to see this chart alongside other plots?

We can use *concatenation* operators to place multiple charts side-by-side, either vertically or horizontally. Here, we'll use the `|` (pipe) operator to perform horizontal concatenation of two charts:

In [70]:
mass = alt.Chart(jedi).mark_line().encode(
  alt.X("birth_year", axis=alt.Axis(title="Year born (BBY = Before Battle of Yavin)"), scale=alt.Scale(domain=[900, 0])),
  alt.Y("mass", axis= alt.Axis(title="Weight (kg)"))
)

(height + height.mark_circle()) | (mass + mass.mark_circle())

## More complex interaction
For more complex interactions, such as linked charts and cross-filtering, Altair provides a *selection* abstraction for defining interactive selections and then binding them to components of a chart. 

Below is a more complex example. The upper histogram shows the count of cars per year and  uses an interactive selection to modify the opacity of points in the lower scatter plot, which shows horsepower versus mileage.

Drag out an interval in the upper chart and see how it affects the points in the lower chart.

In [96]:
# create an interval selection over an x-axis encoding
brush = alt.selection(type="interval", encodings=['x'])

# determine opacity based on brush
opacity = alt.condition(brush, alt.value(0.9), alt.value(0.1))

# define the base chart, with the common parts of the background and highlights
base = alt.Chart(starwars).mark_bar().encode(
    alt.X('height:Q', 
      scale=alt.Scale(domain=[0, 280]),
      axis=alt.Axis(title=None, labelAngle=0,), # no title, no label angle
      bin=True
    ),
    alt.Y('count()', title=None), # counts, no axis title
).properties(
    width=400, # set the chart width to 400 pixels
    height=50  # set the chart height to 50 pixels
)
#  grey background with selection
background = base.encode(
    color=alt.value('#ddd')
).add_selection(brush)

# blue highlights on the transformed data
highlight = base.transform_filter(brush)

# a detail scatterplot of horsepower vs. mileage
# modulate point opacity based on the brush selection
detail = alt.Chart(starwars).mark_point().encode(
    alt.X('height', scale=alt.Scale(domain=[0,280])),
    alt.Y('mass'),
    # set opacity based on brush selection
    opacity=opacity
).properties(width=400) # set chart width to match the first chart

# vertically concatenate (vconcat) charts using the '&' operator
(background + highlight) & detail