# Altair Intro: Encoding

In the grammar of graphics, encoding defines the mapping between data and various properties of marks. The most common encodings are the x and y encoding that maps data to position on a chart. In this session, we will play with the various encoding types supported by Altair.

Let's start with a toy dataset.

In [1]:
import altair as alt
import pandas as pd

source = pd.DataFrame({"category": [1, 2, 3, 4, 5, 6], "value": [4, 6, 10, 3, 7, 8],
                       "quality": ["standard", "good", "excellent", "standard", "good", "excellent"]})
source

Unnamed: 0,category,value,quality
0,1,4,standard
1,2,6,good
2,3,10,excellent
3,4,3,standard
4,5,7,good
5,6,8,excellent


In this toy dataset, we have three columns: category, value and quality.

## X and Y encoding

In our first attempt, let's just set the x and y encoding to see what the plot looks like.

In [2]:
base = alt.Chart(source).mark_point(size=200, filled=True).encode(x="category", y="value")
base

This is a reasonable scatter plot.  Let's now explore other encoding types that will allow us to visualize the "quality" column as well.

## Color encoding

In [3]:
scatter_plot = base.encode(color="quality")
scatter_plot

By simply setting color to a column name ("quality" in this case), we mapped the quality column as the color of the points in our plot.  Note that Altair chose a very reasonable default colormap in this case.  If you would like to change this default, you need to modify the "scale" of the color encoding.  Please see the tutorial on scale for more details.  We will leave the colormap as it is for now.

## Shape encoding

In [4]:
scatter_plot = base.encode(shape="quality")
scatter_plot

Instead of color, one can also use shape to encode data. See that different qualities are encoded as different shapes in the scatter plot.  Using shape encoding is a great choice when the number of distinct values are small (e.g. less than 6).  Shape, unfortunately, starts to become ineffective when you have a lot of distinct values.

In Altair, we can also use multiple encoding together.  For example, in the chart below we encode quality both as color and shape.

In [5]:
scatter_plot = base.encode(color="quality", 
                           shape="quality")
scatter_plot

## Size encoding

We can also use the size of the mark to encode data.

In [6]:
scatter_plot = base.encode(size="value")
scatter_plot

Note that large values mapped to large circles in the chart above.  If we want to map large values to small circles, we can reverse the order of values in the corresponding scale. We will come back to it when we talk about scale.

## Opacity encoding

Another commonly used encoding is the opacity of the mark. This is especially important when we have multiple overlapping data points in the plot. In the example below, we encode the "value" column as the opacity.

In [7]:
scatter_plot = base.encode(color="quality", 
                           size="value", 
                           opacity="value")
scatter_plot

## Using constant for encoding

Lastly, sometimes we just want to encode a channel with constant value.  This can be done by setting the corresponding mark properties when we specify the mark. It can also be done by using the `alt.value` object. For example, to add a constant stroke color, we can set the stroke encoding to `alt.value("color_name")`.

In [8]:
scatter_plot = base.encode(color="quality", 
                           size="value", 
                           opacity="value",
                           stroke=alt.value("black"))
scatter_plot

## Summary

In this tutorial, we explored a few of the common encoding types: X, Y, color, shape, size, opacity and stroke.  To learn more, [Altair's doc](https://altair-viz.github.io/user_guide/encoding.html) provides a complete list of encoding types.