Marks = bars, lines, points, etc

[Mark] ATTRIBUTES = position, shape, size, color --> these serve as *channels* through which encoding of underlying data values happens

With a basic framework of data types, marks, and encoding channels, we can concisely create a wide variety of visualizations.

In [1]:
import pandas as pd
import altair as alt

In [4]:
data = pd.read_csv("countries.csv")
data.shape

(1704, 6)

In [5]:
data.head()

Unnamed: 0,country,continent,year,lifeExpectancy,population,gdpPerCapita
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.85303
2,Afghanistan,Asia,1962,31.997,10267083,853.10071
3,Afghanistan,Asia,1967,34.02,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106


In [7]:
set(data.year)

{1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002, 2007}

In [9]:
data2002 = data.loc[data.year==2002]
data2002

Unnamed: 0,country,continent,year,lifeExpectancy,population,gdpPerCapita
10,Afghanistan,Asia,2002,42.129,25268405,726.734055
22,Albania,Europe,2002,75.651,3508512,4604.211737
34,Algeria,Africa,2002,70.994,31287142,5288.040382
46,Angola,Africa,2002,41.003,10866106,2773.287312
58,Argentina,Americas,2002,74.340,38331121,8797.640716
...,...,...,...,...,...,...
1654,Vietnam,Asia,2002,73.017,80908147,1764.456677
1666,West Bank and Gaza,Asia,2002,72.370,3389578,4515.487575
1678,"Yemen, Rep.",Asia,2002,60.308,18701257,2234.820827
1690,Zambia,Africa,2002,39.193,10595811,1071.613938


In [15]:
alt.Chart(data2002).mark_point().encode(
    alt.X("lifeExpectancy:Q"),
    alt.Y("continent:O")
)

By default, axes for linear quantitative scales include zero to ensure a proper baseline for comparing ratio-valued data. In some cases, however, a zero baseline may be meaningless or you may want to focus on interval comparisons. 

To disable automatic inclusion of zero, configure the scale mapping using the encoding `scale` attribute:

In [20]:
alt.Chart(data2002).mark_point().encode(
    alt.X("lifeExpectancy:Q", scale=alt.Scale(zero=False)),
    alt.Y("continent:O")
)

In [21]:
alt.Chart(data2002).mark_point().encode(
    alt.X("lifeExpectancy:Q", scale=alt.Scale(zero=False, nice=False)),
    alt.Y("continent:O")
)

The `size` encoding channel sets a mark's size or extent. 

The meaning of the channel can vary based on the mark type; e.g. for point marks, the size channel maps to the pixel area of the plotting symbol.

In [22]:
alt.Chart(data2002).mark_point().encode(
    alt.X("lifeExpectancy:Q", scale=alt.Scale(zero=False, nice=False)),
    alt.Y("continent:O"),
    alt.Size("population:Q")
)

In [23]:
alt.Chart(data2002).mark_point().encode(
    alt.X("lifeExpectancy:Q", scale=alt.Scale(zero=False, nice=False)),
    alt.Y("continent:O"),
    alt.Size("population:Q", scale=alt.Scale(range=[0,1000]))
)

In [25]:
alt.Chart(data2002).mark_point(filled=True).encode(
    alt.X("lifeExpectancy:Q", scale=alt.Scale(zero=False, nice=False)),
    alt.Y("continent:O"),
    alt.Size("population:Q", scale=alt.Scale(range=[0,1000])),
    alt.Color("gdpPerCapita:Q")
)

In [28]:
alt.Chart(data2002).mark_circle().encode(
    alt.X("lifeExpectancy:Q", scale=alt.Scale(zero=False, nice=False)),
    alt.Y("continent:O"),
    alt.Size("population:Q", scale=alt.Scale(range=[0,1000])),
    alt.Color("gdpPerCapita:Q"),
    alt.OpacityValue(0.9)
)

In [30]:
alt.Chart(data2002).mark_circle().encode(
    alt.X("lifeExpectancy:Q", scale=alt.Scale(zero=False, nice=False)),
    alt.Y("continent:O"),
    alt.Size("population:Q", scale=alt.Scale(range=[0,1000])),
    alt.Color("gdpPerCapita:Q"),
    alt.OpacityValue(0.9),
    alt.Tooltip("country")
)

The `order` encoding channel determines the order of data points, affecting both the order in which they are drawn and, for line and area marks, the order in which they are connected to one another.

In [31]:
alt.Chart(data2002).mark_circle().encode(
    alt.X("lifeExpectancy:Q", scale=alt.Scale(zero=False, nice=False)),
    alt.Y("continent:O"),
    alt.Size("population:Q", scale=alt.Scale(range=[0,1000])),
    alt.Color("gdpPerCapita:Q"),
    alt.OpacityValue(0.9),
    alt.Tooltip("country"),
    alt.Order("population:Q", sort="descending") # ensuring smaller circles drawn later than  (i.e. on top of) larger circles
)

In [34]:
alt.Chart(data2002).mark_circle().encode(
    alt.X("lifeExpectancy:Q", scale=alt.Scale(zero=False, nice=False)),
    alt.Y("continent:O"),
    alt.Size("population:Q", scale=alt.Scale(range=[0,1000])),
    alt.Color("gdpPerCapita:Q"),
    alt.OpacityValue(0.9),
    alt.Order("population:Q", sort="descending"),
    tooltip = [alt.Tooltip("country:N"),
              alt.Tooltip("population:Q"),
              alt.Tooltip("gdpPerCapita:Q")]
)

# what if I want to change how the text in the tooltip is displayed?

## Small multiples

The `column` and `row` encoding channels generate either a horizontal (columns) or vertical (rows) set of sub-plots, in which the data is partitioned according to the provided data field.

In [36]:
alt.Chart(data2002).mark_circle().encode(
    alt.X("lifeExpectancy:Q", scale=alt.Scale(zero=False, nice=False)),
    alt.Size("population:Q", scale=alt.Scale(range=[0,1000])),
    alt.Color("gdpPerCapita:Q"),
    alt.OpacityValue(0.9),
    alt.Tooltip("country"),
    alt.Order("population:Q", sort="descending"), # ensuring smaller circles drawn later than  (i.e. on top of) larger circles
    alt.Column("continent:O")
)

In [46]:
alt.Chart(data2002).mark_circle().encode(
    alt.X("lifeExpectancy:Q",
          scale=alt.Scale(zero=False)),
    alt.Size("population:Q", 
             scale=alt.Scale(range=[0,1000])),
    alt.Color("gdpPerCapita:Q",
              legend=alt.Legend(orient="bottom",
                                titleOrient="left")),
    alt.OpacityValue(0.9),
    alt.Tooltip("country"),
    alt.Order("population:Q", sort="descending"), # ensuring smaller circles drawn later than  (i.e. on top of) larger circles
    alt.Row("continent:O")
)

# Exploring the Philippines

In [53]:
data.head()

Unnamed: 0,country,continent,year,lifeExpectancy,population,gdpPerCapita
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.85303
2,Afghanistan,Asia,1962,31.997,10267083,853.10071
3,Afghanistan,Asia,1967,34.02,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106


In [57]:
dataPI = data[data.country=="Philippines"]
dataPI

Unnamed: 0,country,continent,year,lifeExpectancy,population,gdpPerCapita
1212,Philippines,Asia,1952,47.752,22438691,1272.880995
1213,Philippines,Asia,1957,51.334,26072194,1547.944844
1214,Philippines,Asia,1962,54.757,30325264,1649.552153
1215,Philippines,Asia,1967,56.393,35356600,1814.12743
1216,Philippines,Asia,1972,58.065,40850141,1989.37407
1217,Philippines,Asia,1977,60.06,46850962,2373.204287
1218,Philippines,Asia,1982,62.082,53456774,2603.273765
1219,Philippines,Asia,1987,64.151,60017788,2189.634995
1220,Philippines,Asia,1992,66.458,67185766,2279.324017
1221,Philippines,Asia,1997,68.564,75012988,2536.534925


In [71]:
alt.Chart(dataPI).mark_line().encode(
    alt.X("year:O"),
    alt.Y("population:Q"),
)

In [69]:
alt.Chart(dataPI).mark_bar().encode(
    alt.X("year:O"),
    alt.Y("gdpPerCapita:Q"),
)