# Altair

[Altair](https://altair-viz.github.io/index.html) is a declarative statistical visualization library for Python.

Data in Altair is built around the Pandas Dataframe. It uses tabular data, assuming records (rows) and fields (columns) and works best with long-form data (one row per observation along with its metadata).

Altair has limitations as it aims to provide a simplified user experience for exploratory visualisation.

## Import packages

In [1]:
import altair as alt
import pandas as pd
import numpy as np

## Basic Visualisations

In [2]:
data = pd.DataFrame({'a': list('CCCDDDEEE'),
                     'b': [2, 7, 4, 1, 2, 6, 8, 4, 7]})

In [3]:
data.head()

Unnamed: 0,a,b
0,C,2
1,C,7
2,C,4
3,D,1
4,D,2


The Altair `Chart` object takes a dataframe as a single argument.
To visualise the data, the `mark` attribute of the chart object can be used. 
The `Chart.encode()` method is used to map the encoding channels (such as `x`, `y`, `color`, etc) to columns from the dataset.

In [4]:
alt.Chart(data).mark_point().encode(
    x='a',
    y='b',
)

The data type of each column is automatically-inferred by Altair if provided as a pandas DataFrame. If the data provided is not a pandas dataframe (such as a JSON-style list of records), the [encoding data type](https://altair-viz.github.io/user_guide/encodings/index.html#encoding-data-types) needs to be specified.

To generate a bar chart, we can use `mark_bar()`. We can also aggregate data such as calculating the average of all values by specifying within the column identifier:

In [5]:
alt.Chart(data).mark_bar().encode(
    x='a',
    y='average(b)'
)

We can make the bar chart vertical by swapping x and y.

In [6]:
chart = alt.Chart(data).mark_bar().encode(
    y='a',
    x='average(b)'
)
chart

## Under the hood

Altair converts plot specifications to a JSON string. This can be viewed by using the `to_json()` method.
This has full details of the data columns.

In [7]:
print(chart.to_json())

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.17.0.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-347f1284ea3247c0f55cb966abbdd2d8"
  },
  "datasets": {
    "data-347f1284ea3247c0f55cb966abbdd2d8": [
      {
        "a": "C",
        "b": 2
      },
      {
        "a": "C",
        "b": 7
      },
      {
        "a": "C",
        "b": 4
      },
      {
        "a": "D",
        "b": 1
      },
      {
        "a": "D",
        "b": 2
      },
      {
        "a": "D",
        "b": 6
      },
      {
        "a": "E",
        "b": 8
      },
      {
        "a": "E",
        "b": 4
      },
      {
        "a": "E",
        "b": 7
      }
    ]
  },
  "encoding": {
    "x": {
      "aggregate": "average",
      "field": "b",
      "type": "quantitative"
    },
    "y": {
      "field": "a",
      "type": "nominal"
    }
  },
  "mark": "bar"
}


## Customisation

In [8]:
alt.Chart(data, title='Average number of items per category').mark_bar(color='firebrick').encode(
    y=alt.Y('a', title='category'),
    x=alt.X('average(b)', title='avg items by category'),
)


Note that Altair 5 introduced a method-based syntax for setting channel options:

`x=alt.X('a').title('Category')` instead of

`x=alt.X('a', title='Category')`.

These examples are still using attribute-based syntax.

In [9]:
alt.Chart(data).mark_bar(color='firebrick').encode(
    y=alt.Y('a', title='category', sort=['E','D','C']),
    x=alt.X('average(b)', title='avg items by category'),
    color='a',
)


## Bar Chart

As shown above, bar charts can be created with `mark_bar()` and specifying the channels.

Altair guide to [Bar charts](https://altair-viz.github.io/user_guide/marks/bar.html).

In [10]:
from altair import datum

source = pd.DataFrame(
    {
        "a": ["A", "B", "C", "D", "E", "F", "G", "H", "I"],
        "b": [28, 55, 43, 91, 81, 53, 19, 87, 52],
        "c": [1, 3, 7, 1, 4, 8, 2, 2, 6],
        "d": ["X", "X", "X", "Y", "Y", "Y", "Z", "Z", "Z"]
    }
)

alt.Chart(source).mark_bar().encode(
    x=alt.X("a", axis=alt.Axis(labelAngle=0)),
    y="b",
).transform_filter(
    # (datum.a != 'B') & (datum.a != 'C')
    (datum.c != 1)
)

### Histogram

In [11]:
alt.Chart(source).mark_bar().encode(
    alt.X("c"),
    y='count()',
)

In [12]:
alt.Chart(source).mark_bar().encode(
    alt.X("c", bin=alt.Bin(extent=[1,9], step=1), title='c values'),
    y='count()',
)

In [13]:
alt.Chart(source).mark_bar().encode(
    x=alt.X("a"),
    y=alt.Y("b"),
    color=alt.Color("c"),
)

## Box Plot

In [14]:
alt.Chart(source).mark_boxplot().encode(
   y=alt.X('b'),
)

In [15]:
alt.Chart(source).mark_boxplot().encode(
   x=alt.X('b'),
   y=alt.Y('d'),
   color='d'
).properties(height=100)

## Time series data

In [16]:
date_range = pd.date_range(start='2020-01-01', end='2021-12-31', freq='W')

np.random.seed(20)
items_sold = np.random.randint(low=1, high=500, size=len(date_range))

df_ts = pd.DataFrame({
    'Date': date_range,
    'ItemsSold': items_sold,
    'Category': 'A'
})

In [17]:
df_ts.head()

Unnamed: 0,Date,ItemsSold,Category
0,2020-01-05,356,A
1,2020-01-12,475,A
2,2020-01-19,272,A
3,2020-01-26,224,A
4,2020-02-02,413,A


In [18]:
chart = alt.Chart(df_ts).mark_line(point=True).encode(
    x='Date',
    y='ItemsSold',
    tooltip=['Date','ItemsSold', 'Category'],
).properties(
    title='Weekly items sold in 2020 and 2021', width=600
).interactive()

chart

### Update colours

Using global config - note this applies to the chart and all subcharts.

In [19]:
chart_global = alt.Chart(df_ts).mark_line().encode(
    x='Date',
    y='ItemsSold',
    tooltip=['Date','ItemsSold'],
).configure_mark(
   opacity=0.2,
   color='red'
).properties(
    title='Weekly items sold in 2020 and 2021', width=600
).interactive()

chart_global

Local config - only affects the chart referenced.

In [20]:
chart_local = alt.Chart(df_ts).mark_line(opacity=0.2, color='red').encode(
    x='Date',
    y='ItemsSold',
    tooltip=['Date','ItemsSold'],
).properties(
    title='Weekly items sold in 2020 and 2021', width=600
).interactive()

chart_local

Set chart properties using the encoding - map a property directly to a value.

In [21]:
chart_encoding = alt.Chart(df_ts).mark_line().encode(
    x='Date',
    y='ItemsSold',
    tooltip=['Date','ItemsSold'],
    opacity=alt.value(0.2),
    color=alt.value('red')
).properties(
    title='Weekly items sold in 2020 and 2021', width=600
).interactive()

chart_encoding

### Multiple series

In [22]:
items_sold_B = np.random.randint(low=20, high=100, size=len(date_range))
items_sold_C = np.random.randint(low=100, high=250, size=len(date_range))

df2 = pd.DataFrame({
    'Date': date_range,
    'ItemsSold': items_sold_B,
    'Category': 'B'
})

df3 = pd.DataFrame({
    'Date': date_range,
    'ItemsSold': items_sold_C,
    'Category': 'C'
})

df = pd.concat([df_ts, df2, df3])

df.head()

Unnamed: 0,Date,ItemsSold,Category
0,2020-01-05,356,A
1,2020-01-12,475,A
2,2020-01-19,272,A
3,2020-01-26,224,A
4,2020-02-02,413,A


In [23]:
multi_chart = alt.Chart(df).mark_line(point=True).encode(
    x='Date',
    y='ItemsSold',
    tooltip=['Date','ItemsSold'],
   color='Category'
)

multi_chart.properties(
    title='Weekly items sold in 2020 and 2021', width=600
)

In [24]:
labels = alt.Chart(df).mark_text(align='left', dx=5).encode(
    alt.X('Date', aggregate='max'),
    alt.Y('ItemsSold', aggregate={'argmax': 'Date'}),
    alt.Text('Category'),
    alt.Color('Category', legend=None),
)

(multi_chart + labels).properties(
    title='Weekly items sold in 2020 and 2021', width=600
)

## Layering

The previous chart contains weekly data but it may be useful to also contain a moving average or other points as a reference.
This can be done really easily with Layered Charts.

In `transform_window()` a new variable can be declared and the `frame` defined. This by default uses `[None,0]`.

In [25]:
chart

In [26]:
# Add a mean line to
mean = alt.Chart(df_ts).mark_line(color="#d62728", size=1).transform_window(
    mean=f"mean(ItemsSold)", frame=[None,None]
    ).encode(
    x='Date',
    y2='mean:Q',
    tooltip=['Date','mean:Q'],
)

(chart + mean).properties(width=600, title ='Weekly items sold and average').interactive()

Moving average over 4 week period.

In [27]:
mean = alt.Chart(df_ts).mark_line(color="#d62728", size=2).transform_window(
    mean=f"mean(ItemsSold)", frame=[-3,0]
    ).encode(
    x='Date',
    y='mean:Q',
    tooltip=['Date','mean:Q'],
)

(chart + mean).properties(width=600, title ='Weekly items sold and average').interactive()


### Concat charts

In [28]:
def create_chart(df, filter):
    c = alt.Chart(df).mark_line(point=True).encode(
        x='Date',
        y='ItemsSold',
        tooltip=['Date','ItemsSold']
    ).transform_filter(
        (datum.Category == f'{filter}')
    ).properties(
        title=f'Weekly items sold in 2020 and 2021 for category {filter}'
    ).interactive()

    return c

create_chart(df, 'A')

In [29]:
create_chart(df, 'A') | create_chart(df, 'B') | create_chart(df, 'C')

In [30]:
create_chart(df, 'A') & create_chart(df, 'B') & create_chart(df, 'C')

In [31]:
(create_chart(df, 'A') & create_chart(df, 'B')) | create_chart(df, 'C')

## Build regression 

In [32]:
df['other_col'] = df['ItemsSold'].apply(lambda x: x * np.random.random())
chart = alt.Chart(df).mark_point().encode(x='ItemsSold', y='other_col')
chart + chart.transform_regression('ItemsSold', 'other_col', method="poly").mark_line(color='red')

## Formatting

In [33]:
df['other_col'] = df['ItemsSold'].apply(lambda x: x * np.random.random()) - 250
alt.Chart(df).mark_point().encode(
    x=alt.X('ItemsSold'), 
    y=alt.Y('other_col'),
    color=alt.Color('ItemsSold', scale=alt.Scale(scheme='viridis'), title='Scale for legend')
    ).properties(
    title={
      "text": ["First line of title", "Second line of title"], 
      "subtitle": ["This is subtitle", "Second subtitle"],
      "color": "red",
      "subtitleColor": "green"
    }
)