# Introduction to Altair

[Altair](https://altair-viz.github.io/) is a declarative statistical visualization library for Python. Altair offers a powerful and concise visualization grammar for quickly building a wide range of statistical graphics.

By *declarative*, we mean that you can provide a high-level specification of *what* you want the visualization to include, in terms of *data*, *graphical marks*, and *encoding channels*, rather than having to specify *how* to implement the visualization in terms of for-loops, low-level drawing commands, *etc*. The key idea is that you declare links between data fields and visual encoding channels, such as the x-axis, y-axis, color, *etc*. The rest of the plot details are handled automatically. Building on this declarative plotting idea, a surprising range of simple to sophisticated visualizations can be created using a concise grammar.

Altair is based on [Vega-Lite](https://vega.github.io/vega-lite/), a high-level grammar of interactive graphics. Altair provides a friendly Python [API (Application Programming Interface)](https://en.wikipedia.org/wiki/Application_programming_interface) that generates Vega-Lite specifications in [JSON (JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON) format. Environments such as Jupyter Notebooks, JupyterLab, and Colab can then take this specification and render it directly in the web browser. To learn more about the motivation and basic concepts behind Altair and Vega-Lite, watch the [Vega-Lite presentation video from OpenVisConf 2017](https://www.youtube.com/watch?v=9uaHRWj04D4).

This notebook will guide you through the basic process of creating visualizations in Altair. First, you will need to make sure you have the Altair package and its dependencies installed (for more, see the [Altair installation documentation](https://altair-viz.github.io/getting_started/installation.html)), or you are using a notebook environment that includes the dependencies pre-installed.

## Learning Goals
Those who actively work through this notebook will be able to:
- Describe the `Chart` object in Altair
- Use three different methods to `read in` data
- Attach data to the chart object
- Describe the basic structure of how visualizations (vizzes) are created in Altair
- Create bar and column chart

## Imports

To start, we must import the necessary libraries: Pandas for data frames and Altair for visualization.

In [1]:
import pandas as pd
import altair as alt

## Renderers

Depending on your environment, you may need to specify a [renderer](https://altair-viz.github.io/user_guide/display_frontends.html) for Altair. If you are using __JupyterLab__, __Jupyter Notebook__, or __Google Colab__ with a live Internet connection you should not need to do anything. Otherwise, please read the documentation for [Displaying Altair Charts](https://altair-viz.github.io/user_guide/display_frontends.html).

## Data

Data in Altair is built around the Pandas data frame, which consists of a set of named data *columns*. We will also regularly refer to data columns as data *fields*.


### Vega Datasets

When using Altair, datasets are commonly provided as data frames. Alternatively, Altair can also accept a URL to load a network-accessible dataset. As we will see, the named columns of the data frame are an essential piece of plotting with Altair.
We will often use datasets from the [vega-datasets](https://github.com/vega/vega-datasets) repository. Some of these datasets are directly available as Pandas data frames:

In [2]:
from vega_datasets import data  # import vega_datasets
cars = data.cars()              # load cars data as a Pandas data frame
cars.head()                     # display the first five rows

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA


Datasets in the vega-datasets collection can also be accessed via URLs:

In [6]:
data.cars.url

'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/cars.json'

Dataset URLs can be passed directly to Altair (for supported formats like JSON and [CSV](https://en.wikipedia.org/wiki/Comma-separated_values)), or loaded into a Pandas data frame like so:

In [7]:
pd.read_json(data.cars.url).head() # load JSON data into a data frame

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA


For more information about data frames and some useful transformations to prepare Pandas data frames for plotting with Altair:
    - work through the [Basic Data Wrangling with Pandas](https://ubc-dsci.github.io/dsci-320-instructors/pre-req/0.3-pandas-wrangling.html) section in our preparation module and
    - see the [Specifying Data with Altair documentation](https://altair-viz.github.io/user_guide/data.html)

### Weather Data

For the rest of the notebook we will use a simple dataset _airport_weather_snippet_.
The data was collected at weather stations at airports across Canada and is freely available [online](https://climate.weather.gc.ca/historical_data/search_historic_data_e.html).
We have done some preliminary cleaning to provide you with a clean small subset of the data.
**Go to EdStem to download the zip folder that contains the [data folder](https://edstem.org/us/courses/31933/resources?download=23650) for the labs. Place the data folder in the same directory that your Jupyter notebook is in**
Let's first load the dataset which is in a csv file.

In [8]:
path = 'data/airport.csv'
df = pd.read_csv(path, index_col=0, parse_dates=True)
df

Unnamed: 0,airport,year,month,temp,rain,snow,precip
0,Calgary,2000,January,-9.7,0.0,23.8,10.2
1,Calgary,2000,May,8.8,22.6,6.4,28.8
2,Calgary,2000,September,10.7,52.8,0.8,53.6
3,Montreal,2000,January,-10.1,20.0,64.8,95.8
4,Montreal,2000,May,13.3,132.5,0.0,132.5
5,Montreal,2000,September,14.3,65.5,0.0,65.5
6,Toronto,2000,January,-5.8,16.4,17.0,29.2
7,Toronto,2000,May,14.3,124.4,0.0,124.4
8,Toronto,2000,September,15.9,70.0,0.0,70.0
9,Vancouver,2000,January,3.5,134.0,18.8,151.4


The dataset has 12 items with 7 attributes. We have shortened the numerical attribute names as follows:
- temp: Mean Temp (°C)
- rain: Total Rain (mm)
- snow: Total Snow (cm)
- precip: Total Precip (mm)

In future notebooks we will spend a bit more time describing the data, but for right now let's dive into visualizing.

## The Chart Object

Visualization begins with data and this is true when we use Altair.
Our first course of action is to attach the data frame to the fundamental object in Altair, the `Chart`.
The `Chart` takes a data frame as a single argument:

In [11]:
chart = alt.Chart(df)

So far, we have defined the `Chart` object and passed it the simple data frame we generated above. We have not yet told the chart to *do* anything with the data.

## Marks and Encodings

With a chart object in hand, we can now specify how we would like the data to be visualized. We first indicate what kind of graphical *mark* (geometric shape) we want to use to represent the data. We can set the `mark` attribute of the chart object using the the `Chart.mark_*` methods.

For example, we can show the data as a point using `Chart.mark_point()`:

In [13]:
alt.Chart(df).mark_point()

Here the rendering consists of one point per row in the dataset, all plotted on top of each other, since we have not yet specified positions for these points.

To visually separate the points, we can map various *encoding channels*, or *channels* for short, to fields in the dataset. For example, we could *encode* the field `'airport'` of the data using the `y` channel, which represents the y-axis position of the points. To specify this, use the `encode` method:


In [14]:
alt.Chart(df).mark_point().encode(
  y='airport',
)

The `encode()` method builds a key-value mapping between encoding channels (such as `x`, `y`, `color`, `shape`, `size`, *etc.*) to fields in the dataset, accessed by field name. For Pandas data frames, Altair automatically determines an appropriate data type for the mapped column, which in this case is the *nominal* type, indicating unordered, categorical values.
This level of abstraction allows the designer to focus on the higher level constructs of what is being encoded as opposed to the nitty-gritty at the code level.

Though we have now separated the data by one attribute, we still have multiple points overlapping within each category. Let's further separate these by adding an `x` encoding channel, mapped to the `'month'` field:

In [15]:
alt.Chart(df).mark_point().encode(
    y='airport',
    x='month'
)

This visual representation does not really provide us with any insights that we could not infer from the data table. Choosing which channel should be used to encode the data is not a trivial endeavour. In the coming notebooks, we will explore this issue in detail.

Let's change the data that is encoded on the `x` channel from `'month'` to the `'rain'` field

In [20]:
alt.Chart(df).mark_point().encode(
    y='airport',
    x='rain',
)

_The Calgary airport has the least amount of rain and the Vancouver airport has the most._

The data type of the `'rain'` field is again automatically inferred by Altair, and this time is treated as a *quantitative* type (that is, a real-valued number). We see that grid lines and appropriate axis titles are automatically added as well.

Above we have specified key-value pairs using keyword arguments (`x='rain'`). In addition, Altair provides construction methods for encoding definitions, using the syntax `alt.X('rain')`. This alternative is useful for providing more parameters to an encoding, as we will see later in this notebook.


In [17]:
alt.Chart(df).mark_point().encode(
    alt.Y('airport'),
    alt.X('rain'),
)

The two styles of specifying encodings can be interleaved: `x='rain', alt.Y('city')` is also a valid input to the `encode` function.

In the examples above, the data type for each field is inferred automatically based on its type within the Pandas data frame.  In the next notebook we will explicitly indicate the data type to Altair.

## Data Aggregation

To allow for more flexibility in how data are visualized, Altair has a built-in syntax for *aggregation* of data. For example, we can compute the average of all values by specifying an aggregation function along with the field name:

In [28]:
# alt.Chart(df).mark_point().encode(
#     y='airport',
#     x='average(rain)',

# )
chart = chart.mark_point().encode(
    y = "airport",
    x = "average(rain)"
)
chart.save("a.html")

#alt.Chart.save("a.html")

Now within each x-axis category, we see a single point reflecting the *average* of the values within that category.

_Does Calgary really have the lowest average rain fall of all these cities? (It does!) Still, how might this plot mislead? Which months are included? What counts as rain?_

Altair supports a variety of aggregation functions, including `count`, `min` (minimum), `max` (maximum), `average`, `median`, and `stdev` (standard deviation). Aggregation is a data transformation. In a later notebook, we will take a tour of data transformations, including aggregation, sorting, filtering, and creation of new derived fields using calculation formulas.

## Changing the Mark Type

Let's say we want to represent our aggregated values using rectangular bars rather than circular points. We can do this by replacing `Chart.mark_point` with `Chart.mark_bar`:

In [22]:
alt.Chart(df).mark_bar().encode(
    y='airport',
    x='average(rain)',

)

Because the nominal field `average` is mapped to the `y`-axis, the result is a horizontal bar chart. To get a vertical bar chart, we can simply swap the `x` and `y` keywords:

In [23]:
alt.Chart(df).mark_bar().encode(
    x='airport',
    y='average(rain)'
)

## Publishing a Visualization

Once you have visualized your data, perhaps you would like to publish it somewhere on the web. This can be done straightforwardly using the [vega-embed JavaScript package](https://github.com/vega/vega-embed). A simple example of a stand-alone HTML document can be generated for any chart using the `Chart.save` method:

```python
chart = alt.Chart(df).mark_bar().encode(
    x='average(rain)',
    y='city',
)
chart.save('chart.html')
```


The basic HTML template produces output that looks like this, where the JSON specification for your plot produced by `Chart.to_json` should be stored in the `spec` JavaScript variable:

```html
<!DOCTYPE html>
<html>
<head>
<script src="https://cdn.jsdelivr.net/npm/vega@5"></script>
<script src="https://cdn.jsdelivr.net/npm/vega-lite@4"></script>
<script src="https://cdn.jsdelivr.net/npm/vega-embed@6"></script>
</head>
<body>
<div id="vis"></div>
<script>
(function(vegaEmbed) {
var spec = {}; /* JSON output for your chart's specification */
var embedOpt = {"mode": "vega-lite"}; /* Options for the embedding */

function showError(el, error){
    el.innerHTML = ('<div style="color:red;">'
                    + '<p>JavaScript Error: ' + error.message + '</p>'
                    + "<p>This usually means there's a typo in your chart specification. "
                    + "See the javascript console for the full traceback.</p>"
                    + '</div>');
throw error;
}
const el = document.getElementById('vis');
vegaEmbed("#vis", spec, embedOpt)
    .catch(error => showError(el, error));
})(vegaEmbed);
</script>
</body>
</html>
```

The `Chart.save` method provides a convenient way to save such HTML output to file. For more information on embedding Altair/Vega-Lite, see the [documentation of the vega-embed project](https://github.com/vega/vega-embed).


## Summary
🎉 Hooray, you've completed the introduction to Altair!
 In this gentle introduction, we started out by showing three ways data can be read in. Next, you were exposed to the fundamental building block of altair vizzes, the Chart object. The rest of the notebook introduced how marks and channels are used to specify how data can be represented. In the next notebook, we will take a closer look at the various data types in Altair.
