# INFO 3402 – Week 13: Visualizing data in Altair and Vega

[Brian C. Keegan, Ph.D.](http://brianckeegan.com/)  
[Assistant Professor, Department of Information Science](https://www.colorado.edu/cmci/people/information-science/brian-c-keegan)  
University of Colorado Boulder  

Copyright and distributed under an [MIT License](https://opensource.org/licenses/MIT).

## Learning Objectives
* Visualizing data using "grammars of graphics"
* Using Altair to construct Vega specifications for visualizing data

## Resources
* Documentation
  * [Vega and Vega-Lite](https://vega.github.io/) documentation.
  * [Altair](https://altair-viz.github.io/index.html) documentation.
* Optional readings
  * Wilkinson, L. *[The Grammar of Graphics](https://www.springer.com/gp/book/9780387245447).*
  * Wilkinson, L. "[The Grammar of Graphics](https://link.springer.com/chapter/10.1007/978-3-642-21551-3_13)." *Handbook of Computational Statistics*, 2012.
  * Wickham, H. "[A Layered Grammar of Graphics](https://www.tandfonline.com/doi/abs/10.1198/jcgs.2009.07098)." *J. of Comp. and Graphical Statistics*, 2010.
  * Satyanarayan, A.,Wongsuphasawat, K., & Heer, J. "[Declarative interaction design for data visualization](https://dl.acm.org/doi/abs/10.1145/2642918.2647360)." *Proc. UIST'14*.
  * Satyanarayan, A., Moritz, D., Wongsuphasawat, K., & Heer, J. "[Vega-Lite: A Grammar of Interactive Graphics](https://idl.cs.washington.edu/papers/vega-lite)." *IEEE Trans. Visualization & Comp. Graphics*, 2017.
  * VanderPlas, J., Granger, B.E., Heer, J., *et al*. "[Altair: Interactive Statistical Visualizations for Python](https://joss.theoj.org/papers/10.21105/joss.01057.pdf)", *The Journal of Open Source Software*

## Install libraries

You only need to do this once. At the terminal (or Anaconda Prompt on Windows) run:

<code>conda install -c conda-forge altair vega_datasets</code>

If it's been a while, now may also be a good time to do a <code>conda update --all</code>

## Import libraries

In [1]:
import pandas as pd
import altair as alt

In [4]:
pip install altair vega_datasets

Note: you may need to restart the kernel to use updated packages.


## Make some fake data

In [5]:
data = pd.DataFrame({'a': list('CCCDDDEEE'),
                     'b': [2, 7, 4, 1, 2, 6, 8, 4, 7]})

data

Unnamed: 0,a,b
0,C,2
1,C,7
2,C,4
3,D,1
4,D,2
5,D,6
6,E,8
7,E,4
8,E,7


## Make a Chart object with the `data` inside

In [6]:
chart = alt.Chart(data)

## Make a basic mark

In [8]:
chart.mark_point()

## Make a basic mark with encodings

In [9]:
chart.mark_point().encode(x='a')

In [10]:
chart.mark_point().encode(x='a',y='b')

## Try an alternative mark and encoding

In [11]:
chart.mark_point().encode(x='b',y='a')

In [16]:
chart_bar = chart.mark_bar().encode(y='average(b)',x='a')
chart_bar

## Change mark color

In [17]:
chart_bar = chart.mark_bar(color='Coral').encode(x='average(b)',y='a')
chart_bar

## Examine JSON output

In [14]:
print(chart_bar.to_json())

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-347f1284ea3247c0f55cb966abbdd2d8"
  },
  "datasets": {
    "data-347f1284ea3247c0f55cb966abbdd2d8": [
      {
        "a": "C",
        "b": 2
      },
      {
        "a": "C",
        "b": 7
      },
      {
        "a": "C",
        "b": 4
      },
      {
        "a": "D",
        "b": 1
      },
      {
        "a": "D",
        "b": 2
      },
      {
        "a": "D",
        "b": 6
      },
      {
        "a": "E",
        "b": 8
      },
      {
        "a": "E",
        "b": 4
      },
      {
        "a": "E",
        "b": 7
      }
    ]
  },
  "encoding": {
    "x": {
      "aggregate": "average",
      "field": "b",
      "type": "quantitative"
    },
    "y": {
      "field": "a",
      "type": "nominal"
    }
  },
  "mark": {
    "color": "red",
    "type": "bar"
  }
}


## Save chart to HTML

In [18]:
chart_bar.save('chart_bar.html')