<a href="https://colab.research.google.com/github/odu-cs625-datavis/public-fall24-mcw/blob/main/Annotations_in_Vega_Altair.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Vega-Altair Annotations Examples**

This notebook contains examples of customizing and adding annotations to Vega-Altair charts.

The other notebooks (Seaborn, Vega-Lite) included examples from [Chapter 28, Graphics for Communication](https://byuidatascience.github.io/python4ds/graphics-for-communication.html) from [Python for Data Science](https://byuidatascience.github.io/python4ds/), which itself is a Python port of Grolemund and Wickham's [R for Data Science](https://r4ds.had.co.nz/index.html) book.  Since this book uses Vega-Altair for its examples, those won't be repeated here.

In [1]:
!pip install altair==5.4.1

Collecting altair==5.4.1
  Downloading altair-5.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting narwhals>=1.5.2 (from altair==5.4.1)
  Downloading narwhals-1.13.3-py3-none-any.whl.metadata (7.4 kB)
Downloading altair-5.4.1-py3-none-any.whl (658 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m658.1/658.1 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading narwhals-1.13.3-py3-none-any.whl (201 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m201.1/201.1 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: narwhals, altair
  Attempting uninstall: altair
    Found existing installation: altair 4.2.2
    Uninstalling altair-4.2.2:
      Successfully uninstalled altair-4.2.2
Successfully installed altair-5.4.1 narwhals-1.13.3


In [2]:
import altair as alt
import pandas as pd

In [3]:
alt.__version__

'5.4.1'

# **Other Annotations**

There are a few examples of adding annotations (bar chart with labels, bar chart with line overlay, histogram with mean overlay) in the [Vega-Altair vs. Vega-Lite JSON Examples](https://colab.research.google.com/drive/1DGIHEoohX9YRXb9ZDsmCnC4PWETdwssW?usp=drive_link) notebook.

### **Layering text over heatmap**

From https://vega.github.io/vega-lite/examples/layer_text_heatmap.html

In [4]:
cars = alt.UrlData(url = "https://vega.github.io/vega-lite/examples/data/cars.json", format=alt.JsonDataFormat(type="json"))
cars

UrlData({
  format: JsonDataFormat({
    type: 'json'
  }),
  url: 'https://vega.github.io/vega-lite/examples/data/cars.json'
})

In [5]:
base = alt.Chart(cars).transform_aggregate(
    num_cars='count()',
    groupby=["Origin", "Cylinders"]
)

heatmap = base.mark_rect().encode(
    x='Cylinders:O',
    y='Origin:O',
    color=alt.Color('num_cars:Q', title='Count of Records',
                    legend=alt.Legend(direction='horizontal', gradientLength=120))
)

text = base.mark_text().encode(
    x='Cylinders:O',
    y='Origin:O',
    text='num_cars:Q',
    color=alt.condition(
        alt.datum.num_cars < 40,
        alt.value('black'),  # The True color
        alt.value('white')   # The False color
    )
)

chart = alt.layer(heatmap, text).configure_axis(
    grid=True,
    tickBand='extent'
)

chart.display()

## **Carbon Dioxide in the Atmosphere**

From https://vega.github.io/vega-lite/examples/layer_line_co2_concentration.html

In [6]:
co2_concentration = alt.UrlData(url = "https://vega.github.io/vega-lite/examples/data/co2-concentration.csv", format=alt.CsvDataFormat(type="csv"))
co2_concentration

UrlData({
  format: CsvDataFormat({
    type: 'csv'
  }),
  url: 'https://vega.github.io/vega-lite/examples/data/co2-concentration.csv'
})

In [7]:
base = alt.Chart(co2_concentration).transform_calculate(
    year="year(datum.Date)",
    decade="floor(datum.year / 10)",
    scaled_date="(datum.year % 10) + (month(datum.Date) / 12)",
    end= "datum.first_date === datum.scaled_date ? 'first' : datum.last_date === datum.scaled_date ? 'last' : null"
).encode(
    x=alt.X("scaled_date:Q", title="Year into Decade", axis=alt.Axis(tickCount=11)),
    y=alt.Y("CO2:Q", title="CO2 concentration in ppm", scale=alt.Scale(zero=False)),
    color=alt.Color("decade:O", legend=None, scale=alt.Scale(scheme="magma"))
)

# Line mark for CO2 concentration
line = base.mark_line()

# Text mark for the left side
left_text = base.mark_text(align='left', dx=3, dy=1, baseline="top", aria=False).encode(
    x=alt.X("scaled_date:Q", aggregate="min"),
    y=alt.Y("CO2:Q", aggregate={"argmin": "scaled_date"}),
    text=alt.Text("year:N", aggregate={"argmin": "scaled_date"})
)

# Text mark for the right side
right_text = base.mark_text(align='left', dx=3, dy=1, aria=False).encode(
    x=alt.X("scaled_date:Q", aggregate="max"),
    y=alt.Y("CO2:Q", aggregate={"argmax": "scaled_date"}),
    text=alt.Text("year:N", aggregate={"argmax": "scaled_date"})
)

# Layering the marks together
chart = alt.layer(line, left_text, right_text).properties(
    width=800,
    height=500
)

chart.display()


x value is set by `scaled_date="(datum.year % 10) + (month(datum.Date) / 12)"`

Examples:
* 1960-01 = 0 + 1/12 = 0.083
* 1961-02 = 1 + 2/12 = 1.167
* 1965-06 = 5 + 6/12 = 5.5
* 1975-02 = 5 + 2/12 = 5.167

[Aggregate](https://vega.github.io/vega-lite/docs/aggregate.html) lets us compute aggregate summary statistics, like max.  For the x value for ending year text, we're computing the max of the `scaled_date` field.

[Argmin/Argmax](https://vega.github.io/vega-lite/docs/aggregate.html#argmax) lets us compute aggregate summary statistics of another field.  For the y value for the ending year text, we're computing max of the `scaled_date` field and then finding the CO2 value for that date.

## **Line Chart with Highlighted Rectangles**

From https://vega.github.io/vega-lite/examples/layer_falkensee.html

In [8]:
# Define the population data
population_data = pd.DataFrame([
      {"year": "1875", "population": 1309},
      {"year": "1890", "population": 1558},
      {"year": "1910", "population": 4512},
      {"year": "1925", "population": 8180},
      {"year": "1933", "population": 15915},
      {"year": "1939", "population": 24824},
      {"year": "1946", "population": 28275},
      {"year": "1950", "population": 29189},
      {"year": "1964", "population": 29881},
      {"year": "1971", "population": 26007},
      {"year": "1981", "population": 24029},
      {"year": "1985", "population": 23340},
      {"year": "1989", "population": 22307},
      {"year": "1990", "population": 22087},
      {"year": "1991", "population": 22139},
      {"year": "1992", "population": 22105},
      {"year": "1993", "population": 22242},
      {"year": "1994", "population": 22801},
      {"year": "1995", "population": 24273},
      {"year": "1996", "population": 25640},
      {"year": "1997", "population": 27393},
      {"year": "1998", "population": 29505},
      {"year": "1999", "population": 32124},
      {"year": "2000", "population": 33791},
      {"year": "2001", "population": 35297},
      {"year": "2002", "population": 36179},
      {"year": "2003", "population": 36829},
      {"year": "2004", "population": 37493},
      {"year": "2005", "population": 38376},
      {"year": "2006", "population": 39008},
      {"year": "2007", "population": 39366},
      {"year": "2008", "population": 39821},
      {"year": "2009", "population": 40179},
      {"year": "2010", "population": 40511},
      {"year": "2011", "population": 40465},
      {"year": "2012", "population": 40905},
      {"year": "2013", "population": 41258},
      {"year": "2014", "population": 41777}
])
population_data['year'] = pd.to_datetime(population_data['year'], format='%Y')

# Define the event data
event_data = pd.DataFrame([
    {"start": "1933", "end": "1945", "event": "Nazi Rule"},
    {"start": "1948", "end": "1989", "event": "GDR (East Germany)"}
])
event_data['start'] = pd.to_datetime(event_data['start'], format='%Y')
event_data['end'] = pd.to_datetime(event_data['end'], format='%Y')

# Rectangle mark for historical periods
rect = alt.Chart(event_data).mark_rect().encode(
    x=alt.X('start:T', axis=alt.Axis(title='Year')),
    x2='end:T',
    color=alt.Color('event:N', legend=alt.Legend(title="Historical Event"))
)

# Line and point mark for population data
line = alt.Chart(population_data).mark_line(color="#333").encode(
    x=alt.X('year:T', title='Year'),
    y=alt.Y('population:Q', title='Population')
)
point = alt.Chart(population_data).mark_point(color="#333").encode(
    x=alt.X('year:T', title='Year'),
    y=alt.Y('population:Q', title='Population')
)

# Combine the charts
chart = alt.layer(rect, line, point).properties(width=500)

chart.display()