# Plotting COVID's effect on CDVS workshop attendance

When COVID started we shut down CDVS workshops and adapted them to online presentation. We actually had record attendance after that transition, so we wanted to tell that story with our attendance data.

The original plot was prepared in Excel and then imported into PowerPoint to add annotations. Here I'll show you how I would do the initial process in Python using Altair, but I'll also show you how I would add similar annotations directly in code instead. It's a slower process, but you can achieve reasonable results.

![progression](images/WorkshopAttendanceLineProgression.png)

In [1]:
import pandas as pd
import altair as alt

## Load data & add an index field for sorting

We need something to keep the original data order in the final plot. There is no other quantity in the original two columns that we can sort by to get the correct semester order.

In [16]:
df = pd.read_excel('data/AnnualReport2022/WorkshopAttendanceLine.xlsx').reset_index()
df

Unnamed: 0,index,Semester,Attendance
0,0,Fall 2017,456
1,1,Spring 2018,450
2,2,Fall 2018,332
3,3,Spring 2019,462
4,4,Fall 2019,438
5,5,Spring 2020,491
6,6,Fall 2020,281
7,7,Spring 2021,711
8,8,Fall 2021,435
9,9,Spring 2022,625


## Base line chart

- All defaults except plot size
- Instead of layering circles, you could do `point=True` in the `mark_line()` arguments, but you can't control size that way
    - To control overlay point properties you do `point=alt.OverlayMarkerDef()'
    - [OverlayMarkerDef documentation](https://altair-viz.github.io/user_guide/generated/core/altair.OverlayMarkDef.html)

In [197]:
base = alt.Chart(df).encode(
    alt.X('Semester:N'),
    alt.Y('Attendance:Q')
).properties(
    width = 550,
    height = 300
)

line = base.mark_line()

points = base.mark_circle()

alt.layer(
    line,
    points
)

## Pretty good version of final plot

***This is close to the version I would probably save to SVG and add annotations in PowerPoint or Adobe Illustrator because doing the rest in Altair is time-consuming and difficult!***

- Splitting Semester on a space makes it into a list, which displays as one line per entry
- Prepped index field above so could sort by those values
- Semester labels are easier to read when they're horizontal
- Setting line thickness and color
- Round joins in the line keep pointy transitions from poking out beyond the circles
- Points now inherit color from line, but the default opacity is not 100% so set explicitly
- `domainWidth` sets the width of the axis line
- global `.configure_view(strokeWidth=0)` takes away the box around the plot

### Two ways to wrap text in Altair

- Text in titles, subtitles, or other text marks will display as one line per entry in a list
- https://github.com/altair-viz/altair/issues/2376
- https://stackoverflow.com/questions/71215156/how-to-wrap-axis-label-in-altair

In [234]:
base = alt.Chart(df).encode(
    alt.X('semesterWrap:N')
            .sort(alt.SortField('index'))
            .axis(labelAngle = 0)
            .title('Semester'),
    alt.Y('Attendance:Q').axis(tickCount=5, domainWidth=0)
).transform_calculate(
    semesterWrap = alt.expr.split(alt.datum.Semester,' ')
).properties(
    width = 550,
    height = 300
)

line = base.mark_line(
    size=4, 
    strokeJoin="round"
).encode(
    color = alt.value("#C84E00")
)

points = line.mark_circle(
    size=50, 
    opacity=1.0
)

alt.layer(line, 
          points
).configure_view(
    strokeWidth=0
)

### Transparent background in SVG output

I want a transparent SVG chart so I can put things behind the plot, like the eventual COVID yellow rectangle. 

`background="none"` in `.configure()` is the only way I can figure out how to make a transparent background for the SVG output on the whole plot. You can set the background color, which translates into a colored `<rect fill="color">` in the SVG, but there doesn't seem to be a vega-lite config for `<rect fill-opacity="0">` equivalent. So, `background="none"` just takes away any background `<rect>` in the SVG, which is fine.

https://www.fabiofranchino.com/log/set-a-vega-vega-lite-background-to-transparent/

In [235]:
alt.layer(line, 
          points
).configure(
    background="none"
).configure_view(
    strokeWidth=0
).save('attendance_line.svg')

## Add data labels and highlight circles

- By default, open circles would show up at every data point
- **Filter down to just the points you want to highlight** in that specification
- Unfortunately, you can't specify the label offset separately for each data point, so some are overlapping the line...

In [237]:
base = alt.Chart(df).encode(
    alt.X('semesterWrap:N')
            .sort(alt.SortField('index'))
            .axis(labelAngle = 0)
            .title('Semester'),
    alt.Y('Attendance:Q').axis(tickCount=5, domainWidth=0, zindex=1)
).transform_calculate(
    semesterWrap = alt.expr.split(alt.datum.Semester,' ')
).properties(
    width = 550,
    height = 300
)

line = base.mark_line(
    size=4, 
    strokeJoin="round"
).encode(
    color = alt.value("#C84E00")
)

points = line.mark_circle(
    size=50, 
    opacity=1.0
)

count_text = base.mark_text(
    dy = -15
).encode(
    text = 'Attendance:Q',
    color = alt.value('#666666')
)

highlight_circles = base.mark_point(
    size=300, 
    opacity=1.0
).encode(
    color = alt.value("#988675")
).transform_filter(
    alt.FieldOneOfPredicate(field='Semester', oneOf=['Fall 2020','Spring 2021'])
)

alt.layer(line, 
          points, 
          count_text, 
          highlight_circles
).configure_view(
    strokeWidth=0
)

## Final plot with highlight band, text annotations, and title

- Title gets specified in the top-level chart
- Since grid lines are overlapping the red plot line, make them black and reduce opacity
- Lots of other small configurations on the axis label spacing, title alignment, etc.
- *These font sizes are probably the ones I'd use in the simpler plot I'd save to SVG to annotate in PowerPoint*

In [249]:
base = alt.Chart(
    df, 
    title = alt.TitleParams(
        "CDVS workshop attendance set a record after a COVID transition"
    )
).encode(
    alt.X('semesterWrap:N')
            .sort(alt.SortField('index'))
            .axis(labelAngle = 0)
            .title(''),
    # Grid lines stay behind all elements unless set zindex > 0
    alt.Y('Attendance:Q')
            .axis(tickCount=5, domainWidth=0, zindex=1)
            .title('')
).transform_calculate(
    # Splitting on a space makes it into a list, which spans one line per entry
    semesterWrap = alt.expr.split(alt.datum.Semester,' ')
).properties(
    width = 550,
    height = 300
)

line = base.mark_line(
    size=4, 
    strokeJoin="round"
).encode(
    color = alt.value("#C84E00")
)

points = line.mark_circle(
    size=50, 
    opacity=1.0
)

count_text = base.mark_text(
    dy = -15,
    fontSize = 12
).encode(
    text = 'Attendance:Q',
    color = alt.value('#888888')
)

highlight_circles = base.mark_point(
    size=300, 
    opacity=1.0
).encode(
    color = alt.value("#988675")
).transform_filter(
    alt.FieldOneOfPredicate(field='Semester', oneOf=['Fall 2020','Spring 2021'])
)

covid_band = alt.Chart(df).mark_rect().encode(
    x = alt.value(325),
    x2 = alt.value(550),
    color = alt.value("#FCF7E5")
)

covid_text = alt.Chart(df).mark_text(
    fontSize=36,
    fontStyle="bold"
).encode(
    x = alt.value(390),
    y = alt.value(280),
    text = alt.value("COVID"),
    color = alt.value('white')
)

fall2020_text = base.mark_text(
    fontSize=12,
    fontStyle='italic',
    dx = -70,
    dy = -5
).encode(
    text = alt.value(['Revising curriculum','for distance learning']),
    color = alt.value("#666666")
).transform_filter(
    alt.datum.Semester == "Fall 2020"
)

spring2021_text = base.mark_text(
    fontSize=12,
    fontStyle='italic',
    dx = -70,
    dy = -5
).encode(
    text = alt.value(['Record attendance','with online offerings']),
    color = alt.value("#333333")
).transform_filter(
    alt.datum.Semester == "Spring 2021"
)

alt.layer(covid_band, 
          line, 
          points, 
          count_text, 
          highlight_circles, 
          covid_text, 
          fall2020_text,
          spring2021_text
).configure_axis(
    titlePadding = 10,
    labelFontSize = 14,
    labelColor = "#333333",
    titleFontSize = 12,
    gridColor = "black",
    gridOpacity = 0.15
).configure_title(
    fontSize = 16,
    anchor = 'start',
    offset = 15,
    color = "#333333",
    subtitleColor = "#666666"
).configure_view(
    strokeWidth=0
)