# Fill in the blank exercises

Read the instructions and try to create the visualizations by filling in the ---- blank sections.

In [None]:
import pandas as pd
import altair as alt
from altair import datum

In [None]:
alt.data_transformers.enable('json')

# Polio cases by state from 1928-1969

**Note: Data is normalized cases per 100,000 people for each state**

- [Downloaded from visdatasets](https://visdatasets.github.io/)
- Original Retrieved from [Project Tycho](https://www.tycho.pitt.edu/); aggregated into yearly values.
- [Good article on visualizations of this data](http://www.randalolson.com/2016/03/04/revisiting-the-vaccine-visualizations/)

For this example we'll load data from an Excel file. If you have more than one sheet, you need to specify the sheet name.

In [None]:
polio = pd.read_excel('data/polio_incidence_rates_united_states.xlsx', 
                      sheet_name='polio_incidence_rates')
polio.head()

## Timeline of total incidence per year (summed over all states)

Start exploring by making a line chart that plots the sum of all the polio cases per year.

In [None]:
alt.Chart(----).mark_----().encode(
    x = '----:-',
    y = '----:-'
).properties(
    width = 600
)

## Timeline overlaying (detail) all states

We can add `detail=` to the encoding to split the data according to some categorical variable. This will make a mark for each unique entry in that category, adding a finer level of detail, without tying that variable to any other visual property like color or symbol type.

Add a `detail=` section to the previous encoding to create more detail – one line for each state.

In [None]:
alt.Chart(----).mark_----(opacity=0.3).encode(
    x = '----:-',
    y = '----:-',
    detail = '----:-'
).properties(
    width = 600
)

## Making a simple DataFrame to hold the year of the polio vaccine introduction

We'll use this DataFrame for a rule to annotate some charts

In [None]:
vacc = pd.DataFrame([{"Introduction": 1955}])

## Timeline of all states overlayed with mean cases across states

Now we'll practice layering up multiple charts using the `+` operator.

- Make the bottom layer just like your previous plot, with one line per state.
- Put over that a single line showing the **mean** number of cases per year (across all states)
- Also add rule at 1955 introduction of vaccine

**Note that we can layer charts using data from different DataFrames!**

In [None]:
state_lines = alt.Chart(----).----

mean_line = alt.Chart(----).mark_line(strokeWidth=3).encode(----)

rule = alt.Chart(vacc).mark_rule().encode(
    x='Introduction:O'
)

state_lines + mean_line + rule

## Median line with upper and lower quartile boundaries

Often data patterns are more clear if you don't show as much detail. We can use an area plot to show upper and lower quartile bounds around a (layered) median line

- The `mark_area()` has both y and y2 encoding channels
- these will be the lower and upper bounds of the area plotted
- `q1()` and `q3()` are aggregation functions which calculate the lower and upper quartile, respectively

In [None]:
base = alt.Chart(polio).properties(width=500)

line = base.mark_line().encode(----)

confidence_interval = base.mark_area(opacity=0.3).encode(
    x ='----:-',
    y = 'q1(----)',
    y2 = 'q3(----)'
)

rule = alt.Chart(----).----

confidence_interval + line + rule

## Mean line with 95% confidence intervals

Now try to do the same thing, but with the line showing the `mean` number of cases per state, and the area showing the upper and lower 95% confidence intervals. 

- Instead of using `q1()` and `q3()` for the upper and lower quantiles from the last exercise, now use `ci0()` and `ci1()` for the confidence interval aggregation functions in the `y=` and `y2=` arguments for the `encoding()`.

In [None]:
base = alt.Chart(----)

line = base.----

confidence_interval = base.----

rule = alt.Chart(----).-----

----

## Bar chart of sum of incidents by state (over all time)

Now create a simple **horizontal** bar chart showing the total number of cases per state

- Try to put the state names and the bars running horizontally so it's easier to read the state names.
- By default the states will be listed alphabetically. Notice how it's hard to see the data pattern.

In [None]:
alt.Chart(---).----

## Sorted bar chart of sum of incidents by state (over all time)

It's better practice to sort the bars (descending) by their length. To do that we need to change the specification for the y-axis to sort the categorical variable.

In [None]:
alt.Chart(polio).mark_bar().encode(
    x = '----:-',
    y = alt.Y('----:-',
            sort=alt.EncodingSortField(
                field="----",
                op="sum",
                order="descending"))
)

## Top 10 bar chart of summed cases per state, sorted (descending) by number of cases

`transform_window()` is currently very poorly documented in Altair. It's purpose is to do things like running averages and rank calculations. In Tableau, these are usually "table calculations".

- Here I [pulled from this example](https://github.com/altair-viz/altair/blob/master/altair/examples/top_k_letters.py) to calculate the top 10 states
- Note that since I needed to use the sum of cases in multiple places I put it in a `transform_aggregate()` section

In [None]:
alt.Chart(polio).mark_bar().encode(
    x = 'sum_cases:Q',
    y = alt.Y('state:N',
            sort=alt.EncodingSortField(
                field="sum_cases",
                op="sum",
                order="descending")
    )
).transform_aggregate(
    sum_cases='sum(cases)',
    groupby=['state']
).transform_window(
    rank = 'rank(sum_cases)',
    sort=[alt.SortField('sum_cases', order='descending')]
).transform_filter(
    alt.datum.rank < 10
)

## Heatmap of cases by state and year

Use rectangle marks, and show cases in color, states on the left and years on the bottom.

[Vega-Lite color schemes](https://vega.github.io/vega/docs/schemes/)

*(Note: To see the trend more clearly, limit the color scale domain from 0-50.)*

In [None]:
alt.Chart(----).mark_----().encode(
    x = '----:-',
    y = '----:-',
    color = alt.Color('----:-', scale=alt.Scale(scheme='reds', domain=[0,50]))
).properties(
    width = 500,
    height = 500
)

## Heatmap with states sorted by sum of cases

Now again 

- sort the states by the sum of the number of cases (over all years)
- add a rule at the year when the vaccine was introduced

In [None]:
heatmap = alt.Chart(----).----

rule = alt.Chart(----).----

---- - ----

## US States Symbol Map

US map with a circle for each state showing the total number of cases over the years for that state.

In [None]:
state_locs = pd.read_excel('data/polio_incidence_rates_united_states.xlsx', sheet_name='state_locations')
state_locs.head()

In [None]:
states = alt.topo_feature('https://vega.github.io/vega-datasets/data/us-10m.json', 'states')
proj_type = 'albersUsa'
width = 600
height = 400

background = alt.Chart(states).mark_geoshape(
    fill='#e5d8bd',
    stroke='white',
    opacity=0.5
).project(
    type = proj_type
).properties(
    width = width,
    height = height
)

points = alt.Chart(----).mark_circle().encode(
    longitude = '----:Q',
    latitude = '----:Q',
    size = '----:Q',
    tooltip = ['----:N','----:Q']
).transform_aggregate(
    sum_cases = 'sum(----)',
    groupby = ['state']
).transform_lookup(
    lookup = "state",
    from_ = alt.LookupData(data=state_locs, key='state', fields=['latitude','longitude'])
).project(
    type = proj_type
).properties(
    width = width,
    height = height
)

background + points