# SI649-24-WINTER -> Altair Assignment 3

School of Information, University of Michigan


## Assignment Overview

We will replicate 4 visualizations based on an old article from Five Thirty Eight about Biden's approval rating in previous years (Bycoffe, Mehta, Silver, Radcliffe, 2023).

The total points of this assignment is **11**:
* 2.5 points for each visualization (x 4)
* 1 bonus point for hosting the fourth visualization on Huggingface

For this lab, please write Altair code to answer the questions. It's fine if your visualization looks slightly different from the example (e.g., getting 1.1 instead of 1.0, use orange instead of red, have different titles, chart width/height,and mark size/opacity)

When you are finished, upload your .ipynb notebook to Canvas

### Resources:
- Altair Interactive charts gallery: [https://altair-viz.github.io/gallery/index.html#interactive-charts](https://altair-viz.github.io/gallery/index.html#interactive-charts)
- Getting started in Panel [https://panel.holoviz.org/getting_started/index.html](https://panel.holoviz.org/getting_started/index.html)

### General Hints: 
* We recommend that you finish all the static charts before adding interactions. 
* If you see duplicated axes, use `axis=None` to get rid of unnecessary axes.  
* `resolve_scale` ensures charts share axes and scales. 


In [26]:
# start with the setup

# supress warnings about future deprecations
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pandas as pd
import altair as alt
import numpy as np
import pprint
import datetime as dt
from vega_datasets import data
import matplotlib.pyplot as plt

# Solve a javascript error by explicitly setting the renderer
alt.renderers.enable('jupyterlab')


RendererRegistry.enable('jupyterlab')

In [27]:
#load data 
df1=pd.read_csv("https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw3/approval_polllist.csv")
df2=pd.read_csv("https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw3/approval_topline.csv")

In [28]:
#change the approval ratings into percentage
df1['approve_percent']=df1['approve']/100
df1.head()

Unnamed: 0,president,subgroup,modeldate,startdate,enddate,pollster,grade,samplesize,population,weight,...,adjusted_approve,adjusted_disapprove,multiversions,tracking,url,poll_id,question_id,createddate,timestamp,approve_percent
0,Joe Biden,Voters,2/14/2023,1/2/2023,1/8/2023,Rasmussen Reports,B,1500.0,lv,0.135108,...,46.977828,48.201321,,T,https://www.rasmussenreports.com/public_conten...,81949,166984,1/9/2023,2023/2/14 9:25,0.46
1,Joe Biden,Voters,2/14/2023,1/3/2023,1/9/2023,Rasmussen Reports,B,1500.0,lv,0.13469,...,46.977828,48.201321,,T,https://www.rasmussenreports.com/public_conten...,81961,167024,1/10/2023,2023/2/14 9:25,0.46
2,Joe Biden,Voters,2/14/2023,1/5/2023,1/8/2023,Premise,,1642.0,rv,0.809922,...,42.374544,51.650266,,,https://projects.fivethirtyeight.com/polls/202...,81977,167102,1/13/2023,2023/2/14 9:25,0.4
3,Joe Biden,Voters,2/14/2023,1/5/2023,1/8/2023,Morning Consult,B,2000.0,rv,0.107462,...,41.792708,52.803786,,,https://morningconsult.com/2023/01/23/bidens-a...,82026,167374,1/23/2023,2023/2/14 9:25,0.43
4,Joe Biden,Voters,2/14/2023,1/5/2023,1/9/2023,Navigator Research,B/C,1000.0,rv,0.941366,...,45.24421,51.434311,,,https://navigatorresearch.org/nearly-three-in-...,81996,167157,1/18/2023,2023/2/14 9:25,0.46


In [29]:
# fix the time stamps and reorganize the data to combine approve and disapprove into one column
df2['timestamp']=pd.to_datetime(df2['timestamp'])
df2=pd.melt(df2, id_vars=['president', 'subgroup', 'timestamp'], value_vars=['approve','disapprove']).rename(columns={'variable':'choice', 'value':'rate'})
df2.head()

Unnamed: 0,president,subgroup,timestamp,choice,rate
0,Joe Biden,All polls,2023-02-14 09:24:00,approve,43.108745
1,Joe Biden,Adults,2023-02-14 09:24:00,approve,41.936438
2,Joe Biden,Voters,2023-02-14 09:25:00,approve,44.082005
3,Joe Biden,All polls,2023-02-13 19:23:00,approve,42.822155
4,Joe Biden,Adults,2023-02-13 19:24:00,approve,41.936438


# Visualization 1: Average Approval Ratings for Joe Biden

We will replicate the following visualization:
![alt text](https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw3/fig/1_1.jpg?raw=true)

**Description of the visualization (static):**
*   Use *df1* for this exercise
*   This visualization has 3 components: **bar chart**, **vertical line**, and **average value text** 
*   All 3 components share the same x axis, which display the *average* of approval percentage for Biden. 
*   The bar chart has a low opacity because we want to add interactions (see the next cell). 

**Description of the visualization (interactivity):**
1. When hovering over bars, the associated average approval rating will show up as tooltips.
![alt text](https://raw.githubusercontent.com/ruotongg/SI649_data/main/w6-raw/fig/1_2.gif?raw=true)
2. Brushing over the bars will change the opacity of the bars.
3. Brushing over the bars will generate different average approval value lines.
![alt text](https://raw.githubusercontent.com/ruotongg/SI649_data/main/w6-raw/fig/1_3.gif?raw=true)

**Sample style settings (optional):**
Here's a list of default style settings we used to generate the graph.
* Original opacity for the bar chart: 0.6. 
* Brushed opacity: 1
* Bar height = 15
* Vertical line size = 3, color = "firebrick"
* Text color='firebrick', fontSize=12, align='left', dx=7
* after building the compound chart, use the following line to disable border : `.configure_view(strokeWidth=0)`

Hint:
* We recommend getting all static components working before writing any interactivity. 
* Add one interaction at a time and test whether or not it works. 
* To add an interaction that's not tooltip and zooming, you need four steps (review in-class demo). 
* Selection is used in two scenarios: 1) to add to a *condition*, which is used in `encode`. 2) to add in `transform_filter`. In this visualization, you will implement both. Think through which you will use where before trying to build this.

In [30]:
##Static Component - Bars
avg_approve = alt.Chart(df1).transform_aggregate(
    groupby=['pollster'],
    mean_approve_percent='mean(approve_percent)'
).mark_bar(size=15).encode(
    alt.Y('pollster:N', axis=alt.Axis(title=None)),
    alt.X('mean_approve_percent:Q', axis=alt.Axis(title=None)),
    tooltip=alt.Text('mean_approve_percent:Q', format='.0%', title=None)
).properties(
    width=400,
    title='Average Approval Ratings for Joe Biden'
)

##Static Component - Vertical Line
brush=alt.selection_interval(encodings=["y"])
opacityCondition = alt.condition(brush, alt.OpacityValue(1), alt.OpacityValue(0.6))
avg_approve=avg_approve.add_params(brush).encode(
    opacity = opacityCondition
)

line = alt.Chart().mark_rule(color='firebrick',size=3).encode(
    x='mean(approve_percent):Q',
).transform_filter(
    brush
)

text = line.mark_text(
    fontSize=12,
    align='left', 
    dx=7, 
    color='firebrick'
).encode(
    alt.Text(
        'mean(approve_percent):Q',
        formatType='number',
        format='.2%')
)

##Put all together
alt.layer(avg_approve, line, text, data=df1).configure_view(
    strokeWidth=0
)

<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


# Visualization 2: Reported Approval Ratings for Biden

We will replicate the following visualization:
![alt text](https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw3/fig/2_1.jpg?raw=true)

**Description of the visualization (static):**
*   Use *df1* for this exercise
*   This visualization has 2 components: **scatterplot** and **bar chart** 
*   The scatter chart has a lightgray color because we want to add interactions (see the next cell). 

**Description of the visualization (interactivity):**
1. Brushing over points will change the color of the points.
![alt text](https://raw.githubusercontent.com/ruotongg/SI649_data/main/w6-raw/fig/2_2.gif?raw=true)
2. When brushing over the points, the associated bars will be filtered.
![alt text](https://raw.githubusercontent.com/ruotongg/SI649_data/main/w6-raw/fig/2_3.gif?raw=true)

**Sample style settings (optional):**
Here's a list of default style settings we used to generate the graph.
* Original color for the scatter chart: lightgray. 
* after building the compound chart, use the following line to disable border : `.configure_view(strokeWidth=0)`

Hint:
* Since there are multiple approval rating values for each pollster, we will use the average value of approval ratings in the bar chart.

In [31]:
# Create selection and condition
brush = alt.selection_interval(encodings=["x"])
colorCondition = alt.condition(brush, "pollster:N", alt.value("lightgray"))

# scatter plot
points = alt.Chart(df1).mark_point().encode(
    x=alt.X('startdate:T', axis=alt.Axis(title=None)),
    y=alt.Y('adjusted_approve:Q', axis=alt.Axis(title='Approve Rate of Joe Biden')),
    color=colorCondition
).add_params(
    brush
).properties(
    width=400
)

# bar plot
bars = alt.Chart(df1).mark_bar().encode(
    x=alt.X('mean(adjusted_approve):Q', axis=alt.Axis(title='Mean of approve rate')),
    y=alt.Y('pollster:N', axis=alt.Axis(title=None)),
    color='pollster:N'
).transform_filter(
    brush
).properties(
    width=400
)

# Put all together
(points & bars).properties(
    title='Recently Reported Approve Rate of Joe Biden'
).configure_title(
    anchor='start'
).configure_view(
    strokeWidth=0
)

<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


# Visualization 3: Approval Ratings for Joe Biden 2021-2023

We will replicate the following visualization:
![alt text](https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw3/fig/3_1.jpg?raw=true)

**Description of the visualization (static):**
*   Use *df2* for this exercise
*   This visualization has 4 components: **line chart**, **vertical line**, **points** and **texts** 

**Description of the visualization (interactive):**
1. Enable zooming and panning along the x-axis. (The gif below only displays the line chart.)
![alt text](https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw3/fig/3_2.gif?raw=true)
2. Display a vertical line that moves with the mouse. This will require you to add additional chart component(let's call it **vLine**).
![alt text](https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw3/fig/3_3.gif?raw=true)
3. Display the intersection of the **vLine** with the **line chart** as 2 circles (let's call these two circles **intersection dots**). 
4. When hovering over these **intersection dots**, display the *ratings* in text label.  
![alt text](https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw3/fig/3_4.gif?raw=true)

**Sample style settings (optional):**
Here's a list of default style settings we used to generate the graph.

* line chart size = 2.5, 
* vLine: size=4, color="lightgray", initial opacity = 0 
* indicator dot: size=90
* text label: fontSize=14, align='left', dx=7

**Hint**


* We only want to enable zooming and panning along the x-axis.
*  There are multiple ways of implementing the **vLine**. Here is one of them: 
> 1) use mark_rule to generate a line for every single data point and set these line's opacity to be 0.

> 2) when mouse hovering over a line, display it by changing its opacity. 

*  The implementation of the **intersection dots** is similar to that of the **vLine**. Do you need a new selection/condition for the **intersection dots**?

In [32]:
# Create a selection for zooming and panning across the x-axis
pan_x = alt.selection_interval(bind='scales', encodings=['x'])

# Create a selection and condition for the vertical line, annotation dots, and text annotations
nearest = alt.selection_point(nearest=True, on='mouseover', fields=['timestamp'], empty=False)

# Create the base chart and filter to All polls
line = alt.Chart(df2).mark_line(interpolate='basis', size=2.5).transform_filter(
    alt.datum.subgroup == 'All polls'
).encode(
    x=alt.X('timestamp:T', title=None),
    y='rate:Q',
    color='choice:N'
).add_params(pan_x)

# Vertical line
rules = alt.Chart(df2).mark_rule(color='lightgray', size=4).encode(
    x='timestamp:T',
).transform_filter(
    nearest
)

# interaction dots
selectors = alt.Chart(df2).mark_point().encode(
    x='timestamp:T',
    opacity=alt.value(0),
).add_params(
    nearest
)
points = line.mark_point(size=90).encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# interaction text labels
text = line.mark_text(align='left', dx=7, fontSize=14).encode(
    text=alt.condition(nearest, alt.Text('rate:Q', format='.2f'), alt.value(' '))
)

# Put them all together
alt.layer(
    line, points, selectors, rules, text
).properties(
    width=500
)


<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


# Visualization 4: Interactive smoothing (hosted on Huggingface using Panel)

![alt text](https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw3/fig/5_1.jpg?raw=true)

**Description of the visualization (static):**
* Use *df2* for this exercise
* The base chart should plot each poll's **approval** rating as a small grey point
* It should also plot the **smoothed** time series of approval ratings as a red line, using a moving window smoother

**Description of the visualization (interactive):**
* A dropdown box should allow the user to select between showing "Adults", "Voters", or "All polls"
* A date range slider should allow changing the beginning and end of the time series that is shown
* A slider should control the window size of the moving window, allowng the user to make it more or less smooth
* The min and max values for the y-axis should remain the same as the user changes the date slider

![alt text](https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw3/fig/panel.gif?raw=true)


**Sample style settings (optional):**
Here's a list of default style settings we used to generate the graph.
* The line is red with size=2
* The points are size grey, with size=2, and opacity=0.7
* The domain was set to [30, 60]

**Hints**
* To compute the moving average smoothing, you can use the `rolling()` function in Pandas
* To get the smoothed time series to line up with the points, you can use the `shift()` function
* To add the dropdown box and sliders, use the corresponding widgets in **Panel**
* Because you are using Panel, you can do calculations, like computing the moving average, outside of the Altair chart specification
  



In [33]:
# Import panel and vega datasets

import panel as pn

# Enable Panel extensions
pn.extension('vega')

# Define a function to create and return a plot

def create_plot(subgroup, date_range, moving_av_window):

    # Apply any required transformations to the data in pandas
    df = df2.loc[(df2['choice'] == 'approve') & (df2['subgroup'] == subgroup)]
    start_date = date_range[0]
    end_date = date_range[1]
    mask = (df['timestamp'].dt.date >= start_date) & (df['timestamp'].dt.date <= end_date)
    df = df.loc[mask]
    df['smoothed'] = df['rate'].rolling(window=moving_av_window).mean().shift(-(moving_av_window // 2))
    
    # Line chart
    line = alt.Chart(df).mark_line(
        color='red',
        size=2
    ).encode(
        alt.X('timestamp:T', axis=alt.Axis(title=None)),
        alt.Y('smoothed:Q', scale=alt.Scale(domain=[30,60]), axis=alt.Axis(title='mov_avg'))
    )

    # Scatter plot with individual polls
    scatter = alt.Chart(df).mark_point(
        size=2,
        color='grey',
        opacity=0.7
    ).encode(
        alt.X('timestamp:T'),
        alt.Y('mean(rate):Q', scale=alt.Scale(domain=[30,60]), axis=alt.Axis(title='approve'))
    )

    # Put them together
    plot = alt.layer(scatter, line)

    # Return the combined chart
    return plot

# Create the selection widget
subgroups=list(df2['subgroup'].unique())
subgroup_select = pn.widgets.Select(name='Select', options=subgroups)

# Create the slider for the date range
date_range_slider = pn.widgets.DateRangeSlider(
    name='Date Range Slider',
    start=dt.datetime(2021,1,26).date(),
    end=dt.datetime(2023,2,14).date(),
    value=(dt.datetime(2021,1,26).date(), dt.datetime(2023,2,14).date())
)

# Create the slider for the moving average window
moving_slider = pn.widgets.IntSlider(
    name='Moving average window',
    start = 1,
    end = 100,
    step = 1,
    value= 1
)

# Bind the widgets to the create_plot function
bond_plot = pn.bind(create_plot, subgroup_select, date_range_slider, moving_slider)

# # Combine everything in a Panel Column to create an app
dashboard = pn.Column(bond_plot, subgroup_select, date_range_slider, moving_slider)

# # set the app to be servable
dashboard.servable()


BokehModel(combine_events=True, render_bundle={'docs_json': {'692abf82-46f7-4ed5-9f16-91f1d04d2d74': {'version…

# Bonus: Deploy your completed visualization 4 as a Space on Huggingface

For 1 bonus point, set up a public Space on Huggingface, configured to use Panel.

Modify the `requirements.txt` file and the `app.py` file with your code.

Make sure the visualization works as intended there, and then *paste the URL of your space below*

Note: you will only get credit for this part if your visualization works and can be accessed by the teaching team at the time of grading.

<a href='https://huggingface.co/spaces/gaoyujia/SI649_hw3_viz4'>My Hugging Face Page</a>