# Applied Data Visualization – Homework 5
*https://www.dataviscourse.net/2024-applied/*


In this homework we will create charts using Vega-Altair. 



## Your Info and Submission Instructions

* *First name: Logan*
* *Last name: Correa*
* *Email: u1094034@umail.utah.edu*
* *UID: u1094034*



For your submission, please do the following things: 
* **rename the file to `HW5_lastname.ipynb`**
* **include all files that you need to run the homework, including the data file provided** 
* **don't use absolute paths, but usea relative path to the same directory for referencing data**

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Need for this homework
import altair as alt

plt.style.use('default')
# This next line tells jupyter to render the images inline
%matplotlib inline
import matplotlib_inline
# This renders your figures as vector graphics AND gives you an option to download a PDF too
matplotlib_inline.backend_inline.set_matplotlib_formats('svg', 'pdf')

# Part 1: Avalanche Calendar

In this assignment, we will create an interactive visualization using Vega-Altair that has two linked views:
1. A calendar-like heatmap (day x month) of average number of avalanches by date, and
2. A bar chart of total avalanche count by year.

Chart requirements:
 - The bar chart should display the counts across all dates by default, but should be filtered to only the selected date (or dates) by clicking on the heatmap. 
 - When hovering over a date, a tooltip should appear with the date and the value (average number of avalanches on that date).

See the video below for an example of interaction:

![An example of output interactivity](calendar_example.gif)

Hints:
- Similar to HW 2, you will need to create a data set with *all valid dates*, not only those that appear in the data. We need to account for zeros!
- Highly recommend browsing the Vega-Altair example gallery for help: https://altair-viz.github.io/gallery/index.html

In [3]:
# Read in data
avy_df = pd.read_csv('./avalanches.csv')

# Convert dates to the correct format
avy_df['Date'] = pd.to_datetime(avy_df['Date'])

# Filter out 2009, it's incomplete
avy_df['Year'] = avy_df['Date'].dt.year.astype('Int64')
avy_df = avy_df[avy_df['Year']>2009]

In [4]:
# Convert Date column to datetime format for proper grouping and handling
avy_df['Date'] = pd.to_datetime(avy_df['Date'], errors='coerce')

# Group by Date and count the number of avalanches per day
avalanches_per_day = avy_df.groupby('Date').size().reset_index(name='avalanche_count')

# Generate a complete date range from the first to the last date in the dataset
full_date_range = pd.DataFrame(pd.date_range(start=avalanches_per_day['Date'].min(),
                                             end=avalanches_per_day['Date'].max()),
                               columns=['Date'])

# Merge the complete date range with the avalanche counts, filling missing dates with 0
complete_data = pd.merge(full_date_range, avalanches_per_day, on='Date', how='left').fillna(0)

# Convert the 'avalanche_count' to integer type
complete_data['avalanche_count'] = complete_data['avalanche_count'].astype(int)

# rename Date to date and avalanche_count to temp_max
complete_data = complete_data.rename(columns={'Date': 'date'})

#complete_data.head()

In [5]:
brush_selection = alt.selection_interval()

# Define the heatmap
heatmap = alt.Chart(complete_data, title="Average Avalanche Count by Date").mark_rect().encode(
    alt.X("date(date):O").title("Day").axis(format="%e", labelAngle=0),  # Format for day
    alt.Y("month(date):O").title("Month"),
    color=alt.condition(
        brush_selection, 
        alt.Color("mean(avalanche_count):Q", title="Avg").legend(None), 
        alt.value("lightgray")
    ),
    opacity=alt.condition(brush_selection, alt.value(1), alt.value(.7)),
    tooltip=[
        alt.Tooltip("monthdate(date)", title="Date"),
        alt.Tooltip("mean(avalanche_count)", title="Avg"),  # Average avalanche count in tooltip
    ]
).properties(
    width=400,
    height=200
).add_params(
    brush_selection
)

# Define the bar chart, filtered by the selected date(s) from the heatmap
barchart = alt.Chart(complete_data, title="Avalanches by Year").mark_bar().encode(
    alt.X("year(date):O").title("Year"),  # Group by year
    alt.Y("sum(avalanche_count):Q").title("Avalanche Count")  # Sum of avalanche count by year
).properties(
    width=300,
    height=200
).transform_filter(
    brush_selection
)
# Combine heatmap and bar chart side by side
heatmap | barchart

# Part 2: Bubble Chart, Revisited

For this assignment we are recreating the Bubble Chart from Homework 3 but this time using exclusively Vega-Altair. 

A refresher on the requirements:
- Each `Discipline` bubble and label should be colored according to the `Sport` variable. You can pick your own colors, as long as they are discernable.
- Each bubble's size should depend on the number of gold medals awarded. (This can be calculated as the number of unique `Event`-`Gender` pairs in the data set.)
- There should be a label noting that 1940 and 1944 Olympic games were not held (due to World War II).

Plus additional requirement:
- When hovering over a bubble, a tooltip with all the underlying data should appear.

![A bubble grid chart of medals for winter olympics](bubble_chart.svg)

We are giving you the code necessary to prepare the data set, since you have done that already for HW3. This is primarily an exercise in precise formatting using Vega-Altair.

Hints:
- Notice that we converted the `Year` variable to a datetime. Think about how you can leverage that in your encoding.
- There is a variety of ways to create an annotation box. One way to do it is to create a "dummy" DataFrame that you use to `.encode()` your `.mark_rect()` and `.mark_text()`. Another could be to use the `alt.datum()` command (see https://altair-viz.github.io/user_guide/encodings/index.html#datum-and-value).
- Check out `.mark_rect()` properties here: https://altair-viz.github.io/user_guide/marks/rect.html
- Again recommending the Vega-Altair example gallery for help: https://altair-viz.github.io/gallery/index.html

In [21]:
medals_df = pd.read_csv('./winter.csv')

# Convert Year to a datetime
medals_df['Year'] = medals_df['Year'].apply(lambda x: pd.to_datetime(f"{x}-01-01"))

# Concatenate Gender & Event to get unique gender/event variable
medals_df['Gender_Event'] = medals_df['Gender'] + medals_df['Event']

# Count the number of unique events in every year-discipline
medals_df_grouped = (
    medals_df
    .groupby(['Year', 'City', 'Sport', 'Discipline', 'Country', 'Medal'])
    .agg(Count = ('Gender_Event', 'nunique'))
    .reset_index()
)

#display(medals_df_grouped)

In [50]:
# Filter the data to show only gold medals
gold_medals = medals_df_grouped[medals_df_grouped['Medal'] == 'Gold']

# Create a basic bubble chart
bubble_chart = alt.Chart(gold_medals).mark_circle(opacity=1).encode(
    x=alt.X('year(Year):T', title='Year'),  # Use year from the Year datetime
    y=alt.Y('Discipline:N', title='Discipline',  # Discipline on the Y-axis
        sort=alt.EncodingSortField(field='Sport', order='ascending')).axis(domain=False, ticks=False),  # Sort disciplines by Sport
    size=alt.Size('Count:Q', title='Number of Gold Medals').legend(symbolOpacity=0.3),  # Size based on the number of gold medals
    color=alt.Color('Sport:N', title='Sport'),  # Color by Sport
    tooltip=[  # Tooltip with all the relevant details
        alt.Tooltip('City:N', title='City'),
        alt.Tooltip('Sport:N', title='Sport'),
        alt.Tooltip('Discipline:N', title='Discipline'),
        alt.Tooltip('Country:N', title='Country'),
        alt.Tooltip('Count:Q', title='Gold Medals')
    ]
).properties(
    width=800,
    height=400,
    title="Winter Olympics Gold Medals",
    view=alt.ViewConfig(stroke=None)
)

# Create a rectangle for the years when the Olympics were not held (1940 and 1944)
rect = alt.Chart(pd.DataFrame({
    'Year': ['1939-01-01'],
    'Year_End': ['1945-01-01'],  # End of each year range
    'Note': ['Games Not Held']
})).mark_rect(color='lightgray').encode(
    x=alt.X('year(Year):O'),
    x2=alt.X2('year(Year_End):O')
)

# Add text in the center of the rectangle
text = alt.Chart(pd.DataFrame({
    'Year': ['1941-01-01'],
    'Note': ['Games Not Held']
})).mark_text(align='center', baseline='middle', fontSize=8).encode(
    x=alt.X('year(Year):O'),
    y=alt.value(200),  # Position in the center of the chart height
    text='Note:N',
    color=alt.value('black')
)

# Combine bubble chart with the annotations
bubble_chart + rect + text


# Bonus: Bubble Chart Interactivity

For bonus points, add any non-trivial interactivity to the Bubble Chart from Part 2. See our second Altair lecture for inspiration.

Options include brushing with another linked view, a widget that modifies chart's appearance, a widget that filters the underlying data, etc.

# Grading Scheme

* Part 1: 5 points
* Part 2: 5 points
* Bonus: 2 points