# Task 3 — Interactive & Extended Vega-Lite Visualizations

This task extends the exploratory visualizations from **Task 2** by incorporating Vega-Lite’s interactive and compositional features to enable richer analytical exploration.  
The following enhanced visualizations build directly on earlier static versions and demonstrate interactivity through tooltips, highlighting, filtering, and multi-view composition.

In [None]:
import pandas as pd
import altair as alt
from pathlib import Path
from vega_datasets import data as vega_data

alt.data_transformers.disable_max_rows()
alt.renderers.enable('svg')

df = pd.read_csv("../data/data_science_job_posts_2025_clean.csv", low_memory=False)

## Interactive Visualization 1 — Skill Demand Ranking

**Goal**  
Reveal the most frequently requested skills in job postings and let users explore different “Top-N” cutoffs and focus on specific skills.

**Vega-Lite Design & Interactivity**  
- **Mark:** Horizontal bars (`mark_bar`).  
- **Tooltip:** shows skill and its count.  
- **Params & Selections:**  
  - `TopN` — a **slider** (5–50) that limits the ranking via `transform_window(rank)`  
  - `sel` — a **click selection** that highlights the chosen skill (blue) and dims others (gray).

**How to Use**  
- Move the **Top-N** slider to show the top 5–50 skills.  
- **Click** any bar to highlight a specific skill.

In [5]:
skills = (
    df.assign(skills=df["skills"].apply(lambda s: [x.strip(" '\"") for x in s.strip("[]").split(",")]))
      .explode("skills")
)

skill_counts = (
    skills.loc[skills["skills"] != "", "skills"]
          .value_counts()
          .rename_axis("skill")
          .reset_index(name="num_postings")
)

topN = alt.param(name="TopN", value=15, bind=alt.binding_range(name="Top N", min=5, max=50, step=5))
sel  = alt.selection_point(fields=["skill"], on="click", clear="true")

chart =(
    alt.Chart(skill_counts, title="Skill Demand Ranking (interactive)")
      .transform_window(rank="rank(num_postings)")
      .transform_filter(alt.datum.rank <= topN)
      .mark_bar()
      .encode(
          y=alt.Y("skill:N", sort="-x", title=None),
          x=alt.X("num_postings:Q", title="Count in Job Postings"),
          tooltip=["skill:N", alt.Tooltip("num_postings:Q", title="Count")],
          color=alt.condition(sel, alt.value("#1f77b4"), alt.value("#bbbbbb"))
      )
      .add_params(topN, sel)
      .properties(width=800, height=550)
    #   .save("../visualizations/vi_1_interactive_topN_skill_demand_ranking.png")
)
chart


## Interactive Visualization 2 — Median Salary by U.S. State

**Goal**  
Expose geographic patterns in compensation and posting volume, with controls to filter low-sample states.

**Vega-Lite Design & Interactivity**   
- **Params & Controls:**  
  - `MinN` — **slider** to require a minimum number of postings per state.  
  - `Metric` — **radio buttons** to **toggle encoding** between *Median Salary* and *# Postings*.  
- **Mark:** `mark_geoshape` with white borders; AlbersUSA projection.  
- **Tooltip:** state name, median salary, and posting count.

**How to Use**  
- Increase **Min # postings** to focus on states with stronger sample sizes.  
- Switch **Color by** to see either **pay** or **demand concentration**.

In [6]:
# Aggregate
state_med = (
    df.groupby(["state", "fips_int"], as_index=False)
      .agg(median_salary=("salary_mid", "median"),
           n=("salary_mid", "count"))
      .dropna(subset=["fips_int"])
      .assign(fips_int=lambda d: d["fips_int"].astype(int))
)

# Controls
minN   = alt.param("MinN", value=2, bind=alt.binding_range(name="Min # postings", min=1, max=200, step=1))
metric = alt.param("Metric", value="Median Salary",
                   bind=alt.binding_radio(name="Color by:", options=["Median Salary", "# Postings"]))

# Base map + lookup
base = (
    alt.Chart(alt.topo_feature(vega_data.us_10m.url, "states"))
      .transform_lookup(
          lookup="id",
          from_=alt.LookupData(state_med, key="fips_int", fields=["state", "median_salary", "n"])
      )
      .transform_filter(alt.datum.n >= minN)
      .project(type="albersUsa")
      .properties(width=800, height=550)
)

# Two layers toggled by radio param
(
    base.mark_geoshape(stroke="white", strokeWidth=0.5)
        .transform_filter("Metric == 'Median Salary'")
        .encode(
            color=alt.Color("median_salary:Q", title="Median Salary (USD)", scale=alt.Scale(scheme="blues")),
            tooltip=["state:N", alt.Tooltip("median_salary:Q", title="Median", format="$.0f"), alt.Tooltip("n:Q", title="# postings")]
        )
  +
    base.mark_geoshape(stroke="white", strokeWidth=0.5)
        .transform_filter("Metric == '# Postings'")
        .encode(
            color=alt.Color("n:Q", title="# Postings", scale=alt.Scale(scheme="greens")),
            tooltip=["state:N", alt.Tooltip("median_salary:Q", title="Median", format="$.0f"), alt.Tooltip("n:Q", title="# postings")]
        )
).add_params(minN, metric).properties(
    title=alt.TitleParams("Interactive Map by State", anchor="start", fontSize=14)
# ).save("../visualizations/vi_2_interactive_median_salary_by_state.png")
)