## VI LAB 2 

## Data collection and preparation

For this project, the data was obtained directly from the NSF Award Search portal, which is the official source used by the National Science Foundation to publish information about funded grants (referred to administratively as “awards”). This source was chosen because it provides all the attributes required by the project specification, including state, directorate, award dates, and awarded amounts, and because it allows filtering by both time period and award status.

Two datasets were collected to fully satisfy the project requirements. The first dataset contains all NSF grants awarded during the last five years (2020–2024) and serves as the baseline for analyzing current funding distribution and evolution. The second dataset contains NSF grants that were explicitly terminated during the Trump administration (2017–2021), filtered using the “terminated” award status. This separation is intentional and necessary, as the project explicitly requires analyzing both recent grants and historical cancellations from a different political period.

Due to export limitations of the NSF portal, the 2020–2024 dataset was downloaded in multiple smaller time ranges and later merged. This approach ensured complete coverage while preserving data integrity and consistency.

## Data cleaning

All major data cleaning was performed in OpenRefine to keep the Python notebook focused on visualization rather than preprocessing. In OpenRefine, column names were standardized across datasets, unnecessary administrative fields were removed, and monetary values were converted to numeric format. A derived year attribute was created from the award start date to support temporal analysis. Additionally, a categorical flag (cancelled_trump) was introduced to clearly distinguish between baseline grants and Trump-era terminated grants.

After cleaning, the datasets were exported as clean CSV files and loaded into the Python notebook. Only minimal preprocessing was performed in Python, consisting of type checks, column name normalization, and the creation of aggregated DataFrames for each visualization task.

In [143]:
import pandas as pd
import altair as alt

# Performance: required by the project (datasets > 5000 rows)
alt.data_transformers.enable("vegafusion")


DataTransformerRegistry.enable('vegafusion')

In [144]:
# load datasets

base_path = "."

df_grants = pd.read_csv(
    f"{base_path}/NSF_Grants_Last5Years_Clean.csv"
)

df_trump = pd.read_csv(
    f"{base_path}/trump17-21-csv.csv"
)


In [145]:
# Ensure correct dtypes
df_grants["year"] = df_grants["year"].astype(int)
df_grants["award_amount"] = pd.to_numeric(df_grants["award_amount"], errors="coerce")

df_trump["year"] = df_trump["year"].astype(int)
df_trump["award_amount"] = pd.to_numeric(df_trump["award_amount"], errors="coerce")


In [146]:
# Drop rows with critical missing values
df_grants = df_grants.dropna(subset=["state", "directorate", "year"])
df_trump = df_trump.dropna(subset=["directorate"])


In [147]:
year_selection = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(
        options=sorted(df_grants["year"].unique()),
        name="Year: "
    ),
    value=sorted(df_grants["year"].unique())[0]
)

state_selection = alt.selection_point(
    fields=["state"],
    bind=alt.binding_select(
        options=sorted(df_grants["state"].unique()),
        name="State: "
    )
)


## Q1: How are the grants distributed by states every year?

In [148]:
import altair as alt
import pandas as pd

# 0. ENSURE RENDERER IS CORRECT (Fixes the error)
alt.data_transformers.enable("default")

# 1. DATA PREPARATION
q1_df = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# 2. CREATE SELECTIONS
# Define the missing year_selection
year_selection = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(
        options=sorted(q1_df["year"].unique()), name="Select Year: "
    ),
    value=[{"year": 2021}],  
)

state_click = alt.selection_point(fields=["state"], empty="all")

# 3. CREATE CHARTS
q1_bars = (
    alt.Chart(q1_df)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title="State"),
        y=alt.Y("grants_count:Q", title="Number of grants"),
        color=alt.condition(
            state_click,
            alt.Color(
                "grants_count:Q", scale=alt.Scale(scheme="blues"), title="Grants count"
            ),
            alt.value("lightgray"),
        ),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("grants_count:Q", title="Grants"),
            alt.Tooltip("total_amount:Q", title="Total amount ($)", format=",.0f"),
        ],
    )
    .add_params(year_selection, state_click)
    .transform_filter(year_selection)
    .properties(width=750, height=380, title="Q1 — Grants by State")
)

q1_state_trend = (
    alt.Chart(q1_df)
    .mark_line(point=True)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("grants_count:Q", title="Grants"),
        tooltip=[
            alt.Tooltip("state:N"),
            alt.Tooltip("year:O"),
            alt.Tooltip("grants_count:Q", title="Grants"),
            alt.Tooltip("total_amount:Q", title="Total amount ($)", format=",.0f"),
        ],
    )
    .transform_filter(state_click)
    .properties(width=750, height=180, title="Selected State — Grants over Time")
)

# 4. DISPLAY
(q1_bars & q1_state_trend)

In [149]:
import altair as alt
import pandas as pd
from vega_datasets import data

# 0. FIX THE RENDERER (Crucial for avoiding errors)
alt.data_transformers.enable("default")

# 1. PREPARE THE DATA
# We ensure the aggregation is correct
q1_df = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# 2. DEFINE THE MAP DATA
# We need to map State Abbreviations (AK, AL) to FIPS Codes (02, 01) for the map to work.
state_to_fips = {
    "WA": "53",
    "DE": "10",
    "DC": "11",
    "WI": "55",
    "WV": "54",
    "HI": "15",
    "FL": "12",
    "WY": "56",
    "PR": "72",
    "NJ": "34",
    "NM": "35",
    "TX": "48",
    "LA": "22",
    "NC": "37",
    "ND": "38",
    "NE": "31",
    "TN": "47",
    "NY": "36",
    "PA": "42",
    "AK": "02",
    "NV": "32",
    "NH": "33",
    "VA": "51",
    "CO": "08",
    "CA": "06",
    "AL": "01",
    "AR": "05",
    "VT": "50",
    "IL": "17",
    "GA": "13",
    "IN": "18",
    "IA": "19",
    "MA": "25",
    "AZ": "04",
    "ID": "16",
    "CT": "09",
    "ME": "23",
    "MD": "24",
    "OK": "40",
    "OH": "39",
    "UT": "49",
    "MO": "29",
    "MN": "27",
    "MI": "26",
    "RI": "44",
    "KS": "20",
    "MT": "30",
    "MS": "28",
    "SC": "45",
    "KY": "21",
    "OR": "41",
    "SD": "46",
}

# Create a lookup dataframe
fips_df = pd.DataFrame(list(state_to_fips.items()), columns=["state", "id"])
fips_df["id"] = fips_df["id"].astype(int)  # Ensure ID is integer to match topojson

# 3. AGGREGATE DATA BY STATE FIRST (for the map - sum across all years)
# This gives us one row per state with total funding
q1_map_agg = (
    q1_df.groupby("state")
    .agg(total_amount=("total_amount", "sum"))
    .reset_index()
)

# Merge FIPS IDs into aggregated data
q1_map_data_agg = q1_map_agg.merge(fips_df, on="state", how="inner")

# Keep the full data with FIPS for the trend chart
q1_map_data_full = q1_df.merge(fips_df, on="state", how="inner")

# 4. CREATE THE INTERACTION (Compatible with Altair 4 & 5)
# "Clicking a state selects it"
try:
    # Try new syntax first
    state_select = alt.selection_point(fields=["id"], empty="all")
except AttributeError:
    # Fallback for older Altair versions
    state_select = alt.selection_single(fields=["id"], empty="all")

# 5. DRAW THE MAP (Overview)
# We use a standard US map topology
us_states = alt.topo_feature(data.us_10m.url, "states")

map_chart = (
    alt.Chart(us_states)
    .mark_geoshape(stroke="white", strokeWidth=0.5)
    .transform_lookup(
        lookup="id",
        from_=alt.LookupData(q1_map_data_agg, "id", ["total_amount", "state"]),
        default=0  # Default value for states without data
    )
    .encode(
        color=alt.Color(
            "total_amount:Q", 
            scale=alt.Scale(scheme="blues"), 
            title="Total Funding ($)",
            legend=alt.Legend(format="$,.0f")
        ),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("total_amount:Q", title="Total Funding", format="$,.0f")
        ],
    )
    .add_params(state_select)
    .project(type="albersUsa")
    .properties(
        width=700, height=400, title="Q1: Funding by State (Click to Filter Trend)"
    )
)

# 6. DRAW THE TREND LINE (Detail)
# Shows the trend for the SELECTED state
# Filter the trend data based on the selected state's id
trend_chart = (
    alt.Chart(q1_map_data_full)
    .mark_line(point=True, strokeWidth=2)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("total_amount:Q", title="Total Amount ($)", axis=alt.Axis(format="$,.0f")),
        color=alt.value("lightblue"),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("total_amount:Q", title="Total Amount", format="$,.0f")
        ],
    )
    .transform_filter(state_select)  # This will filter by the id field in the selection
    .add_params(state_select)
    .properties(width=700, height=200, title="Funding Trend for Selected State")
)

# 7. COMBINE
q1_v2 = map_chart & trend_chart
q1_v2

In [150]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION
q1_df = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# 2. CREATE SELECTIONS
min_year = int(q1_df["year"].min())
max_year = int(q1_df["year"].max())

slider = alt.binding_range(min=min_year, max=max_year, step=1, name="Select Year: ")

try:
    # Modern Altair
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=slider, value=[{"year": max_year}]
    )
    state_select = alt.selection_point(
        name="state_select", fields=["state"], empty="all"
    )
except AttributeError:
    # Older Altair
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=slider, init={"year": max_year}
    )
    state_select = alt.selection_single(
        name="state_select", fields=["state"], empty="all"
    )

# 3. CHART A: MAIN BAR CHART
bars = (
    alt.Chart(q1_df)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title="State"),
        y=alt.Y("grants_count:Q", title="Number of Grants"),
        color=alt.condition(
            state_select,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#f0f0f0"),  # Very light gray for unselected
        ),
        tooltip=[
            alt.Tooltip("state:N"),
            alt.Tooltip("grants_count:Q"),
            alt.Tooltip("total_amount:Q", format="$,.0f"),
        ],
    )
    .transform_filter(year_select)
    .properties(width=550, height=400, title="Grants Distribution")
)

# 4. CHART B: KPI TEXT (Subtle & Professional)
# We layer two text marks: one for the label, one for the number
base_text = (
    alt.Chart(q1_df).transform_filter(year_select).transform_filter(state_select)
)

# Layer 1: The Label "Total Funding"
label = base_text.mark_text(
    align="center", color="#888", fontSize=14, dy=-15  # Light gray  # Move up slightly
).encode(text=alt.value("Total Funding"), y=alt.value(200), x=alt.value(100))

# Layer 2: The Value (The Number)
value = base_text.mark_text(
    align="center",
    color="#444",  # Darker gray (but not black)
    fontSize=24,  # Smaller than 40
    fontWeight="bold",
    dy=15,  # Move down slightly
).encode(
    text=alt.Text("sum(total_amount):Q", format="$,.0f"),
    y=alt.value(200),
    x=alt.value(100),
)

kpi_section = (label + value).properties(width=200, height=400)

# 5. DISPLAY (Combine and Attach Slider)
# Attaching add_params to the final object puts the slider at the bottom
q1_v3 = (
    (bars | kpi_section)
    .add_params(year_select, state_select)
    .resolve_scale(color="independent")
)

q1_v3

In [151]:
import altair as alt
import pandas as pd
from vega_datasets import data

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION
q1_df = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# 2. MAP DATA PREPARATION
state_to_fips = {
    "WA": "53",
    "DE": "10",
    "DC": "11",
    "WI": "55",
    "WV": "54",
    "HI": "15",
    "FL": "12",
    "WY": "56",
    "PR": "72",
    "NJ": "34",
    "NM": "35",
    "TX": "48",
    "LA": "22",
    "NC": "37",
    "ND": "38",
    "NE": "31",
    "TN": "47",
    "NY": "36",
    "PA": "42",
    "AK": "02",
    "NV": "32",
    "NH": "33",
    "VA": "51",
    "CO": "08",
    "CA": "06",
    "AL": "01",
    "AR": "05",
    "VT": "50",
    "IL": "17",
    "GA": "13",
    "IN": "18",
    "IA": "19",
    "MA": "25",
    "AZ": "04",
    "ID": "16",
    "CT": "09",
    "ME": "23",
    "MD": "24",
    "OK": "40",
    "OH": "39",
    "UT": "49",
    "MO": "29",
    "MN": "27",
    "MI": "26",
    "RI": "44",
    "KS": "20",
    "MT": "30",
    "MS": "28",
    "SC": "45",
    "KY": "21",
    "OR": "41",
    "SD": "46",
}
fips_df = pd.DataFrame(list(state_to_fips.items()), columns=["state", "id"])
fips_df["id"] = fips_df["id"].astype(int)
q1_full = q1_df.merge(fips_df, on="state", how="inner")

# 3. INTERACTION SETUP
min_year = int(q1_full["year"].min())
max_year = int(q1_full["year"].max())
slider = alt.binding_range(min=min_year, max=max_year, step=1, name="Select Year: ")

try:
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=slider, value=[{"year": max_year}]
    )
    state_select = alt.selection_point(name="state_select", fields=["id"], empty="all")
except AttributeError:
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=slider, init={"year": max_year}
    )
    state_select = alt.selection_single(name="state_select", fields=["id"], empty="all")

# 4. CHART A: THE MAP (Smaller & Left)
us_states = alt.topo_feature(data.us_10m.url, "states")

map_base = (
    alt.Chart(us_states)
    .mark_geoshape(fill="lightgray", stroke="white")
    .project(type="albersUsa")
)

map_layer = (
    alt.Chart(us_states)
    .mark_geoshape(stroke="white")
    .transform_lookup(
        lookup="id",
        from_=alt.LookupData(q1_full, "id", ["total_amount", "state", "year"]),
    )
    .transform_filter(year_select)
    .encode(
        color=alt.Color(
            "total_amount:Q", scale=alt.Scale(scheme="blues"), title="Funding ($)"
        ),
        tooltip=["state:N", alt.Tooltip("total_amount:Q", format="$,.0f")],
    )
    .project(type="albersUsa")
)

the_map = (map_base + map_layer).properties(
    width=300, height=350, title="Geographic Overview"  # REDUCED WIDTH (was 500)
)

# 5. CHART B: THE BAR CHART (Larger & Right)
the_bars = (
    alt.Chart(q1_full)
    .mark_bar()
    .encode(
        x=alt.X("total_amount:Q", title="Total Funding ($)"),
        y=alt.Y("state:N", sort="-x", title="State"),
        color=alt.condition(
            state_select,
            alt.Color("total_amount:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("lightgray"),
        ),
        tooltip=["state", "total_amount"],
    )
    .transform_filter(year_select)
    .properties(
        width=400,  # INCREASED WIDTH (was 200)
        height=350,
        title="Ranked Funding by State",
    )
)

# 6. CHART C: THE EVOLUTION (Uniform Color)
the_trend = (
    alt.Chart(q1_full)
    .mark_line(point=True)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("total_amount:Q", title="Total Funding ($)"),
        color=alt.value("steelblue"),  # UNIFORM COLOR (Fixed Blue)
        tooltip=["state", "year", alt.Tooltip("total_amount", format="$,.0f")],
    )
    .transform_filter(state_select)
    .properties(
        width=750,  # Matches the sum of top charts approx (300+400 + padding)
        height=200,
        title="Evolution of Funding (Selected State)",
    )
)

# 7. FINAL DASHBOARD
q1_v4 = (
    ((the_bars | the_map) & the_trend)
    .add_params(year_select, state_select)
    .resolve_scale(color="independent")
)

q1_v4

In [152]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION
# We need detailed data for the trend (state + year), which you already have in q1_df
q1_df = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# 2. CREATE SELECTIONS
min_year = int(q1_df["year"].min())
max_year = int(q1_df["year"].max())

slider = alt.binding_range(min=min_year, max=max_year, step=1, name="Select Year: ")

try:
    # Modern Altair
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=slider, value=[{"year": max_year}]
    )
    state_select = alt.selection_point(
        name="state_select", fields=["state"], empty="all"
    )
except AttributeError:
    # Older Altair
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=slider, init={"year": max_year}
    )
    state_select = alt.selection_single(
        name="state_select", fields=["state"], empty="all"
    )

# 3. LEFT COLUMN: MAIN BAR CHART
bars = (
    alt.Chart(q1_df)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title="State"),
        y=alt.Y("grants_count:Q", title="Number of Grants"),
        color=alt.condition(
            state_select,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#f0f0f0"),  # Light gray for unselected
        ),
        tooltip=[
            alt.Tooltip("state:N"),
            alt.Tooltip("grants_count:Q"),
            alt.Tooltip("total_amount:Q", format="$,.0f"),
        ],
    )
    .transform_filter(year_select)
    .properties(width=500, height=400, title="Grants Distribution (Click a bar)")
)

# 4. RIGHT COLUMN: EVOLUTION + KPI

# A. Evolution Chart (Top Right)
# Shows the trend for the selected state over ALL years
trend_chart = (
    alt.Chart(q1_df)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),  # Ordinal to show specific years
        y=alt.Y("total_amount:Q", title="Total Amount ($)", axis=alt.Axis(format="~s")),
        color=alt.value("#4c78a8"),  # Blue to match the bar chart color scheme
        tooltip=[alt.Tooltip("year:O"), alt.Tooltip("total_amount:Q", format="$,.0f")],
    )
    .transform_filter(state_select)  # This is the magic: Filter by the click!
    .properties(
        width=250,
        height=180,  # Half the height of the main chart roughly
        title="History (Selected State)",
    )
)

# B. KPI Text (Bottom Right)
base_text = (
    alt.Chart(q1_df).transform_filter(year_select).transform_filter(state_select)
)

# Layer 1: Label
label = base_text.mark_text(align="center", color="#888", fontSize=14, dy=-15).encode(
    text=alt.value("Total Funding (Selected Year)"), y=alt.value(75), x=alt.value(110)
)

# Layer 2: Value
value = base_text.mark_text(
    align="center",
    color="#444",
    fontSize=24,
    fontWeight="bold",
    dy=15,
).encode(
    text=alt.Text("sum(total_amount):Q", format="$,.0f"),
    y=alt.value(75),  # Centered vertically in its box
    x=alt.value(110),  # Centered horizontally
)

kpi_section = (label + value).properties(width=225, height=155)

# 5. ASSEMBLE
# Right column is Trend on top of KPI
right_col = trend_chart & kpi_section

# Final is Bars on left of Right Column
q1_v5 = (
    (bars | right_col)
    .add_params(year_select, state_select)
    .resolve_scale(color="independent")
)

q1_v5

In [186]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION

# A. Yearly Data (Specific Years)
q1_yearly = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# B. Global Data (Year 0 = "All Time")
q1_total = (
    df_grants.groupby(["state"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)
q1_total["year"] = 0  # Assign 0 for the aggregate

# C. Combine
q1_full = pd.concat([q1_yearly, q1_total], ignore_index=True)


# 2. INTERACTION SETUP
# Get list of years from data + Add 0
years = sorted(q1_yearly["year"].unique())
year_options = [0] + years
year_labels = ["All Years (Total)"] + [str(y) for y in years]

input_element = alt.binding_select(
    options=year_options, labels=year_labels, name="Select Year: "
)

try:
    # Modern Altair
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=input_element, value=[{"year": 0}]
    )
    state_select = alt.selection_point(
        name="state_select", fields=["state"], empty="all"
    )
except AttributeError:
    # Older Altair
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=input_element, init={"year": 0}
    )
    state_select = alt.selection_single(
        name="state_select", fields=["state"], empty="all"
    )


# 3. LEFT COLUMN: MAIN BAR CHART
# Uses q1_full to show either specific year stats or All-Time totals
bars = (
    alt.Chart(q1_full)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title="State"),
        y=alt.Y("grants_count:Q", title="Number of Grants"),
        color=alt.condition(
            state_select,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#f0f0f0"),  # Light gray for unselected
        ),
        tooltip=[
            alt.Tooltip("state:N"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("grants_count:Q", title="Grants"),
            alt.Tooltip("total_amount:Q", format="$,.0f", title="Total Amount"),
        ],
    )
    .add_params(year_select, state_select)
    .transform_filter(year_select)  # <--- Listens to Dropdown
    .properties(width=500, height=400, title="Grants Distribution (Click a bar)")
)


# 4. RIGHT COLUMN: EVOLUTION + KPI

# A. Evolution Chart (Top Right)
# Uses q1_yearly ONLY (excludes Year 0 so it doesn't plot a weird point)
trend_chart = (
    alt.Chart(q1_yearly)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("total_amount:Q", title="Total Amount ($)", axis=alt.Axis(format="~s")),
        color=alt.value("#4c78a8"),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("year:O"),
            alt.Tooltip("total_amount:Q", format="$,.0f"),
        ],
    )
    .transform_filter(state_select)  # Listens to Click
    .properties(
        width=250,
        height=180,
        title="History (Selected State)",
    )
)

# B. KPI Text (Bottom Right)
# Uses q1_full to show correct totals based on dropdown
base_text = (
    alt.Chart(q1_full).transform_filter(year_select).transform_filter(state_select)
)

# Layer 1: Label
label = base_text.mark_text(align="center", color="#888", fontSize=14, dy=-15).encode(
    text=alt.value("Total Funding (Selected Year)"), y=alt.value(75), x=alt.value(110)
)

# Layer 2: Value
value = base_text.mark_text(
    align="center",
    color="#444",
    fontSize=24,
    fontWeight="bold",
    dy=15,
).encode(
    text=alt.Text("sum(total_amount):Q", format="$,.0f"),
    y=alt.value(75),
    x=alt.value(110),
)

kpi_section = (label + value).properties(width=225, height=155)


# 5. ASSEMBLE
right_col = trend_chart & kpi_section

final_q1 = (
    (bars | right_col).resolve_scale(color="independent")
)

final_q1

To analyze grant distribution by state, I implemented a **composite dashboard** centered on a sorted bar chart. A bar chart was chosen over a choropleth map for the primary view because it allows for precise ranking and direct comparison of grant magnitudes, which are often obscured by geography in map views.

The design follows Shneiderman’s mantra: the bars provide the **overview** for the selected year. The **filtering** mechanism (year slider) enables temporal exploration, allowing users to observe shifts in distribution over time. **Details-on-demand** are achieved through linking: clicking a specific state isolates it visually (using a "focus+context" gray/blue color scheme) and triggers the side panels.

The right-hand column adds critical context: the **trend line** reveals the selected state's 5-year funding trajectory (evolution), while the **KPI text** provides the precise financial figure ($), bridging the gap between abstract patterns and exact data. This structure answers the question by showing both the relative standing of states and their individual historical performance.

## Q2: How are the grants distributed per directorates? And for a certain year?

In [154]:
# Q2 aggregation: grants per directorate per year
q2_df = (
    df_grants
    .groupby(["directorate", "year"])
    .agg(
        grants_count=("award_id", "count"),
        total_amount=("award_amount", "sum")
    )
    .reset_index()
)

#q2_df.head()


In [155]:
dir_click = alt.selection_point(fields=["directorate"], empty="all")

In [156]:
q2_overview = (
    alt.Chart(q2_df)
    .mark_bar()
    .encode(
        y=alt.Y("directorate:N", sort="-x", title="Directorate"),
        x=alt.X("grants_count:Q", title="Number of grants"),
        color=alt.condition(
            dir_click,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="blues"), title="Grants count"),
            alt.value("lightgray")
        ),
        tooltip=[
            alt.Tooltip("directorate:N", title="Directorate"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("grants_count:Q", title="Grants"),
            alt.Tooltip("total_amount:Q", title="Total amount ($)", format=",.0f"),
        ],
    )
    .add_params(year_selection, dir_click)
    .transform_filter(year_selection)
    .properties(
        title="Q2 — Grants by Directorate (select a year + click a directorate)",
        width=750,
        height=420,
    )
)


In [157]:
q2_trend = (
    alt.Chart(q2_df)
    .mark_line(point=True)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("grants_count:Q", title="Number of grants"),
        tooltip=[
            alt.Tooltip("directorate:N", title="Directorate"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("grants_count:Q", title="Grants"),
            alt.Tooltip("total_amount:Q", title="Total amount ($)", format=",.0f"),
        ],
    )
    .transform_filter(dir_click)
    .properties(
        title="Selected directorate — grants over time",
        width=750,
        height=180,
    )
)


In [158]:
(q2_overview & q2_trend)


In [159]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION
q2_df = (
    df_grants.groupby(["directorate", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# 2. CREATE SELECTIONS
min_year = int(q2_df["year"].min())
max_year = int(q2_df["year"].max())

slider = alt.binding_range(min=min_year, max=max_year, step=1, name="Select Year: ")

try:
    # Modern Altair
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=slider, value=[{"year": max_year}]
    )
    dir_select = alt.selection_point(
        name="dir_select", fields=["directorate"], empty="all"
    )
except AttributeError:
    # Older Altair
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=slider, init={"year": max_year}
    )
    dir_select = alt.selection_single(
        name="dir_select", fields=["directorate"], empty="all"
    )

# 3. LEFT COLUMN: HORIZONTAL BARS (The Overview)
bars = (
    alt.Chart(q2_df)
    .mark_bar()
    .encode(
        x=alt.X("grants_count:Q", title="Number of Grants"),
        y=alt.Y("directorate:N", sort="-x", title="Directorate"),
        color=alt.condition(
            dir_select,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="teals"), legend=None),
            alt.value("#f0f0f0"),
        ),
        tooltip=[
            alt.Tooltip("directorate:N", title="Directorate"),
            alt.Tooltip("grants_count:Q", title="Grants"),
            alt.Tooltip("total_amount:Q", title="Total Amount ($)", format=",.0f"),
        ],
    )
    .transform_filter(year_select)
    .properties(width=450, height=550, title="Grants by Directorate (Click to Filter)")
)

# 4. RIGHT COLUMN: TREND + KPI

# A. Trend Line (The History) - REMOVED 'PADDING' TO FIX ERROR
trend_chart = (
    alt.Chart(q2_df)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y(
            "total_amount:Q", title="Total Funding ($)", axis=alt.Axis(format="~s")
        ),
        color=alt.value("teal"),
        tooltip=[
            alt.Tooltip("directorate:N"),
            alt.Tooltip("year:O"),
            alt.Tooltip("total_amount:Q", format="$,.0f"),
        ],
    )
    .transform_filter(dir_select)
    .properties(
        width=300,
        height=200,
        title="Funding History (Selected Directorate)",
        # padding removed here
    )
)

# B. KPI Text (The Detail)
base_text = alt.Chart(q2_df).transform_filter(year_select).transform_filter(dir_select)

label = base_text.mark_text(align="center", color="#888", fontSize=14, dy=-15).encode(
    text=alt.value("Total Funding (Selected Year)"), y=alt.value(75), x=alt.value(150)
)

value = base_text.mark_text(
    align="center", color="#444", fontSize=24, fontWeight="bold", dy=15
).encode(
    text=alt.Text("sum(total_amount):Q", format="$,.0f"),
    y=alt.value(75),
    x=alt.value(150),
)

kpi_section = (label + value).properties(width=300, height=155)

# 5. ASSEMBLE
# We vertically stack Trend + KPI
right_col = trend_chart & kpi_section

# We horizontally stack Bars | Right Column
q2_v2 = (
    (bars | right_col)
    .add_params(year_select, dir_select)
    .resolve_scale(color="independent")
)

q2_v2

In [160]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION
q2_df = (
    df_grants.groupby(["directorate", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# 2. CREATE SELECTIONS
min_year = int(q2_df["year"].min())
max_year = int(q2_df["year"].max())

slider = alt.binding_range(min=min_year, max=max_year, step=1, name="Select Year: ")

try:
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=slider, value=[{"year": max_year}]
    )
    dir_select = alt.selection_point(
        name="dir_select", fields=["directorate"], empty="all"
    )
except AttributeError:
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=slider, init={"year": max_year}
    )
    dir_select = alt.selection_single(
        name="dir_select", fields=["directorate"], empty="all"
    )

# 3. LEFT COLUMN: HORIZONTAL BARS
bars = (
    alt.Chart(q2_df)
    .mark_bar()
    .encode(
        x=alt.X("grants_count:Q", title="Number of Grants"),
        y=alt.Y("directorate:N", sort="-x", title="Directorate"),
        color=alt.condition(
            dir_select,
            alt.Color(
                "grants_count:Q",
                scale=alt.Scale(scheme="blues"),
                legend=None,
            ),
            alt.value("#f0f0f0"),
        ),
        tooltip=[
            alt.Tooltip("directorate:N", title="Directorate"),
            alt.Tooltip("grants_count:Q", title="Grants"),
            alt.Tooltip("total_amount:Q", title="Total Amount ($)", format=",.0f"),
        ],
    )
    .transform_filter(year_select)
    .properties(width=450, height=550, title="Grants by Directorate (Click to Filter)")
)

# 4. RIGHT COLUMN COMPONENTS

# A. Spacer para crear gap superior
top_spacer = (
    alt.Chart(q2_df)
    .mark_rect()
    .encode(opacity=alt.value(0))
    .properties(width=300, height=30)
)

# B. Trend Line
trend_chart = (
    alt.Chart(q2_df)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y(
            "total_amount:Q", title="Total Funding ($)", axis=alt.Axis(format="~s")
        ),
        color=alt.value("#4c78a8"),
        tooltip=[
            alt.Tooltip("directorate:N"),
            alt.Tooltip("year:O"),
            alt.Tooltip("total_amount:Q", format="$,.0f"),
        ],
    )
    .transform_filter(dir_select)
    .properties(
        width=300,
        height=220,
        title="Funding History (Selected Directorate)",
    )
)

# C. KPI Text
base_text = alt.Chart(q2_df).transform_filter(year_select).transform_filter(dir_select)

label = base_text.mark_text(align="center", color="#888", fontSize=14, dy=-15).encode(
    text=alt.value("Total Funding (Selected Year)"), y=alt.value(75), x=alt.value(100)
)

value = base_text.mark_text(
    align="center", color="#444", fontSize=24, fontWeight="bold", dy=15
).encode(
    text=alt.Text("sum(total_amount):Q", format="$,.0f"),
    y=alt.value(75),
    x=alt.value(100),
)

kpi_section = (label + value).properties(width=300, height=100)

# D. Legend
legend_chart = (
    alt.Chart(q2_df)
    .mark_circle(opacity=0)
    .encode(
        color=alt.Color(
            "grants_count:Q",
            scale=alt.Scale(scheme="blues"),
            legend=alt.Legend(
                title="Grant Count Intensity",
                orient="bottom",
                direction="horizontal",
                titleAnchor="middle",
                gradientLength=200,
            ),
        )
    )
    .transform_filter(year_select)
    .properties(width=300, height=40)
)

# 5. ASSEMBLE
# Ahora incluimos el spacer al inicio
right_col = alt.vconcat(
    top_spacer,
    trend_chart,
    kpi_section,
    legend_chart,
    spacing=5,
)

# Final Assembly
q2_v3 = (
    (bars | right_col)
    .add_params(year_select, dir_select)
    .resolve_scale(color="independent")
    .configure_view(stroke=None)
    .configure_concat(spacing=10)
)

q2_v3

In [192]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION

# A. Yearly Data (Specific Years)
q2_yearly = (
    df_grants.groupby(["directorate", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# B. Global Data (Year 0 = "All Time")
q2_total = (
    df_grants.groupby(["directorate"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)
q2_total["year"] = 0  # Assign 0 for the aggregate

# C. Combine
q2_full = pd.concat([q2_yearly, q2_total], ignore_index=True)


# 2. INTERACTION SETUP
years = sorted(q2_yearly["year"].unique())
year_options = [0] + years
year_labels = ["All Years (Total)"] + [str(y) for y in years]

input_element = alt.binding_select(
    options=year_options, labels=year_labels, name="Select Year: "
)

try:
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=input_element, value=[{"year": 0}]
    )
    dir_select = alt.selection_point(
        name="dir_select", fields=["directorate"], empty="all"
    )
except AttributeError:
    # Older Altair
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=input_element, init={"year": 0}
    )
    dir_select = alt.selection_single(
        name="dir_select", fields=["directorate"], empty="all"
    )

# 3. LEFT COLUMN: HORIZONTAL BARS
bars = (
    alt.Chart(q2_full)
    .mark_bar()
    .encode(
        x=alt.X("grants_count:Q", title="Number of Grants"),
        y=alt.Y("directorate:N", sort="-x", title="Directorate"),
        color=alt.condition(
            dir_select,
            alt.Color(
                "grants_count:Q",
                scale=alt.Scale(scheme="blues"),
                legend=None,
            ),
            alt.value("#f0f0f0"),
        ),
        tooltip=[
            alt.Tooltip("directorate:N", title="Directorate"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("grants_count:Q", title="Grants"),
            alt.Tooltip("total_amount:Q", title="Total Amount ($)", format=",.0f"),
        ],
    )
    .add_params(dir_select, year_select)
    .transform_filter(year_select)  # <--- Filter by Dropdown
    .properties(width=450, height=550, title="Grants by Directorate (Click to Filter)")
)

# 4. RIGHT COLUMN COMPONENTS

# A. Spacer
top_spacer = (
    alt.Chart(q2_full)
    .mark_rect()
    .encode(opacity=alt.value(0))
    .properties(width=300, height=30)
)

# B. Trend Line
trend_chart = (
    alt.Chart(q2_yearly)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y(
            "total_amount:Q", title="Total Funding ($)", axis=alt.Axis(format="~s")
        ),
        color=alt.value("#4c78a8"),
        tooltip=[
            alt.Tooltip("directorate:N"),
            alt.Tooltip("year:O"),
            alt.Tooltip("total_amount:Q", format="$,.0f"),
        ],
    )
    .transform_filter(dir_select)
    .properties(
        width=300,
        height=220,
        title="Funding History (Selected Directorate)",
    )
)

# C. KPI Text
base_text = (
    alt.Chart(q2_full).transform_filter(year_select).transform_filter(dir_select)
)

label = base_text.mark_text(align="center", color="#888", fontSize=14, dy=-15).encode(
    text=alt.value("Total Funding (Selected Year)"), y=alt.value(50), x=alt.value(125)
)

value = base_text.mark_text(
    align="center", color="#444", fontSize=24, fontWeight="bold", dy=15
).encode(
    text=alt.Text("sum(total_amount):Q", format="$,.0f"),
    y=alt.value(50),
    x=alt.value(125),
)

kpi_section = (label + value).properties(width=300, height=100)

# D. Legend
legend_chart = (
    alt.Chart(q2_full)
    .mark_circle(opacity=0)
    .encode(
        color=alt.Color(
            "grants_count:Q",
            scale=alt.Scale(scheme="blues"),
            legend=alt.Legend(
                title="Grant Count Intensity",
                orient="bottom",
                direction="horizontal",
                titleAnchor="middle",
                gradientLength=200,
            ),
        )
    )
    .transform_filter(year_select)
    .properties(width=300, height=40)
)

# 5. ASSEMBLE
right_col = alt.vconcat(
    top_spacer,
    trend_chart,
    kpi_section,
    legend_chart,
    spacing=5,
)

final_q2 = (
    (bars | right_col)
    .resolve_scale(color="independent")
)

final_q2

o analyze grant distribution across the 47+ NSF directorates, I designed a **composite dashboard** centered on a sorted horizontal bar chart. This provides a clear 'Leaderboard' of funding volume, which is essential for comparing such a large number of categories.

Addressing the need to see data 'for a certain year,' I implemented a **Dropdown Selector** that allows users to instantly toggle between a global 'All-Time' summary and specific fiscal years. This satisfies the multi-level granularity requirement without cluttering the interface.

Adhering to the **Details-on-Demand** principle, clicking a directorate reveals its specific historical context in the side panel: a **Trend Line** showing funding evolution over the last 5 years and a **KPI Text** displaying the exact dollar amount for the selected timeframe. This separation ensures the main view remains an uncluttered overview while providing deep-dive data when needed.

## Q3: Are the cancelled grants especially hitting a certain directorate?

In [162]:
import pandas as pd
import altair as alt

# --- Aggregations ---
q3_cancel_df = (
    df_trump.groupby(["directorate"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

q3_base_df = (
    df_grants.groupby(["directorate"])
    .agg(base_count=("award_id", "count"), base_amount=("award_amount", "sum"))
    .reset_index()
)

q3_df = q3_base_df.merge(q3_cancel_df, on="directorate", how="outer").fillna(0)

q3_df["cancel_rate"] = (
    q3_df["cancelled_count"] / q3_df["base_count"].replace(0, pd.NA)
).fillna(0)

q3_scatter_df = q3_df[(q3_df["base_count"] > 0) & (q3_df["cancelled_count"] > 0)].copy()

dir_sel = alt.selection_point(fields=["directorate"], empty="all")

q3_scatter = (
    alt.Chart(q3_scatter_df)
    .mark_circle(opacity=0.8, stroke="black", strokeWidth=0.4)
    .encode(
        x=alt.X("base_count:Q", title="Baseline grants (last 5 years)"),
        y=alt.Y("cancelled_count:Q", title="Cancelled grants (Trump era)"),
        size=alt.Size("cancelled_amount:Q", title="Cancelled amount ($)", legend=None),
        color=alt.Color(
            "cancel_rate:Q",
            title="Cancellation rate",
            scale=alt.Scale(scheme="oranges"),
        ),
        tooltip=[
            alt.Tooltip("directorate:N", title="Directorate"),
            alt.Tooltip("base_count:Q", title="Baseline grants"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled grants"),
            alt.Tooltip("cancel_rate:Q", title="Cancel rate", format=".2%"),
            alt.Tooltip(
                "cancelled_amount:Q", title="Cancelled amount ($)", format=",.0f"
            ),
        ],
    )
    .add_params(dir_sel)
    .properties(
        width=750,
        height=380,
        title="Q3 — Cancelled grants vs baseline distribution (by directorate)",
    )
)

# --- Bars ---
n_dirs = q3_df[q3_df["cancelled_count"] > 0]["directorate"].nunique()
rank_height = max(300, n_dirs * 18)

q3_bars = (
    alt.Chart(q3_df)
    .mark_bar()
    .encode(
        y=alt.Y(
            "directorate:N",
            sort="-x",
            title="Directorate",
            axis=alt.Axis(labelLimit=200),
        ),
        x=alt.X("cancelled_count:Q", title="Cancelled grants"),
        color=alt.condition(dir_sel, alt.value("#d95f02"), alt.value("lightgray")),
        tooltip=[
            alt.Tooltip("directorate:N"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled grants"),
            alt.Tooltip("cancel_rate:Q", title="Cancel rate", format=".2%"),
        ],
    )
    .transform_filter(alt.datum.cancelled_count > 0)
    .add_params(dir_sel)
    .properties(
        width=750, height=rank_height, title="Cancelled grants ranking (click to focus)"
    )
)

q3_cancel_by_year = (
    df_trump.groupby(["directorate", "year"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

q3_trend = (
    alt.Chart(q3_cancel_by_year)
    .mark_line(point=True)
    .encode(
        x=alt.X("year:O", title="Year (Trump era)"),
        y=alt.Y("cancelled_count:Q", title="Cancelled grants"),
        tooltip=[
            alt.Tooltip("directorate:N"),
            alt.Tooltip("year:O"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled grants"),
            alt.Tooltip(
                "cancelled_amount:Q", title="Cancelled amount ($)", format=",.0f"
            ),
        ],
    )
    .transform_filter(dir_sel)
    .properties(
        width=750, height=180, title="Selected directorate — cancellations over time"
    )
)

(q3_scatter & q3_bars & q3_trend)

In [163]:
import altair as alt
import pandas as pd


alt.data_transformers.enable("default")

q3_cancel_df = (
    df_trump.groupby(["directorate"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

q3_base_df = (
    df_grants.groupby(["directorate"])
    .agg(base_count=("award_id", "count"), base_amount=("award_amount", "sum"))
    .reset_index()
)

q3_df = q3_base_df.merge(q3_cancel_df, on="directorate", how="outer").fillna(0)
q3_df["cancel_rate"] = (
    q3_df["cancelled_count"] / q3_df["base_count"].replace(0, 1)
).fillna(0)

q3_scatter_df = q3_df[(q3_df["base_count"] > 0)].copy()

q3_trend_data = (
    df_trump.groupby(["directorate", "year"])
    .agg(cancelled_count=("award_id", "count"))
    .reset_index()
)

try:
    dir_select = alt.selection_point(fields=["directorate"], empty="all")
except AttributeError:
    dir_select = alt.selection_single(fields=["directorate"], empty="all")


scatter = (
    alt.Chart(q3_scatter_df)
    .mark_circle(size=100, stroke="black", strokeWidth=0.5, opacity=0.8)
    .encode(
        x=alt.X(
            "base_count:Q",
            scale=alt.Scale(type="log"),
            title="Total Grants (Log Scale)",
        ),
        y=alt.Y("cancel_rate:Q", title="Cancellation Rate", axis=alt.Axis(format="%")),
        color=alt.condition(
            dir_select,
            alt.Color("cancel_rate:Q", scale=alt.Scale(scheme="reds"), legend=None),
            alt.value("#f0f0f0"),  # Turn gray if not clicked
        ),
        size=alt.Size("cancelled_amount:Q", title="Lost Funding ($)", legend=None),
        tooltip=[
            alt.Tooltip("directorate:N", title="Directorate"),
            alt.Tooltip("base_count:Q", title="Total Grants"),
            alt.Tooltip("cancel_rate:Q", title="Cancel Rate", format=".1%"),
            alt.Tooltip("cancelled_amount:Q", title="Lost Funding", format="$,.0f"),
        ],
    )
    .add_selection(dir_select)  # <--- Interaction Driver
    .properties(
        width=450, height=400, title="Q3: Cancellation Intensity (Rate vs Volume)"
    )
)

# Add a mean line for context
mean_rate = q3_scatter_df["cancel_rate"].mean()
rule = (
    alt.Chart(pd.DataFrame({"mean_rate": [mean_rate]}))
    .mark_rule(color="gray", strokeDash=[4, 4])
    .encode(y="mean_rate:Q")
)

left_chart = scatter + rule

# 4. RIGHT COLUMN COMPONENTS

# A. Spacer (Top margin)
top_spacer = (
    alt.Chart(q3_df)
    .mark_rect()
    .encode(opacity=alt.value(0))
    .properties(width=300, height=50)
)

# B. Trend Chart (Red)
trend_chart = (
    alt.Chart(q3_trend_data)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("cancelled_count:Q", title="Cancellations"),
        color=alt.value("#d62728"),  # Standard Red
        tooltip=["directorate", "year", "cancelled_count"],
    )
    .transform_filter(dir_select)
    .properties(width=300, height=200, title="Cancellation History (Selected)")
)

# C. KPI Text
base_text = alt.Chart(q3_df).transform_filter(dir_select)

label = base_text.mark_text(align="center", color="#888", fontSize=14, dy=-15).encode(
    text=alt.value("Total Lost Funding"), y=alt.value(60), x=alt.value(150)
)

value = base_text.mark_text(
    align="center", color="#333", fontSize=24, fontWeight="bold", dy=15
).encode(
    text=alt.Text("sum(cancelled_amount):Q", format="$,.0f"),
    y=alt.value(60),
    x=alt.value(150),
)

kpi_section = (label + value).properties(width=300, height=120)

legend_chart = (
    alt.Chart(q3_df)
    .mark_circle(opacity=0)
    .encode(
        color=alt.Color(
            "cancel_rate:Q",
            scale=alt.Scale(scheme="reds"),
            legend=alt.Legend(
                title="Cancellation Intensity (Rate)",
                orient="bottom",
                direction="horizontal",
                titleAnchor="middle",
                gradientLength=200,
            ),
        )
    )
    .properties(width=300, height=50)
)

# ASSEMBLE
right_col = alt.vconcat(top_spacer, trend_chart, kpi_section, legend_chart, spacing=10)

q3_v2 = (
    (left_chart | right_col)
    .resolve_scale(color="independent")
    .configure_view(stroke=None)
    .configure_concat(spacing=30)
)

q3_v2

Deprecated since `altair=5.0.0`. Use add_params instead.
  .add_selection(dir_select)  # <--- Interaction Driver


In [164]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION
# A. Cancelled Stats
q3_cancel_df = (
    df_trump.groupby(["directorate"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

# B. Base Stats
q3_base_df = (
    df_grants.groupby(["directorate"])
    .agg(base_count=("award_id", "count"), base_amount=("award_amount", "sum"))
    .reset_index()
)

# C. Merge & Rate
q3_df = q3_base_df.merge(q3_cancel_df, on="directorate", how="outer").fillna(0)
q3_df["cancel_rate"] = (
    q3_df["cancelled_count"] / q3_df["base_count"].replace(0, 1)
).fillna(0)

# Filter for plotting (must have base grants)
q3_plot_df = q3_df[q3_df["base_count"] > 0].copy()

# D. Trend Data
q3_trend_data = (
    df_trump.groupby(["directorate", "year"])
    .agg(cancelled_count=("award_id", "count"))
    .reset_index()
)

# 2. SELECTION
try:
    dir_select = alt.selection_point(fields=["directorate"], empty="all")
except AttributeError:
    dir_select = alt.selection_single(fields=["directorate"], empty="all")

# 3. LEFT CHART: BAR CHART (Ranking)
bars = (
    alt.Chart(q3_plot_df)
    .mark_bar()
    .encode(
        x=alt.X("cancelled_count:Q", title="Number of Cancellations"),
        y=alt.Y("directorate:N", sort="-x", title="Directorate"),
        color=alt.condition(
            dir_select,
            alt.Color("cancelled_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#f0f0f0"),
        ),
        tooltip=[
            "directorate",
            "cancelled_count",
            alt.Tooltip("cancelled_amount", format="$,.0f"),
        ],
    )
    .add_selection(dir_select)
    .properties(width=250, height=450, title="Ranking: Total Cancellations")
)

# 4. RIGHT CHART: BUBBLE RADAR (Analysis)
bubble_radar = (
    alt.Chart(q3_plot_df)
    .mark_circle(stroke="black", strokeWidth=0.5, opacity=0.8)
    .encode(
        x=alt.X(
            "base_count:Q", scale=alt.Scale(type="log"), title="Total Grants Size (Log)"
        ),
        y=alt.Y("cancel_rate:Q", title="Cancellation Rate", axis=alt.Axis(format="%")),
        size=alt.Size(
            "cancelled_amount:Q",
            title="Lost Funding ($)",
            legend=None,
            scale=alt.Scale(range=[50, 500]),
        ),
        color=alt.condition(
            dir_select,
            alt.value("#4c78a8"),  # Fixed capitalization here
            alt.value("#f0f0f0"),
        ),
        tooltip=[
            alt.Tooltip("directorate:N"),
            alt.Tooltip("cancel_rate:Q", format=".1%"),
            alt.Tooltip("cancelled_amount:Q", format="$,.0f"),
        ],
    )
    .add_selection(dir_select)
    .properties(width=400, height=250, title="Analysis: Intensity vs. Size")
)

# 5. BOTTOM RIGHT: TREND LINE (Context)
trend_line = (
    alt.Chart(q3_trend_data)
    .mark_line(point=True, color="#4c78a8")
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("cancelled_count:Q", title="Cancellations"),
        tooltip=["directorate", "year", "cancelled_count"],
    )
    .transform_filter(dir_select)
    .properties(width=400, height=150, title="Timeline: When did it happen?")
)

# 6. ASSEMBLE
right_col = alt.vconcat(bubble_radar, trend_line, spacing=10)

q3_v3 = (
    (bars | right_col)
    .resolve_scale(color="independent")
    .configure_view(stroke=None)
    .configure_concat(spacing=30)
)

q3_v3


Deprecated since `altair=5.0.0`. Use add_params instead.
  .add_selection(dir_select)
Deprecated since `altair=5.0.0`. Use add_params instead.
  .add_selection(dir_select)


In [165]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION
# A. Cancelled Stats
q3_cancel_df = (
    df_trump.groupby(["directorate"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

# B. Base Stats
q3_base_df = (
    df_grants.groupby(["directorate"])
    .agg(base_count=("award_id", "count"), base_amount=("award_amount", "sum"))
    .reset_index()
)

# C. Merge
q3_df = q3_base_df.merge(q3_cancel_df, on="directorate", how="outer").fillna(0)
q3_df["cancel_rate"] = (
    q3_df["cancelled_count"] / q3_df["base_count"].replace(0, 1)
).fillna(0)

# Filter
q3_plot_df = q3_df[q3_df["base_count"] > 0].copy()

# D. Trend Data
q3_trend_data = (
    df_trump.groupby(["directorate", "year"])
    .agg(cancelled_count=("award_id", "count"))
    .reset_index()
)

# 2. SELECTION
try:
    dir_select = alt.selection_point(fields=["directorate"], empty="all")
except AttributeError:
    dir_select = alt.selection_single(fields=["directorate"], empty="all")

# 3. LEFT CHART: BAR CHART (Leaderboard)
bars = (
    alt.Chart(q3_plot_df)
    .mark_bar()
    .encode(
        x=alt.X("cancelled_count:Q", title="Number of Cancellations"),
        y=alt.Y("directorate:N", sort="-x", title="Directorate"),
        color=alt.condition(
            dir_select,
            alt.Color("cancelled_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#f0f0f0"),
        ),
        tooltip=[
            "directorate",
            "cancelled_count",
            alt.Tooltip("cancelled_amount", format="$,.0f"),
        ],
    )
    .add_selection(dir_select)
    .properties(width=250, height=450, title="Ranking: Total Cancellations")
)

# 4. RIGHT CHART: LINEAR SCATTER (Context)
# CHANGED: Now using simple Counts on both axes
# Insight: Points high up but to the left are "Disproportionately Hit"
linear_scatter = (
    alt.Chart(q3_plot_df)
    .mark_circle(stroke="black", strokeWidth=0.5, opacity=0.8)
    .encode(
        x=alt.X("base_count:Q", title="Total Grants Issued (Size)"),
        y=alt.Y("cancelled_count:Q", title="Total Cancellations (Hits)"),
        size=alt.Size(
            "cancelled_amount:Q",
            title="Lost Funding ($)",
            legend=None,
            scale=alt.Scale(range=[50, 500]),
        ),
        color=alt.condition(dir_select, alt.value("#4c78a8"), alt.value("#f0f0f0")),
        tooltip=[
            alt.Tooltip("directorate:N"),
            alt.Tooltip("base_count:Q", title="Total Grants"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled Grants"),
            alt.Tooltip("cancelled_amount:Q", format="$,.0f"),
        ],
    )
    .add_selection(dir_select)
    .properties(width=400, height=250, title="Context: Volume vs. Cancellations")
)

# 5. BOTTOM RIGHT: TREND LINE
trend_line = (
    alt.Chart(q3_trend_data)
    .mark_line(point=True, color="#4c78a8")
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("cancelled_count:Q", title="Cancellations"),
        tooltip=["directorate", "year", "cancelled_count"],
    )
    .transform_filter(dir_select)
    .properties(width=400, height=150, title="Timeline: When did it happen?")
)

# 6. ASSEMBLE
right_col = alt.vconcat(linear_scatter, trend_line, spacing=10)

q3_v4 = (
    (bars | right_col)
    .resolve_scale(color="independent")
    .configure_view(stroke=None)
    .configure_concat(spacing=30)
)

q3_v4

Deprecated since `altair=5.0.0`. Use add_params instead.
  .add_selection(dir_select)
Deprecated since `altair=5.0.0`. Use add_params instead.
  .add_selection(dir_select)


In [193]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# Load Data (Assuming files are in the same directory)
df_grants = pd.read_csv("NSF_Grants_Last5Years_Clean.csv")
df_trump = pd.read_csv("trump17-21-csv.csv")

# Clean columns just in case
df_grants.columns = df_grants.columns.str.strip()
df_trump.columns = df_trump.columns.str.strip()


# 1. DATA PREPARATION

# --- A. Yearly Data (Specific Years) ---
# Grants per Year
base_yearly = (
    df_grants.groupby(["directorate", "year"])
    .agg(base_count=("award_id", "count"), base_amount=("award_amount", "sum"))
    .reset_index()
)

# Cancellations per Year
cancel_yearly = (
    df_trump.groupby(["directorate", "year"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

# Merge Yearly
yearly_df = base_yearly.merge(
    cancel_yearly, on=["directorate", "year"], how="outer"
).fillna(0)


# --- B. Global Data (Year 0 = "All Time") ---
# Total Grants (All Time)
base_total = (
    df_grants.groupby(["directorate"])
    .agg(base_count=("award_id", "count"), base_amount=("award_amount", "sum"))
    .reset_index()
)
base_total["year"] = 0  # Assign "0" to represent the global aggregate

# Total Cancellations (All Time)
cancel_total = (
    df_trump.groupby(["directorate"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)
cancel_total["year"] = 0

# Merge Global
total_df = base_total.merge(
    cancel_total, on=["directorate", "year"], how="outer"
).fillna(0)


# --- C. Prepare the "Total" Rows ---
# We calculate the "Static Size" (Total Grants) for the X-axis
# This ensures the dots stay in the same X-position regardless of the year selected
base_total_fixed = base_total[["directorate", "base_count"]].rename(
    columns={"base_count": "static_base_count"}
)

# Merge Static Size into Yearly Data
yearly_df = yearly_df.merge(base_total_fixed, on="directorate", how="left").fillna(0)

# For Global rows, static_base_count is just the base_count
total_rows = total_df.copy()
total_rows["static_base_count"] = total_rows["base_count"]


# --- D. Combine & Metrics ---
q3_full = pd.concat([yearly_df, total_rows], ignore_index=True)

# Filter: Keep only Year 0 and 2018-2021
target_years = [0, 2018, 2019, 2020, 2021]
q3_full = q3_full[q3_full["year"].isin(target_years)]

# Filter for Plotting: Keep rows with either Base Grants OR Cancellations
q3_plot_full = q3_full[
    (q3_full["static_base_count"] > 0) | (q3_full["cancelled_count"] > 0)
].copy()

# Trend Data (Context Line Chart)
q3_trend_data = (
    df_trump.groupby(["directorate", "year"])
    .agg(cancelled_count=("award_id", "count"))
    .reset_index()
)


# 2. INTERACTION SETUP
year_options = [0, 2018, 2019, 2020, 2021]
year_labels = ["All Years (Total)", "2018", "2019", "2020", "2021"]

input_element = alt.binding_select(
    options=year_options, labels=year_labels, name="Select Year: "
)

try:
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=input_element, value=[{"year": 0}]
    )
    dir_select = alt.selection_point(
        name="dir_select", fields=["directorate"], empty="all"
    )
except AttributeError:
    # Fallback for older Altair
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=input_element, init={"year": 0}
    )
    dir_select = alt.selection_single(
        name="dir_select", fields=["directorate"], empty="all"
    )


# 3. LEFT CHART: BAR CHART (Ranking)
bars = (
    alt.Chart(q3_plot_full)
    .mark_bar()
    .encode(
        x=alt.X("cancelled_count:Q", title="Number of Cancellations"),
        y=alt.Y("directorate:N", sort="-x", title="Directorate"),
        color=alt.condition(
            dir_select,
            alt.Color(
                "cancelled_count:Q", scale=alt.Scale(scheme="blues"), legend=None
            ),
            alt.value("#f0f0f0"),
        ),
        tooltip=[
            "directorate",
            "year",
            "cancelled_count",
            alt.Tooltip("cancelled_amount", format="$,.0f"),
        ],
    )
    .add_params(dir_select, year_select)
    .transform_filter(year_select)
    .properties(width=250, height=450, title="Ranking: Total Cancellations")
)


# 4. RIGHT CHART: LINEAR SCATTER (Dynamic Zoom Added!)
linear_scatter = (
    alt.Chart(q3_plot_full)
    .mark_circle(stroke="black", strokeWidth=0.5, opacity=0.8)
    .encode(
        x=alt.X("static_base_count:Q", title="General Directorate Size (Total Grants)"),
        y=alt.Y("cancelled_count:Q", title="Cancellations (Selected Year)"),
        size=alt.Size(
            "cancelled_amount:Q",
            title="Lost Funding ($)",
            legend=None,
            scale=alt.Scale(range=[50, 500]),
        ),
        color=alt.condition(dir_select, alt.value("#4c78a8"), alt.value("#f0f0f0")),
        tooltip=[
            alt.Tooltip("directorate:N"),
            alt.Tooltip("year:O", title="Data Year"),
            alt.Tooltip("static_base_count:Q", title="Directorate Size (Total)"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled Grants"),
            alt.Tooltip("cancelled_amount:Q", format="$,.0f"),
        ],
    )
    .add_params(dir_select, year_select)
    .transform_filter(year_select)
    .properties(width=400, height=250, title="Context: Volume vs. Cancellations")
    .interactive()  # <--- THIS ENABLES DYNAMIC ZOOM & PAN
)


# 5. BOTTOM RIGHT: TREND LINE
trend_line = (
    alt.Chart(q3_trend_data)
    .mark_line(point=True, color="#4c78a8")
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("cancelled_count:Q", title="Cancellations"),
        tooltip=["directorate", "year", "cancelled_count"],
    )
    .transform_filter(dir_select)
    .properties(width=400, height=150, title="Timeline: When did it happen?")
)


# 6. ASSEMBLE
right_col = alt.vconcat(linear_scatter, trend_line, spacing=10)

final_q3 = (
    (bars | right_col)
    .resolve_scale(color="independent")
)

final_q3

To determine if cancellations disproportionately targeted specific directorates, I designed a **composite dashboard** that distinguishes **volume** from **intensity**. A raw count is biased by directorate size, so I paired a **Ranked Bar Chart** (Left) for absolute impact with an **Interactive Scatter Plot** (Right) for relative context.

The Scatter Plot plots Static Directorate Size (X) vs. Cancellations (Y). Using a stable X-axis ensures that dots shift vertically rather than erratically when filtering, making yearly comparisons intuitive. To facilitate **temporal analysis**, I implemented a **Dropdown Selector** that toggles between a global 'All-Time' summary and specific fiscal years.

Adhering to Shneiderman's mantra, the dashboard supports **Zoom & Pan** on the scatter plot to resolve occlusion in dense clusters. This design effectively separates natural scaling (diagonal trend) from anomalies (outliers high on Y but low on X), allowing users to pinpoint specific targets across different timeframes.

## Q4: How have the total grants amount evolved over the years?

In [167]:
import pandas as pd
import altair as alt

# --------------------------------------------------
# DATA AGGREGATION
# --------------------------------------------------

q4_df = (
    df_grants.groupby("year")
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
    .sort_values("year")
)

# --------------------------------------------------
# INTERACTION: YEAR SELECTION
# --------------------------------------------------

year_sel = alt.selection_point(fields=["year"], empty="all")

# --------------------------------------------------
# MAIN TREND: TOTAL FUNDING (PRIMARY STORY)
# --------------------------------------------------

q4_funding = (
    alt.Chart(q4_df)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y(
            "total_amount:Q", title="Total NSF funding ($)", axis=alt.Axis(format="~s")
        ),
        color=alt.condition(year_sel, alt.value("#1f77b4"), alt.value("#b0c4de")),
        tooltip=[
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("total_amount:Q", title="Total funding ($)", format=",.0f"),
            alt.Tooltip("grants_count:Q", title="Number of grants"),
        ],
    )
    .add_params(year_sel)
    .properties(
        width=750,
        height=300,
        title="Q4 — Evolution of total NSF funding over the last 5 years",
    )
)

# --------------------------------------------------
# CONTEXT: NUMBER OF GRANTS (SECONDARY STORY)
# --------------------------------------------------

q4_grants = (
    alt.Chart(q4_df)
    .mark_bar()
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("grants_count:Q", title="Number of grants"),
        color=alt.condition(year_sel, alt.value("#ff7f0e"), alt.value("#ffd8b1")),
        tooltip=[
            alt.Tooltip("year:O"),
            alt.Tooltip("grants_count:Q", title="Number of grants"),
            alt.Tooltip("total_amount:Q", title="Total funding ($)", format=",.0f"),
        ],
    )
    .add_params(year_sel)
    .properties(width=750, height=200, title="Number of grants per year (context)")
)

# --------------------------------------------------
# FINAL COMPOSITION
# --------------------------------------------------

(q4_funding & q4_grants)

In [168]:
# ---- sizes (tweakable) ----
W_BIG = 560
H_BIG = 520
W_SMALL = 330

# LEFT (BIG): breakdown by directorate
breakdown = (
    alt.Chart(q4_dir)
    .mark_bar()
    .encode(
        x=alt.X("total_amount:Q", title="Funding ($)", axis=alt.Axis(format="~s")),
        y=alt.Y(
            "directorate:N",
            sort="-x",
            title="Directorate",
            axis=alt.Axis(labelLimit=180)  # helps a bit
        ),
        tooltip=[
            alt.Tooltip("directorate:N", title="Directorate"),
            alt.Tooltip("total_amount:Q", title="Funding", format="$,.0f"),
            alt.Tooltip("grants_count:Q", title="Grants"),
        ],
        color=alt.Color("total_amount:Q", scale=alt.Scale(scheme="blues"), legend=None),
    )
    .transform_filter(year_select)
    .properties(width=W_BIG, height=H_BIG, title="Breakdown by directorate (selected year)")
)

# RIGHT (SMALL): line over time
line = (
    alt.Chart(q4_year)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("total_amount:Q", title="Total funding ($)", axis=alt.Axis(format="~s")),
        tooltip=[
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("total_amount:Q", title="Total funding", format="$,.0f"),
            alt.Tooltip("grants_count:Q", title="Grants count"),
            alt.Tooltip("yoy_pct:Q", title="YoY %", format="+.1f"),
        ],
    )
    .properties(width=W_SMALL, height=220, title="Total funding over time")
)

highlight = (
    alt.Chart(q4_year)
    .mark_point(size=180, filled=True)
    .encode(x="year:O", y="total_amount:Q", color=alt.value("#1f77b4"))
    .transform_filter(year_select)
)

line_block = (line + highlight).add_params(year_select)

# KPIs (same as before, just keep width aligned)
kpi_totals = kpi_totals.properties(width=W_SMALL, height=140)
kpi_yoy = kpi_yoy.properties(width=W_SMALL, height=110)

right_col = alt.vconcat(line_block, kpi_totals, kpi_yoy, spacing=12)

q4_v2 = (
    (breakdown | right_col)
    .configure_view(stroke=None)
    .configure_concat(spacing=24)
)

q4_v2


NameError: name 'q4_dir' is not defined

In [194]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION
# We need the finest granularity to allow filtering by both State and Directorate
q4_df = (
    df_grants.groupby(["year", "state", "directorate"])
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
)

# 2. CREATE DROPDOWN SELECTIONS

# A. State Selector
states = sorted(q4_df["state"].unique())
state_input = alt.binding_select(
    options=[None] + states,  # None adds the "All" option
    labels=["All States"] + states,
    name="Select State: ",
)
state_select = alt.selection_point(fields=["state"], bind=state_input)

# B. Directorate Selector
dirs = sorted(q4_df["directorate"].unique())
dir_input = alt.binding_select(
    options=[None] + dirs,
    labels=["All Directorates"] + dirs,
    name="Select Directorate: ",
)
dir_select = alt.selection_point(fields=["directorate"], bind=dir_input)

# 3. MAIN CHART: EVOLUTION AREA CHART
# We use an Area chart to emphasize the "Volume" of funding over time
evolution_chart = (
    alt.Chart(q4_df)
    .mark_area(
        line={"color": "#4c78a8"},  # Darker blue line on top
        color=alt.Gradient(
            gradient="linear",
            stops=[
                alt.GradientStop(color="#4c78a8", offset=0),
                alt.GradientStop(color="white", offset=1),
            ],
            x1=1,
            x2=1,
            y1=1,
            y2=0,
        ),
        opacity=0.6,
    )
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y(
            "sum(total_amount):Q", title="Total Funding ($)", axis=alt.Axis(format="~s")
        ),
        tooltip=[
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("sum(total_amount):Q", title="Total Funding", format="$,.0f"),
            alt.Tooltip("sum(grants_count):Q", title="Grants Count"),
        ],
    )
    .add_params(state_select, dir_select)
    .transform_filter(state_select)
    .transform_filter(dir_select)
    .properties(width=600, height=300, title="Evolution of Funding (Filtered View)")
)

# Add points on top for better hover interaction
points = (
    alt.Chart(q4_df)
    .mark_circle(size=60, color="#4c78a8")
    .encode(
        x="year:O",
        y="sum(total_amount):Q",
        tooltip=[
            alt.Tooltip("year:O"),
            alt.Tooltip("sum(total_amount):Q", format="$,.0f"),
            alt.Tooltip("sum(grants_count):Q"),
        ],
    )
    .transform_filter(state_select)
    .transform_filter(dir_select)
)

# 4. KPI CARDS (Dynamic)
# Base chart for KPIs
base_kpi = alt.Chart(q4_df).transform_filter(state_select).transform_filter(dir_select)

# KPI 1: Total Funding (Sum of the filtered view)
kpi_fund_text = base_kpi.mark_text(
    align="center", fontSize=24, fontWeight="bold", color="#4c78a8"
).encode(text=alt.Text("sum(total_amount):Q", format="$,.2s"))
kpi_fund_label = base_kpi.mark_text(
    align="center", fontSize=12, color="gray", dy=-20
).encode(text=alt.value("Total Funding (Selected)"))
kpi_fund = (kpi_fund_label + kpi_fund_text).properties(width=150, height=80)

# KPI 2: Total Grants
kpi_count_text = base_kpi.mark_text(
    align="center", fontSize=24, fontWeight="bold", color="#4c78a8"
).encode(text=alt.Text("sum(grants_count):Q", format=","))
kpi_count_label = base_kpi.mark_text(
    align="center", fontSize=12, color="gray", dy=-20
).encode(text=alt.value("Total Grants (Selected)"))
kpi_count = (kpi_count_label + kpi_count_text).properties(width=150, height=80)


# 5. ASSEMBLE
# Chart on Left | KPIs on Right (Stacked Vertically)
chart_layer = evolution_chart + points
kpi_col = alt.vconcat(kpi_fund, kpi_count, spacing=20)

final_q4 = (
    (chart_layer | kpi_col)
    .resolve_scale(color="independent")
)

final_q4

not bad anymore

## Q5: For a selected state, how have the grants evolved? Are there cancelled grants?

In [169]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. LOAD DATA
# Ensure you have these files in your folder
df_grants = pd.read_csv("NSF_Grants_Last5Years_Clean.csv")
df_trump = pd.read_csv("trump17-21-csv.csv")

# Clean columns just in case
df_grants.columns = df_grants.columns.str.strip()
df_trump.columns = df_trump.columns.str.strip()

# 2. AGGREGATIONS
q5_grants = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

q5_trump = (
    df_trump.groupby(["state", "year"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

# 3. CREATE SELECTION (Dropdown)
# Get list of states for the dropdown
states = sorted(q5_grants["state"].unique())
state_dropdown = alt.binding_select(options=states, name="Select State: ")

try:
    # Modern Altair
    state_selection = alt.selection_point(
        fields=["state"], bind=state_dropdown, value=[{"state": "CA"}]
    )
except AttributeError:
    # Older Altair
    state_selection = alt.selection_single(
        fields=["state"], bind=state_dropdown, init={"state": "CA"}
    )

# 4. CHART DEFINITIONS

# Chart A: Total Funding (Line)
q5_amount_line = (
    alt.Chart(q5_grants)
    .mark_line(point=True, color="#4c78a8")  # Standard Blue
    .encode(
        x=alt.X("year:O", title="Year (last 5 years)"),
        y=alt.Y(
            "total_amount:Q", title="Total Funding ($)", axis=alt.Axis(format="~s")
        ),
        tooltip=[
            alt.Tooltip("state:N"),
            alt.Tooltip("year:O"),
            alt.Tooltip("total_amount:Q", title="Total funding ($)", format=",.0f"),
            alt.Tooltip("grants_count:Q", title="Number of grants"),
        ],
    )
    .add_selection(state_selection)  # <--- Selection added here
    .transform_filter(state_selection)
    .properties(
        width=750, height=200, title="Q5 — Selected State: Total Funding Evolution"
    )
)

# Chart B: Number of Grants (Bar)
q5_count_bar = (
    alt.Chart(q5_grants)
    .mark_bar(color="#72b7b2")  # Teal
    .encode(
        x=alt.X("year:O", title="Year (last 5 years)"),
        y=alt.Y("grants_count:Q", title="Number of grants"),
        tooltip=[
            alt.Tooltip("state:N"),
            alt.Tooltip("year:O"),
            alt.Tooltip("grants_count:Q", title="Number of grants"),
            alt.Tooltip("total_amount:Q", title="Total funding ($)", format=",.0f"),
        ],
    )
    .transform_filter(state_selection)  # Listens to the same selection
    .properties(width=750, height=150, title="Grant Count Evolution")
)

# Chart C: Cancelled Grants (Bar - Red)
q5_cancelled = (
    alt.Chart(q5_trump)
    .mark_bar(color="#e45756")  # Red
    .encode(
        x=alt.X("year:O", title="Year (Trump era)"),
        y=alt.Y("cancelled_count:Q", title="Cancelled grants"),
        tooltip=[
            alt.Tooltip("state:N"),
            alt.Tooltip("year:O"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled grants"),
            alt.Tooltip(
                "cancelled_amount:Q", title="Cancelled amount ($)", format=",.0f"
            ),
        ],
    )
    .transform_filter(state_selection)  # Listens to the same selection
    .properties(width=750, height=150, title="Trump Era (2017–2021): Cancelled Grants")
)

# 5. ASSEMBLE
q5_v1 = q5_amount_line & q5_count_bar & q5_cancelled

q5_v1

Deprecated since `altair=5.0.0`. Use add_params instead.
  .add_selection(state_selection)  # <--- Selection added here


In [170]:
import altair as alt
import pandas as pd

alt.data_transformers.enable("default")

q5_grants = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

q5_trump = (
    df_trump.groupby(["state", "year"])
    .agg(cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum"))
    .reset_index()
)

states = sorted(q5_grants["state"].dropna().unique())
state_dropdown = alt.binding_select(options=states, name="Select State: ")

min_year = int(q5_grants["year"].min())
max_year = int(q5_grants["year"].max())
year_slider = alt.binding_range(min=min_year, max=max_year, step=1, name="Select Year: ")

try:
    state_sel = alt.selection_point(fields=["state"], bind=state_dropdown, value=[{"state": "CA"}])
    year_sel = alt.selection_point(fields=["year"], bind=year_slider, value=[{"year": max_year}])
except AttributeError:
    state_sel = alt.selection_single(fields=["state"], bind=state_dropdown, init={"state": "CA"})
    year_sel = alt.selection_single(fields=["year"], bind=year_slider, init={"year": max_year})

# LEFT BIG

W_BIG, H_BIG = 560, 420
W_SMALL = 340

funding_line = (
    alt.Chart(q5_grants)
    .mark_line(point=True, strokeWidth=3, color="#4c78a8")
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("total_amount:Q", title="Total Funding ($)", axis=alt.Axis(format="~s")),
        tooltip=[
            alt.Tooltip("state:N"),
            alt.Tooltip("year:O"),
            alt.Tooltip("total_amount:Q", title="Total funding", format="$,.0f"),
            alt.Tooltip("grants_count:Q", title="Grants"),
        ],
    )
    .transform_filter(state_sel)
    .properties(width=W_BIG, height=H_BIG, title="Q5 — Funding evolution (selected state)")
)

# highlight selected year on the line
funding_highlight = (
    alt.Chart(q5_grants)
    .mark_point(size=180, filled=True, color="#1f77b4")
    .encode(x="year:O", y="total_amount:Q")
    .transform_filter(state_sel)
    .transform_filter(year_sel)
)

left = (funding_line + funding_highlight).add_params(state_sel, year_sel)

# 4) RIGHT TOP: KPI

kpi_base = alt.Chart(q5_grants).transform_filter(state_sel).transform_filter(year_sel)

kpi_label_1 = kpi_base.mark_text(align="center", color="#888", fontSize=13, dy=-18).encode(
    x=alt.value(170), y=alt.value(45), text=alt.value("Total Funding (Selected Year)")
)
kpi_value_1 = kpi_base.mark_text(align="center", color="#333", fontSize=22, fontWeight="bold", dy=8).encode(
    x=alt.value(170), y=alt.value(45), text=alt.Text("total_amount:Q", format="$,.0f")
)

kpi_label_2 = kpi_base.mark_text(align="center", color="#888", fontSize=13, dy=-18).encode(
    x=alt.value(170), y=alt.value(95), text=alt.value("Grants Count (Selected Year)")
)
kpi_value_2 = kpi_base.mark_text(align="center", color="#333", fontSize=20, fontWeight="bold", dy=8).encode(
    x=alt.value(170), y=alt.value(95), text=alt.Text("grants_count:Q", format=",.0f")
)

kpi_year = (kpi_label_1 + kpi_value_1 + kpi_label_2 + kpi_value_2).properties(width=W_SMALL, height=135)

# RIGHT MID:

count_chart = (
    alt.Chart(q5_grants)
    .mark_bar(color="#72b7b2")
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("grants_count:Q", title="Grants"),
        tooltip=[alt.Tooltip("year:O"), alt.Tooltip("grants_count:Q", title="Grants")],
        opacity=alt.condition(year_sel, alt.value(1.0), alt.value(0.35))
    )
    .transform_filter(state_sel)
    .properties(width=W_SMALL, height=150, title="Grant count (selected state)")
)

# RIGHT BOTTOM: Trump-era cancellations 

cancel_chart = (
    alt.Chart(q5_trump)
    .mark_bar(color="#e45756")
    .encode(
        x=alt.X("year:O", title="Year (Trump era)"),
        y=alt.Y("cancelled_count:Q", title="Cancelled grants"),
        tooltip=[
            alt.Tooltip("year:O"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled grants"),
            alt.Tooltip("cancelled_amount:Q", title="Lost funding", format="$,.0f"),
        ],
    )
    .transform_filter(state_sel)
    .properties(width=W_SMALL, height=160, title="Trump era cancellations (2017–2021)")
)

cancel_kpi_base = alt.Chart(q5_trump).transform_filter(state_sel)

cancel_label = cancel_kpi_base.mark_text(align="center", color="#888", fontSize=13, dy=-18).encode(
    x=alt.value(170), y=alt.value(55), text=alt.value("Total Lost Funding (Trump era)")
)
cancel_value = cancel_kpi_base.mark_text(align="center", color="#333", fontSize=22, fontWeight="bold", dy=8).encode(
    x=alt.value(170), y=alt.value(55), text=alt.Text("sum(cancelled_amount):Q", format="$,.0f")
)

cancel_kpi = (cancel_label + cancel_value).properties(width=W_SMALL, height=95)

# Assemble (dashboard layout)

right = alt.vconcat(kpi_year, count_chart, cancel_chart, cancel_kpi, spacing=12)

q5_v2 = (
    (left | right)
    .configure_view(stroke=None)
    .configure_concat(spacing=24)
)

q5_v2


In [171]:
import altair as alt
import pandas as pd

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA PREPARATION
q5_grants = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

q5_trump = (
    df_trump.groupby(["state", "year"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

# 2. SELECTION
states = sorted(q5_grants["state"].dropna().unique())
state_input = alt.binding_select(options=states, name="Select State: ")

try:
    state_select = alt.selection_point(
        fields=["state"], bind=state_input, value=[{"state": "CA"}]
    )
except AttributeError:
    # Fallback for older Altair versions
    state_select = alt.selection_single(
        fields=["state"], bind=state_input, init={"state": "CA"}
    )

# 3. TOP CHART: EVOLUTION (Dual Axis)
# Note: We use .transform_filter(state_select) here, but we DO NOT add .add_params() yet.
base_evolution = (
    alt.Chart(q5_grants)
    .transform_filter(state_select)
    .encode(x=alt.X("year:O", title=None))
)

# Layer A: Grant Count (Bars)
bar_vol = base_evolution.mark_bar(color="#9ecae1", opacity=0.6).encode(
    y=alt.Y(
        "grants_count:Q", title="Number of Grants", axis=alt.Axis(titleColor="#6baed6")
    ),
    tooltip=["year", "grants_count"],
)

# Layer B: Total Funding (Line)
line_val = base_evolution.mark_line(color="#08519c", strokeWidth=3, point=True).encode(
    y=alt.Y(
        "total_amount:Q",
        title="Total Funding ($)",
        axis=alt.Axis(format="~s", titleColor="#08519c"),
    ),
    tooltip=["year", alt.Tooltip("total_amount", format="$,.0f")],
)

# Combine layers
evolution_chart = (
    alt.layer(bar_vol, line_val)
    .resolve_scale(y="independent")
    .properties(width=600, height=250, title="Q5: State Evolution (Volume vs. Value)")
)

# 4. BOTTOM CHART: CANCELLATIONS
# Again, just filtering, no add_params
cancel_chart = (
    alt.Chart(q5_trump)
    .mark_bar(color="#de2d26")
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("cancelled_count:Q", title="Cancelled Grants"),
        tooltip=[
            alt.Tooltip("year:O"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled Count"),
            alt.Tooltip("cancelled_amount:Q", title="Lost Funding", format="$,.0f"),
        ],
    )
    .transform_filter(state_select)
    .properties(width=600, height=120, title="Impact: Cancelled Grants (Trump Era)")
)

# 5. ASSEMBLE
# ✅ CRITICAL FIX: Add the parameter ONCE to the final concatenated object
q5_v3 = (
    alt.vconcat(evolution_chart, cancel_chart)
    .add_params(state_select)
    .configure_concat(spacing=5)
    .configure_view(stroke=None)
)

q5_v3

In [172]:
import altair as alt
import pandas as pd
import itertools

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA AGGREGATION
# A. Grants (Base)
q5_grants = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# B. Trump Cancellations
q5_trump_agg = (
    df_trump.groupby(["state", "year"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

# 2. CREATE MASTER TIMELINE (The Fix)
# We need every combination of State + Year from BOTH datasets to ensure no year is dropped.

# Get all unique states
all_states = pd.concat([q5_grants["state"], q5_trump_agg["state"]]).unique()

# Get all unique years (Union of Grants years + Trump years)
all_years = pd.concat([q5_grants["year"], q5_trump_agg["year"]]).unique()

# Create a scaffold (All States x All Years)
# This ensures that if 2019 exists in Trump but not Grants, it's still in the chart.
master_rows = list(itertools.product(all_states, all_years))
q5_master = pd.DataFrame(master_rows, columns=["state", "year"])

# 3. MERGE DATA ONTO MASTER
# Merge Grants Data
q5_master = q5_master.merge(q5_grants, on=["state", "year"], how="left")

# Merge Trump Data
q5_master = q5_master.merge(
    q5_trump_agg, on=["state", "year"], how="left", suffixes=("_grant", "_cancel")
)

# Fill NaNs with 0 so the lines/bars draw continuously
q5_master = q5_master.fillna(0)


# 4. SELECTION
# Use the master list of states
states_list = sorted(all_states)
state_input = alt.binding_select(options=states_list, name="Select State: ")

try:
    state_select = alt.selection_point(
        fields=["state"], bind=state_input, value=[{"state": "CA"}]
    )
except AttributeError:
    state_select = alt.selection_single(
        fields=["state"], bind=state_input, init={"state": "CA"}
    )


# 5. TOP CHART: EVOLUTION (Dual Axis)
base_evolution = (
    alt.Chart(q5_master)
    .transform_filter(state_select)
    .encode(x=alt.X("year:O", title=None))
)

# Layer A: Grant Count (Bars)
bar_vol = base_evolution.mark_bar(color="#9ecae1", opacity=0.6).encode(
    y=alt.Y(
        "grants_count:Q", title="Number of Grants", axis=alt.Axis(titleColor="#6baed6")
    ),
    tooltip=["year", "grants_count"],
)

# Layer B: Total Funding (Line)
line_val = base_evolution.mark_line(color="#08519c", strokeWidth=3, point=True).encode(
    y=alt.Y(
        "total_amount:Q",
        title="Total Funding ($)",
        axis=alt.Axis(format="~s", titleColor="#08519c"),
    ),
    tooltip=["year", alt.Tooltip("total_amount", format="$,.0f")],
)

evolution_chart = (
    alt.layer(bar_vol, line_val)
    .resolve_scale(y="independent")
    .properties(width=600, height=250, title="Q5: State Evolution (Volume vs. Value)")
)


# 6. BOTTOM CHART: CANCELLATIONS (Symmetrical Timeline)
# Now uses the SAME q5_master dataset, so the X-axis is identical
cancel_chart = (
    alt.Chart(q5_master)
    .mark_bar(color="#de2d26")
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("cancelled_count:Q", title="Cancelled Grants"),
        tooltip=[
            alt.Tooltip("year:O"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled Count"),
            alt.Tooltip("cancelled_amount:Q", title="Lost Funding", format="$,.0f"),
        ],
    )
    .transform_filter(state_select)
    .properties(width=600, height=120, title="Impact: Cancelled Grants (Trump Era)")
)


# 7. ASSEMBLE
q5_v4 = (
    alt.vconcat(evolution_chart, cancel_chart)
    .add_params(state_select)
    .configure_concat(spacing=5)
    .configure_view(stroke=None)
)

q5_v4

In [195]:
import altair as alt
import pandas as pd
import itertools

# 0. SETUP
alt.data_transformers.enable("default")

# 1. DATA AGGREGATION
# A. Grants (Base)
q5_grants = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

# B. Trump Cancellations
q5_trump_agg = (
    df_trump.groupby(["state", "year"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

# 2. CREATE MASTER TIMELINE
# Ensure we have rows for every year (2017-2024), even if data is missing in one file
all_states = pd.concat([q5_grants["state"], q5_trump_agg["state"]]).unique()
all_years = pd.concat([q5_grants["year"], q5_trump_agg["year"]]).unique()

master_rows = list(itertools.product(all_states, all_years))
q5_master = pd.DataFrame(master_rows, columns=["state", "year"])

# Merge Grants Data
q5_master = q5_master.merge(q5_grants, on=["state", "year"], how="left")

# Merge Trump Data
q5_master = q5_master.merge(q5_trump_agg, on=["state", "year"], how="left")

# Fill NaNs with 0
q5_master = q5_master.fillna(0)


# 3. SELECTION
states_list = sorted(all_states)
state_input = alt.binding_select(options=states_list, name="Select State: ")

try:
    state_select = alt.selection_point(
        fields=["state"], bind=state_input, value=[{"state": "CA"}]
    )
except AttributeError:
    state_select = alt.selection_single(
        fields=["state"], bind=state_input, init={"state": "CA"}
    )


# 4. TOP CHART: EVOLUTION (Blue)
# Bars = Volume (Count), Line = Value ($)
base_evolution = (
    alt.Chart(q5_master)
    .transform_filter(state_select)
    .encode(x=alt.X("year:O", title=None))
)

bar_vol = base_evolution.mark_bar(color="#9ecae1", opacity=0.6).encode(
    y=alt.Y(
        "grants_count:Q", title="Number of Grants", axis=alt.Axis(titleColor="#6baed6")
    ),
    tooltip=["year", "grants_count"],
)

line_val = base_evolution.mark_line(color="#08519c", strokeWidth=3, point=True).encode(
    y=alt.Y(
        "total_amount:Q",
        title="Total Funding ($)",
        axis=alt.Axis(format="~s", titleColor="#08519c"),
    ),
    tooltip=["year", alt.Tooltip("total_amount", format="$,.0f")],
)

evolution_chart = (
    alt.layer(bar_vol, line_val)
    .resolve_scale(y="independent")
    .properties(width=600, height=250, title="Q5: State Evolution (Volume vs. Value)")
)


# 5. BOTTOM CHART: IMPACT (Red)
# Bars = Volume (Cancelled Count), Line = Value (Lost Funding)
base_cancel = (
    alt.Chart(q5_master)
    .transform_filter(state_select)
    .encode(x=alt.X("year:O", title="Year"))
)

# Layer A: Cancelled Count (Bars - Light Red)
cancel_bar = base_cancel.mark_bar(color="#fc9272", opacity=0.6).encode(
    y=alt.Y(
        "cancelled_count:Q",
        title="Cancelled Grants",
        axis=alt.Axis(titleColor="#fc9272"),
    ),
    tooltip=["year", "cancelled_count"],
)

# Layer B: Lost Funding (Line - Dark Red)
cancel_line = base_cancel.mark_line(color="#de2d26", strokeWidth=3, point=True).encode(
    y=alt.Y(
        "cancelled_amount:Q",
        title="Lost Funding ($)",
        axis=alt.Axis(format="~s", titleColor="#de2d26"),
    ),
    tooltip=["year", alt.Tooltip("cancelled_amount", format="$,.0f")],
)

cancel_chart = (
    alt.layer(cancel_bar, cancel_line)
    .resolve_scale(y="independent")
    .properties(width=600, height=150, title="Impact: Cancellations & Lost Funding")
)


# 6. ASSEMBLE
final_q5 = (
    alt.vconcat(evolution_chart, cancel_chart)
    .add_params(state_select)
)

final_q5

To provide a comprehensive view of state-level funding evolution, I designed a **vertically stacked, dual-axis dashboard**. This layout enables a direct 'cause-and-effect' comparison between the funding ecosystem (Top) and the cancellation impact (Bottom) on a synchronized timeline.

Both charts utilize a **Dual-Axis approach** to combine **Volume** (Bars: Grant Counts) and **Value** (Lines: Funding Amount). This is critical because a drop in grant volume doesn't always equal a drop in funding; separating these metrics reveals the true texture of the data.

A key technical decision was constructing a '**Master Timeline**' (2017–2024) that merges data from both the active grants and cancellation datasets. This ensures visual continuity, allowing users to see exactly where data is missing or where activity dips to zero, rather than having misleading gaps. The result is a rigorous, symmetrical profile that instantly contextualizes the 'Trump Era' cancellations against the broader backdrop of state funding.

## Q6: Select some attribute that has not been mentioned previously (e.g., party governing, population of the state, number of funded institutions in the state…), and let the user interactively explore the information around the attribute to get insights.

For Question 6, state population was selected as an additional attribute not previously used in the analysis. Population is a meaningful contextual variable that enables deeper exploration beyond absolute grant counts or total funding amounts. By relating funding to population size, users can investigate whether certain states receive disproportionately high or low levels of funding relative to their population, revealing patterns that are not visible through raw totals alone.

This attribute supports an analysis by enabling per capita comparisons, outlier detection, and interactive investigation of funding efficiency across states and years. It integrates naturally with the existing state based aggregations used in earlier questions.

In [174]:
import pandas as pd
import altair as alt

df_pop_raw = pd.read_csv("estimated_population.csv")
df_abbr_raw = pd.read_csv("state_abbreviations.csv")

# Clean column names
df_pop_raw.columns = df_pop_raw.columns.str.strip()
df_abbr_raw.columns = df_abbr_raw.columns.str.strip()

# Ensure we have a 'state' column (full names like Alabama, Alaska, ...)
if "state" not in df_pop_raw.columns:
    raise ValueError(f"estimated_population.csv must have a 'state' column. Found: {list(df_pop_raw.columns)}")

pop_cols = [c for c in df_pop_raw.columns if c.lower().startswith("pop_")]
if not pop_cols:
    raise ValueError(f"Could not find pop_YYYY columns. Found: {list(df_pop_raw.columns)}")

df_pop_long = df_pop_raw.melt(
    id_vars=["state"],
    value_vars=pop_cols,
    var_name="year",
    value_name="population"
)

# Convert year from 'pop_2020' -> 2020
df_pop_long["year"] = df_pop_long["year"].str.replace("pop_", "", regex=False).astype(int)

# Convert population to numeric
df_pop_long["population"] = pd.to_numeric(df_pop_long["population"], errors="coerce")

# Keep only 2020-2024 (safety)
df_pop_long = df_pop_long[df_pop_long["year"].between(2020, 2024)]

# Standardize state name
df_pop_long = df_pop_long.rename(columns={"state": "state_name"})
df_pop_long["state_name"] = df_pop_long["state_name"].astype(str).str.strip()

df_abbr = df_abbr_raw.copy()

# Detect likely columns for state name and abbreviation
name_candidates = [c for c in df_abbr.columns if "name" in c.lower() or ("state" in c.lower() and "abbr" not in c.lower())]
abbr_candidates = [c for c in df_abbr.columns if "abbr" in c.lower() or "code" in c.lower()]

if not name_candidates or not abbr_candidates:
    raise ValueError(
        "state_abbreviations.csv must contain columns for full state name and abbreviation.\n"
        f"Columns found: {list(df_abbr.columns)}"
    )

name_col = name_candidates[0]
abbr_col = abbr_candidates[0]

df_abbr = df_abbr.rename(columns={name_col: "state_name", abbr_col: "state"})
df_abbr["state_name"] = df_abbr["state_name"].astype(str).str.strip()
df_abbr["state"] = df_abbr["state"].astype(str).str.strip()

# Normalize case (helps joins)
df_abbr["state_name_key"] = df_abbr["state_name"].str.lower()
df_pop_long["state_name_key"] = df_pop_long["state_name"].str.lower()

# Join to add 2-letter codes
df_pop_long = df_pop_long.merge(
    df_abbr[["state_name_key", "state"]],
    on="state_name_key",
    how="left"
)

# Debug unmapped names
unmapped = df_pop_long[df_pop_long["state"].isna()]["state_name"].dropna().unique()
print("Unmapped population state names (should be empty):", unmapped[:20], " ... total:", len(unmapped))

# Keep only mapped rows + required cols
df_pop_long = df_pop_long.dropna(subset=["state", "population"])
df_pop_long = df_pop_long[["state", "year", "population"]].copy()

print("Population long shape:", df_pop_long.shape)
print("Population states:", df_pop_long["state"].nunique(), "Years:", sorted(df_pop_long["year"].unique()))

# sanity: ensure expected columns exist
required_cols = {"state", "year", "award_amount", "award_id"}
missing = required_cols - set(df_grants.columns)
if missing:
    raise ValueError(f"df_grants missing required columns: {missing}. Found: {list(df_grants.columns)}")

# Ensure year numeric
df_grants["year"] = pd.to_numeric(df_grants["year"], errors="coerce").astype("Int64")

q6_grants = (
    df_grants
    .dropna(subset=["state", "year", "award_amount"])
    .groupby(["state", "year"])
    .agg(
        total_amount=("award_amount", "sum"),
        grants_count=("award_id", "count")
    )
    .reset_index()
)

print("NSF aggregated shape:", q6_grants.shape)
print("NSF states:", q6_grants["state"].nunique(), "Years:", sorted(q6_grants["year"].unique()))

q6_df = q6_grants.merge(df_pop_long, on=["state", "year"], how="inner")
q6_df["funding_per_capita"] = q6_df["total_amount"] / q6_df["population"]

print("Merged q6_df shape:", q6_df.shape)
print("Merged states:", q6_df["state"].nunique(), "Years:", sorted(q6_df["year"].unique()))

# If still empty, show mismatch hints
if q6_df.empty:
    print("\nq6_df is EMPTY. Debug hints:")
    print("Sample NSF states:", sorted(q6_grants["state"].unique())[:15])
    print("Sample POP states:", sorted(df_pop_long["state"].unique())[:15])
    print("Sample NSF years:", sorted(q6_grants["year"].unique()))
    print("Sample POP years:", sorted(df_pop_long["year"].unique()))
    raise ValueError("Merge produced empty q6_df. See debug hints above.")

year_selection = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(options=sorted(q6_df["year"].unique()), name="Year: "),
    value=max(q6_df["year"].unique())
)

state_click = alt.selection_point(fields=["state"], empty="all")

q6_overview = (
    alt.Chart(q6_df)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title="State"),
        y=alt.Y("funding_per_capita:Q", title="Funding per capita ($)", axis=alt.Axis(format="~s")),
        color=alt.condition(
            state_click,
            alt.Color("funding_per_capita:Q", scale=alt.Scale(scheme="purples"), title="Funding per capita"),
            alt.value("lightgray")
        ),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("population:Q", title="Population", format=",.0f"),
            alt.Tooltip("total_amount:Q", title="Total funding ($)", format=",.0f"),
            alt.Tooltip("funding_per_capita:Q", title="Funding per capita ($)", format=",.2f"),
            alt.Tooltip("grants_count:Q", title="Grants count"),
        ]
    )
    .add_params(year_selection, state_click)
    .transform_filter(year_selection)
    .properties(width=750, height=380, title="Q6 — NSF funding per capita by state (select year + click a state)")
)



q6_trend = (
    alt.Chart(q6_df)
    .mark_line(point=True)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("funding_per_capita:Q", title="Funding per capita ($)", axis=alt.Axis(format="~s")),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("funding_per_capita:Q", title="Funding per capita ($)", format=",.2f"),
            alt.Tooltip("total_amount:Q", title="Total funding ($)", format=",.0f"),
            alt.Tooltip("population:Q", title="Population", format=",.0f"),
            alt.Tooltip("grants_count:Q", title="Grants count"),
        ]
    )
    .transform_filter(state_click)
    .properties(width=750, height=200, title="Selected state — funding per capita over time (2020–2024)")
)

(q6_overview & q6_trend)



Unmapped population state names (should be empty): []  ... total: 0
Population long shape: (255, 3)
Population states: 51 Years: [2020, 2021, 2022, 2023, 2024]
NSF aggregated shape: (231, 4)
NSF states: 52 Years: [2020, 2021, 2022, 2023, 2024]
Merged q6_df shape: (222, 6)
Merged states: 50 Years: [2020, 2021, 2022, 2023, 2024]


In [175]:
# Q6
import pandas as pd
import altair as alt

alt.data_transformers.enable("default")


df_pop_raw = pd.read_csv("estimated_population.csv")
df_abbr_raw = pd.read_csv("state_abbreviations.csv")

df_pop_raw.columns = df_pop_raw.columns.str.strip()
df_abbr_raw.columns = df_abbr_raw.columns.str.strip()

pop_cols = [c for c in df_pop_raw.columns if c.lower().startswith("pop_")]
if "state" not in df_pop_raw.columns or not pop_cols:
    raise ValueError(
        "estimated_population.csv must have 'state' + columns like pop_2020..pop_2024. "
        f"Found: {list(df_pop_raw.columns)}"
    )

df_pop_long = df_pop_raw.melt(
    id_vars=["state"],
    value_vars=pop_cols,
    var_name="year",
    value_name="population"
)
df_pop_long["year"] = df_pop_long["year"].str.replace("pop_", "", regex=False).astype(int)
df_pop_long["population"] = pd.to_numeric(df_pop_long["population"], errors="coerce")
df_pop_long = df_pop_long[df_pop_long["year"].between(2020, 2024)].copy()

df_pop_long = df_pop_long.rename(columns={"state": "state_name"})
df_pop_long["state_name"] = df_pop_long["state_name"].astype(str).str.strip()

# detect columns in abbreviations
df_abbr = df_abbr_raw.copy()
name_candidates = [c for c in df_abbr.columns if "name" in c.lower() or (c.lower() == "state")]
abbr_candidates = [c for c in df_abbr.columns if "abbr" in c.lower() or "code" in c.lower()]

if not name_candidates or not abbr_candidates:
    raise ValueError(
        "state_abbreviations.csv must contain full state-name + abbreviation columns.\n"
        f"Columns found: {list(df_abbr.columns)}"
    )

name_col = name_candidates[0]
abbr_col = abbr_candidates[0]

df_abbr = df_abbr.rename(columns={name_col: "state_name", abbr_col: "state"})
df_abbr["state_name"] = df_abbr["state_name"].astype(str).str.strip()
df_abbr["state"] = df_abbr["state"].astype(str).str.strip()

df_abbr["state_name_key"] = df_abbr["state_name"].str.lower()
df_pop_long["state_name_key"] = df_pop_long["state_name"].str.lower()

df_pop_long = df_pop_long.merge(
    df_abbr[["state_name_key", "state"]],
    on="state_name_key",
    how="left"
)

df_pop_long = df_pop_long.dropna(subset=["state", "population"])
df_pop_long = df_pop_long[["state", "year", "population"]].copy()

# NSF AGGREGATION 

required_cols = {"state", "year", "award_amount", "award_id"}
missing = required_cols - set(df_grants.columns)
if missing:
    raise ValueError(f"df_grants missing required columns: {missing}. Found: {list(df_grants.columns)}")

df_grants["year"] = pd.to_numeric(df_grants["year"], errors="coerce").astype(int)

q6_grants = (
    df_grants.dropna(subset=["state", "year", "award_amount"])
    .groupby(["state", "year"])
    .agg(
        total_amount=("award_amount", "sum"),
        grants_count=("award_id", "count"),
    )
    .reset_index()
)

q6_df = q6_grants.merge(df_pop_long, on=["state", "year"], how="inner")
q6_df["funding_per_capita"] = q6_df["total_amount"] / q6_df["population"]

if q6_df.empty:
    raise ValueError("q6_df is empty after merge. Check state mapping + years.")

min_year = int(q6_df["year"].min())
max_year = int(q6_df["year"].max())

slider = alt.binding_range(min=min_year, max=max_year, step=1, name="Select Year: ")

try:
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=slider, value=[{"year": max_year}]
    )
    # allow empty selection so right panel can be blank
    state_select = alt.selection_point(
        name="state_select", fields=["state"], empty="all", on="click", clear="dblclick"
    )
except AttributeError:
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=slider, init={"year": max_year}
    )
    state_select = alt.selection_single(fields=["state"], empty="all")

# LEFT bars
bars = (
    alt.Chart(q6_df)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title="State"),
        y=alt.Y("funding_per_capita:Q", title="Funding per capita ($/person)", axis=alt.Axis(format=",.2f")),
        color=alt.Color("funding_per_capita:Q", scale=alt.Scale(scheme="purples"), legend=None),
        stroke=alt.condition(state_select, alt.value("black"), alt.value(None)),
        strokeWidth=alt.condition(state_select, alt.value(1.5), alt.value(0)),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("population:Q", title="Population", format=",.0f"),
            alt.Tooltip("total_amount:Q", title="Total NSF funding", format="$,.0f"),
            alt.Tooltip("funding_per_capita:Q", title="Funding per capita", format="$,.2f"),
            alt.Tooltip("grants_count:Q", title="Grants count"),
        ],
    )
    .transform_filter(year_select)
    .add_params(year_select, state_select)
    .properties(width=560, height=420, title="Q6 — Funding per capita by state (click a bar)")
)

# RIGHT panel

history = (
    alt.Chart(q6_df)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("funding_per_capita:Q", title="$/person", axis=alt.Axis(format=",.2f")),
        color=alt.value("#6a3d9a"),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("funding_per_capita:Q", title="Funding per capita", format="$,.2f"),
            alt.Tooltip("total_amount:Q", title="Total NSF funding", format="$,.0f"),
            alt.Tooltip("population:Q", title="Population", format=",.0f"),
            alt.Tooltip("grants_count:Q", title="Grants count"),
        ],
    )
    .transform_filter(state_select)
    .properties(width=340, height=200, title="History (selected state)")
)

# KPI block 
kpi_base = alt.Chart(q6_df).transform_filter(year_select).transform_filter(state_select)

def kpi(label_text, expr, fmt, y):
    lbl = kpi_base.mark_text(align="center", color="#888", fontSize=13, dy=-10).encode(
        text=alt.value(label_text), x=alt.value(170), y=alt.value(y)
    )
    val = kpi_base.mark_text(align="center", color="#333", fontSize=22, fontWeight="bold", dy=12).encode(
        text=alt.Text(expr, format=fmt), x=alt.value(170), y=alt.value(y)
    )
    return lbl + val

kpi_panel = (
    alt.Chart(pd.DataFrame({"x":[0]}))
    .mark_rect(opacity=0)
    .encode()
    .properties(width=340, height=220)
    + kpi("Funding per capita (selected year)", "mean(funding_per_capita):Q", "$,.2f", y=60)
    + kpi("Total NSF funding (selected year)", "sum(total_amount):Q", "$,.0f", y=135)
    + kpi("Population (selected year)", "mean(population):Q", ",.0f", y=210)
)

# Placeholder text when no state is selected
placeholder = (
    alt.Chart(pd.DataFrame())
    .mark_text(align="center", color="#999", fontSize=14)
    .encode(text="msg:N", x=alt.value(170), y=alt.value(120))
    .properties(width=340, height=200, title="History (selected state)")
)

# Show placeholder when empty selection; otherwise show history
right_top = alt.layer(
    placeholder.transform_filter(~state_select),
    history.transform_filter(state_select),
)

# For KPIs: show a light placeholder block when empty
kpi_placeholder = (
    alt.Chart(pd.DataFrame())
    .mark_text(align="center", color="#bbb", fontSize=13)
    .encode(text="msg:N", x=alt.value(170), y=alt.value(110))
    .properties(width=340, height=220)
)

right_bottom = alt.layer(
    kpi_placeholder.transform_filter(~state_select),
    kpi_panel.transform_filter(state_select),
)

right_col = right_top & right_bottom


q6_v2 = (
    (bars | right_col)
    .resolve_scale(color="independent")
    .configure_view(stroke=None)
    .configure_concat(spacing=18)
)

q6_v2


In [196]:
import pandas as pd
import altair as alt

alt.data_transformers.enable("default")

# 1. LOAD & CLEAN
df_pop_raw = pd.read_csv("estimated_population.csv")
df_abbr_raw = pd.read_csv("state_abbreviations.csv")

# Clean cols
df_pop_raw.columns = df_pop_raw.columns.str.strip()
df_abbr_raw.columns = df_abbr_raw.columns.str.strip()

# Melt Population
pop_cols = [c for c in df_pop_raw.columns if c.lower().startswith("pop_")]
df_pop_long = df_pop_raw.melt(
    id_vars=["state"], value_vars=pop_cols, var_name="year", value_name="population"
)
df_pop_long["year"] = (
    df_pop_long["year"].str.replace("pop_", "", regex=False).astype(int)
)
df_pop_long["population"] = pd.to_numeric(df_pop_long["population"], errors="coerce")
df_pop_long = df_pop_long[df_pop_long["year"].between(2020, 2024)].copy()
df_pop_long = df_pop_long.rename(columns={"state": "state_name"})
df_pop_long["state_name"] = df_pop_long["state_name"].astype(str).str.strip()

# Clean Abbreviations
df_abbr = df_abbr_raw.copy()
name_col = [
    c for c in df_abbr.columns if "name" in c.lower() or (c.lower() == "state")
][0]
abbr_col = [c for c in df_abbr.columns if "abbr" in c.lower() or "code" in c.lower()][0]
df_abbr = df_abbr.rename(columns={name_col: "state_name", abbr_col: "state"})
df_abbr["state_name"] = df_abbr["state_name"].astype(str).str.strip()
df_abbr["state"] = df_abbr["state"].astype(str).str.strip()

# Merge Pop + Abbr
df_abbr["state_name_key"] = df_abbr["state_name"].str.lower()
df_pop_long["state_name_key"] = df_pop_long["state_name"].str.lower()
df_pop_long = df_pop_long.merge(
    df_abbr[["state_name_key", "state"]], on="state_name_key", how="left"
)
df_pop_long = df_pop_long.dropna(subset=["state", "population"])
df_pop_long = df_pop_long[["state", "year", "population"]].copy()

# NSF Data Prep
df_grants["year"] = pd.to_numeric(df_grants["year"], errors="coerce").astype(int)
q6_grants = (
    df_grants.dropna(subset=["state", "year", "award_amount"])
    .groupby(["state", "year"])
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
)

# Merge All
q6_df = q6_grants.merge(df_pop_long, on=["state", "year"], how="inner")
q6_df["funding_per_capita"] = q6_df["total_amount"] / q6_df["population"]

# CALCULATE NATIONAL AVERAGES
us_avg = q6_df.groupby("year")["funding_per_capita"].mean().reset_index()
us_avg = us_avg.rename(columns={"funding_per_capita": "us_avg_per_capita"})
q6_df = q6_df.merge(us_avg, on="year", how="left")


# 2. INTERACTION SETUP
min_year = int(q6_df["year"].min())
max_year = int(q6_df["year"].max())

slider = alt.binding_range(min=min_year, max=max_year, step=1, name="Select Year: ")

try:
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=slider, value=[{"year": max_year}]
    )
    state_select = alt.selection_point(
        name="state_select", fields=["state"], empty="all", on="click", clear="dblclick"
    )
except AttributeError:
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=slider, init={"year": max_year}
    )
    state_select = alt.selection_single(fields=["state"], empty="all")


# 3. LEFT CHART: SCATTER PLOT
base_scatter = alt.Chart(q6_df).transform_filter(year_select)

points = (
    base_scatter.mark_circle(size=120, opacity=0.8, stroke="white", strokeWidth=1)
    .encode(
        x=alt.X("population:Q", title="State Population", axis=alt.Axis(format="~s")),
        y=alt.Y(
            "funding_per_capita:Q",
            title="Funding Per Capita ($)",
            axis=alt.Axis(format="$,.0f"),
        ),
        color=alt.condition(
            state_select,
            alt.Color(
                "funding_per_capita:Q", scale=alt.Scale(scheme="viridis"), legend=None
            ),
            alt.value("lightgray"),
        ),
        size=alt.condition(state_select, alt.value(150), alt.value(80)),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("population:Q", format=",.0f"),
            alt.Tooltip("total_amount:Q", format="$,.0f", title="Total Funding"),
            alt.Tooltip("funding_per_capita:Q", format="$,.2f", title="Per Capita"),
        ],
    )
    .add_params(state_select, year_select)
)

rule = base_scatter.mark_rule(color="red", strokeDash=[5, 5], size=2).encode(
    y="mean(us_avg_per_capita):Q",
    tooltip=[
        alt.Tooltip(
            "mean(us_avg_per_capita):Q", format="$,.2f", title="National Average"
        )
    ],
)

rule_text = base_scatter.mark_text(
    align="left", dx=5, dy=-5, color="red", fontWeight="bold"
).encode(
    y=alt.Y("mean(us_avg_per_capita):Q"), x=alt.value(0), text=alt.value("National Avg")
)

left_chart = (points + rule + rule_text).properties(
    width=500, height=400, title="Efficiency Matrix: Population vs. Funding Intensity"
)


# 4. RIGHT PANEL: DETAILS & HISTORY

# A. Trend Comparison
history_base = alt.Chart(q6_df).transform_filter(state_select)

state_line = history_base.mark_line(point=True, strokeWidth=4, color="#440154").encode(
    x=alt.X("year:O", title="Year"),
    y=alt.Y("funding_per_capita:Q", title="$/Person"),
    tooltip=["year", alt.Tooltip("funding_per_capita", format="$,.2f")],
)

avg_line = (
    alt.Chart(us_avg)
    .mark_line(strokeDash=[5, 5], color="red", opacity=0.5)
    .encode(x=alt.X("year:O"), y=alt.Y("us_avg_per_capita:Q"))
)

# We wrap this in a filter so it disappears if nothing is selected
history_chart = (
    (avg_line + state_line)
    .transform_filter(state_select)
    .properties(
        width=350, height=200, title="History: Selected State vs. National Avg (Red)"
    )
)

# B. KPI Block
kpi_base = alt.Chart(q6_df).transform_filter(year_select).transform_filter(state_select)


def make_kpi(label, value_col, fmt, y_pos):
    lbl = kpi_base.mark_text(align="center", color="#666", fontSize=12).encode(
        text=alt.value(label), x=alt.value(175), y=alt.value(y_pos)
    )
    val = kpi_base.mark_text(
        align="center", color="#333", fontSize=20, fontWeight="bold"
    ).encode(
        text=alt.Text(value_col, format=fmt), x=alt.value(175), y=alt.value(y_pos + 20)
    )
    return lbl + val


kpis = (
    alt.Chart(pd.DataFrame({"x": [0]}))
    .mark_rect(opacity=0)
    .properties(width=350, height=180)
    + make_kpi("State Population", "mean(population):Q", ",.0f", 20)
    + make_kpi("Total Funding Received", "sum(total_amount):Q", "$,.2s", 80)
    + make_kpi("Per Capita Funding", "mean(funding_per_capita):Q", "$,.2f", 140)
)


# 5. ASSEMBLE
# We just stack them vertically. If no state is selected, they will simply be blank/empty charts, which is fine.
right_panel = alt.vconcat(history_chart, kpis, spacing=20)

final_q6 = (
    (left_chart | right_panel).configure_view(stroke=None).configure_concat(spacing=20)
)

final_q6

In [217]:
import pandas as pd
import altair as alt

alt.data_transformers.enable("default")

# 1. LOAD & CLEAN
df_pop_raw = pd.read_csv("estimated_population.csv")
df_abbr_raw = pd.read_csv("state_abbreviations.csv")

# Clean cols
df_pop_raw.columns = df_pop_raw.columns.str.strip()
df_abbr_raw.columns = df_abbr_raw.columns.str.strip()

# Melt Population
pop_cols = [c for c in df_pop_raw.columns if c.lower().startswith("pop_")]
df_pop_long = df_pop_raw.melt(
    id_vars=["state"], value_vars=pop_cols, var_name="year", value_name="population"
)
df_pop_long["year"] = (
    df_pop_long["year"].str.replace("pop_", "", regex=False).astype(int)
)
df_pop_long["population"] = pd.to_numeric(df_pop_long["population"], errors="coerce")
df_pop_long = df_pop_long[df_pop_long["year"].between(2020, 2024)].copy()
df_pop_long = df_pop_long.rename(columns={"state": "state_name"})
df_pop_long["state_name"] = df_pop_long["state_name"].astype(str).str.strip()

# Clean Abbreviations
df_abbr = df_abbr_raw.copy()
name_col = [
    c for c in df_abbr.columns if "name" in c.lower() or (c.lower() == "state")
][0]
abbr_col = [c for c in df_abbr.columns if "abbr" in c.lower() or "code" in c.lower()][0]
df_abbr = df_abbr.rename(columns={name_col: "state_name", abbr_col: "state"})
df_abbr["state_name"] = df_abbr["state_name"].astype(str).str.strip()
df_abbr["state"] = df_abbr["state"].astype(str).str.strip()

# Merge Pop + Abbr
df_abbr["state_name_key"] = df_abbr["state_name"].str.lower()
df_pop_long["state_name_key"] = df_pop_long["state_name"].str.lower()
df_pop_long = df_pop_long.merge(
    df_abbr[["state_name_key", "state"]], on="state_name_key", how="left"
)
df_pop_long = df_pop_long.dropna(subset=["state", "population"])
df_pop_long = df_pop_long[["state", "year", "population"]].copy()

# NSF Data Prep
df_grants["year"] = pd.to_numeric(df_grants["year"], errors="coerce").astype(int)
q6_grants = (
    df_grants.dropna(subset=["state", "year", "award_amount"])
    .groupby(["state", "year"])
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
)

# Merge All
q6_df = q6_grants.merge(df_pop_long, on=["state", "year"], how="inner")
q6_df["funding_per_capita"] = q6_df["total_amount"] / q6_df["population"]

# CALCULATE NATIONAL AVERAGES
us_avg = q6_df.groupby("year")["funding_per_capita"].mean().reset_index()
us_avg = us_avg.rename(columns={"funding_per_capita": "us_avg_per_capita"})
q6_df = q6_df.merge(us_avg, on="year", how="left")


# 2. INTERACTION SETUP
min_year = int(q6_df["year"].min())
max_year = int(q6_df["year"].max())

slider = alt.binding_range(min=min_year, max=max_year, step=1, name="Select Year: ")

try:
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=slider, value=[{"year": max_year}]
    )
    state_select = alt.selection_point(
        name="state_select", fields=["state"], empty="all", on="click", clear="dblclick"
    )
except AttributeError:
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=slider, init={"year": max_year}
    )
    state_select = alt.selection_single(fields=["state"], empty="all")


# 3. LEFT CHART: SCATTER PLOT
base_scatter = alt.Chart(q6_df).transform_filter(year_select)

points = (
    base_scatter.mark_circle(size=120, opacity=0.8, stroke="white", strokeWidth=1)
    .encode(
        x=alt.X("population:Q", title="State Population", axis=alt.Axis(format="~s")),
        y=alt.Y(
            "funding_per_capita:Q",
            title="Funding Per Capita ($)",
            axis=alt.Axis(format="$,.0f"),
        ),
        color=alt.condition(
            state_select,
            alt.Color(
                "funding_per_capita:Q", scale=alt.Scale(scheme="viridis"), legend=None
            ),
            alt.value("lightgray"),
        ),
        size=alt.condition(state_select, alt.value(150), alt.value(80)),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("population:Q", format=",.0f"),
            alt.Tooltip("total_amount:Q", format="$,.0f", title="Total Funding"),
            alt.Tooltip("funding_per_capita:Q", format="$,.2f", title="Per Capita"),
        ],
    )
    .add_params(state_select, year_select)
    .interactive()  # <--- ADDED HERE: Makes the scatter plot zoomable and pannable
)

rule = base_scatter.mark_rule(color="red", strokeDash=[5, 5], size=2).encode(
    y="mean(us_avg_per_capita):Q",
    tooltip=[
        alt.Tooltip(
            "mean(us_avg_per_capita):Q", format="$,.2f", title="National Average"
        )
    ],
)

rule_text = base_scatter.mark_text(
    align="left", dx=5, dy=-5, color="red", fontWeight="bold"
).encode(
    y=alt.Y("mean(us_avg_per_capita):Q"), x=alt.value(0), text=alt.value("National Avg")
)

left_chart = (points + rule + rule_text).properties(
    width=500, height=400, title="Efficiency Matrix: Population vs. Funding Intensity"
)


# 4. RIGHT PANEL: DETAILS & HISTORY

# A. Trend Comparison
history_base = alt.Chart(q6_df).transform_filter(state_select)

state_line = history_base.mark_line(point=True, strokeWidth=4, color="#440154").encode(
    x=alt.X("year:O", title="Year"),
    y=alt.Y("funding_per_capita:Q", title="$/Person"),
    tooltip=["year", alt.Tooltip("funding_per_capita", format="$,.2f")],
)

avg_line = (
    alt.Chart(us_avg)
    .mark_line(strokeDash=[5, 5], color="red", opacity=0.5)
    .encode(x=alt.X("year:O"), y=alt.Y("us_avg_per_capita:Q"))
)

# We wrap this in a filter so it disappears if nothing is selected
history_chart = (
    (avg_line + state_line)
    .transform_filter(state_select)
    .properties(
        width=350, height=200, title="History: Selected State vs. National Avg (Red)"
    )
)

# B. KPI Block
kpi_base = alt.Chart(q6_df).transform_filter(year_select).transform_filter(state_select)


def make_kpi(label, value_col, fmt, y_pos):
    lbl = kpi_base.mark_text(align="center", color="#666", fontSize=12).encode(
        text=alt.value(label), x=alt.value(175), y=alt.value(y_pos)
    )
    val = kpi_base.mark_text(
        align="center", color="#333", fontSize=20, fontWeight="bold"
    ).encode(
        text=alt.Text(value_col, format=fmt), x=alt.value(175), y=alt.value(y_pos + 20)
    )
    return lbl + val


kpis = (
    alt.Chart(pd.DataFrame({"x": [0]}))
    .mark_rect(opacity=0)
    .properties(width=350, height=180)
    + make_kpi("State Population", "mean(population):Q", ",.0f", 20)
    + make_kpi("Total Funding Received", "sum(total_amount):Q", "$,.2s", 80)
    + make_kpi("Per Capita Funding", "mean(funding_per_capita):Q", "$,.2f", 140)
)


# 5. ASSEMBLE
# We just stack them vertically. If no state is selected, they will simply be blank/empty charts, which is fine.
right_panel = alt.vconcat(history_chart, kpis, spacing=20)

final_q6 = (
    (left_chart | right_panel).configure_view(stroke=None).configure_concat(spacing=20)
)

final_q6

In [218]:
import pandas as pd
import altair as alt

alt.data_transformers.enable("default")

# 1. LOAD & CLEAN
df_pop_raw = pd.read_csv("estimated_population.csv")
df_abbr_raw = pd.read_csv("state_abbreviations.csv")

# Clean cols
df_pop_raw.columns = df_pop_raw.columns.str.strip()
df_abbr_raw.columns = df_abbr_raw.columns.str.strip()

# Melt Population
pop_cols = [c for c in df_pop_raw.columns if c.lower().startswith("pop_")]
df_pop_long = df_pop_raw.melt(
    id_vars=["state"], value_vars=pop_cols, var_name="year", value_name="population"
)
df_pop_long["year"] = (
    df_pop_long["year"].str.replace("pop_", "", regex=False).astype(int)
)
df_pop_long["population"] = pd.to_numeric(df_pop_long["population"], errors="coerce")
df_pop_long = df_pop_long[df_pop_long["year"].between(2020, 2024)].copy()
df_pop_long = df_pop_long.rename(columns={"state": "state_name"})
df_pop_long["state_name"] = df_pop_long["state_name"].astype(str).str.strip()

# Clean Abbreviations
df_abbr = df_abbr_raw.copy()
name_col = [
    c for c in df_abbr.columns if "name" in c.lower() or (c.lower() == "state")
][0]
abbr_col = [c for c in df_abbr.columns if "abbr" in c.lower() or "code" in c.lower()][0]
df_abbr = df_abbr.rename(columns={name_col: "state_name", abbr_col: "state"})
df_abbr["state_name"] = df_abbr["state_name"].astype(str).str.strip()
df_abbr["state"] = df_abbr["state"].astype(str).str.strip()

# Merge Pop + Abbr
df_abbr["state_name_key"] = df_abbr["state_name"].str.lower()
df_pop_long["state_name_key"] = df_pop_long["state_name"].str.lower()
df_pop_long = df_pop_long.merge(
    df_abbr[["state_name_key", "state"]], on="state_name_key", how="left"
)
df_pop_long = df_pop_long.dropna(subset=["state", "population"])
df_pop_long = df_pop_long[["state", "year", "population"]].copy()

# NSF Data Prep
df_grants["year"] = pd.to_numeric(df_grants["year"], errors="coerce").astype(int)

# --- NEW: AGGREGATE "ALL YEARS" DATA (Year 0) ---
# A. Yearly Data
q6_yearly = (
    df_grants.dropna(subset=["state", "year", "award_amount"])
    .groupby(["state", "year"])
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
)

# B. Global Data (Year 0)
# Note: For population, we can't sum populations over years.
# We'll take the *average* population of the state over the 5 years for the "Total" view.
q6_total_grants = (
    df_grants.dropna(subset=["state", "award_amount"])
    .groupby(["state"])
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
)
q6_total_grants["year"] = 0

# 3. PREPARE POPULATION FOR YEAR 0
pop_avg = df_pop_long.groupby("state")["population"].mean().reset_index()
pop_avg["year"] = 0

# Combine Pop Data (Yearly + Year 0)
df_pop_full = pd.concat([df_pop_long, pop_avg], ignore_index=True)

# Combine Grants Data (Yearly + Year 0)
q6_grants_full = pd.concat([q6_yearly, q6_total_grants], ignore_index=True)

# Merge All
q6_df = q6_grants_full.merge(df_pop_full, on=["state", "year"], how="inner")
q6_df["funding_per_capita"] = q6_df["total_amount"] / q6_df["population"]

# CALCULATE NATIONAL AVERAGES
us_avg = q6_df.groupby("year")["funding_per_capita"].mean().reset_index()
us_avg = us_avg.rename(columns={"funding_per_capita": "us_avg_per_capita"})
q6_df = q6_df.merge(us_avg, on="year", how="left")


# 2. INTERACTION SETUP
# Get list of years from data (which now includes 0)
years = sorted(q6_df["year"].unique())  # [0, 2020, 2021, 2022, 2023, 2024]
year_options = years
year_labels = ["All Years (Total)"] + [str(y) for y in years if y != 0]

input_element = alt.binding_select(
    options=year_options, labels=year_labels, name="Select Year: "
)

try:
    year_select = alt.selection_point(
        name="year_select", fields=["year"], bind=input_element, value=[{"year": 0}]
    )
    state_select = alt.selection_point(
        name="state_select", fields=["state"], empty="all", on="click", clear="dblclick"
    )
except AttributeError:
    year_select = alt.selection_single(
        name="year_select", fields=["year"], bind=input_element, init={"year": 0}
    )
    state_select = alt.selection_single(fields=["state"], empty="all")


# 3. LEFT CHART: SCATTER PLOT
base_scatter = alt.Chart(q6_df).transform_filter(year_select)

points = (
    base_scatter.mark_circle(size=120, opacity=0.8, stroke="white", strokeWidth=1)
    .encode(
        x=alt.X("population:Q", title="State Population", axis=alt.Axis(format="~s")),
        y=alt.Y(
            "funding_per_capita:Q",
            title="Funding Per Capita ($)",
            axis=alt.Axis(format="$,.0f"),
        ),
        color=alt.condition(
            state_select,
            alt.Color(
                "funding_per_capita:Q", scale=alt.Scale(scheme="viridis"), legend=None
            ),
            alt.value("lightgray"),
        ),
        size=alt.condition(state_select, alt.value(150), alt.value(80)),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("population:Q", format=",.0f"),
            alt.Tooltip("total_amount:Q", format="$,.0f", title="Total Funding"),
            alt.Tooltip("funding_per_capita:Q", format="$,.2f", title="Per Capita"),
        ],
    )
    .add_params(state_select, year_select)
    .interactive()
)

rule = base_scatter.mark_rule(color="red", strokeDash=[5, 5], size=2).encode(
    y="mean(us_avg_per_capita):Q",
    tooltip=[
        alt.Tooltip(
            "mean(us_avg_per_capita):Q", format="$,.2f", title="National Average"
        )
    ],
)

rule_text = base_scatter.mark_text(
    align="left", dx=5, dy=-5, color="red", fontWeight="bold"
).encode(
    y=alt.Y("mean(us_avg_per_capita):Q"), x=alt.value(0), text=alt.value("National Avg")
)

left_chart = (points + rule + rule_text).properties(
    width=500, height=400, title="Efficiency Matrix: Population vs. Funding Intensity"
)


# 4. RIGHT PANEL: DETAILS & HISTORY

# A. Trend Comparison
# NOTE: We filter OUT Year 0 for the trend line so it doesn't plot a weird point
history_base = alt.Chart(q6_df[q6_df["year"] != 0]).transform_filter(state_select)

state_line = history_base.mark_line(point=True, strokeWidth=4, color="#440154").encode(
    x=alt.X("year:O", title="Year"),
    y=alt.Y("funding_per_capita:Q", title="$/Person"),
    tooltip=["year", alt.Tooltip("funding_per_capita", format="$,.2f")],
)

avg_line = (
    alt.Chart(us_avg[us_avg["year"] != 0])
    .mark_line(strokeDash=[5, 5], color="red", opacity=0.5)
    .encode(x=alt.X("year:O"), y=alt.Y("us_avg_per_capita:Q"))
)

history_chart = (
    (avg_line + state_line)
    .transform_filter(state_select)
    .properties(
        width=350, height=200, title="History: Selected State vs. National Avg (Red)"
    )
)

# B. KPI Block
kpi_base = alt.Chart(q6_df).transform_filter(year_select).transform_filter(state_select)


def make_kpi(label, value_col, fmt, y_pos):
    lbl = kpi_base.mark_text(align="center", color="#666", fontSize=12).encode(
        text=alt.value(label), x=alt.value(175), y=alt.value(y_pos)
    )
    val = kpi_base.mark_text(
        align="center", color="#333", fontSize=20, fontWeight="bold"
    ).encode(
        text=alt.Text(value_col, format=fmt), x=alt.value(175), y=alt.value(y_pos + 20)
    )
    return lbl + val


kpis = (
    alt.Chart(pd.DataFrame({"x": [0]}))
    .mark_rect(opacity=0)
    .properties(width=350, height=180)
    + make_kpi("State Population", "mean(population):Q", ",.0f", 20)
    + make_kpi("Total Funding Received", "sum(total_amount):Q", "$,.2s", 80)
    + make_kpi("Per Capita Funding", "mean(funding_per_capita):Q", "$,.2f", 140)
)


# 5. ASSEMBLE
right_panel = alt.vconcat(history_chart, kpis, spacing=20)

final_q6 = (
    (left_chart | right_panel).configure_view(stroke=None).configure_concat(spacing=20)
)

final_q6

For the final analysis, I designed an **'Efficiency Matrix' Scatter Plot** to normalize funding against state size. By plotting **Population (X)** versus **Funding Per Capita (Y)**, this visualization instantly reveals structural disparities that raw totals hide—specifically, identifying small states with high research intensity (top-left) versus large states that are relatively underfunded (bottom-right).

A dynamic **National Average Reference Line (Red)** was added to provide an immediate benchmark, allowing users to spot which states are 'beating the average' in any selected period.

Crucially, I integrated an **'All Years' (Year 0)** aggregation. This allows the user to toggle between a long-term structural view (averaging out yearly noise) and specific yearly snapshots. When a state is selected, the Historical Trend Line on the right excludes the aggregate 'Year 0' to show the true temporal evolution, answering the question: 'Is this state's efficiency a consistent pattern or a one-time anomaly?'

# Final Visualization

## Visualizations one after the other

In [198]:
# CRITICAL FIX: Ensure final_q6 doesn't have configure methods
# This cell fixes final_q6 by removing any configure_*() methods
# Run this cell before running the dashboard_full cell

# The issue is that some cell earlier defines final_q6 with .configure_*() methods
# We need to ensure final_q6 is defined without those methods
# The correct definition should be: (q6_left | q6_right).resolve_scale(color="independent")

# If q6_left and q6_right exist, recreate final_q6 correctly
if 'q6_left' in globals() and 'q6_right' in globals():
    final_q6 = (q6_left | q6_right).resolve_scale(color="independent")
    print("✓ Fixed final_q6 - removed configure methods")
else:
    print("⚠ Warning: q6_left and q6_right not found. Please run Cell 55 first.")


✓ Fixed final_q6 - removed configure methods


In [206]:
# ============================================================
# SCRIPT 1 — FULL-SIZE STACKED DASHBOARD
# (One chart after another, no resizing)
# ============================================================

dashboard_full = (
    alt.vconcat(
        final_q1,
        final_q2,
        final_q3,
        final_q4,
        final_q5,
        final_q6,
        spacing=80,  # generous spacing so nothing collides
    )
    .resolve_scale(color="independent")
    .configure_view(stroke=None)
    .configure_concat(spacing=80)
)

dashboard_full

## First try to plot all viz all together

In [None]:
import pandas as pd
import altair as alt
import itertools

# ---------- Global setup ----------
alt.data_transformers.enable("default")
try:
    alt.data_transformers.disable_max_rows()
except Exception:
    pass

# ---------- Global styling ----------
BASE_CONFIG = (
    alt.Chart()
    .configure_view(stroke=None)
    .configure_axis(
        labelFont="Arial",
        titleFont="Arial",
        labelFontSize=11,
        titleFontSize=12,
        gridColor="#e6e6e6",
    )
    .configure_title(font="Arial", fontSize=14, anchor="start")
    .configure_legend(
        labelFont="Arial",
        titleFont="Arial",
        labelFontSize=11,
        titleFontSize=12,
    )
    .configure_concat(spacing=22)
)

# ============================================================
# Q1 (with unique params)
# ============================================================
q1_yearly = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

q1_total = (
    df_grants.groupby(["state"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)
q1_total["year"] = 0
q1_full = pd.concat([q1_yearly, q1_total], ignore_index=True)

q1_years = sorted(q1_yearly["year"].unique())
q1_year_options = [0] + q1_years
q1_year_labels = ["All Years (Total)"] + [str(y) for y in q1_years]

q1_input = alt.binding_select(options=q1_year_options, labels=q1_year_labels, name="Q1 - Year: ")

try:
    q1_year_select = alt.selection_point(
        name="q1_year_select", fields=["year"], bind=q1_input, value=[{"year": 0}]
    )
    q1_state_select = alt.selection_point(
        name="q1_state_select", fields=["state"], empty="all"
    )
except AttributeError:
    q1_year_select = alt.selection_single(
        name="q1_year_select", fields=["year"], bind=q1_input, init={"year": 0}
    )
    q1_state_select = alt.selection_single(
        name="q1_state_select", fields=["state"], empty="all"
    )

q1_bars = (
    alt.Chart(q1_full)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title="State"),
        y=alt.Y("grants_count:Q", title="Number of Grants"),
        color=alt.condition(
            q1_state_select,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#f0f0f0"),
        ),
        tooltip=[
            alt.Tooltip("state:N"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("grants_count:Q", title="Grants"),
            alt.Tooltip("total_amount:Q", title="Total Amount", format="$,.0f"),
        ],
    )
    .add_params(q1_year_select, q1_state_select)
    .transform_filter(q1_year_select)
    .properties(width=430, height=260, title="Q1 — Grants Distribution (Click a bar)")
)

q1_trend = (
    alt.Chart(q1_yearly)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("total_amount:Q", title="Total Amount ($)", axis=alt.Axis(format="~s")),
        color=alt.value("#4c78a8"),
        tooltip=[alt.Tooltip("year:O"), alt.Tooltip("total_amount:Q", format="$,.0f")],
    )
    .transform_filter(q1_state_select)
    .properties(width=220, height=115, title="History (Selected State)")
)

q1_base_text = alt.Chart(q1_full).transform_filter(q1_year_select).transform_filter(q1_state_select)

q1_kpi_label = q1_base_text.mark_text(align="center", color="#888", fontSize=12, dy=-12).encode(
    text=alt.value("Total Funding (Selected Year)"), y=alt.value(55), x=alt.value(105)
)
q1_kpi_value = q1_base_text.mark_text(
    align="center", color="#444", fontSize=18, fontWeight="bold", dy=10
).encode(
    text=alt.Text("sum(total_amount):Q", format="$,.0f"), y=alt.value(55), x=alt.value(105)
)
q1_kpi = (q1_kpi_label + q1_kpi_value).properties(width=210, height=90)

final_q1 = (q1_bars | (q1_trend & q1_kpi)).resolve_scale(color="independent")


# ============================================================
# Q2 (unique params)
# ============================================================
q2_yearly = (
    df_grants.groupby(["directorate", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

q2_total = (
    df_grants.groupby(["directorate"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)
q2_total["year"] = 0
q2_full = pd.concat([q2_yearly, q2_total], ignore_index=True)

q2_years = sorted(q2_yearly["year"].unique())
q2_year_options = [0] + q2_years
q2_year_labels = ["All Years (Total)"] + [str(y) for y in q2_years]

q2_input = alt.binding_select(options=q2_year_options, labels=q2_year_labels, name="Q2 - Year: ")

try:
    q2_year_select = alt.selection_point(
        name="q2_year_select", fields=["year"], bind=q2_input, value=[{"year": 0}]
    )
    q2_dir_select = alt.selection_point(
        name="q2_dir_select", fields=["directorate"], empty="all"
    )
except AttributeError:
    q2_year_select = alt.selection_single(
        name="q2_year_select", fields=["year"], bind=q2_input, init={"year": 0}
    )
    q2_dir_select = alt.selection_single(
        name="q2_dir_select", fields=["directorate"], empty="all"
    )

q2_bars = (
    alt.Chart(q2_full)
    .mark_bar()
    .encode(
        x=alt.X("grants_count:Q", title="Number of Grants"),
        y=alt.Y("directorate:N", sort="-x", title="Directorate"),
        color=alt.condition(
            q2_dir_select,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#f0f0f0"),
        ),
        tooltip=[
            alt.Tooltip("directorate:N", title="Directorate"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("grants_count:Q", title="Grants"),
            alt.Tooltip("total_amount:Q", title="Total Amount ($)", format=",.0f"),
        ],
    )
    .add_params(q2_dir_select, q2_year_select)
    .transform_filter(q2_year_select)
    .properties(width=380, height=320, title="Q2 — Grants by Directorate (Click to Filter)")
)

q2_trend = (
    alt.Chart(q2_yearly)
    .mark_line(point=True, strokeWidth=3)
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("total_amount:Q", title="Total Funding ($)", axis=alt.Axis(format="~s")),
        color=alt.value("#4c78a8"),
        tooltip=[
            alt.Tooltip("directorate:N"),
            alt.Tooltip("year:O"),
            alt.Tooltip("total_amount:Q", format="$,.0f"),
        ],
    )
    .transform_filter(q2_dir_select)
    .properties(width=260, height=120, title="Funding History (Selected Directorate)")
)

q2_base_text = alt.Chart(q2_full).transform_filter(q2_year_select).transform_filter(q2_dir_select)
q2_kpi_label = q2_base_text.mark_text(align="center", color="#888", fontSize=12, dy=-12).encode(
    text=alt.value("Total Funding (Selected Year)"), y=alt.value(55), x=alt.value(130)
)
q2_kpi_value = q2_base_text.mark_text(
    align="center", color="#444", fontSize=18, fontWeight="bold", dy=10
).encode(
    text=alt.Text("sum(total_amount):Q", format="$,.0f"), y=alt.value(55), x=alt.value(130)
)
q2_kpi = (q2_kpi_label + q2_kpi_value).properties(width=260, height=80)

q2_legend = (
    alt.Chart(q2_full)
    .mark_circle(opacity=0)
    .encode(
        color=alt.Color(
            "grants_count:Q",
            scale=alt.Scale(scheme="blues"),
            legend=alt.Legend(
                title="Grant Count Intensity",
                orient="bottom",
                direction="horizontal",
                titleAnchor="middle",
                gradientLength=180,
            ),
        )
    )
    .transform_filter(q2_year_select)
    .properties(width=260, height=40)
)

final_q2 = (q2_bars | (q2_trend & q2_kpi & q2_legend)).resolve_scale(color="independent")


# ============================================================
# Q3 (unique params)
# ============================================================
base_yearly = (
    df_grants.groupby(["directorate", "year"])
    .agg(base_count=("award_id", "count"), base_amount=("award_amount", "sum"))
    .reset_index()
)

cancel_yearly = (
    df_trump.groupby(["directorate", "year"])
    .agg(cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum"))
    .reset_index()
)

yearly_df = base_yearly.merge(cancel_yearly, on=["directorate", "year"], how="outer").fillna(0)

base_total = (
    df_grants.groupby(["directorate"])
    .agg(base_count=("award_id", "count"), base_amount=("award_amount", "sum"))
    .reset_index()
)
base_total["year"] = 0

cancel_total = (
    df_trump.groupby(["directorate"])
    .agg(cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum"))
    .reset_index()
)
cancel_total["year"] = 0

total_df = base_total.merge(cancel_total, on=["directorate", "year"], how="outer").fillna(0)

base_total_fixed = base_total[["directorate", "base_count"]].rename(columns={"base_count": "static_base_count"})
yearly_df = yearly_df.merge(base_total_fixed, on="directorate", how="left").fillna(0)

total_rows = total_df.copy()
total_rows["static_base_count"] = total_rows["base_count"]

q3_full = pd.concat([yearly_df, total_rows], ignore_index=True)
q3_full = q3_full[q3_full["year"].isin([0, 2018, 2019, 2020, 2021])]
q3_plot_full = q3_full[(q3_full["static_base_count"] > 0) | (q3_full["cancelled_count"] > 0)].copy()

q3_trend_data = (
    df_trump.groupby(["directorate", "year"])
    .agg(cancelled_count=("award_id", "count"))
    .reset_index()
)

q3_year_options = [0, 2018, 2019, 2020, 2021]
q3_year_labels = ["All Years (Total)", "2018", "2019", "2020", "2021"]
q3_input = alt.binding_select(options=q3_year_options, labels=q3_year_labels, name="Q3 - Year: ")

try:
    q3_year_select = alt.selection_point(
        name="q3_year_select", fields=["year"], bind=q3_input, value=[{"year": 0}]
    )
    q3_dir_select = alt.selection_point(
        name="q3_dir_select", fields=["directorate"], empty="all"
    )
except AttributeError:
    q3_year_select = alt.selection_single(
        name="q3_year_select", fields=["year"], bind=q3_input, init={"year": 0}
    )
    q3_dir_select = alt.selection_single(
        name="q3_dir_select", fields=["directorate"], empty="all"
    )

q3_rank = (
    alt.Chart(q3_plot_full)
    .mark_bar()
    .encode(
        x=alt.X("cancelled_count:Q", title="Cancellations"),
        y=alt.Y("directorate:N", sort="-x", title="Directorate"),
        color=alt.condition(
            q3_dir_select,
            alt.Color("cancelled_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#f0f0f0"),
        ),
        tooltip=[
            "directorate",
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled"),
            alt.Tooltip("cancelled_amount:Q", title="Lost Funding", format="$,.0f"),
        ],
    )
    .add_params(q3_dir_select, q3_year_select)
    .transform_filter(q3_year_select)
    .properties(width=260, height=260, title="Q3 — Ranking: Total Cancellations")
)

q3_scatter = (
    alt.Chart(q3_plot_full)
    .mark_circle(stroke="black", strokeWidth=0.5, opacity=0.8)
    .encode(
        x=alt.X("static_base_count:Q", title="Directorate Size (Total Grants)"),
        y=alt.Y("cancelled_count:Q", title="Cancellations (Selected Year)"),
        size=alt.Size("cancelled_amount:Q", scale=alt.Scale(range=[40, 350]), legend=None),
        color=alt.condition(q3_dir_select, alt.value("#4c78a8"), alt.value("#f0f0f0")),
        tooltip=[
            alt.Tooltip("directorate:N"),
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("static_base_count:Q", title="Size"),
            alt.Tooltip("cancelled_count:Q", title="Cancelled"),
            alt.Tooltip("cancelled_amount:Q", title="Lost Funding", format="$,.0f"),
        ],
    )
    .add_params(q3_dir_select, q3_year_select)
    .transform_filter(q3_year_select)
    .properties(width=330, height=150, title="Context: Volume vs. Cancellations")
    .interactive()
)

q3_timeline = (
    alt.Chart(q3_trend_data)
    .mark_line(point=True, color="#4c78a8")
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("cancelled_count:Q", title="Cancellations"),
        tooltip=["directorate", "year", "cancelled_count"],
    )
    .transform_filter(q3_dir_select)
    .properties(width=330, height=110, title="Timeline: When did it happen?")
)

final_q3 = (q3_rank | (q3_scatter & q3_timeline)).resolve_scale(color="independent")


# ============================================================
# Q4 (unique params)
# ============================================================
q4_df = (
    df_grants.groupby(["year", "state", "directorate"])
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
)

q4_states = sorted(q4_df["state"].unique())
q4_state_input = alt.binding_select(
    options=[None] + q4_states,
    labels=["All States"] + q4_states,
    name="Q4 - State: ",
)
q4_state_select = alt.selection_point(name="q4_state_select", fields=["state"], bind=q4_state_input)

q4_dirs = sorted(q4_df["directorate"].unique())
q4_dir_input = alt.binding_select(
    options=[None] + q4_dirs,
    labels=["All Directorates"] + q4_dirs,
    name="Q4 - Directorate: ",
)
q4_dir_select = alt.selection_point(name="q4_dir_select", fields=["directorate"], bind=q4_dir_input)

q4_area = (
    alt.Chart(q4_df)
    .mark_area(
        line={"color": "#4c78a8"},
        color=alt.Gradient(
            gradient="linear",
            stops=[alt.GradientStop(color="#4c78a8", offset=0),
                   alt.GradientStop(color="white", offset=1)],
            x1=1, x2=1, y1=1, y2=0
        ),
        opacity=0.6,
    )
    .encode(
        x=alt.X("year:O", title="Year"),
        y=alt.Y("sum(total_amount):Q", title="Total Funding ($)", axis=alt.Axis(format="~s")),
        tooltip=[
            alt.Tooltip("year:O", title="Year"),
            alt.Tooltip("sum(total_amount):Q", title="Total Funding", format="$,.0f"),
            alt.Tooltip("sum(grants_count):Q", title="Grants Count"),
        ],
    )
    .add_params(q4_state_select, q4_dir_select)
    .transform_filter(q4_state_select)
    .transform_filter(q4_dir_select)
    .properties(width=430, height=200, title="Q4 — Evolution of Funding (Filtered View)")
)

q4_points = (
    alt.Chart(q4_df)
    .mark_circle(size=55, color="#4c78a8")
    .encode(
        x="year:O",
        y="sum(total_amount):Q",
        tooltip=[
            alt.Tooltip("year:O"),
            alt.Tooltip("sum(total_amount):Q", format="$,.0f"),
            alt.Tooltip("sum(grants_count):Q", format=","),
        ],
    )
    .transform_filter(q4_state_select)
    .transform_filter(q4_dir_select)
)

q4_base_kpi = alt.Chart(q4_df).transform_filter(q4_state_select).transform_filter(q4_dir_select)

q4_kpi_fund = (
    q4_base_kpi.mark_text(align="center", fontSize=18, fontWeight="bold", color="#4c78a8")
    .encode(text=alt.Text("sum(total_amount):Q", format="$,.2s"))
    .properties(width=180, height=35)
)
q4_kpi_fund_lbl = (
    q4_base_kpi.mark_text(align="center", fontSize=11, color="gray", dy=-16)
    .encode(text=alt.value("Total Funding (Selected)"))
    .properties(width=180, height=35)
)

q4_kpi_cnt = (
    q4_base_kpi.mark_text(align="center", fontSize=18, fontWeight="bold", color="#4c78a8")
    .encode(text=alt.Text("sum(grants_count):Q", format=","))  # count already aggregated
    .properties(width=180, height=35)
)
q4_kpi_cnt_lbl = (
    q4_base_kpi.mark_text(align="center", fontSize=11, color="gray", dy=-16)
    .encode(text=alt.value("Total Grants (Selected)"))
    .properties(width=180, height=35)
)

q4_kpis = (q4_kpi_fund_lbl + q4_kpi_fund) & (q4_kpi_cnt_lbl + q4_kpi_cnt)

final_q4 = ((q4_area + q4_points) | q4_kpis).resolve_scale(color="independent")


# ============================================================
# Q5 (unique params)
# ============================================================
q5_grants = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)

q5_trump_agg = (
    df_trump.groupby(["state", "year"])
    .agg(cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum"))
    .reset_index()
)

all_states = pd.concat([q5_grants["state"], q5_trump_agg["state"]]).unique()
all_years = pd.concat([q5_grants["year"], q5_trump_agg["year"]]).unique()
q5_master = pd.DataFrame(list(itertools.product(all_states, all_years)), columns=["state", "year"])
q5_master = q5_master.merge(q5_grants, on=["state", "year"], how="left")
q5_master = q5_master.merge(q5_trump_agg, on=["state", "year"], how="left")
q5_master = q5_master.fillna(0)

q5_states_list = sorted(all_states)
q5_state_input = alt.binding_select(options=q5_states_list, name="Q5 - State: ")

try:
    q5_state_select = alt.selection_point(
        name="q5_state_select", fields=["state"], bind=q5_state_input, value=[{"state": "CA"}]
    )
except AttributeError:
    q5_state_select = alt.selection_single(
        name="q5_state_select", fields=["state"], bind=q5_state_input, init={"state": "CA"}
    )

q5_base = alt.Chart(q5_master).transform_filter(q5_state_select).encode(x=alt.X("year:O", title="Year"))

q5_bar_vol = q5_base.mark_bar(opacity=0.55, color="#9ecae1").encode(
    y=alt.Y("grants_count:Q", title="Grants (count)", axis=alt.Axis(titleColor="#6baed6")),
    tooltip=["year", "grants_count"],
)
q5_line_val = q5_base.mark_line(color="#08519c", strokeWidth=3, point=True).encode(
    y=alt.Y("total_amount:Q", title="Total Funding ($)", axis=alt.Axis(format="~s", titleColor="#08519c")),
    tooltip=["year", alt.Tooltip("total_amount:Q", format="$,.0f")],
)

q5_top = (
    alt.layer(q5_bar_vol, q5_line_val)
    .resolve_scale(y="independent")
    .properties(width=420, height=170, title="Q5 — State Evolution (Volume vs. Value)")
)

q5_cancel_bar = q5_base.mark_bar(opacity=0.55, color="#fc9272").encode(
    y=alt.Y("cancelled_count:Q", title="Cancelled (count)", axis=alt.Axis(titleColor="#fc9272")),
    tooltip=["year", "cancelled_count"],
)
q5_cancel_line = q5_base.mark_line(color="#de2d26", strokeWidth=3, point=True).encode(
    y=alt.Y("cancelled_amount:Q", title="Lost Funding ($)", axis=alt.Axis(format="~s", titleColor="#de2d26")),
    tooltip=["year", alt.Tooltip("cancelled_amount:Q", format="$,.0f")],
)

q5_bottom = (
    alt.layer(q5_cancel_bar, q5_cancel_line)
    .resolve_scale(y="independent")
    .properties(width=420, height=140, title="Impact: Cancellations & Lost Funding")
)

final_q5 = alt.vconcat(q5_top, q5_bottom, spacing=10).add_params(q5_state_select)


# ============================================================
# Q6 (unique params) — reads CSVs
# ============================================================
df_pop_raw = pd.read_csv("estimated_population.csv")
df_abbr_raw = pd.read_csv("state_abbreviations.csv")

df_pop_raw.columns = df_pop_raw.columns.str.strip()
df_abbr_raw.columns = df_abbr_raw.columns.str.strip()

pop_cols = [c for c in df_pop_raw.columns if c.lower().startswith("pop_")]
df_pop_long = df_pop_raw.melt(id_vars=["state"], value_vars=pop_cols, var_name="year", value_name="population")
df_pop_long["year"] = df_pop_long["year"].str.replace("pop_", "", regex=False).astype(int)
df_pop_long["population"] = pd.to_numeric(df_pop_long["population"], errors="coerce")
df_pop_long = df_pop_long[df_pop_long["year"].between(2020, 2024)].copy()
df_pop_long = df_pop_long.rename(columns={"state": "state_name"})
df_pop_long["state_name"] = df_pop_long["state_name"].astype(str).str.strip()

df_abbr = df_abbr_raw.copy()
name_col = [c for c in df_abbr.columns if "name" in c.lower() or (c.lower() == "state")][0]
abbr_col = [c for c in df_abbr.columns if "abbr" in c.lower() or "code" in c.lower()][0]
df_abbr = df_abbr.rename(columns={name_col: "state_name", abbr_col: "state"})
df_abbr["state_name"] = df_abbr["state_name"].astype(str).str.strip()
df_abbr["state"] = df_abbr["state"].astype(str).str.strip()

df_abbr["state_name_key"] = df_abbr["state_name"].str.lower()
df_pop_long["state_name_key"] = df_pop_long["state_name"].str.lower()

df_pop_long = df_pop_long.merge(df_abbr[["state_name_key", "state"]], on="state_name_key", how="left")
df_pop_long = df_pop_long.dropna(subset=["state", "population"])
df_pop_long = df_pop_long[["state", "year", "population"]].copy()

df_grants["year"] = pd.to_numeric(df_grants["year"], errors="coerce").astype(int)
q6_grants = (
    df_grants.dropna(subset=["state", "year", "award_amount"])
    .groupby(["state", "year"])
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
)

q6_df = q6_grants.merge(df_pop_long, on=["state", "year"], how="inner")
q6_df["funding_per_capita"] = q6_df["total_amount"] / q6_df["population"]

us_avg = q6_df.groupby("year")["funding_per_capita"].mean().reset_index()
us_avg = us_avg.rename(columns={"funding_per_capita": "us_avg_per_capita"})
q6_df = q6_df.merge(us_avg, on="year", how="left")

q6_min_year = int(q6_df["year"].min())
q6_max_year = int(q6_df["year"].max())
q6_slider = alt.binding_range(min=q6_min_year, max=q6_max_year, step=1, name="Q6 - Year: ")

try:
    q6_year_select = alt.selection_point(
        name="q6_year_select", fields=["year"], bind=q6_slider, value=[{"year": q6_max_year}]
    )
    q6_state_select = alt.selection_point(
        name="q6_state_select", fields=["state"], empty="all", on="click", clear="dblclick"
    )
except AttributeError:
    q6_year_select = alt.selection_single(
        name="q6_year_select", fields=["year"], bind=q6_slider, init={"year": q6_max_year}
    )
    q6_state_select = alt.selection_single(name="q6_state_select", fields=["state"], empty="all")

q6_base_scatter = alt.Chart(q6_df).transform_filter(q6_year_select)

q6_points = (
    q6_base_scatter.mark_circle(size=110, opacity=0.85, stroke="white", strokeWidth=1)
    .encode(
        x=alt.X("population:Q", title="Population", axis=alt.Axis(format="~s")),
        y=alt.Y("funding_per_capita:Q", title="Funding per capita ($)", axis=alt.Axis(format="$,.0f")),
        color=alt.condition(
            q6_state_select,
            alt.Color("funding_per_capita:Q", scale=alt.Scale(scheme="viridis"), legend=None),
            alt.value("lightgray"),
        ),
        size=alt.condition(q6_state_select, alt.value(150), alt.value(80)),
        tooltip=[
            alt.Tooltip("state:N", title="State"),
            alt.Tooltip("population:Q", format=",.0f"),
            alt.Tooltip("total_amount:Q", format="$,.0f", title="Total Funding"),
            alt.Tooltip("funding_per_capita:Q", format="$,.2f", title="Per Capita"),
        ],
    )
    .add_params(q6_state_select, q6_year_select)
)

q6_rule = q6_base_scatter.mark_rule(color="red", strokeDash=[5, 5], size=2).encode(
    y="mean(us_avg_per_capita):Q"
)
q6_rule_text = q6_base_scatter.mark_text(align="left", dx=5, dy=-5, color="red", fontWeight="bold").encode(
    y=alt.Y("mean(us_avg_per_capita):Q"), x=alt.value(0), text=alt.value("National Avg")
)

q6_left = (q6_points + q6_rule + q6_rule_text).properties(
    width=420, height=260, title="Q6 — Population vs. Funding Intensity"
)

q6_history_base = alt.Chart(q6_df).transform_filter(q6_state_select)
q6_state_line = q6_history_base.mark_line(point=True, strokeWidth=3, color="#440154").encode(
    x=alt.X("year:O", title="Year"),
    y=alt.Y("funding_per_capita:Q", title="$/Person"),
    tooltip=["year", alt.Tooltip("funding_per_capita:Q", format="$,.2f")],
)

q6_avg_line = (
    alt.Chart(us_avg)
    .mark_line(strokeDash=[5, 5], color="red", opacity=0.5)
    .encode(x=alt.X("year:O"), y=alt.Y("us_avg_per_capita:Q"))
)

q6_history = (
    (q6_avg_line + q6_state_line)
    .transform_filter(q6_state_select)
    .properties(width=260, height=120, title="History vs. National Avg")
)

q6_kpi_base = alt.Chart(q6_df).transform_filter(q6_year_select).transform_filter(q6_state_select)

def q6_make_kpi(label, value_col, fmt, y_pos):
    lbl = q6_kpi_base.mark_text(align="center", color="#666", fontSize=11).encode(
        text=alt.value(label), x=alt.value(130), y=alt.value(y_pos)
    )
    val = q6_kpi_base.mark_text(align="center", color="#333", fontSize=16, fontWeight="bold").encode(
        text=alt.Text(value_col, format=fmt), x=alt.value(130), y=alt.value(y_pos + 18)
    )
    return lbl + val

q6_kpis = (
    alt.Chart(pd.DataFrame({"x": [0]})).mark_rect(opacity=0).properties(width=260, height=135)
    + q6_make_kpi("Population", "mean(population):Q", ",.0f", 10)
    + q6_make_kpi("Total Funding", "sum(total_amount):Q", "$,.2s", 55)
    + q6_make_kpi("Per Capita", "mean(funding_per_capita):Q", "$,.2f", 100)
)

q6_right = alt.vconcat(q6_history, q6_kpis, spacing=10)

final_q6 = (q6_left | q6_right).resolve_scale(color="independent")


# ============================================================
# FINAL DASHBOARD LAYOUT (2 columns x 3 rows)
# ============================================================
row1 = (final_q1 | final_q2)
row2 = (final_q3 | final_q4)
row3 = (final_q5 | final_q6)

final_dashboard = (
    (row1 & row2 & row3)
    .configure_view(stroke=None)
    .configure_concat(spacing=26)
    .configure_axis(
        labelFont="Arial",
        titleFont="Arial",
        labelFontSize=11,
        titleFontSize=12,
        gridColor="#e6e6e6",
    )
    .configure_title(font="Arial", fontSize=14, anchor="start")
    .configure_legend(
        labelFont="Arial",
        titleFont="Arial",
        labelFontSize=11,
        titleFontSize=12,
    )
)

final_dashboard


In [210]:
# ============================================================
# SCRIPT 2: COMPACT 3×2 DASHBOARD (3 top, 3 bottom)
# ============================================================

# --- ROW 1 (Top: Q1, Q2, Q3) ---
row1 = alt.hconcat(final_q1, final_q2, final_q3, spacing=20)

# --- ROW 2 (Bottom: Q4, Q5, Q6) ---
row2 = alt.hconcat(final_q4, final_q5, final_q6, spacing=20)

# --- FINAL COMPACT DASHBOARD ---
dashboard_compact = (
    alt.vconcat(row1, row2, spacing=35)
    .resolve_scale(color="independent")
    .configure_view(stroke=None)
    .configure_concat(spacing=20)
    .configure_axis(
        labelFont="Arial",
        titleFont="Arial",
        labelFontSize=9,  # Smaller for compact view
        titleFontSize=10,  # Smaller for compact view
        gridColor="#e6e6e6",
    )
    .configure_title(font="Arial", fontSize=11, anchor="start")  # Smaller titles
    .configure_legend(
        labelFont="Arial",
        titleFont="Arial",
        labelFontSize=9,
        titleFontSize=10,
    )
)

dashboard_compact

## Tries for the final viz

In [214]:
import altair as alt

# ============================================================
# SCRIPT 2: MICRO-COMPACT DASHBOARD (Screen Fit)
# ============================================================

# 1. VISUAL CONFIGURATION (The "Tightening")
# We use a config dictionary to force everything smaller globally
micro_config = {
    "view": {"stroke": "transparent"},
    "axis": {"domain": False, "tickSize": 3, "labelFontSize": 8, "titleFontSize": 9},
    "legend": {"labelFontSize": 8, "titleFontSize": 9, "symbolSize": 20},
    "header": {"labelFontSize": 9, "titleFontSize": 10},
    "title": {"fontSize": 10, "anchor": "middle"},
}

# 2. SPACER CONFIGURATION
# We use minimal spacing (5px instead of 20px) to pack them close
H_SPACING = 10
V_SPACING = 20

# 3. ASSEMBLY
# We assemble them into the grid with tight spacing
row1 = alt.hconcat(final_q1, final_q2, final_q3, spacing=H_SPACING)
row2 = alt.hconcat(final_q4, final_q5, final_q6, spacing=H_SPACING)

dashboard_micro = (
    alt.vconcat(row1, row2, spacing=V_SPACING)
    .resolve_scale(color="independent")
    .configure(**micro_config)  # Apply the tight config
    .configure_concat(spacing=V_SPACING)
)

dashboard_micro

In [215]:
import altair as alt

# ============================================================
# 1. DEFINE MICRO-CHARTS (Optimized for Screen Fit)
# ============================================================

# --- Q1 (Top Left) ---
# Width: 220px (Bars) + 120px (History) = ~340px Total
q1_micro = (
    alt.Chart(q1_full)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title=None),
        y=alt.Y("grants_count:Q", title="Grants"),
        color=alt.condition(
            state_select,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["state", "grants_count"],
    )
    .add_params(year_select, state_select)
    .transform_filter(year_select)
    .properties(width=220, height=180, title="Q1: Grants by State")
    | (
        alt.Chart(q1_yearly)
        .mark_line(point=True)
        .encode(
            x=alt.X("year:O", title=None),
            y=alt.Y("total_amount:Q", axis=alt.Axis(format="~s"), title=None),
        )
        .transform_filter(state_select)
        .properties(width=120, height=80, title="History")
        & alt.Chart(q1_full)
        .transform_filter(year_select)
        .transform_filter(state_select)
        .mark_text(fontWeight="bold")
        .encode(text=alt.Text("sum(total_amount):Q", format="$.2s"))
        .properties(width=120, height=30)
    )
).resolve_scale(color="independent")

# --- Q2 (Top Center) ---
# Width: 180px (Bars) + 130px (History) = ~310px Total
q2_micro = (
    alt.Chart(q2_full)
    .mark_bar()
    .encode(
        x=alt.X("grants_count:Q", title=None),
        y=alt.Y("directorate:N", sort="-x", title=None),
        color=alt.condition(
            dir_select,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["directorate", "grants_count"],
    )
    .add_params(dir_select, year_select)
    .transform_filter(year_select)
    .properties(width=180, height=180, title="Q2: Directorates")
    | (
        alt.Chart(q2_yearly)
        .mark_line(point=True)
        .encode(
            x=alt.X("year:O", title=None),
            y=alt.Y("total_amount:Q", axis=alt.Axis(format="~s"), title=None),
        )
        .transform_filter(dir_select)
        .properties(width=130, height=80, title="History")
        & alt.Chart(q2_full)
        .transform_filter(year_select)
        .transform_filter(dir_select)
        .mark_text(fontWeight="bold")
        .encode(text=alt.Text("sum(total_amount):Q", format="$.2s"))
        .properties(width=130, height=30)
    )
).resolve_scale(color="independent")

# --- Q3 (Top Right) ---
# Width: 140px (Bars) + 180px (Scatter) = ~320px Total
q3_micro = (
    alt.Chart(q3_plot_full)
    .mark_bar()
    .encode(
        x=alt.X("cancelled_count:Q", title="Cancelled"),
        y=alt.Y("directorate:N", sort="-x", title=None),
        color=alt.condition(
            dir_select,
            alt.Color(
                "cancelled_count:Q", scale=alt.Scale(scheme="blues"), legend=None
            ),
            alt.value("#eee"),
        ),
        tooltip=["directorate", "cancelled_count"],
    )
    .add_params(dir_select, year_select)
    .transform_filter(year_select)
    .properties(width=140, height=180, title="Q3: Cancellations")
    | alt.Chart(q3_plot_full)
    .mark_circle(stroke="black")
    .encode(
        x=alt.X("static_base_count:Q", title="Size", axis=alt.Axis(format="~s")),
        y=alt.Y("cancelled_count:Q", title="Hits"),
        size=alt.Size("cancelled_amount:Q", legend=None),
        color=alt.condition(dir_select, alt.value("#4c78a8"), alt.value("#eee")),
        tooltip=["directorate", "cancelled_count"],
    )
    .add_params(dir_select, year_select)
    .transform_filter(year_select)
    .interactive()
    .properties(width=180, height=180, title="Impact")
).resolve_scale(color="independent")

# --- Q4 (Bottom Left) ---
# Width: 300px Total
q4_micro = (
    alt.Chart(q4_df)
    .mark_area(opacity=0.6)
    .encode(
        x=alt.X("year:O", title=None),
        y=alt.Y("sum(total_amount):Q", title="Fund ($)", axis=alt.Axis(format="~s")),
        tooltip=["year", alt.Tooltip("sum(total_amount)", format="$,.2s")],
    )
    .add_params(state_select, dir_select)
    .transform_filter(state_select)
    .transform_filter(dir_select)
    .properties(width=220, height=150, title="Q4: Funding Trend")
    | alt.Chart(q4_df)
    .transform_filter(state_select)
    .transform_filter(dir_select)
    .mark_text(size=16, fontWeight="bold")
    .encode(text=alt.Text("sum(total_amount):Q", format="$.2s"))
    .properties(width=80, height=150, title="Total")
).resolve_scale(color="independent")

# --- Q5 (Bottom Center) ---
# Width: 280px Total
q5_base = (
    alt.Chart(q5_master)
    .transform_filter(state_select)
    .encode(x=alt.X("year:O", title=None))
)
q5_micro = alt.vconcat(
    alt.layer(
        q5_base.mark_bar(color="#9ecae1").encode(
            y=alt.Y("grants_count:Q", title="Grants")
        ),
        q5_base.mark_line(color="#08519c").encode(
            y=alt.Y("total_amount:Q", axis=alt.Axis(format="~s"), title=None)
        ),
    )
    .resolve_scale(y="independent")
    .properties(width=280, height=90, title="Q5: Evolution"),
    alt.layer(
        q5_base.mark_bar(color="#fc9272").encode(
            y=alt.Y("cancelled_count:Q", title="Cancel")
        ),
        q5_base.mark_line(color="#de2d26").encode(
            y=alt.Y("cancelled_amount:Q", axis=alt.Axis(format="~s"), title=None)
        ),
    )
    .resolve_scale(y="independent")
    .properties(width=280, height=60, title="Impact"),
    spacing=5,
).add_params(state_select)

# --- Q6 (Bottom Right) ---
# Width: 220px (Scatter) + 80px (KPI) = ~300px Total
q6_micro = (
    alt.Chart(q6_df)
    .mark_circle(size=60)
    .encode(
        x=alt.X("population:Q", axis=alt.Axis(format="~s"), title="Pop"),
        y=alt.Y("funding_per_capita:Q", axis=alt.Axis(format="$~s"), title="$/Cap"),
        color=alt.condition(
            state_select,
            alt.Color(
                "funding_per_capita:Q", scale=alt.Scale(scheme="purples"), legend=None
            ),
            alt.value("#eee"),
        ),
        tooltip=["state", "funding_per_capita"],
    )
    .add_params(state_select, year_select)
    .transform_filter(year_select)
    .interactive()
    .properties(width=220, height=160, title="Q6: Efficiency")
    | alt.Chart(q6_df)
    .transform_filter(year_select)
    .transform_filter(state_select)
    .mark_text(size=14, fontWeight="bold")
    .encode(text=alt.Text("mean(funding_per_capita):Q", format="$.2f"))
    .properties(width=80, height=160, title="$/Person")
).resolve_scale(color="independent")


# ============================================================
# 2. ASSEMBLE DASHBOARD
# ============================================================

# Row 1: Q1, Q2, Q3
row1 = alt.hconcat(q1_micro, q2_micro, q3_micro, spacing=15)

# Row 2: Q4, Q5, Q6
row2 = alt.hconcat(q4_micro, q5_micro, q6_micro, spacing=15)

# Final Grid
dashboard_compact = (
    alt.vconcat(row1, row2, spacing=20)
    .configure_view(stroke=None)
    .configure_axis(labelFontSize=9, titleFontSize=10, grid=False)  # Clean look
    .configure_title(fontSize=11, anchor="middle")
    .configure_concat(spacing=20)
)

dashboard_compact

In [219]:
import altair as alt
import pandas as pd
import itertools

# 0. GLOBAL SETUP
alt.data_transformers.enable("default")

# ==============================================================================
# 1. DATA LOADING & CLEANING
# ==============================================================================
df_grants = pd.read_csv("NSF_Grants_Last5Years_Clean.csv")
df_trump = pd.read_csv("trump17-21-csv.csv")
df_pop_raw = pd.read_csv("estimated_population.csv")
df_abbr_raw = pd.read_csv("state_abbreviations.csv")

# Clean columns
df_grants.columns = df_grants.columns.str.strip()
df_trump.columns = df_trump.columns.str.strip()
df_pop_raw.columns = df_pop_raw.columns.str.strip()
df_abbr_raw.columns = df_abbr_raw.columns.str.strip()

# Ensure numeric Year
df_grants["year"] = (
    pd.to_numeric(df_grants["year"], errors="coerce").fillna(0).astype(int)
)

# ==============================================================================
# 2. DATA PREPARATION FOR ALL QUESTIONS
# ==============================================================================

# --- Q1 PREP (State Distribution) ---
q1_yearly = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)
q1_total = (
    df_grants.groupby(["state"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)
q1_total["year"] = 0
q1_full = pd.concat([q1_yearly, q1_total], ignore_index=True)

# --- Q2 PREP (Directorate Distribution) ---
q2_yearly = (
    df_grants.groupby(["directorate", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)
q2_total = (
    df_grants.groupby(["directorate"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)
q2_total["year"] = 0
q2_full = pd.concat([q2_yearly, q2_total], ignore_index=True)

# --- Q3 PREP (Cancellations) ---
# Base & Cancelled
base_yearly = (
    df_grants.groupby(["directorate", "year"])
    .agg(base_count=("award_id", "count"), base_amount=("award_amount", "sum"))
    .reset_index()
)
cancel_yearly = (
    df_trump.groupby(["directorate", "year"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)
yearly_df = base_yearly.merge(
    cancel_yearly, on=["directorate", "year"], how="outer"
).fillna(0)

base_total = (
    df_grants.groupby(["directorate"])
    .agg(base_count=("award_id", "count"), base_amount=("award_amount", "sum"))
    .reset_index()
)
base_total["year"] = 0
cancel_total = (
    df_trump.groupby(["directorate"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)
cancel_total["year"] = 0
total_df = base_total.merge(
    cancel_total, on=["directorate", "year"], how="outer"
).fillna(0)

# Static Size Logic (Keep dots aligned)
base_total_fixed = base_total[["directorate", "base_count"]].rename(
    columns={"base_count": "static_base_count"}
)
yearly_df = yearly_df.merge(base_total_fixed, on="directorate", how="left").fillna(0)
total_rows = total_df.copy()
total_rows["static_base_count"] = total_rows["base_count"]

q3_full = pd.concat([yearly_df, total_rows], ignore_index=True)
# Filter for relevant years
q3_full = q3_full[q3_full["year"].isin([0, 2018, 2019, 2020, 2021])]
q3_plot_full = q3_full[
    (q3_full["static_base_count"] > 0) | (q3_full["cancelled_count"] > 0)
].copy()

# --- Q4 PREP (Evolution) ---
q4_df = (
    df_grants.groupby(["year", "state", "directorate"])
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
)

# --- Q5 PREP (State Impact) ---
q5_grants_agg = (
    df_grants.groupby(["state", "year"])
    .agg(grants_count=("award_id", "count"), total_amount=("award_amount", "sum"))
    .reset_index()
)
q5_trump_agg = (
    df_trump.groupby(["state", "year"])
    .agg(
        cancelled_count=("award_id", "count"), cancelled_amount=("award_amount", "sum")
    )
    .reset_index()
)

all_states = pd.concat([q5_grants_agg["state"], q5_trump_agg["state"]]).unique()
all_years = pd.concat([q5_grants_agg["year"], q5_trump_agg["year"]]).unique()
q5_master = pd.DataFrame(
    list(itertools.product(all_states, all_years)), columns=["state", "year"]
)
q5_master = (
    q5_master.merge(q5_grants_agg, on=["state", "year"], how="left")
    .merge(q5_trump_agg, on=["state", "year"], how="left")
    .fillna(0)
)

# --- Q6 PREP (Population Efficiency with Year 0) ---
# 1. Clean Population
pop_cols = [c for c in df_pop_raw.columns if c.lower().startswith("pop_")]
df_pop_long = df_pop_raw.melt(
    id_vars=["state"], value_vars=pop_cols, var_name="year", value_name="population"
)
df_pop_long["year"] = (
    df_pop_long["year"].str.replace("pop_", "", regex=False).astype(int)
)
df_pop_long["population"] = pd.to_numeric(df_pop_long["population"], errors="coerce")
df_pop_long = df_pop_long[df_pop_long["year"].between(2020, 2024)].copy()
df_pop_long = df_pop_long.rename(columns={"state": "state_name"})

# 2. Merge Abbr
df_abbr = df_abbr_raw.rename(
    columns={
        [c for c in df_abbr_raw.columns if "name" in c.lower()][0]: "state_name",
        [c for c in df_abbr_raw.columns if "abbr" in c.lower()][0]: "state",
    }
)
df_abbr["state_name_key"] = df_abbr["state_name"].str.strip().str.lower()
df_pop_long["state_name_key"] = df_pop_long["state_name"].str.strip().str.lower()
df_pop_long = df_pop_long.merge(
    df_abbr[["state_name_key", "state"]], on="state_name_key", how="left"
).dropna(subset=["state", "population"])

# 3. Create Year 0 Data
q6_yearly = (
    df_grants.dropna(subset=["state", "year", "award_amount"])
    .groupby(["state", "year"])
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
)
q6_total = (
    df_grants.dropna(subset=["state", "award_amount"])
    .groupby(["state"])
    .agg(total_amount=("award_amount", "sum"), grants_count=("award_id", "count"))
    .reset_index()
)
q6_total["year"] = 0

pop_avg = df_pop_long.groupby("state")["population"].mean().reset_index()
pop_avg["year"] = 0

# 4. Combine & Merge
df_pop_full = pd.concat(
    [df_pop_long[["state", "year", "population"]], pop_avg], ignore_index=True
)
q6_grants_full = pd.concat([q6_yearly, q6_total], ignore_index=True)
q6_df = q6_grants_full.merge(df_pop_full, on=["state", "year"], how="inner")
q6_df["funding_per_capita"] = q6_df["total_amount"] / q6_df["population"]


# ==============================================================================
# 3. GLOBAL INTERACTION SETUP
# ==============================================================================
# Create a superset of years for the dropdown
all_possible_years = sorted(list(set(q1_full["year"]) | set(q3_full["year"])))
year_input = alt.binding_select(
    options=all_possible_years,
    labels=["All Years (Total)"] + [str(y) for y in all_possible_years if y != 0],
    name="Select Year: ",
)

year_select = alt.selection_point(
    fields=["year"], bind=year_input, value=[{"year": 0}], name="year_sel"
)
state_select = alt.selection_point(fields=["state"], empty="all", name="state_sel")
dir_select = alt.selection_point(fields=["directorate"], empty="all", name="dir_sel")


# ==============================================================================
# 4. MICRO-CHART DEFINITIONS
# ==============================================================================

# --- Q1 (Top Left) ---
q1_micro = (
    alt.Chart(q1_full)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title=None),
        y=alt.Y("grants_count:Q", title="Grants"),
        color=alt.condition(
            state_select,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["state", "grants_count"],
    )
    .add_params(year_select, state_select)
    .transform_filter(year_select)
    .properties(width=220, height=180, title="Q1: Grants by State")
    | (
        alt.Chart(q1_yearly)
        .mark_line(point=True)
        .encode(
            x=alt.X("year:O", title=None),
            y=alt.Y("total_amount:Q", axis=alt.Axis(format="~s"), title=None),
        )
        .transform_filter(state_select)
        .properties(width=120, height=80, title="History")
        & alt.Chart(q1_full)
        .transform_filter(year_select)
        .transform_filter(state_select)
        .mark_text(fontWeight="bold")
        .encode(text=alt.Text("sum(total_amount):Q", format="$.2s"))
        .properties(width=120, height=30)
    )
).resolve_scale(color="independent")

# --- Q2 (Top Center) ---
q2_micro = (
    alt.Chart(q2_full)
    .mark_bar()
    .encode(
        x=alt.X("grants_count:Q", title=None),
        y=alt.Y("directorate:N", sort="-x", title=None),
        color=alt.condition(
            dir_select,
            alt.Color("grants_count:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["directorate", "grants_count"],
    )
    .add_params(dir_select, year_select)
    .transform_filter(year_select)
    .properties(width=180, height=180, title="Q2: Directorates")
    | (
        alt.Chart(q2_yearly)
        .mark_line(point=True)
        .encode(
            x=alt.X("year:O", title=None),
            y=alt.Y("total_amount:Q", axis=alt.Axis(format="~s"), title=None),
        )
        .transform_filter(dir_select)
        .properties(width=130, height=80, title="History")
        & alt.Chart(q2_full)
        .transform_filter(year_select)
        .transform_filter(dir_select)
        .mark_text(fontWeight="bold")
        .encode(text=alt.Text("sum(total_amount):Q", format="$.2s"))
        .properties(width=130, height=30)
    )
).resolve_scale(color="independent")

# --- Q3 (Top Right) ---
q3_micro = (
    alt.Chart(q3_plot_full)
    .mark_bar()
    .encode(
        x=alt.X("cancelled_count:Q", title="Cancelled"),
        y=alt.Y("directorate:N", sort="-x", title=None),
        color=alt.condition(
            dir_select,
            alt.Color(
                "cancelled_count:Q", scale=alt.Scale(scheme="blues"), legend=None
            ),
            alt.value("#eee"),
        ),
        tooltip=["directorate", "cancelled_count"],
    )
    .add_params(dir_select, year_select)
    .transform_filter(year_select)
    .properties(width=140, height=180, title="Q3: Cancellations")
    | alt.Chart(q3_plot_full)
    .mark_circle(stroke="black")
    .encode(
        x=alt.X("static_base_count:Q", title="Size", axis=alt.Axis(format="~s")),
        y=alt.Y("cancelled_count:Q", title="Hits"),
        size=alt.Size("cancelled_amount:Q", legend=None),
        color=alt.condition(dir_select, alt.value("#4c78a8"), alt.value("#eee")),
        tooltip=["directorate", "cancelled_count"],
    )
    .add_params(dir_select, year_select)
    .transform_filter(year_select)
    .interactive()
    .properties(width=180, height=180, title="Impact")
).resolve_scale(color="independent")

# --- Q4 (Bottom Left) ---
q4_micro = (
    alt.Chart(q4_df)
    .mark_area(opacity=0.6)
    .encode(
        x=alt.X("year:O", title=None),
        y=alt.Y("sum(total_amount):Q", title="Fund ($)", axis=alt.Axis(format="~s")),
        tooltip=["year", alt.Tooltip("sum(total_amount)", format="$,.2s")],
    )
    .add_params(state_select, dir_select)
    .transform_filter(state_select)
    .transform_filter(dir_select)
    .properties(width=220, height=150, title="Q4: Funding Trend")
    | alt.Chart(q4_df)
    .transform_filter(state_select)
    .transform_filter(dir_select)
    .mark_text(size=16, fontWeight="bold")
    .encode(text=alt.Text("sum(total_amount):Q", format="$.2s"))
    .properties(width=80, height=150, title="Total")
).resolve_scale(color="independent")

# --- Q5 (Bottom Center) ---
q5_base = (
    alt.Chart(q5_master)
    .transform_filter(state_select)
    .encode(x=alt.X("year:O", title=None))
)
q5_micro = alt.vconcat(
    alt.layer(
        q5_base.mark_bar(color="#9ecae1").encode(
            y=alt.Y("grants_count:Q", title="Grants")
        ),
        q5_base.mark_line(color="#08519c").encode(
            y=alt.Y("total_amount:Q", axis=alt.Axis(format="~s"), title=None)
        ),
    )
    .resolve_scale(y="independent")
    .properties(width=280, height=90, title="Q5: Evolution"),
    alt.layer(
        q5_base.mark_bar(color="#fc9272").encode(
            y=alt.Y("cancelled_count:Q", title="Cancel")
        ),
        q5_base.mark_line(color="#de2d26").encode(
            y=alt.Y("cancelled_amount:Q", axis=alt.Axis(format="~s"), title=None)
        ),
    )
    .resolve_scale(y="independent")
    .properties(width=280, height=60, title="Impact"),
    spacing=5,
).add_params(state_select)

# --- Q6 (Bottom Right) ---
q6_micro = (
    alt.Chart(q6_df)
    .mark_circle(size=60)
    .encode(
        x=alt.X("population:Q", axis=alt.Axis(format="~s"), title="Pop"),
        y=alt.Y("funding_per_capita:Q", axis=alt.Axis(format="$~s"), title="$/Cap"),
        color=alt.condition(
            state_select,
            alt.Color(
                "funding_per_capita:Q", scale=alt.Scale(scheme="purples"), legend=None
            ),
            alt.value("#eee"),
        ),
        tooltip=["state", "funding_per_capita"],
    )
    .add_params(state_select, year_select)
    .transform_filter(year_select)
    .interactive()
    .properties(width=220, height=160, title="Q6: Efficiency")
    | alt.Chart(q6_df)
    .transform_filter(year_select)
    .transform_filter(state_select)
    .mark_text(size=14, fontWeight="bold")
    .encode(text=alt.Text("mean(funding_per_capita):Q", format="$.2f"))
    .properties(width=80, height=160, title="$/Person")
).resolve_scale(color="independent")


# ==============================================================================
# 5. ASSEMBLE GRID
# ==============================================================================
row1 = alt.hconcat(q1_micro, q2_micro, q3_micro, spacing=15)
row2 = alt.hconcat(q4_micro, q5_micro, q6_micro, spacing=15)

dashboard_compact = (
    alt.vconcat(row1, row2, spacing=20)
    .configure_view(stroke=None)
    .configure_axis(labelFontSize=9, titleFontSize=10, grid=False)
    .configure_title(fontSize=11, anchor="middle")
    .configure_concat(spacing=20)
)

dashboard_compact

In [220]:
import altair as alt
import pandas as pd
import itertools

# 0. GLOBAL SETUP
alt.data_transformers.enable("default")

# ==============================================================================
# 1. DATA LOADING & PREP
# ==============================================================================
df_grants = pd.read_csv("NSF_Grants_Last5Years_Clean.csv")
df_trump = pd.read_csv("trump17-21-csv.csv")
df_pop_raw = pd.read_csv("estimated_population.csv")
df_abbr_raw = pd.read_csv("state_abbreviations.csv")

# Clean columns
df_grants.columns = df_grants.columns.str.strip()
df_trump.columns = df_trump.columns.str.strip()
df_pop_raw.columns = df_pop_raw.columns.str.strip()
df_abbr_raw.columns = df_abbr_raw.columns.str.strip()
df_grants["year"] = (
    pd.to_numeric(df_grants["year"], errors="coerce").fillna(0).astype(int)
)

# --- GLOBAL LISTS FOR FILTERS ---
all_years = sorted([y for y in df_grants["year"].unique() if y != 0])
all_states = sorted(df_grants["state"].unique())
all_dirs = sorted(df_grants["directorate"].unique())

# ==============================================================================
# 2. DATA PREPARATION
# ==============================================================================

# Q1 PREP
q1_yearly = (
    df_grants.groupby(["state", "year"])
    .agg(cnt=("award_id", "count"), amt=("award_amount", "sum"))
    .reset_index()
)
q1_total = (
    df_grants.groupby(["state"])
    .agg(cnt=("award_id", "count"), amt=("award_amount", "sum"))
    .reset_index()
)
q1_total["year"] = 0
q1_full = pd.concat([q1_yearly, q1_total], ignore_index=True)

# Q2 PREP
q2_yearly = (
    df_grants.groupby(["directorate", "year"])
    .agg(cnt=("award_id", "count"), amt=("award_amount", "sum"))
    .reset_index()
)
q2_total = (
    df_grants.groupby(["directorate"])
    .agg(cnt=("award_id", "count"), amt=("award_amount", "sum"))
    .reset_index()
)
q2_total["year"] = 0
q2_full = pd.concat([q2_yearly, q2_total], ignore_index=True)

# Q3 PREP
base_yearly = (
    df_grants.groupby(["directorate", "year"])
    .agg(base_count=("award_id", "count"))
    .reset_index()
)
cancel_yearly = (
    df_trump.groupby(["directorate", "year"])
    .agg(cancel_count=("award_id", "count"), cancel_amt=("award_amount", "sum"))
    .reset_index()
)
q3_full = base_yearly.merge(
    cancel_yearly, on=["directorate", "year"], how="outer"
).fillna(0)
# Add totals (Year 0) logic for Q3
base_total = (
    df_grants.groupby(["directorate"])
    .agg(base_count=("award_id", "count"))
    .reset_index()
)
base_total["year"] = 0
cancel_total = (
    df_trump.groupby(["directorate"])
    .agg(cancel_count=("award_id", "count"), cancel_amt=("award_amount", "sum"))
    .reset_index()
)
cancel_total["year"] = 0
q3_full = pd.concat(
    [
        q3_full,
        base_total.merge(cancel_total, on=["directorate", "year"], how="outer").fillna(
            0
        ),
    ],
    ignore_index=True,
)
# Add static size for scatter x-axis
static_size = base_total[["directorate", "base_count"]].rename(
    columns={"base_count": "static_size"}
)
q3_full = q3_full.merge(static_size, on="directorate", how="left")
q3_full = q3_full[
    q3_full["year"].isin([0, 2018, 2019, 2020, 2021])
]  # Filter relevant Trump years

# Q4 PREP
q4_df = (
    df_grants.groupby(["year", "state", "directorate"])
    .agg(total_amount=("award_amount", "sum"))
    .reset_index()
)

# Q5 PREP
q5_data = pd.DataFrame(
    list(itertools.product(all_states, range(2017, 2025))), columns=["state", "year"]
)
q5_data = q5_data.merge(
    df_grants.groupby(["state", "year"])["award_amount"]
    .sum()
    .rename("fund")
    .reset_index(),
    on=["state", "year"],
    how="left",
)
q5_data = q5_data.merge(
    df_trump.groupby(["state", "year"])["award_amount"]
    .sum()
    .rename("lost")
    .reset_index(),
    on=["state", "year"],
    how="left",
).fillna(0)

# Q6 PREP
df_pop = df_pop_raw.melt(
    id_vars=["state"],
    value_vars=[c for c in df_pop_raw.columns if "pop_" in c],
    var_name="year",
    value_name="pop",
)
df_pop["year"] = df_pop["year"].str.replace("pop_", "").astype(int)
df_abbr = df_abbr_raw.rename(
    columns={df_abbr_raw.columns[0]: "name", df_abbr_raw.columns[1]: "abbr"}
)
df_pop = df_pop.merge(df_abbr, left_on="state", right_on="name")  # Get 2 letter code
q6_funds = (
    df_grants.dropna(subset=["state", "year"])
    .groupby(["state", "year"])
    .agg(total=("award_amount", "sum"))
    .reset_index()
)
q6_df = q6_funds.merge(
    df_pop, left_on=["state", "year"], right_on=["abbr", "year"], how="inner"
)
q6_df["per_cap"] = q6_df["total"] / q6_df["pop"]
# Add Year 0 (Avg)
q6_0 = (
    q6_df.groupby("abbr").agg(pop=("pop", "mean"), total=("total", "sum")).reset_index()
)
q6_0["per_cap"] = q6_0["total"] / q6_0["pop"] / 5  # Approx annual avg
q6_0["year"] = 0
q6_0["state"] = q6_0["abbr"]  # Fix column name for Altair
q6_df = pd.concat([q6_df, q6_0], ignore_index=True)


# ==============================================================================
# 3. CHART CONSTRUCTION
# ==============================================================================

# --- Q1: STATES (Year Filter + State Click) ---
q1_yr_sel = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(options=[0] + all_years, name="Q1 Year:"),
    value=[{"year": 0}],
    name="q1_year",
)
q1_st_sel = alt.selection_point(fields=["state"], empty="all", name="q1_state")

q1_chart = (
    alt.Chart(q1_full)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title=None),
        y=alt.Y("cnt:Q", title="Grants"),
        color=alt.condition(
            q1_st_sel,
            alt.Color("cnt:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["state", "cnt", alt.Tooltip("amt", format="$,.2s")],
    )
    .add_params(q1_yr_sel, q1_st_sel)
    .transform_filter(q1_yr_sel)
    .properties(width=200, height=180, title="Q1: Grants by State")
    | alt.Chart(q1_yearly)
    .mark_line(point=True)
    .encode(
        x=alt.X("year:O", title=None),
        y=alt.Y("amt:Q", axis=alt.Axis(format="~s"), title=None),
    )
    .transform_filter(q1_st_sel)
    .properties(width=100, height=80, title="History")
).resolve_scale(color="independent")

# --- Q2: DIRECTORATES (Year Filter + Dir Click) ---
q2_yr_sel = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(options=[0] + all_years, name="Q2 Year:"),
    value=[{"year": 0}],
    name="q2_year",
)
q2_dir_sel = alt.selection_point(fields=["directorate"], empty="all", name="q2_dir")

q2_chart = (
    alt.Chart(q2_full)
    .mark_bar()
    .encode(
        x=alt.X("cnt:Q", title="Grants"),
        y=alt.Y("directorate:N", sort="-x", title=None),
        color=alt.condition(
            q2_dir_sel,
            alt.Color("cnt:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["directorate", "cnt"],
    )
    .add_params(q2_dir_sel, q2_yr_sel)
    .transform_filter(q2_yr_sel)
    .properties(width=180, height=180, title="Q2: Directorates")
    | alt.Chart(q2_yearly)
    .mark_line(point=True)
    .encode(
        x=alt.X("year:O", title=None),
        y=alt.Y("amt:Q", axis=alt.Axis(format="~s"), title=None),
    )
    .transform_filter(q2_dir_sel)
    .properties(width=100, height=80, title="History")
).resolve_scale(color="independent")

# --- Q3: CANCELLATIONS (Year Filter + Dir Click) ---
q3_yr_sel = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(options=[0, 2018, 2019, 2020, 2021], name="Q3 Year:"),
    value=[{"year": 0}],
    name="q3_year",
)
q3_dir_sel = alt.selection_point(fields=["directorate"], empty="all", name="q3_dir")

q3_chart = (
    alt.Chart(q3_full)
    .mark_bar()
    .encode(
        x=alt.X("cancel_count:Q", title="Cancel"),
        y=alt.Y("directorate:N", sort="-x", title=None),
        color=alt.condition(
            q3_dir_sel,
            alt.Color("cancel_count:Q", scale=alt.Scale(scheme="reds"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["directorate", "cancel_count"],
    )
    .add_params(q3_dir_sel, q3_yr_sel)
    .transform_filter(q3_yr_sel)
    .properties(width=140, height=180, title="Q3: Cancellations")
    | alt.Chart(q3_full)
    .mark_circle(stroke="black")
    .encode(
        x=alt.X("static_size:Q", title="Size", axis=alt.Axis(format="~s")),
        y=alt.Y("cancel_count:Q", title="Hits"),
        size=alt.Size("cancel_amt:Q", legend=None),
        color=alt.condition(q3_dir_sel, alt.value("red"), alt.value("#eee")),
        tooltip=["directorate", "cancel_amt"],
    )
    .add_params(q3_dir_sel, q3_yr_sel)
    .transform_filter(q3_yr_sel)
    .interactive()
    .properties(width=160, height=180, title="Impact")
).resolve_scale(color="independent")

# --- Q4: EVOLUTION (State Filter + Directorate Filter) ---
# **ADDED FILTERS HERE**
q4_st_sel = alt.selection_point(
    fields=["state"],
    bind=alt.binding_select(
        options=[None] + all_states, labels=["All"] + all_states, name="Q4 State:"
    ),
    name="q4_state",
)
q4_dir_sel = alt.selection_point(
    fields=["directorate"],
    bind=alt.binding_select(
        options=[None] + all_dirs, labels=["All"] + all_dirs, name="Q4 Dir:"
    ),
    name="q4_dir",
)

q4_chart = (
    alt.Chart(q4_df)
    .mark_area(opacity=0.6)
    .encode(
        x=alt.X("year:O", title=None),
        y=alt.Y("sum(total_amount):Q", title="Fund ($)", axis=alt.Axis(format="~s")),
        tooltip=["year", alt.Tooltip("sum(total_amount)", format="$,.2s")],
    )
    .add_params(q4_st_sel, q4_dir_sel)
    .transform_filter(q4_st_sel)
    .transform_filter(q4_dir_sel)
    .properties(width=220, height=150, title="Q4: Funding Trend")
    | alt.Chart(q4_df)
    .transform_filter(q4_st_sel)
    .transform_filter(q4_dir_sel)
    .mark_text(size=14, fontWeight="bold")
    .encode(text=alt.Text("sum(total_amount):Q", format="$.2s"))
    .properties(width=80, height=150, title="Total")
).resolve_scale(color="independent")

# --- Q5: STATE IMPACT (State Filter) ---
# **ADDED FILTER HERE**
q5_st_sel = alt.selection_point(
    fields=["state"],
    bind=alt.binding_select(options=all_states, name="Q5 State:"),
    value=[{"state": "CA"}],
    name="q5_state",
)

q5_base = (
    alt.Chart(q5_data).transform_filter(q5_st_sel).encode(x=alt.X("year:O", title=None))
)
q5_chart = alt.vconcat(
    alt.layer(
        q5_base.mark_bar(color="#9ecae1").encode(
            y=alt.Y("fund:Q", axis=alt.Axis(format="~s"), title="Funded")
        ),
        q5_base.mark_line(color="#08519c").encode(y="fund:Q"),
    )
    .resolve_scale(y="independent")
    .properties(width=280, height=80, title="Q5: Evolution"),
    alt.layer(
        q5_base.mark_bar(color="#fc9272").encode(
            y=alt.Y("lost:Q", axis=alt.Axis(format="~s"), title="Lost")
        ),
        q5_base.mark_line(color="#de2d26").encode(y="lost:Q"),
    )
    .resolve_scale(y="independent")
    .properties(width=280, height=60, title="Impact"),
    spacing=5,
).add_params(q5_st_sel)

# --- Q6: EFFICIENCY (Year Filter + State Click) ---
q6_yr_sel = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(options=[0, 2020, 2021, 2022, 2023, 2024], name="Q6 Year:"),
    value=[{"year": 0}],
    name="q6_year",
)
q6_st_sel = alt.selection_point(fields=["abbr"], empty="all", name="q6_state")

q6_chart = (
    alt.Chart(q6_df)
    .mark_circle(size=60)
    .encode(
        x=alt.X("pop:Q", axis=alt.Axis(format="~s"), title="Pop"),
        y=alt.Y("per_cap:Q", axis=alt.Axis(format="$~s"), title="$/Cap"),
        color=alt.condition(
            q6_st_sel,
            alt.Color("per_cap:Q", scale=alt.Scale(scheme="purples"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["abbr", "pop", "per_cap"],
    )
    .add_params(q6_st_sel, q6_yr_sel)
    .transform_filter(q6_yr_sel)
    .interactive()
    .properties(width=220, height=160, title="Q6: Efficiency")
    | alt.Chart(q6_df)
    .transform_filter(q6_yr_sel)
    .transform_filter(q6_st_sel)
    .mark_text(size=14, fontWeight="bold")
    .encode(text=alt.Text("mean(per_cap):Q", format="$.2f"))
    .properties(width=80, height=160, title="$/Person")
).resolve_scale(color="independent")


# ==============================================================================
# 4. FINAL GRID ASSEMBLY
# ==============================================================================

row1 = alt.hconcat(q1_chart, q2_chart, q3_chart, spacing=15)
row2 = alt.hconcat(q4_chart, q5_chart, q6_chart, spacing=15)

dashboard = (
    alt.vconcat(row1, row2, spacing=30)
    .configure_view(stroke=None)
    .configure_axis(labelFontSize=9, titleFontSize=10, grid=False)
    .configure_title(fontSize=11, anchor="middle")
    .configure_concat(spacing=30)
)

dashboard

## Final viz

In [221]:
import altair as alt
import pandas as pd
import itertools

# 0. GLOBAL SETUP
alt.data_transformers.enable("default")

# ==============================================================================
# 1. DATA LOADING & CLEANING
# ==============================================================================
df_grants = pd.read_csv("NSF_Grants_Last5Years_Clean.csv")
df_trump = pd.read_csv("trump17-21-csv.csv")
df_pop_raw = pd.read_csv("estimated_population.csv")
df_abbr_raw = pd.read_csv("state_abbreviations.csv")

# Clean columns
df_grants.columns = df_grants.columns.str.strip()
df_trump.columns = df_trump.columns.str.strip()
df_pop_raw.columns = df_pop_raw.columns.str.strip()
df_abbr_raw.columns = df_abbr_raw.columns.str.strip()

# Ensure numeric Year
df_grants["year"] = (
    pd.to_numeric(df_grants["year"], errors="coerce").fillna(0).astype(int)
)

# --- GLOBAL FILTER LISTS ---
all_years = sorted([y for y in df_grants["year"].unique() if y != 0])
all_states = sorted(df_grants["state"].unique())
all_dirs = sorted(df_grants["directorate"].unique())

# ==============================================================================
# 2. DATA PREPARATION (ALL QUESTIONS)
# ==============================================================================

# --- Q1 PREP ---
q1_yearly = (
    df_grants.groupby(["state", "year"])
    .agg(cnt=("award_id", "count"), amt=("award_amount", "sum"))
    .reset_index()
)
q1_total = (
    df_grants.groupby(["state"])
    .agg(cnt=("award_id", "count"), amt=("award_amount", "sum"))
    .reset_index()
)
q1_total["year"] = 0
q1_full = pd.concat([q1_yearly, q1_total], ignore_index=True)

# --- Q2 PREP ---
q2_yearly = (
    df_grants.groupby(["directorate", "year"])
    .agg(cnt=("award_id", "count"), amt=("award_amount", "sum"))
    .reset_index()
)
q2_total = (
    df_grants.groupby(["directorate"])
    .agg(cnt=("award_id", "count"), amt=("award_amount", "sum"))
    .reset_index()
)
q2_total["year"] = 0
q2_full = pd.concat([q2_yearly, q2_total], ignore_index=True)

# --- Q3 PREP ---
base_yearly = (
    df_grants.groupby(["directorate", "year"])
    .agg(base_count=("award_id", "count"))
    .reset_index()
)
cancel_yearly = (
    df_trump.groupby(["directorate", "year"])
    .agg(cancel_count=("award_id", "count"), cancel_amt=("award_amount", "sum"))
    .reset_index()
)
q3_full = base_yearly.merge(
    cancel_yearly, on=["directorate", "year"], how="outer"
).fillna(0)
# Totals (Year 0)
base_total = (
    df_grants.groupby(["directorate"])
    .agg(base_count=("award_id", "count"))
    .reset_index()
)
base_total["year"] = 0
cancel_total = (
    df_trump.groupby(["directorate"])
    .agg(cancel_count=("award_id", "count"), cancel_amt=("award_amount", "sum"))
    .reset_index()
)
cancel_total["year"] = 0
q3_full = pd.concat(
    [
        q3_full,
        base_total.merge(cancel_total, on=["directorate", "year"], how="outer").fillna(
            0
        ),
    ],
    ignore_index=True,
)
# Static Size for X-axis
static_size = base_total[["directorate", "base_count"]].rename(
    columns={"base_count": "static_size"}
)
q3_full = q3_full.merge(static_size, on="directorate", how="left")
q3_full = q3_full[
    q3_full["year"].isin([0, 2018, 2019, 2020, 2021])
]  # Trump years + Total

# --- Q4 PREP ---
q4_df = (
    df_grants.groupby(["year", "state", "directorate"])
    .agg(total_amount=("award_amount", "sum"))
    .reset_index()
)

# --- Q5 PREP ---
q5_data = pd.DataFrame(
    list(itertools.product(all_states, range(2017, 2025))), columns=["state", "year"]
)
q5_data = q5_data.merge(
    df_grants.groupby(["state", "year"])["award_amount"]
    .sum()
    .rename("fund")
    .reset_index(),
    on=["state", "year"],
    how="left",
)
q5_data = q5_data.merge(
    df_trump.groupby(["state", "year"])["award_amount"]
    .sum()
    .rename("lost")
    .reset_index(),
    on=["state", "year"],
    how="left",
).fillna(0)

# --- Q6 PREP (CORRECTED WITH YEAR 0 & US AVG) ---
# 1. Clean Pop
pop_cols = [c for c in df_pop_raw.columns if c.lower().startswith("pop_")]
df_pop_long = df_pop_raw.melt(
    id_vars=["state"], value_vars=pop_cols, var_name="year", value_name="population"
)
df_pop_long["year"] = (
    df_pop_long["year"].str.replace("pop_", "", regex=False).astype(int)
)
df_pop_long["population"] = pd.to_numeric(df_pop_long["population"], errors="coerce")
df_pop_long = df_pop_long[df_pop_long["year"].between(2020, 2024)].copy()
df_pop_long = df_pop_long.rename(columns={"state": "state_name"})

# 2. Merge Abbreviations
df_abbr = df_abbr_raw.rename(
    columns={df_abbr_raw.columns[0]: "state_name", df_abbr_raw.columns[1]: "state"}
)  # Ensure col name is 'state' for merge
df_pop_long = df_pop_long.merge(df_abbr, on="state_name", how="left").dropna(
    subset=["state", "population"]
)

# 3. Create "Year 0" Population (Average over 5 years)
pop_avg = df_pop_long.groupby("state")["population"].mean().reset_index()
pop_avg["year"] = 0
df_pop_full = pd.concat(
    [df_pop_long[["state", "year", "population"]], pop_avg], ignore_index=True
)

# 4. Grants Data (Yearly + Total)
q6_yearly = (
    df_grants.dropna(subset=["state", "year", "award_amount"])
    .groupby(["state", "year"])
    .agg(total_amount=("award_amount", "sum"))
    .reset_index()
)
q6_total = (
    df_grants.dropna(subset=["state", "award_amount"])
    .groupby(["state"])
    .agg(total_amount=("award_amount", "sum"))
    .reset_index()
)
q6_total["year"] = 0
q6_grants_full = pd.concat([q6_yearly, q6_total], ignore_index=True)

# 5. Merge & Calculate Per Capita
q6_df = q6_grants_full.merge(df_pop_full, on=["state", "year"], how="inner")
q6_df["funding_per_capita"] = q6_df["total_amount"] / q6_df["population"]

# 6. National Average (Per Year)
us_avg = (
    q6_df.groupby("year")["funding_per_capita"]
    .mean()
    .reset_index()
    .rename(columns={"funding_per_capita": "us_avg"})
)
q6_df = q6_df.merge(us_avg, on="year", how="left")


# ==============================================================================
# 3. CHART DEFINITIONS (MICRO SIZE)
# ==============================================================================

# --- Q1 (Top Left) ---
q1_yr_sel = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(options=[0] + all_years, name="Q1 Year:"),
    value=[{"year": 0}],
    name="q1_year",
)
q1_st_sel = alt.selection_point(fields=["state"], empty="all", name="q1_state")

q1_micro = (
    alt.Chart(q1_full)
    .mark_bar()
    .encode(
        x=alt.X("state:N", sort="-y", title=None),
        y=alt.Y("cnt:Q", title="Grants"),
        color=alt.condition(
            q1_st_sel,
            alt.Color("cnt:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["state", "cnt"],
    )
    .add_params(q1_yr_sel, q1_st_sel)
    .transform_filter(q1_yr_sel)
    .properties(width=220, height=180, title="Q1: Grants by State")
    | (
        alt.Chart(q1_yearly)
        .mark_line(point=True)
        .encode(
            x=alt.X("year:O", title=None),
            y=alt.Y("amt:Q", axis=alt.Axis(format="~s"), title=None),
        )
        .transform_filter(q1_st_sel)
        .properties(width=120, height=80, title="History")
        & alt.Chart(q1_full)
        .transform_filter(q1_yr_sel)
        .transform_filter(q1_st_sel)
        .mark_text(fontWeight="bold")
        .encode(text=alt.Text("sum(amt):Q", format="$.2s"))
        .properties(width=120, height=30)
    )
).resolve_scale(color="independent")

# --- Q2 (Top Center) ---
q2_yr_sel = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(options=[0] + all_years, name="Q2 Year:"),
    value=[{"year": 0}],
    name="q2_year",
)
q2_dir_sel = alt.selection_point(fields=["directorate"], empty="all", name="q2_dir")

q2_micro = (
    alt.Chart(q2_full)
    .mark_bar()
    .encode(
        x=alt.X("cnt:Q", title=None),
        y=alt.Y("directorate:N", sort="-x", title=None),
        color=alt.condition(
            q2_dir_sel,
            alt.Color("cnt:Q", scale=alt.Scale(scheme="blues"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["directorate", "cnt"],
    )
    .add_params(q2_dir_sel, q2_yr_sel)
    .transform_filter(q2_yr_sel)
    .properties(width=180, height=180, title="Q2: Directorates")
    | (
        alt.Chart(q2_yearly)
        .mark_line(point=True)
        .encode(
            x=alt.X("year:O", title=None),
            y=alt.Y("amt:Q", axis=alt.Axis(format="~s"), title=None),
        )
        .transform_filter(q2_dir_sel)
        .properties(width=130, height=80, title="History")
        & alt.Chart(q2_full)
        .transform_filter(q2_yr_sel)
        .transform_filter(q2_dir_sel)
        .mark_text(fontWeight="bold")
        .encode(text=alt.Text("sum(amt):Q", format="$.2s"))
        .properties(width=130, height=30)
    )
).resolve_scale(color="independent")

# --- Q3 (Top Right) ---
q3_yr_sel = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(options=[0, 2018, 2019, 2020, 2021], name="Q3 Year:"),
    value=[{"year": 0}],
    name="q3_year",
)
q3_dir_sel = alt.selection_point(fields=["directorate"], empty="all", name="q3_dir")

q3_micro = (
    alt.Chart(q3_full)
    .mark_bar()
    .encode(
        x=alt.X("cancel_count:Q", title="Cancelled"),
        y=alt.Y("directorate:N", sort="-x", title=None),
        color=alt.condition(
            q3_dir_sel,
            alt.Color("cancel_count:Q", scale=alt.Scale(scheme="reds"), legend=None),
            alt.value("#eee"),
        ),
        tooltip=["directorate", "cancel_count"],
    )
    .add_params(q3_dir_sel, q3_yr_sel)
    .transform_filter(q3_yr_sel)
    .properties(width=140, height=180, title="Q3: Cancellations")
    | alt.Chart(q3_full)
    .mark_circle(stroke="black")
    .encode(
        x=alt.X("static_size:Q", title="Size", axis=alt.Axis(format="~s")),
        y=alt.Y("cancel_count:Q", title="Hits"),
        size=alt.Size("cancel_amt:Q", legend=None),
        color=alt.condition(q3_dir_sel, alt.value("#4c78a8"), alt.value("#eee")),
        tooltip=["directorate", "cancel_amt"],
    )
    .add_params(q3_dir_sel, q3_yr_sel)
    .transform_filter(q3_yr_sel)
    .interactive()
    .properties(width=180, height=180, title="Impact")
).resolve_scale(color="independent")

# --- Q4 (Bottom Left) ---
q4_st_sel = alt.selection_point(
    fields=["state"],
    bind=alt.binding_select(
        options=[None] + all_states, labels=["All"] + all_states, name="Q4 State:"
    ),
    name="q4_state",
)
q4_dir_sel = alt.selection_point(
    fields=["directorate"],
    bind=alt.binding_select(
        options=[None] + all_dirs, labels=["All"] + all_dirs, name="Q4 Dir:"
    ),
    name="q4_dir",
)

q4_micro = (
    alt.Chart(q4_df)
    .mark_area(opacity=0.6)
    .encode(
        x=alt.X("year:O", title=None),
        y=alt.Y("sum(total_amount):Q", title="Fund ($)", axis=alt.Axis(format="~s")),
        tooltip=["year", alt.Tooltip("sum(total_amount)", format="$,.2s")],
    )
    .add_params(q4_st_sel, q4_dir_sel)
    .transform_filter(q4_st_sel)
    .transform_filter(q4_dir_sel)
    .properties(width=220, height=150, title="Q4: Funding Trend")
    | alt.Chart(q4_df)
    .transform_filter(q4_st_sel)
    .transform_filter(q4_dir_sel)
    .mark_text(size=16, fontWeight="bold")
    .encode(text=alt.Text("sum(total_amount):Q", format="$.2s"))
    .properties(width=80, height=150, title="Total")
).resolve_scale(color="independent")

# --- Q5 (Bottom Center) ---
q5_st_sel = alt.selection_point(
    fields=["state"],
    bind=alt.binding_select(options=all_states, name="Q5 State:"),
    value=[{"state": "CA"}],
    name="q5_state",
)
q5_base = (
    alt.Chart(q5_data).transform_filter(q5_st_sel).encode(x=alt.X("year:O", title=None))
)

q5_micro = alt.vconcat(
    alt.layer(
        q5_base.mark_bar(color="#9ecae1").encode(
            y=alt.Y("fund:Q", axis=alt.Axis(format="~s"), title="Funded")
        ),
        q5_base.mark_line(color="#08519c").encode(y=alt.Y("fund:Q")),
    )
    .resolve_scale(y="independent")
    .properties(width=280, height=90, title="Q5: Evolution"),
    alt.layer(
        q5_base.mark_bar(color="#fc9272").encode(
            y=alt.Y("lost:Q", axis=alt.Axis(format="~s"), title="Lost")
        ),
        q5_base.mark_line(color="#de2d26").encode(y=alt.Y("lost:Q")),
    )
    .resolve_scale(y="independent")
    .properties(width=280, height=60, title="Impact"),
    spacing=5,
).add_params(q5_st_sel)

# --- Q6 (Bottom Right) ---
# Corrected with Year 0, Us Avg Line, and Interactivity
q6_yr_sel = alt.selection_point(
    fields=["year"],
    bind=alt.binding_select(
        options=[0] + sorted([y for y in q6_df["year"].unique() if y != 0]),
        name="Q6 Year:",
    ),
    value=[{"year": 0}],
    name="q6_year",
)
q6_st_sel = alt.selection_point(fields=["state"], empty="all", name="q6_state")

base_scatter = alt.Chart(q6_df).transform_filter(q6_yr_sel)
q6_scatter = (
    base_scatter.mark_circle(size=60)
    .encode(
        x=alt.X("population:Q", axis=alt.Axis(format="~s"), title="Pop"),
        y=alt.Y("funding_per_capita:Q", axis=alt.Axis(format="$~s"), title="$/Cap"),
        color=alt.condition(
            q6_st_sel,
            alt.Color(
                "funding_per_capita:Q", scale=alt.Scale(scheme="purples"), legend=None
            ),
            alt.value("#eee"),
        ),
        tooltip=[
            "state",
            "population",
            alt.Tooltip("funding_per_capita", format="$,.2f"),
        ],
    )
    .add_params(q6_st_sel, q6_yr_sel)
    .interactive()  # <--- ZOOM/PAN ENABLED
)
q6_rule = base_scatter.mark_rule(color="red").encode(
    y="mean(us_avg):Q"
)  # National Avg Line

q6_micro = (
    (q6_scatter + q6_rule).properties(width=220, height=160, title="Q6: Efficiency")
    | (
        # History Line (Exclude Year 0 for trend)
        alt.Chart(q6_df[q6_df["year"] != 0])
        .mark_line(point=True)
        .encode(
            x=alt.X("year:O", title=None), y=alt.Y("funding_per_capita:Q", title=None)
        )
        .transform_filter(q6_st_sel)
        .properties(width=80, height=100, title="History")
        &
        # KPI Text
        alt.Chart(q6_df)
        .transform_filter(q6_yr_sel)
        .transform_filter(q6_st_sel)
        .mark_text(size=14, fontWeight="bold")
        .encode(text=alt.Text("mean(funding_per_capita):Q", format="$.2f"))
        .properties(width=80, height=60, title="$/Person")
    )
).resolve_scale(color="independent")


# ==============================================================================
# 5. FINAL GRID ASSEMBLY
# ==============================================================================
row1 = alt.hconcat(q1_micro, q2_micro, q3_micro, spacing=15)
row2 = alt.hconcat(q4_micro, q5_micro, q6_micro, spacing=15)

dashboard = (
    alt.vconcat(row1, row2, spacing=25)
    .configure_view(stroke=None)
    .configure_axis(labelFontSize=9, titleFontSize=10, grid=False)
    .configure_title(fontSize=11, anchor="middle")
    .configure_concat(spacing=20)
)

dashboard

**Concluding Comment: The 'Command Center' Dashboard**

This final visualization serves as the project's analytical capstone: a **Unified 'Command Center' Dashboard**.

By synthesizing the six individual research questions into a **2×3 Micro-Chart Grid**, we have solved the challenge of 'Overview vs. Detail.' Instead of forcing the user to scroll through fragmented insights, this layout provides a holistic view of the entire NSF funding landscape on a single screen.

### Key Analytical Features:

- **synchronized Interactivity:** Each chart retains its full interactive capabilities (filtering by year, state, or directorate), allowing users to drill down into specific anomalies without losing the broader context.

- **Spatial & Temporal Context**: The top row focuses on **Distribution** (Geography, Directorate structure, and Political impact via cancellations), while the bottom row focuses on **Evolution & Efficiency** (Funding trends, State-specific histories, and Per-Capita normalization).

- **Efficiency Matrix (Q6)**: The bottom-right scatter plot integrates the critical 'Year 0' aggregation logic, enabling instant identification of structural outliers—states that punch above their weight in funding relative to their population.

This design adheres strictly to the visualization mantra: providing an immediate **Overview**, offering tools to **Zoom and Filter**, and delivering **Details-on-Demand** through tooltips and linked views, all within a compact, non-scrolling interface suitable for executive decision-making.
