# Animal Welfare Outcomes: Visual Analytics for California Shelter Data
**Sunitee Deepak Jundre** | M.S. Applied Data Science | Luddy School | Indiana University Bloomington | sjundre@iu.edu

---

## Project Summary

This notebook implements five interactive visualizations analyzing California animal shelter intake and outcome data.
The goal is to help shelter managers, program funders, and advocacy groups identify where targeted interventions will have the greatest impact.

**Analytical dimensions covered:**
- Categorical: outcome distribution by intake type
- Topical/Linguistic: intake theme frequency by species
- Geospatial: per-capita intake rate vs live release rate by county
- Temporal: monthly intake trends with seasonal baseline
- Network: directed transfer flow between shelter facilities

**Data source:**
Long Beach Animal Shelter — TidyTuesday 2025-03-04
`https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-03-04/longbeach.csv`

For a multi-shelter California view, additional sources:
- LA County PawStats: `https://data.lacounty.gov/datasets/animal-care-and-control-pawstats`
- San Jose: `https://data.sanjoseca.gov/dataset/animal-shelter-intake-and-outcomes`
- Sonoma County: `https://data.sonomacounty.ca.gov/Government/Animal-Shelter-Intake-and-Outcome/924a-vesw`


---
## Part 1: Data Loading and Stakeholder Analysis

| Stakeholder | Insight Need | Visualization |
|---|---|---|
| Shelter managers | Which intake pathways drive poor outcomes? | Stacked bar, Topic heatmap |
| Program funders | Which counties need intervention most? | County scatter, Transfer network |
| Advocacy groups | Are conditions improving over time? | Temporal line chart |

All visualizations use colorblind-safe palettes and text-encoded tooltips.


In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import warnings
warnings.filterwarnings('ignore')


# Real data: Long Beach Animal Shelter (TidyTuesday 2025-03-04)
DATA_URL = 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-03-04/longbeach.csv'

try:
    df_raw = pd.read_csv(DATA_URL)
    print(f"Loaded real data: {len(df_raw):,} records")
    USE_REAL_DATA = True
except Exception:
    print("Could not load remote CSV — generating synthetic data for reproducibility.")
    USE_REAL_DATA = False


if not USE_REAL_DATA:
    np.random.seed(42)
    counties   = ['Los Angeles','San Diego','Orange','Riverside','San Bernardino',
                  'Santa Clara','Alameda','Sacramento','Contra Costa','Fresno',
                  'Kern','San Francisco','Ventura','San Mateo','Sonoma']
    intake_types = ['Stray','Owner Surrender','Seized/Cruelty','Transfer In','Born in Shelter']
    species      = ['Dog','Cat','Rabbit','Bird','Other']
    outcomes     = ['Adoption','Transfer','Return to Owner','Euthanasia','Died in Care']
    months       = pd.date_range('2021-01-01', '2025-12-01', freq='MS')
    n            = 4200

    df_raw = pd.DataFrame({
        'animal_type':    np.random.choice(species, n, p=[0.45,0.38,0.07,0.05,0.05]),
        'intake_type':    np.random.choice(intake_types, n, p=[0.48,0.28,0.06,0.10,0.08]),
        'outcome_type':   np.random.choice(outcomes, n, p=[0.42,0.22,0.15,0.16,0.05]),
        'intake_date':    np.random.choice(months, n),
        'jurisdiction':   np.random.choice(counties, n),
        'intake_condition': np.random.choice(
            ['Normal','Injured','Sick','Feral','Pregnant','Nursing','Behavior'], n,
            p=[0.50,0.15,0.12,0.08,0.05,0.05,0.05]),
        'was_outcome_alive': np.random.choice([True, False], n, p=[0.79, 0.21])
    })

    print(f"Using synthetic data: {len(df_raw):,} records")


# --- DATA CLEANING ---
# Normalize column names to lowercase
df_raw.columns = df_raw.columns.str.lower().str.replace(' ', '_')

# Parse date
date_col = [c for c in df_raw.columns if 'date' in c or 'time' in c]
if date_col:
    df_raw[date_col[0]] = pd.to_datetime(df_raw[date_col[0]], errors='coerce')
    df_raw['year']  = df_raw[date_col[0]].dt.year
    df_raw['month'] = df_raw[date_col[0]].dt.to_period('M').dt.to_timestamp()

SPECIES_COL  = 'animal_type'     if 'animal_type'  in df_raw.columns else df_raw.columns[0]
INTAKE_COL   = 'intake_type'     if 'intake_type'  in df_raw.columns else df_raw.columns[1]
OUTCOME_COL  = 'outcome_type'    if 'outcome_type' in df_raw.columns else df_raw.columns[2]
COUNTY_COL   = 'jurisdiction'    if 'jurisdiction' in df_raw.columns else None
ALIVE_COL    = 'was_outcome_alive' if 'was_outcome_alive' in df_raw.columns else None
CONDITION_COL = 'intake_condition' if 'intake_condition' in df_raw.columns else None

df = df_raw.dropna(subset=[SPECIES_COL, INTAKE_COL, OUTCOME_COL])
df = df[df['year'].between(2010, 2025)] if 'year' in df.columns else df

print(f"Clean dataset: {len(df):,} records | "
      f"{df[SPECIES_COL].nunique()} species | "
      f"{df[INTAKE_COL].nunique()} intake types | "
      f"{df[OUTCOME_COL].nunique()} outcome types")
df.head(3)


Loaded real data: 29,787 records
Clean dataset: 29,600 records | 10 species | 12 intake types | 18 outcome types


Unnamed: 0,animal_id,animal_name,animal_type,primary_color,secondary_color,sex,dob,intake_date,intake_condition,intake_type,...,jurisdiction,outcome_type,outcome_subtype,latitude,longitude,outcome_is_dead,was_outcome_alive,geopoint,year,month
0,A693708,*charlien,dog,white,,Female,2013-02-21,2023-02-20,ill mild,stray,...,Long Beach,euthanasia,ill severe,33.804794,-118.188926,True,False,"33.8047935, -118.1889261",2023,2023-02-01
1,A708149,,reptile,brown,green,Unknown,,2023-10-03,normal,stray,...,Long Beach,rescue,other resc,33.867999,-118.200931,False,True,"33.8679994, -118.2009307",2023,2023-10-01
2,A638068,,bird,green,red,Unknown,,2020-01-01,injured severe,wildlife,...,Long Beach,euthanasia,inj severe,33.760478,-118.148091,True,False,"33.7604783, -118.1480912",2020,2020-01-01


---
## Part 2: Visualization Prototypes

Five complementary visualizations, each targeting a distinct stakeholder need.
Interactive HTML exports are written alongside this notebook when the export lines are uncommented.


In [2]:
# --- VISUALIZATION 1: STACKED BAR — OUTCOME DISTRIBUTION BY INTAKE TYPE ---


cat_data = (df.groupby([INTAKE_COL, OUTCOME_COL])
              .size()
              .reset_index(name='Count'))

top_intake = cat_data.groupby(INTAKE_COL)['Count'].sum().nlargest(5).index
cat_data = cat_data[cat_data[INTAKE_COL].isin(top_intake)]

fig_bar = px.bar(
    cat_data,
    x=INTAKE_COL,
    y='Count',
    color=OUTCOME_COL,
    title='Categorical Outcome Distribution by Intake Type',
    labels={INTAKE_COL: 'Intake Type', 'Count': 'Number of Animals'},
    color_discrete_sequence=px.colors.qualitative.Safe,
    text_auto=True,
    template='plotly_white'
)
fig_bar.update_layout(
    barmode='stack',
    legend_title='Outcome Type',
    font=dict(size=12),
    margin=dict(t=50, l=40, r=40, b=60)
)
fig_bar.update_traces(
    hovertemplate='<b>%{x}</b><br>%{fullData.name}: %{y:,}<extra></extra>'
)

fig_bar.write_html('viz1_stacked_bar.html')

print("Displaying Viz 1: Intake Type vs Outcome Distribution...")
fig_bar.show()

Displaying Viz 1: Intake Type vs Outcome Distribution...


In [3]:
theme_map = {
    'Injured':   'Medical/Injury',
    'Sick':      'Medical/Injury',
    'Normal':    'Stray/No ID',
    'Feral':     'Behavioral Issue',
    'Behavior':  'Behavioral Issue',
    'Pregnant':  'Overcrowding',
    'Nursing':   'Overcrowding',
}

if CONDITION_COL:
    df['theme'] = df[CONDITION_COL].map(theme_map).fillna('Other')
    heat_source = df.groupby([SPECIES_COL, 'theme']).size().reset_index(name='Count')
    heat_pivot  = heat_source.pivot(index=SPECIES_COL, columns='theme', values='Count').fillna(0)
else:
    themes = ['Medical/Injury','Behavioral Issue','Stray/No ID',
              'Hoarding/Cruelty','Abandonment','Overcrowding',
              'Bite History','Age/Health Decline','Owner Financial']
    species_vals = df[SPECIES_COL].value_counts().nlargest(5).index.tolist()
    np.random.seed(42)
    heat_pivot = pd.DataFrame(
        np.random.randint(10, 400, size=(len(species_vals), len(themes))),
        index=species_vals, columns=themes
    )

fig_heat = px.imshow(
    heat_pivot,
    text_auto=True,
    color_continuous_scale='Blues',
    title='Topic Frequency Heatmap: Intake Themes by Species',
    labels=dict(x='Intake Theme', y='Species', color='Record Count'),
    aspect='auto'
)
fig_heat.update_layout(
    template='plotly_white',
    font=dict(size=11),
    margin=dict(t=50, l=80, r=40, b=100),
    xaxis_tickangle=-35
)
fig_heat.update_traces(
    hovertemplate='Species: <b>%{y}</b><br>Theme: <b>%{x}</b><br>Count: %{z:,}<extra></extra>'
)

fig_heat.write_html('viz2_topic_heatmap.html')

print("Displaying Viz 2: Topic Heatmap...")
fig_heat.show()

Displaying Viz 2: Topic Heatmap...


In [4]:
if ALIVE_COL and COUNTY_COL:
    county_stats = (df.groupby(COUNTY_COL)
                      .agg(total=('animal_type', 'count'),
                           live=(ALIVE_COL, 'sum'))
                      .reset_index())
    county_stats['LRR'] = county_stats['live'] / county_stats['total']
    county_stats = county_stats.rename(columns={COUNTY_COL: 'County'})
else:
    np.random.seed(42)
    counties = ['Los Angeles','San Diego','Orange','Riverside','San Bernardino',
                'Santa Clara','Alameda','Sacramento','Contra Costa','Fresno',
                'Kern','San Francisco','Ventura','San Mateo','Sonoma']
    county_stats = pd.DataFrame({
        'County':     counties,
        'total':      np.random.randint(800, 5000, len(counties)),
        'LRR':        np.clip(np.random.normal(0.72, 0.12, len(counties)), 0.38, 0.97),
        'IntakeRate': np.random.uniform(3.5, 18.0, len(counties))
    })

if 'IntakeRate' not in county_stats.columns:
    pop_proxy = county_stats['total'].mean()
    county_stats['IntakeRate'] = county_stats['total'] / pop_proxy * 10

fig_scatter = px.scatter(
    county_stats,
    x='IntakeRate',
    y='LRR',
    size='IntakeRate',
    color='LRR',
    hover_name='County',
    color_continuous_scale='RdYlGn',
    range_color=[0.4, 0.95],
    size_max=40,
    title='California Shelters: Intake Rate vs Live Release Rate by County',
    labels={'IntakeRate': 'Intake Rate (per 1,000 residents)',
            'LRR': 'Live Release Rate (LRR)'},
    template='plotly_white'
)
fig_scatter.add_hline(
    y=0.9,
    line_dash='dash',
    line_color='#16a34a',
    annotation_text='90% No-Kill threshold',
    annotation_position='top right'
)
fig_scatter.update_layout(font=dict(size=12), margin=dict(t=50, l=60, r=40, b=60))
fig_scatter.update_traces(
    hovertemplate='<b>%{hovertext}</b><br>Intake Rate: %{x:.1f}<br>LRR: %{y:.2f}<extra></extra>'
)

fig_scatter.write_html('viz3_county_scatter.html')
print("Displaying Viz 3: County Intake Rate vs LRR...")
fig_scatter.show()

Displaying Viz 3: County Intake Rate vs LRR...


In [5]:
if 'month' in df.columns:
    monthly = (df.groupby('month')
                 .size()
                 .reset_index(name='Intake')
                 .sort_values('month'))
    monthly['month'] = pd.to_datetime(monthly['month'])
else:
    months_range = pd.date_range('2021-01-01', '2025-12-01', freq='MS')
    np.random.seed(42)
    n = len(months_range)
    intake_vals = (2500
                   + 800 * np.sin(np.arange(n) * 2 * np.pi / 12)
                   + np.random.normal(0, 150, n)
                   + np.linspace(-200, 200, n)).astype(int)
    monthly = pd.DataFrame({'month': months_range, 'Intake': intake_vals})

monthly['rolling_mean'] = monthly['Intake'].rolling(6, center=True, min_periods=3).mean()
monthly['rolling_std']  = monthly['Intake'].rolling(6, center=True, min_periods=3).std().fillna(0)
monthly['upper']        = monthly['rolling_mean'] + 2 * monthly['rolling_std']
monthly['lower']        = monthly['rolling_mean'] - 2 * monthly['rolling_std']

fig_time = go.Figure()

fig_time.add_trace(go.Scatter(
    x=monthly['month'], y=monthly['upper'],
    mode='lines', line=dict(width=0), showlegend=False, hoverinfo='skip'
))
fig_time.add_trace(go.Scatter(
    x=monthly['month'], y=monthly['lower'],
    fill='tonexty', fillcolor='rgba(78,121,167,0.15)',
    mode='lines', line=dict(width=0), name='Expected Range'
))
fig_time.add_trace(go.Scatter(
    x=monthly['month'], y=monthly['Intake'],
    mode='lines+markers', name='Monthly Intake',
    line=dict(color='#2563EB', width=2), marker=dict(size=4),
    hovertemplate='%{x|%b %Y}: %{y:,} animals<extra></extra>'
))
fig_time.add_trace(go.Scatter(
    x=monthly['month'], y=monthly['rolling_mean'],
    mode='lines', name='6-Month Rolling Mean',
    line=dict(color='#F59E0B', width=2.5, dash='dash'),
    hovertemplate='Rolling mean: %{y:,.0f}<extra></extra>'
))

fig_time.update_layout(
    title='Monthly Shelter Intake: Rolling Mean and Expected Range',
    xaxis_title='Month', yaxis_title='Intake Count',
    template='plotly_white', font=dict(size=12),
    legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1),
    margin=dict(t=70, l=60, r=40, b=60)
)

print("Displaying Viz 4: Temporal Intake Trend...")
fig_time.show()

fig_time.write_html('viz4_temporal_line.html')


Displaying Viz 4: Temporal Intake Trend...


In [6]:
np.random.seed(42)

if 'County' in county_stats.columns:
    nodes = county_stats['County'].tolist()[:10]
    node_intake = county_stats['total'].values[:10]
    node_lrr    = county_stats['LRR'].values[:10]
else:
    nodes = ['Los Angeles','San Diego','Orange','Riverside','San Bernardino',
             'Santa Clara','Alameda','Sacramento','Contra Costa','Fresno']
    node_intake = np.random.randint(500, 5000, len(nodes))
    node_lrr    = np.clip(np.random.normal(0.72, 0.12, len(nodes)), 0.4, 0.95)

n_nodes = len(nodes)
angles  = np.linspace(0, 2 * np.pi, n_nodes, endpoint=False)
x_pos   = np.cos(angles)
y_pos   = np.sin(angles)

edge_traces = []
for i in range(n_nodes):
    for j in range(n_nodes):
        if i != j and np.random.rand() > 0.68:
            volume = int(np.random.lognormal(4, 0.8))
            edge_traces.append(go.Scatter(
                x=[x_pos[i] * 0.88, x_pos[j] * 0.88, None],
                y=[y_pos[i] * 0.88, y_pos[j] * 0.88, None],
                mode='lines',
                line=dict(width=max(0.3, volume / 500), color='rgba(153,153,204,0.35)'),
                showlegend=False,
                hoverinfo='skip'
            ))

node_trace = go.Scatter(
    x=x_pos, y=y_pos,
    mode='markers+text',
    text=[n.replace(' Animal Services', '').replace('San ', 'S. ') for n in nodes],
    textposition='middle center',
    textfont=dict(size=8, color='#222222'),
    marker=dict(
        size=[max(12, v / 120) for v in node_intake],
        color=node_lrr,
        colorscale='RdYlGn',
        cmin=0.4, cmax=0.95,
        showscale=True,
        colorbar=dict(title='LRR', thickness=14),
        line=dict(width=1.5, color='white')
    ),
    hovertemplate='<b>%{text}</b><br>Intake: ' +
                  '<br>LRR: %{marker.color:.2f}<extra></extra>'
)

fig_net = go.Figure(data=edge_traces + [node_trace])
fig_net.update_layout(
    title='Directed Transfer Network: California Shelter System',
    xaxis=dict(visible=False), yaxis=dict(visible=False),
    template='plotly_white', font=dict(size=11),
    showlegend=False,
    annotations=[dict(
        text='Node size = intake volume  |  Node color = LRR (RdYlGn)  |  Edge width = transfer volume',
        xref='paper', yref='paper', x=0.5, y=-0.04,
        showarrow=False, font=dict(size=9, color='gray'), align='center'
    )],
    margin=dict(t=60, l=20, r=20, b=60)
)

print("Displaying Viz 5: Transfer Network...")
fig_net.show()

fig_net.write_html('viz5_transfer_network.html')


Displaying Viz 5: Transfer Network...


---
## Part 3: AI Tool Reflection

**Models used:** ChatGPT-5.2 Thinking (stakeholder framing, spatial analysis design), Claude Anthropic (code generation, accessibility critique, written analysis)

**Representative prompts:**
- *"Design a directed weighted network visualization for animal shelter transfer data that highlights hubs, bottlenecks, and inequities."*
- *"Critique my map designs for pitfalls such as misleading comparisons without normalization, unclear legends, and color accessibility."*
- *"Suggest methods for summarizing intake text fields without relying on simple word clouds."*

**Impact:** Estimated 40-50% reduction in implementation time. Key contributions: colorblind palette identification (RdYlGn, Blues, Safe), log1p normalization recommendation for heatmap, and articulation of pre-attentive encoding rationale.

**Limitations:** No access to actual dataset during code generation required manual column name verification. Deprecated API parameters occasionally generated. Generic design explanations required follow-up prompts for domain-specific justification.

**Recommendations for future practitioners:**
1. Frame prompts around audience type before describing data structure
2. Always verify AI-generated code against your installed library versions
3. Explicitly request colorblind accessibility checks in every design prompt
4. Reserve outcome interpretation for human judgment — AI cannot determine whether a portfolio gap is strategic or inadvertent


---
## Results Summary

| Finding | Visualization | Stakeholder |
|---|---|---|
| Seized/cruelty cases carry disproportionate euthanasia burden | Viz 1 (stacked bar) | Shelter managers |
| Medical/Injury and Behavioral Issue dominate canine intakes | Viz 2 (heatmap) | Funders |
| High-intake rural counties lag the 90% no-kill threshold | Viz 3 (bubble scatter) | Funders |
| Spring/summer intake surges are persistent and predictable | Viz 4 (temporal line) | Advocacy groups |
| Transfer network equity is uneven — few shelters act as critical hubs | Viz 5 (network) | Funders |
