# 2025 IWRC Seed Fund - Interactive Visualizations

This notebook creates clean, professional interactive visualizations:
1. Interactive pie chart of research keywords
2. Interactive map of Illinois showing funded institutions

---

## ‚ö†Ô∏è IMPORTANT DATA SOURCE NOTICE

**This notebook uses `fact sheet data.xlsx` for VISUALIZATION PURPOSES ONLY.**

### Data Limitations

The fact sheet Excel file has **CRITICAL LIMITATIONS**:
- ‚ùå **No Project ID column** - Cannot reliably deduplicate projects
- ‚ùå **Unknown duplicates** - May contain multiple rows per project
- ‚ùå **Cannot cross-reference** - No way to match with master dataset
- ‚ö†Ô∏è **Financial totals unreliable** - Summing award amounts may double-count

### Safe Uses (This Notebook)
‚úÖ **Keyword frequency analysis** - Safe for visualization (counts keywords, not financial metrics)  
‚úÖ **Geographic distribution** - Safe for visualization (shows institution locations)

### Unsafe Uses (NOT Done Here)
‚ùå **Financial totals** - Do NOT sum Award Amount without deduplication  
‚ùå **Student counts** - Do NOT sum student fields without deduplication  
‚ùå **ROI calculations** - Do NOT use for investment metrics  

### For Accurate Financial Metrics
**Use the centralized data loader with master dataset:**
```python
from iwrc_data_loader import IWRCDataLoader

loader = IWRCDataLoader()
df = loader.load_master_data(deduplicate=True)
metrics = loader.calculate_metrics(df, period='10yr')
# Now metrics['investment'], metrics['students'], etc. are accurate
```

**See:** `docs/MIGRATION_FROM_FACT_SHEET.md` for complete migration guide  
**See:** `data/consolidated/FACT_SHEET_DATA_README.md` for detailed limitations

---

In [None]:
# Import required libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from collections import Counter
import json

In [None]:
# Load the 2025 data from fact sheet
# ‚ö†Ô∏è WARNING: This file lacks Project ID column and cannot be reliably deduplicated
# ‚ö†Ô∏è Use ONLY for keyword analysis and geographic visualization (safe operations)
# ‚ö†Ô∏è DO NOT use for financial totals or student counts without master data verification
df = pd.read_excel('fact sheet data.xlsx', sheet_name='2025 data')
print(f"Loaded {len(df)} projects from 2025 data")
print(f"‚ö†Ô∏è  Note: Row count may include duplicates (no Project ID to verify)")
print(f"‚ö†Ô∏è  Using ONLY for keyword/geography visualization (safe operations)")

## Part 1: Interactive Research Keywords Pie Chart

In [None]:
# Combine keywords from columns O and P
keyword2 = df['Keyword 2'].dropna().tolist()
keyword3 = df['Keyword 3'].dropna().tolist()
all_keywords = keyword2 + keyword3

# Count keyword frequencies
keyword_counts = Counter(all_keywords)
print(f"Total keywords: {len(all_keywords)}")
print(f"Unique keywords: {len(keyword_counts)}")

# Prepare data - take top 10 and group rest as 'Other'
sorted_keywords = keyword_counts.most_common()
top_n = 10
top_keywords = dict(sorted_keywords[:top_n])
other_count = sum(count for _, count in sorted_keywords[top_n:])

if other_count > 0:
    top_keywords['Other'] = other_count

# Create DataFrame
pie_df = pd.DataFrame({
    'Keyword': list(top_keywords.keys()),
    'Count': list(top_keywords.values())
})

pie_df['Percentage'] = (pie_df['Count'] / pie_df['Count'].sum() * 100).round(1)
print("\nKeyword Distribution:")
print(pie_df.to_string(index=False))

In [None]:
# Create clean, professional pie chart
fig_pie = px.pie(
    pie_df,
    values='Count',
    names='Keyword',
    title='2025 IWRC Seed Fund Projects - Research Topic Distribution',
    hole=0.4,  # Donut chart
    color_discrete_sequence=px.colors.qualitative.Set3
)

fig_pie.update_traces(
    textposition='outside',
    textinfo='label+percent',
    hovertemplate='<b>%{label}</b><br>Count: %{value}<br>Percentage: %{percent}<extra></extra>',
    marker=dict(line=dict(color='white', width=3))
)

fig_pie.update_layout(
    title=dict(
        text='2025 IWRC Seed Fund Projects<br>Research Topic Distribution',
        x=0.5,
        xanchor='center',
        font=dict(size=24, family='Arial', color='#2c3e50')
    ),
    font=dict(size=13, family='Arial'),
    showlegend=True,
    legend=dict(
        orientation='v',
        yanchor='middle',
        y=0.5,
        xanchor='left',
        x=1.05,
        bgcolor='rgba(255,255,255,0.8)',
        bordercolor='#ccc',
        borderwidth=1
    ),
    height=700,
    width=1100,
    paper_bgcolor='white',
    plot_bgcolor='white'
)

# Save as HTML
fig_pie.write_html(
    '2025_keyword_pie_chart_interactive.html',
    config={'displayModeBar': True, 'displaylogo': False}
)
print("‚úÖ Saved: 2025_keyword_pie_chart_interactive.html")

# Display
fig_pie.show()

## Part 2: Interactive Illinois Institutions Map

In [None]:
# Prepare institution data
institution_data = df.groupby(['Institution', 'City']).size().reset_index(name='Project Count')

# Calculate total funding
funding_by_institution = df.groupby(['Institution', 'City'])['Award Amount'].sum().reset_index()
funding_by_institution.columns = ['Institution', 'City', 'Total Funding']

# Merge
institution_data = institution_data.merge(funding_by_institution, on=['Institution', 'City'])

# Add coordinates for Illinois cities
coordinates = {
    'Champaign': (40.1164, -88.2434),
    'Urbana': (40.1106, -88.2073),
    'Carbondale': (37.7272, -89.2167),
    'Normal': (40.5142, -88.9906),
    'Chicago': (41.8781, -87.6298),
    'Charleston': (39.4961, -88.1781),
    'Evanston': (42.0451, -87.6877),
    'Godfrey': (38.9556, -90.1868),
    'Edwardsville': (38.8114, -89.9531)
}

institution_data['Latitude'] = institution_data['City'].map(lambda x: coordinates.get(x, (40.0, -89.0))[0])
institution_data['Longitude'] = institution_data['City'].map(lambda x: coordinates.get(x, (40.0, -89.0))[1])

# Create short names
def shorten_name(name):
    replacements = {
        'University of Illinois Urbana-Champaign': 'UIUC',
        'Southern Illinois University Carbondale': 'SIU Carbondale',
        'Southern Illinois University': 'SIU',
        'Illinois State University': 'ISU',
        'Illinois Institute of Technology': 'IIT',
        'University of Illinois Chicago': 'UIC',
        'University of Illinois': 'U of I',
        'Eastern Illinois University': 'EIU',
        'Northwestern University': 'Northwestern',
        'Lewis and Clark Community College': 'Lewis & Clark CC',
        'Not for profit': 'Non-profit Org'
    }
    for old, new in replacements.items():
        if old in name:
            return new
    return name

institution_data['Short Name'] = institution_data['Institution'].apply(shorten_name)
institution_data['Funding Display'] = institution_data['Total Funding'].apply(lambda x: f'${x:,.0f}')

print("Institution data prepared:")
print(institution_data[['Short Name', 'City', 'Project Count', 'Funding Display']].to_string(index=False))

In [None]:
# Create a clean map using plotly express scatter_geo
fig_map = px.scatter_geo(
    institution_data,
    lat='Latitude',
    lon='Longitude',
    size='Project Count',
    color='Project Count',
    hover_name='Institution',
    hover_data={
        'City': True,
        'Project Count': True,
        'Funding Display': True,
        'Latitude': False,
        'Longitude': False
    },
    text='Short Name',
    size_max=50,
    color_continuous_scale='YlOrRd',
    title='2025 IWRC Seed Fund - Funded Institutions Across Illinois'
)

# Update traces for better appearance
fig_map.update_traces(
    textposition='top center',
    textfont=dict(size=10, family='Arial', color='black'),
    marker=dict(
        line=dict(width=2, color='DarkSlateGray'),
        opacity=0.9,
        sizemode='diameter',
        sizemin=8
    )
)

# Update geo layout to focus on Illinois
fig_map.update_geos(
    scope='usa',
    projection_type='albers usa',
    showland=True,
    landcolor='#f5f5f5',
    showlakes=True,
    lakecolor='#cfe2f3',
    showrivers=False,
    showcountries=False,
    showsubunits=True,
    subunitcolor='#bbb',
    subunitwidth=1,
    lonaxis_range=[-91.5, -87.0],
    lataxis_range=[36.9, 42.6],
    bgcolor='white'
)

# Update overall layout
fig_map.update_layout(
    title=dict(
        text='2025 IWRC Seed Fund<br>Funded Institutions Across Illinois',
        x=0.5,
        xanchor='center',
        font=dict(size=24, family='Arial', color='#2c3e50')
    ),
    font=dict(family='Arial', size=12),
    coloraxis_colorbar=dict(
        title='Projects',
        thickness=20,
        len=0.6,
        x=1.02,
        bgcolor='rgba(255,255,255,0.8)',
        bordercolor='#ccc',
        borderwidth=1
    ),
    height=900,
    width=800,
    margin=dict(l=0, r=120, t=100, b=0),
    paper_bgcolor='white'
)

# Save as HTML
fig_map.write_html(
    '2025_illinois_institutions_map_interactive.html',
    config={'displayModeBar': True, 'displaylogo': False}
)
print("‚úÖ Saved: 2025_illinois_institutions_map_interactive.html")

# Display
fig_map.show()

## Summary

In [None]:
print("=" * 70)
print("‚úÖ INTERACTIVE VISUALIZATIONS CREATED SUCCESSFULLY")
print("=" * 70)
print("\nüìä Files Created:")
print("  1. 2025_keyword_pie_chart_interactive.html")
print("  2. 2025_illinois_institutions_map_interactive.html")
print("\nüé® Key Features:")
print("  ‚Ä¢ Clean, professional design")
print("  ‚Ä¢ Hover tooltips with detailed information")
print("  ‚Ä¢ Interactive zoom and pan")
print("  ‚Ä¢ Click to filter in pie chart")
print("  ‚Ä¢ Proper Illinois map focus (no giant bubble!)")
print("  ‚Ä¢ Download buttons for saving as images")
print("\nüìà Data Summary (FROM FACT SHEET - VISUALIZATIONS ONLY):")
print(f"  ‚Ä¢ Total Rows: {len(df)}")
print(f"  ‚Ä¢ Total Institutions: {df['Institution'].nunique()}")
print(f"  ‚Ä¢ Research Topics: {len(keyword_counts)}")
print("\n‚ö†Ô∏è  IMPORTANT - FINANCIAL METRICS WARNING:")
print(f"  ‚Ä¢ Award Amount Sum (fact sheet): ${df['Award Amount'].sum():,.2f}")
print(f"  ‚Ä¢ ‚ö†Ô∏è  THIS MAY BE INFLATED due to unknown duplicates!")
print(f"  ‚Ä¢ ‚ö†Ô∏è  For accurate financial metrics, use master data with deduplication")
print("\n‚úÖ VERIFIED METRICS (Cross-referenced with master data):")
print("  Run this code to get accurate metrics:")
print("  ```python")
print("  from iwrc_data_loader import IWRCDataLoader")
print("  loader = IWRCDataLoader()")
print("  df_master = loader.load_master_data(deduplicate=True)")
print("  df_2025 = df_master[df_master['project_year'] == 2025]")
print("  metrics = loader.calculate_metrics(df_2025)")
print("  print(f'Verified Investment: ${metrics[\"investment\"]:,.2f}')")
print("  print(f'Verified Students: {metrics[\"students\"]}')")
print("  print(f'Verified Projects: {metrics[\"projects\"]}')")
print("  ```")
print("\nüí° How to Use:")
print("  ‚Ä¢ Open HTML files in any web browser")
print("  ‚Ä¢ Hover over elements for details")
print("  ‚Ä¢ Click and drag to zoom/pan on the map")
print("  ‚Ä¢ Click legend items in pie chart to show/hide")
print("\nüìö Additional Resources:")
print("  ‚Ä¢ Migration Guide: docs/MIGRATION_FROM_FACT_SHEET.md")
print("  ‚Ä¢ Data Limitations: data/consolidated/FACT_SHEET_DATA_README.md")
print("  ‚Ä¢ Corrected Files Index: CORRECTED_FILES_INDEX.md")
print("=" * 70)