# Football Recruiting Interactive Map

This notebook creates an interactive U.S. map visualizing football recruiting data, showing:
- Blue markers for high schools
- Green markers for colleges
- Red lines connecting high schools to their committed colleges
- Line thickness based on number of recruits per pathway
- Hover tooltips with school names and recruit counts

## Features
- Geocoding with caching to avoid re-querying
- College-specific filtering
- Albers USA projection for U.S.-focused view
- Handles large datasets efficiently


In [15]:
# Import required libraries
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import json
import os
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut, GeocoderServiceError
import time
from collections import defaultdict
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")


Libraries imported successfully!


## 1. Data Loading and Preparation


In [16]:
# Load the recruiting data
df = pd.read_csv('data/recruiting_data.csv')
print(f"Loaded {len(df):,} recruiting records")
print(f"Date range: {df['class_year'].min()} - {df['class_year'].max()}")
print(f"Columns: {list(df.columns)}")

# Show sample data
print("\nSample data:")
print(df[['school', 'committedTo', 'city', 'stateProvince', 'latitude', 'longitude']].head())


Loaded 67,179 recruiting records
Date range: 2000 - 2025
Columns: ['year', 'ranking', 'name', 'school', 'committedTo', 'position', 'height', 'weight', 'stars', 'rating', 'city', 'stateProvince', 'country', 'class_year', 'latitude', 'longitude']

Sample data:
                      school     committedTo          city stateProvince  \
0                De La Salle           Miami       Concord            CA   
1  Evangel Christian Academy         Florida    Shreveport            LA   
2                    Saginaw  Michigan State       Saginaw            MI   
3                 Notre Dame   Florida State  Sherman Oaks            CA   
4           Thomas Jefferson        Colorado        Denver            CO   

    latitude   longitude  
0  37.976852 -122.033562  
1  32.522183  -93.765194  
2  43.420039  -83.949037  
3  34.150872 -118.448987  
4  39.739236 -104.984862  


## 2. Geocoding Functions with Caching


In [17]:
def load_geocode_cache():
    """Load existing geocoding cache from JSON file"""
    cache_file = 'data/geocode_cache.json'
    if os.path.exists(cache_file):
        with open(cache_file, 'r') as f:
            return json.load(f)
    return {}

def save_geocode_cache(cache):
    """Save geocoding cache to JSON file"""
    cache_file = 'data/geocode_cache.json'
    os.makedirs(os.path.dirname(cache_file), exist_ok=True)
    with open(cache_file, 'w') as f:
        json.dump(cache, f, indent=2)

def geocode_location(name, cache, geolocator):
    """Geocode a location with caching"""
    if name in cache:
        return cache[name]
    
    try:
        # Add ", USA" to help with geocoding accuracy
        location = geolocator.geocode(f"{name}, USA", timeout=10)
        if location:
            result = {
                'latitude': location.latitude,
                'longitude': location.longitude,
                'address': location.address
            }
            cache[name] = result
            return result
        else:
            cache[name] = None
            return None
    except (GeocoderTimedOut, GeocoderServiceError) as e:
        print(f"Geocoding error for {name}: {e}")
        cache[name] = None
        return None

# Initialize geocoding cache and geolocator
geocode_cache = load_geocode_cache()
geolocator = Nominatim(user_agent="football_recruiting_map")

print(f"Loaded {len(geocode_cache)} cached geocoding results")


Loaded 258 cached geocoding results


## 3. College Geocoding


In [18]:
# Get unique colleges and geocode them
unique_colleges = df['committedTo'].unique()
print(f"Found {len(unique_colleges)} unique colleges")

# Geocode colleges (this may take a while for first run)
college_coords = {}
geocoded_count = 0
failed_count = 0

for i, college in enumerate(unique_colleges):
    if i % 10 == 0:
        print(f"Geocoding progress: {i}/{len(unique_colleges)} ({i/len(unique_colleges)*100:.1f}%)")
    
    result = geocode_location(college, geocode_cache, geolocator)
    if result:
        college_coords[college] = (result['latitude'], result['longitude'])
        geocoded_count += 1
    else:
        failed_count += 1
    
    # Be respectful to the geocoding service
    time.sleep(1)

# Save updated cache
save_geocode_cache(geocode_cache)

print(f"\nGeocoding complete:")
print(f"Successfully geocoded: {geocoded_count}")
print(f"Failed to geocode: {failed_count}")
print(f"Total cached entries: {len(geocode_cache)}")


Found 258 unique colleges
Geocoding progress: 0/258 (0.0%)
Geocoding progress: 10/258 (3.9%)
Geocoding progress: 20/258 (7.8%)
Geocoding progress: 30/258 (11.6%)
Geocoding progress: 40/258 (15.5%)
Geocoding progress: 50/258 (19.4%)
Geocoding progress: 60/258 (23.3%)
Geocoding progress: 70/258 (27.1%)
Geocoding progress: 80/258 (31.0%)
Geocoding progress: 90/258 (34.9%)
Geocoding progress: 100/258 (38.8%)
Geocoding progress: 110/258 (42.6%)
Geocoding progress: 120/258 (46.5%)
Geocoding progress: 130/258 (50.4%)
Geocoding progress: 140/258 (54.3%)
Geocoding progress: 150/258 (58.1%)
Geocoding progress: 160/258 (62.0%)
Geocoding progress: 170/258 (65.9%)
Geocoding progress: 180/258 (69.8%)
Geocoding progress: 190/258 (73.6%)
Geocoding progress: 200/258 (77.5%)
Geocoding progress: 210/258 (81.4%)
Geocoding progress: 220/258 (85.3%)
Geocoding progress: 230/258 (89.1%)
Geocoding progress: 240/258 (93.0%)
Geocoding progress: 250/258 (96.9%)

Geocoding complete:
Successfully geocoded: 255
Fail

## 4. Data Aggregation and Pathway Analysis


In [19]:
def aggregate_pathways(df, college_coords, sample_size=None):
    """Aggregate recruiting pathways and prepare data for visualization"""
    
    # Filter out rows with missing coordinates or failed geocoding
    valid_df = df[
        (df['latitude'].notna()) & 
        (df['longitude'].notna()) & 
        (df['committedTo'].isin(college_coords.keys()))
    ].copy()
    
    # Filter to only include locations within the United States using country column
    valid_df = valid_df[valid_df['country'] == 'USA'].copy()
    
    print(f"After US filtering: {len(valid_df):,} records from USA")
    
    if sample_size:
        # Sample data for performance (e.g., recent years or top recruits)
        valid_df = valid_df.nlargest(sample_size, 'rating')
    
    print(f"Using {len(valid_df):,} valid records for visualization")
    
    # Calculate city and college recruit counts for hover information
    city_counts = valid_df.groupby(['city', 'stateProvince']).size().reset_index(name='city_total_recruits')
    college_counts = valid_df.groupby('committedTo').size().reset_index(name='college_total_recruits')
    
    # Aggregate pathways (HS -> College) with counts
    pathway_counts = valid_df.groupby(['school', 'committedTo']).size().reset_index(name='recruit_count')
    
    # Add coordinates for high schools and colleges
    pathway_data = []
    
    for _, row in pathway_counts.iterrows():
        hs_name = row['school']
        college_name = row['committedTo']
        count = row['recruit_count']
        
        # Get high school coordinates (already in data)
        hs_record = valid_df[valid_df['school'] == hs_name].iloc[0]
        hs_lat, hs_lon = hs_record['latitude'], hs_record['longitude']
        
        # Get college coordinates (from geocoding)
        college_lat, college_lon = college_coords[college_name]
        
        # Get city total recruits for hover
        city_total = city_counts[
            (city_counts['city'] == hs_record['city']) & 
            (city_counts['stateProvince'] == hs_record['stateProvince'])
        ]['city_total_recruits'].iloc[0] if len(city_counts[
            (city_counts['city'] == hs_record['city']) & 
            (city_counts['stateProvince'] == hs_record['stateProvince'])
        ]) > 0 else 0
        
        # Get college total recruits for hover
        college_total = college_counts[college_counts['committedTo'] == college_name]['college_total_recruits'].iloc[0] if len(college_counts[college_counts['committedTo'] == college_name]) > 0 else 0
        
        pathway_data.append({
            'hs_name': hs_name,
            'hs_lat': hs_lat,
            'hs_lon': hs_lon,
            'hs_city': hs_record['city'],
            'hs_state': hs_record['stateProvince'],
            'college_name': college_name,
            'college_lat': college_lat,
            'college_lon': college_lon,
            'recruit_count': count,
            'city_total_recruits': city_total,
            'college_total_recruits': college_total
        })
    
    return pd.DataFrame(pathway_data)

# Create aggregated data (sample for performance - use recent years or top recruits)
recent_data = df[df['class_year'] >= 2020]  # Focus on recent recruiting
pathway_df = aggregate_pathways(recent_data, college_coords, sample_size=5000)

print(f"\nPathway analysis:")
print(f"Total pathways: {len(pathway_df)}")
print(f"Total recruits: {pathway_df['recruit_count'].sum()}")
print(f"Average recruits per pathway: {pathway_df['recruit_count'].mean():.1f}")
print(f"Max recruits in single pathway: {pathway_df['recruit_count'].max()}")

# Show top pathways
print("\nTop 10 recruiting pathways:")
top_pathways = pathway_df.nlargest(10, 'recruit_count')
for _, row in top_pathways.iterrows():
    print(f"{row['hs_name']} -> {row['college_name']}: {row['recruit_count']} recruits")


After US filtering: 15,799 records from USA
Using 5,000 valid records for visualization

Pathway analysis:
Total pathways: 3236
Total recruits: 3648
Average recruits per pathway: 1.1
Max recruits in single pathway: 11

Top 10 recruiting pathways:
IMG Academy -> Georgia: 11 recruits
IMG Academy -> Alabama: 6 recruits
IMG Academy -> Miami: 6 recruits
IMG Academy -> Michigan: 6 recruits
Miami Central -> Miami: 6 recruits
Chaminade-Madonna -> Miami: 5 recruits
Mater Dei -> USC: 5 recruits
West Bloomfield -> Michigan: 5 recruits
American Heritage -> Miami: 4 recruits
Belleville -> Michigan: 4 recruits


## 5. Interactive Map Creation


In [20]:
def create_recruiting_map(pathway_df, filter_college=None):
    """Create interactive Plotly map showing recruiting pathways"""
    
    # Filter by specific college if requested
    if filter_college:
        pathway_df = pathway_df[pathway_df['college_name'].str.contains(filter_college, case=False, na=False)]
        print(f"Filtered to {len(pathway_df)} pathways for {filter_college}")
    
    # Create figure with Albers USA projection
    fig = go.Figure()
    
    # Add high school markers (blue)
    hs_trace = go.Scattergeo(
        lat=pathway_df['hs_lat'],
        lon=pathway_df['hs_lon'],
        mode='markers',
        marker=dict(
            size=8,
            color='blue',
            opacity=0.7,
            symbol='circle'
        ),
        name='High Schools',
        hovertemplate='<b>%{text}</b><br>High School<br>City Total Recruits: %{customdata}<extra></extra>',
        text=[f"{row['hs_name']}<br>{row['hs_city']}, {row['hs_state']}" for _, row in pathway_df.iterrows()],
        customdata=pathway_df['city_total_recruits']
    )
    fig.add_trace(hs_trace)
    
    # Add college markers (green)
    college_trace = go.Scattergeo(
        lat=pathway_df['college_lat'],
        lon=pathway_df['college_lon'],
        mode='markers',
        marker=dict(
            size=10,
            color='green',
            opacity=0.8,
            symbol='diamond'
        ),
        name='Colleges',
        hovertemplate='<b>%{text}</b><br>College<br>Total Recruits Received: %{customdata}<extra></extra>',
        text=pathway_df['college_name'],
        customdata=pathway_df['college_total_recruits']
    )
    fig.add_trace(college_trace)
    
    # Add connecting lines (red, thickness based on recruit count)
    for _, row in pathway_df.iterrows():
        # Calculate line width based on recruit count (min 1, max 8)
        line_width = max(1, min(8, row['recruit_count']))
        
        # Calculate opacity based on recruit count (min 0.3, max 0.9)
        line_opacity = max(0.3, min(0.9, 0.3 + (row['recruit_count'] / pathway_df['recruit_count'].max()) * 0.6))
        
        # Create RGBA color with opacity
        line_color = f'rgba(255, 0, 0, {line_opacity})'
        
        fig.add_trace(go.Scattergeo(
            lat=[row['hs_lat'], row['college_lat']],
            lon=[row['hs_lon'], row['college_lon']],
            mode='lines',
            line=dict(
                color=line_color,
                width=line_width
            ),
            showlegend=False,
            hoverinfo='skip'  # No hover labels on connecting lines
        ))
    
    # Update layout with Albers USA projection
    fig.update_layout(
        title=dict(
            text=f"Football Recruiting Map{' - ' + filter_college if filter_college else ''}",
            x=0.5,
            font=dict(size=20)
        ),
        geo=dict(
            projection_type='albers usa',
            showland=True,
            landcolor='lightgray',
            showocean=True,
            oceancolor='lightblue',
            showlakes=True,
            lakecolor='lightblue',
            showrivers=True,
            rivercolor='lightblue',
            scope='usa',
            center=dict(lat=39.8283, lon=-98.5795),  # Geographic center of USA
            lonaxis_range=[-125, -66],  # Continental US longitude range
            lataxis_range=[24, 50]     # Continental US latitude range
        ),
        width=1200,
        height=800,
        showlegend=True,
        legend=dict(
            x=0.02,
            y=0.98,
            bgcolor='rgba(255,255,255,0.8)'
        )
    )
    
    return fig

# Create the main recruiting map
print("Creating interactive recruiting map...")
fig = create_recruiting_map(pathway_df)

# Display the map
fig.show()


Creating interactive recruiting map...


## 6. College-Specific Filtering


In [21]:
# Function to show recruiting map for a specific college
def show_college_recruiting(college_name):
    """Show recruiting map filtered to a specific college"""
    print(f"\nCreating recruiting map for {college_name}...")
    
    # Filter data for this college
    college_pathways = pathway_df[pathway_df['college_name'].str.contains(college_name, case=False, na=False)]
    
    if len(college_pathways) == 0:
        print(f"No recruiting data found for {college_name}")
        return
    
    print(f"Found {len(college_pathways)} high schools sending recruits to {college_name}")
    print(f"Total recruits: {college_pathways['recruit_count'].sum()}")
    
    # Show top high schools for this college
    top_hs = college_pathways.nlargest(10, 'recruit_count')
    print("\nTop high schools:")
    for _, row in top_hs.iterrows():
        print(f"  {row['hs_name']} ({row['hs_city']}, {row['hs_state']}): {row['recruit_count']} recruits")
    
    # Create filtered map
    fig = create_recruiting_map(pathway_df, filter_college=college_name)
    fig.show()
    
    return fig

# Example: Show Alabama recruiting
print("Example: Alabama recruiting map")
alabama_fig = show_college_recruiting("Alabama")


Example: Alabama recruiting map

Creating recruiting map for Alabama...
Found 108 high schools sending recruits to Alabama
Total recruits: 136

Top high schools:
  IMG Academy (Bradenton, FL): 6 recruits
  Buford (Buford, GA): 4 recruits
  Central (Phenix City, AL): 4 recruits
  Mater Dei (Santa Ana, CA): 4 recruits
  Carver (Montgomery, AL): 3 recruits
  Thompson (Alabaster, AL): 3 recruits
  Aledo (Aledo, TX): 2 recruits
  All Saints Episcopal (Fort Worth, TX): 2 recruits
  Anniston (Anniston, AL): 2 recruits
  Brookwood (Snellville, GA): 2 recruits
Filtered to 108 pathways for Alabama


## 7. Additional Analysis and Statistics


In [22]:
# Analyze recruiting patterns
print("\n=== RECRUITING ANALYSIS ===")

# Top colleges by total recruits
college_totals = pathway_df.groupby('college_name')['recruit_count'].sum().sort_values(ascending=False)
print("\nTop 15 colleges by total recruits:")
for college, count in college_totals.head(15).items():
    print(f"  {college}: {count} recruits")

# Geographic distribution
print("\nGeographic distribution of high schools:")
state_counts = pathway_df.groupby('hs_state').size().sort_values(ascending=False)
print(f"Top 10 states by number of high schools:")
for state, count in state_counts.head(10).items():
    print(f"  {state}: {count} high schools")

# Distance analysis (approximate)
print("\nRecruiting distance analysis:")
print(f"Average recruits per pathway: {pathway_df['recruit_count'].mean():.1f}")
print(f"Median recruits per pathway: {pathway_df['recruit_count'].median():.1f}")
print(f"Pathways with 1 recruit: {(pathway_df['recruit_count'] == 1).sum()}")
print(f"Pathways with 5+ recruits: {(pathway_df['recruit_count'] >= 5).sum()}")



=== RECRUITING ANALYSIS ===

Top 15 colleges by total recruits:
  Alabama: 136 recruits
  Georgia: 136 recruits
  Ohio State: 121 recruits
  Notre Dame: 119 recruits
  Texas: 117 recruits
  LSU: 117 recruits
  Oregon: 116 recruits
  Texas A&M: 116 recruits
  Michigan: 110 recruits
  Penn State: 109 recruits
  Oklahoma: 102 recruits
  Florida: 98 recruits
  Miami: 92 recruits
  Clemson: 92 recruits
  Tennessee: 86 recruits

Geographic distribution of high schools:
Top 10 states by number of high schools:
  TX: 502 high schools
  FL: 435 high schools
  GA: 346 high schools
  CA: 235 high schools
  AL: 134 high schools
  LA: 111 high schools
  NC: 108 high schools
  OH: 108 high schools
  MS: 90 high schools
  TN: 85 high schools

Recruiting distance analysis:
Average recruits per pathway: 1.1
Median recruits per pathway: 1.0
Pathways with 1 recruit: 2938
Pathways with 5+ recruits: 8


## 8. Export Interactive Map


In [23]:
# Export the main map as HTML file
def save_map_html(fig, filename):
    """Save Plotly figure as interactive HTML file"""
    fig.write_html(filename)
    print(f"Map saved as {filename}")

def save_map_if_not_exists(fig, filename):
    """Save map only if file doesn't already exist"""
    if not os.path.exists(filename):
        save_map_html(fig, filename)
        return True
    else:
        print(f"File {filename} already exists, skipping...")
        return False

# Save the main recruiting map (only if it doesn't exist)
if not os.path.exists('football_recruiting_map.html'):
    save_map_html(fig, 'football_recruiting_map.html')
else:
    print("football_recruiting_map.html already exists, skipping...")

# Save Alabama-specific map if it exists and doesn't already exist
if 'alabama_fig' in locals():
    if not os.path.exists('alabama_recruiting_map.html'):
        save_map_html(alabama_fig, 'alabama_recruiting_map.html')
    else:
        print("alabama_recruiting_map.html already exists, skipping...")

print("\nInteractive maps export complete!")
print("You can open these files in any web browser to view the interactive maps.")


football_recruiting_map.html already exists, skipping...
alabama_recruiting_map.html already exists, skipping...

Interactive maps export complete!
You can open these files in any web browser to view the interactive maps.


In [24]:
## 10. Circle-Based Recruit Count Visualizations


In [25]:
def create_city_recruit_map(df, college_coords, sample_size=None):
    """Create map showing recruit counts by city using circle sizes"""
    
    # Filter and prepare data
    valid_df = df[
        (df['latitude'].notna()) & 
        (df['longitude'].notna()) & 
        (df['committedTo'].isin(college_coords.keys()))
    ].copy()
    
    valid_df = valid_df[valid_df['country'] == 'USA'].copy()
    
    if sample_size:
        valid_df = valid_df.nlargest(sample_size, 'rating')
    
    # Aggregate by city to get total recruits per city
    city_recruits = valid_df.groupby(['city', 'stateProvince', 'latitude', 'longitude']).size().reset_index(name='total_recruits')
    
    # Calculate circle sizes (min 5, max 50)
    max_recruits = city_recruits['total_recruits'].max()
    city_recruits['circle_size'] = city_recruits['total_recruits'].apply(
        lambda x: max(5, min(50, 5 + (x / max_recruits) * 45))
    )
    
    # Create figure
    fig = go.Figure()
    
    # Add city circles
    city_trace = go.Scattergeo(
        lat=city_recruits['latitude'],
        lon=city_recruits['longitude'],
        mode='markers',
        marker=dict(
            size=city_recruits['circle_size'],
            color='blue',
            opacity=0.6,
            symbol='circle',
            line=dict(width=1, color='darkblue')
        ),
        name='Cities by Recruit Count',
        hovertemplate='<b>%{text}</b><br>Total Recruits: %{customdata}<br>%{lat:.3f}, %{lon:.3f}<extra></extra>',
        text=[f"{row['city']}, {row['stateProvince']}" for _, row in city_recruits.iterrows()],
        customdata=city_recruits['total_recruits']
    )
    fig.add_trace(city_trace)
    
    # Update layout
    fig.update_layout(
        title=dict(
            text="Football Recruiting by City (Circle Size = Recruit Count)",
            x=0.5,
            font=dict(size=20)
        ),
        geo=dict(
            projection_type='albers usa',
            showland=True,
            landcolor='lightgray',
            showocean=True,
            oceancolor='lightblue',
            showlakes=True,
            lakecolor='lightblue',
            showrivers=True,
            rivercolor='lightblue',
            scope='usa',
            center=dict(lat=39.8283, lon=-98.5795),
            lonaxis_range=[-125, -66],
            lataxis_range=[24, 50]
        ),
        width=1200,
        height=800,
        showlegend=True,
        legend=dict(
            x=0.02,
            y=0.98,
            bgcolor='rgba(255,255,255,0.8)'
        )
    )
    
    return fig


In [26]:
def create_college_recruit_map(df, college_coords, sample_size=None):
    """Create map showing recruit counts by college using circle sizes"""
    
    # Filter and prepare data
    valid_df = df[
        (df['latitude'].notna()) & 
        (df['longitude'].notna()) & 
        (df['committedTo'].isin(college_coords.keys()))
    ].copy()
    
    valid_df = valid_df[valid_df['country'] == 'USA'].copy()
    
    if sample_size:
        valid_df = valid_df.nlargest(sample_size, 'rating')
    
    # Aggregate by college to get total recruits per college
    college_recruits = valid_df.groupby('committedTo').size().reset_index(name='total_recruits')
    
    # Add coordinates for colleges
    college_data = []
    for _, row in college_recruits.iterrows():
        college_name = row['committedTo']
        college_lat, college_lon = college_coords[college_name]
        college_data.append({
            'college_name': college_name,
            'latitude': college_lat,
            'longitude': college_lon,
            'total_recruits': row['total_recruits']
        })
    
    college_df = pd.DataFrame(college_data)
    
    # Calculate circle sizes (min 8, max 60)
    max_recruits = college_df['total_recruits'].max()
    college_df['circle_size'] = college_df['total_recruits'].apply(
        lambda x: max(8, min(60, 8 + (x / max_recruits) * 52))
    )
    
    # Create figure
    fig = go.Figure()
    
    # Add college circles
    college_trace = go.Scattergeo(
        lat=college_df['latitude'],
        lon=college_df['longitude'],
        mode='markers',
        marker=dict(
            size=college_df['circle_size'],
            color='green',
            opacity=0.7,
            symbol='diamond',
            line=dict(width=1, color='darkgreen')
        ),
        name='Colleges by Recruit Count',
        hovertemplate='<b>%{text}</b><br>Total Recruits: %{customdata}<br>%{lat:.3f}, %{lon:.3f}<extra></extra>',
        text=college_df['college_name'],
        customdata=college_df['total_recruits']
    )
    fig.add_trace(college_trace)
    
    # Update layout
    fig.update_layout(
        title=dict(
            text="Football Recruiting by College (Circle Size = Recruit Count)",
            x=0.5,
            font=dict(size=20)
        ),
        geo=dict(
            projection_type='albers usa',
            showland=True,
            landcolor='lightgray',
            showocean=True,
            oceancolor='lightblue',
            showlakes=True,
            lakecolor='lightblue',
            showrivers=True,
            rivercolor='lightblue',
            scope='usa',
            center=dict(lat=39.8283, lon=-98.5795),
            lonaxis_range=[-125, -66],
            lataxis_range=[24, 50]
        ),
        width=1200,
        height=800,
        showlegend=True,
        legend=dict(
            x=0.02,
            y=0.98,
            bgcolor='rgba(255,255,255,0.8)'
        )
    )
    
    return fig


In [27]:
# Create and display the new circle-based maps
print("Creating city recruit count map...")
city_fig = create_city_recruit_map(recent_data, college_coords, sample_size=5000)
city_fig.show()

print("\nCreating college recruit count map...")
college_fig = create_college_recruit_map(recent_data, college_coords, sample_size=5000)
college_fig.show()


Creating city recruit count map...



Creating college recruit count map...


In [28]:
# Save the new circle-based maps as HTML files (only if they don't exist)
if not os.path.exists('city_recruit_count_map.html'):
    save_map_html(city_fig, 'city_recruit_count_map.html')
    print("city_recruit_count_map.html created")
else:
    print("city_recruit_count_map.html already exists, skipping...")

if not os.path.exists('college_recruit_count_map.html'):
    save_map_html(college_fig, 'college_recruit_count_map.html')
    print("college_recruit_count_map.html created")
else:
    print("college_recruit_count_map.html already exists, skipping...")

print("\nCircle-based maps export complete!")
print("- city_recruit_count_map.html: Shows cities with circle sizes based on total recruits")
print("- college_recruit_count_map.html: Shows colleges with circle sizes based on total recruits")


city_recruit_count_map.html already exists, skipping...
college_recruit_count_map.html already exists, skipping...

Circle-based maps export complete!
- city_recruit_count_map.html: Shows cities with circle sizes based on total recruits
- college_recruit_count_map.html: Shows colleges with circle sizes based on total recruits


## 9. Usage Instructions

### Running the Notebook
1. **Activate the environment**: `conda activate football`
2. **Start Jupyter**: `jupyter notebook` or `jupyter lab`
3. **Open this notebook** and run all cells

### Customizing the Visualization

**To show a specific college's recruiting:**
```python
show_college_recruiting("Ohio State")
```

**To change the data sample:**
```python
# Use all data (may be slow)
pathway_df = aggregate_pathways(df, college_coords)

# Use only recent years
recent_data = df[df['class_year'] >= 2022]
pathway_df = aggregate_pathways(recent_data, college_coords)
```

**To adjust map appearance:**
- Modify marker sizes in the `create_recruiting_map()` function
- Change colors by updating the `color` parameters
- Adjust line thickness by modifying the `line_width` calculation

### Dependencies
The required packages are listed in `environment.yml`:
- pandas: Data manipulation
- plotly: Interactive visualizations
- geopy: Geocoding services
- jupyter: Notebook environment

### Performance Notes
- First run will be slower due to geocoding
- Subsequent runs use cached geocoding results
- Large datasets (>10k records) may be slow to render
- Consider sampling data for initial exploration
