# 🏙️ Tulsa, Oklahoma Spatial DBSCAN Analysis

This notebook demonstrates spatial clustering analysis for Tulsa, Oklahoma using PyMapGIS and DBSCAN clustering.

## 🎯 Objectives
- Generate realistic spatial data for Tulsa hotspots
- Apply DBSCAN clustering to identify spatial patterns
- Create interactive maps with cluster visualization
- Export results for further analysis

## 📍 Tulsa Key Locations
- **Downtown Tulsa**: Business and entertainment district
- **Tulsa International Airport**: Major transportation hub
- **University of Tulsa**: Educational institution
- **Gathering Place**: Major riverfront park and attraction
- **Brookside District**: Shopping and dining area

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import geopandas as gpd
import folium
from folium import plugins
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

print("✅ Libraries imported successfully!")
print("🗺️ Ready to analyze Tulsa, Oklahoma!")

In [None]:
# Tulsa Configuration
CITY_NAME = "Tulsa"
STATE = "Oklahoma"
CITY_CENTER = [36.1540, -95.9928]  # Downtown Tulsa coordinates
ZOOM_LEVEL = 11

# Define Tulsa hotspots (key activity areas)
tulsa_hotspots = {
    "Downtown Tulsa": [36.1540, -95.9928],
    "Tulsa Airport": [36.1984, -95.8881],
    "University of Tulsa": [36.1512, -95.9443],
    "Gathering Place": [36.1615, -95.9880],
    "Brookside District": [36.1180, -95.9792]
}

# Geographic boundaries for Tulsa area
LAT_MIN, LAT_MAX = 36.05, 36.25
LON_MIN, LON_MAX = -96.10, -95.85

print(f"🏙️ Analyzing {CITY_NAME}, {STATE}")
print(f"📍 City center: {CITY_CENTER}")
print(f"🎯 Number of hotspots: {len(tulsa_hotspots)}")
print(f"📏 Analysis area: {LAT_MAX-LAT_MIN:.2f}° × {LON_MAX-LON_MIN:.2f}°")

In [None]:
# Generate realistic spatial data for Tulsa
def generate_tulsa_data():
    """
    Generate realistic spatial data points around Tulsa hotspots
    """
    all_points = []
    
    # Points per hotspot (adjust based on area importance)
    points_config = {
        "Downtown Tulsa": 150,      # Major business district
        "Tulsa Airport": 80,        # Transportation hub
        "University of Tulsa": 100, # Educational institution
        "Gathering Place": 120,     # Major attraction
        "Brookside District": 70    # Shopping/dining area
    }
    
    for hotspot_name, center_coords in tulsa_hotspots.items():
        num_points = points_config[hotspot_name]
        
        # Generate points around each hotspot with realistic spread
        lat_center, lon_center = center_coords
        
        # Adjust spread based on area type
        if "Downtown" in hotspot_name:
            spread = 0.008  # Tight cluster for downtown
        elif "Airport" in hotspot_name:
            spread = 0.012  # Medium spread for airport area
        else:
            spread = 0.010  # Standard spread for other areas
        
        # Generate points with normal distribution around center
        lats = np.random.normal(lat_center, spread, num_points)
        lons = np.random.normal(lon_center, spread, num_points)
        
        # Ensure points stay within Tulsa boundaries
        lats = np.clip(lats, LAT_MIN, LAT_MAX)
        lons = np.clip(lons, LON_MIN, LON_MAX)
        
        # Create point data
        for lat, lon in zip(lats, lons):
            all_points.append({
                'latitude': lat,
                'longitude': lon,
                'hotspot_origin': hotspot_name,
                'point_id': len(all_points)
            })
    
    return pd.DataFrame(all_points)

# Generate the data
tulsa_data = generate_tulsa_data()

print(f"📊 Generated {len(tulsa_data)} data points for Tulsa analysis")
print(f"🎯 Hotspot distribution:")
print(tulsa_data['hotspot_origin'].value_counts())
print(f"\n📍 Coordinate ranges:")
print(f"   Latitude: {tulsa_data['latitude'].min():.4f} to {tulsa_data['latitude'].max():.4f}")
print(f"   Longitude: {tulsa_data['longitude'].min():.4f} to {tulsa_data['longitude'].max():.4f}")

In [None]:
# Apply DBSCAN clustering
def apply_spatial_clustering(data, eps_meters=500, min_samples=3):
    """
    Apply DBSCAN clustering to spatial data
    
    Parameters:
    - eps_meters: Maximum distance between points in a cluster (meters)
    - min_samples: Minimum number of points required to form a cluster
    """
    # Convert coordinates to a format suitable for clustering
    coordinates = data[['latitude', 'longitude']].values
    
    # Convert eps from meters to degrees (approximate)
    # 1 degree ≈ 111,000 meters at the equator
    # Adjust for Tulsa's latitude (36.15°)
    lat_correction = np.cos(np.radians(36.15))
    eps_degrees = eps_meters / (111000 * lat_correction)
    
    # Apply DBSCAN clustering
    dbscan = DBSCAN(eps=eps_degrees, min_samples=min_samples)
    cluster_labels = dbscan.fit_predict(coordinates)
    
    # Add cluster labels to data
    data_clustered = data.copy()
    data_clustered['cluster'] = cluster_labels
    
    # Calculate cluster statistics
    n_clusters = len(set(cluster_labels)) - (1 if -1 in cluster_labels else 0)
    n_noise = list(cluster_labels).count(-1)
    
    print(f"🔍 DBSCAN Clustering Results:")
    print(f"   Parameters: eps={eps_meters}m, min_samples={min_samples}")
    print(f"   Number of clusters: {n_clusters}")
    print(f"   Number of noise points: {n_noise}")
    print(f"   Percentage clustered: {((len(data) - n_noise) / len(data) * 100):.1f}%")
    
    return data_clustered, n_clusters, n_noise

# Apply clustering with Tulsa-appropriate parameters
tulsa_clustered, num_clusters, num_noise = apply_spatial_clustering(
    tulsa_data, 
    eps_meters=500,  # 500 meter radius for urban clustering
    min_samples=3    # Minimum 3 points per cluster
)

# Display cluster summary
cluster_summary = tulsa_clustered.groupby('cluster').agg({
    'latitude': ['count', 'mean'],
    'longitude': 'mean',
    'hotspot_origin': lambda x: x.mode().iloc[0] if len(x.mode()) > 0 else 'Mixed'
}).round(4)

cluster_summary.columns = ['Point_Count', 'Avg_Latitude', 'Avg_Longitude', 'Primary_Hotspot']
print(f"\n📈 Cluster Summary:")
print(cluster_summary.head(10))

In [None]:
# Create interactive map visualization
def create_tulsa_map(data_clustered):
    """
    Create an interactive Folium map showing Tulsa clusters
    """
    # Create base map centered on Tulsa
    tulsa_map = folium.Map(
        location=CITY_CENTER,
        zoom_start=ZOOM_LEVEL,
        tiles='OpenStreetMap'
    )
    
    # Define colors for clusters
    cluster_colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred', 
                     'lightred', 'beige', 'darkblue', 'darkgreen', 'cadetblue',
                     'darkpurple', 'white', 'pink', 'lightblue', 'lightgreen']
    
    # Add cluster points to map
    for idx, row in data_clustered.iterrows():
        cluster_id = row['cluster']
        
        if cluster_id == -1:
            # Noise points in gray
            color = 'gray'
            popup_text = f"Noise Point\nOrigin: {row['hotspot_origin']}"
        else:
            # Cluster points in assigned colors
            color = cluster_colors[cluster_id % len(cluster_colors)]
            popup_text = f"Cluster {cluster_id}\nOrigin: {row['hotspot_origin']}"
        
        folium.CircleMarker(
            location=[row['latitude'], row['longitude']],
            radius=4,
            popup=popup_text,
            color=color,
            fill=True,
            fillColor=color,
            fillOpacity=0.7
        ).add_to(tulsa_map)
    
    # Add hotspot markers
    for hotspot_name, coords in tulsa_hotspots.items():
        folium.Marker(
            location=coords,
            popup=f"📍 {hotspot_name}",
            icon=folium.Icon(color='black', icon='star')
        ).add_to(tulsa_map)
    
    # Add title
    title_html = f'''
                 <h3 align="center" style="font-size:20px"><b>Tulsa, Oklahoma - Spatial DBSCAN Analysis</b></h3>
                 <p align="center">Clusters: {num_clusters} | Points: {len(data_clustered)} | Noise: {num_noise}</p>
                 '''
    tulsa_map.get_root().html.add_child(folium.Element(title_html))
    
    return tulsa_map

# Create and display the map
tulsa_map = create_tulsa_map(tulsa_clustered)
print("🗺️ Interactive map created successfully!")
print("📍 Black stars show original hotspot locations")
print("🔴 Colored circles show clustered data points")
print("⚫ Gray circles show noise points (not in any cluster)")

# Display the map
tulsa_map

In [None]:
# Export results for further analysis
def export_tulsa_results(data_clustered):
    """
    Export clustering results to CSV and GeoJSON formats
    """
    # Export to CSV
    csv_filename = 'tulsa_spatial_analysis.csv'
    data_clustered.to_csv(csv_filename, index=False)
    print(f"📄 Results exported to {csv_filename}")
    
    # Create GeoDataFrame for spatial export
    from shapely.geometry import Point
    
    geometry = [Point(lon, lat) for lat, lon in zip(data_clustered['latitude'], data_clustered['longitude'])]
    gdf = gpd.GeoDataFrame(data_clustered, geometry=geometry, crs='EPSG:4326')
    
    # Export to GeoJSON
    geojson_filename = 'tulsa_clusters.geojson'
    gdf.to_file(geojson_filename, driver='GeoJSON')
    print(f"🗺️ Spatial data exported to {geojson_filename}")
    
    return csv_filename, geojson_filename

# Export the results
csv_file, geojson_file = export_tulsa_results(tulsa_clustered)

# Generate summary report
print(f"\n📊 TULSA SPATIAL ANALYSIS SUMMARY")
print(f"="*50)
print(f"City: {CITY_NAME}, {STATE}")
print(f"Analysis Date: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M')}")
print(f"Total Data Points: {len(tulsa_clustered)}")
print(f"Number of Clusters: {num_clusters}")
print(f"Noise Points: {num_noise} ({(num_noise/len(tulsa_clustered)*100):.1f}%)")
print(f"Clustering Success Rate: {((len(tulsa_clustered)-num_noise)/len(tulsa_clustered)*100):.1f}%")
print(f"\n📁 Output Files:")
print(f"   • {csv_file} - Tabular data with cluster assignments")
print(f"   • {geojson_file} - Spatial data for GIS applications")
print(f"\n🎯 Largest Clusters:")
largest_clusters = tulsa_clustered[tulsa_clustered['cluster'] != -1]['cluster'].value_counts().head(5)
for cluster_id, count in largest_clusters.items():
    primary_hotspot = tulsa_clustered[tulsa_clustered['cluster'] == cluster_id]['hotspot_origin'].mode().iloc[0]
    print(f"   • Cluster {cluster_id}: {count} points (primarily {primary_hotspot})")

## 🎉 Analysis Complete!

You've successfully completed a spatial DBSCAN analysis for Tulsa, Oklahoma! 

### 🔍 What We Discovered
- Identified spatial clusters around key Tulsa locations
- Analyzed clustering patterns in the city
- Created interactive visualizations
- Exported results for further analysis

### 🚀 Next Steps
1. **Experiment with parameters** - Try different `eps` and `min_samples` values
2. **Add real data** - Incorporate actual Tulsa datasets (crime, business locations, etc.)
3. **Compare with other cities** - Run similar analysis for other Oklahoma cities
4. **Temporal analysis** - Add time-based clustering analysis

### 🏙️ Create Your Own City Analysis
Want to analyze your own city? Check out the **[Add Your City Guide](../ADD_YOUR_CITY_GUIDE.md)** for step-by-step instructions!

### 📚 Learn More
- **[PyMapGIS on PyPI](https://pypi.org/project/pymapgis/)** - Explore more spatial analysis capabilities
- **[DBSCAN Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html)** - Learn about clustering parameters
- **[Folium Documentation](https://python-visualization.github.io/folium/)** - Create more advanced maps

---
**Happy spatial analyzing! 🗺️✨**