# City Planning 101: Census Data, ArcGIS, and Data Visualization

This notebook demonstrates how to work with census data, ArcGIS, and data visualization tools for city planning applications.

## Setup

First, let's import the necessary libraries:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization defaults
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['figure.dpi'] = 100

print("✓ Libraries imported successfully")

## Part 1: Working with Census Data

The Census Bureau provides extensive demographic and economic data. We'll use the `census` package to access this data.

### Setting up Census API Access

To use the Census API:
1. Get a free API key from: https://api.census.gov/data/key_signup.html
2. Set it as an environment variable: `export CENSUS_API_KEY='your_key_here'`

For this tutorial, we'll work with sample data instead.

In [None]:
# Example: What census data looks like
# This simulates data you would get from the Census API

census_data_example = pd.DataFrame({
    'state_name': ['California', 'Texas', 'Florida', 'New York', 'Pennsylvania'],
    'total_population': [39538223, 29145505, 21538187, 20201249, 13002700],
    'median_age': [36.5, 34.8, 42.0, 38.9, 40.8],
    'median_household_income': [75235, 64034, 59227, 72108, 63463]
})

print("Sample Census Data:")
display(census_data_example)

## Part 2: Data Manipulation with Pandas and NumPy

Now let's create sample city planning data and perform analysis.

In [None]:
# Create sample city district data
districts_data = {
    'district_name': [
        'Downtown', 'North Side', 'South Side', 'East Side', 'West Side',
        'Riverside', 'Hillside', 'Industrial', 'Suburban', 'Commercial'
    ],
    'population': [25000, 18500, 22000, 15000, 19000, 12000, 8500, 3000, 28000, 5000],
    'area_sq_miles': [2.5, 4.2, 3.8, 5.1, 3.5, 2.8, 6.2, 8.5, 12.0, 1.5],
    'median_income': [65000, 52000, 48000, 45000, 58000, 70000, 55000, 42000, 72000, 85000],
    'parks_count': [5, 8, 6, 4, 7, 10, 15, 2, 20, 1],
    'schools_count': [3, 5, 4, 3, 4, 2, 3, 1, 8, 0],
    'crime_rate': [3.2, 2.8, 4.1, 3.5, 2.5, 1.8, 1.2, 2.0, 1.5, 2.3]
}

df = pd.DataFrame(districts_data)

print("City Districts Data:")
display(df)

### Calculate Urban Planning Metrics

In [None]:
# Calculate population density
df['population_density'] = df['population'] / df['area_sq_miles']

# Calculate amenities per capita
df['parks_per_10k'] = (df['parks_count'] / df['population']) * 10000
df['schools_per_10k'] = (df['schools_count'] / df['population']) * 10000

# Calculate livability score
df['livability_score'] = (
    (df['median_income'] / df['median_income'].max()) * 30 +
    ((df['crime_rate'].max() - df['crime_rate']) / df['crime_rate'].max()) * 30 +
    (df['parks_per_10k'] / df['parks_per_10k'].max()) * 20 +
    (df['schools_per_10k'] / df['schools_per_10k'].max()) * 20
)

print("Calculated Metrics:")
display(df[['district_name', 'population_density', 'parks_per_10k', 'livability_score']].head())

### Statistical Analysis with NumPy

In [None]:
# Population statistics
pop_array = df['population'].values

print("Population Statistics:")
print(f"  Mean: {np.mean(pop_array):,.0f}")
print(f"  Median: {np.median(pop_array):,.0f}")
print(f"  Std Dev: {np.std(pop_array):,.0f}")
print(f"  Min: {np.min(pop_array):,.0f}")
print(f"  Max: {np.max(pop_array):,.0f}")

# Correlation analysis
correlation = np.corrcoef(df['median_income'], df['crime_rate'])[0, 1]
print(f"\nCorrelation (Income vs Crime Rate): {correlation:.3f}")

## Part 3: Data Visualization

Let's create various visualizations to understand our city planning data.

### Population Distribution

In [None]:
# Population by district
plt.figure(figsize=(12, 6))
df_sorted = df.sort_values('population', ascending=False)
bars = plt.bar(df_sorted['district_name'], df_sorted['population'], 
               color='steelblue', edgecolor='navy', alpha=0.7)

plt.title('Population by District', fontsize=16, fontweight='bold')
plt.xlabel('District', fontsize=12)
plt.ylabel('Population', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)

# Add value labels
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
            f'{int(height):,}', ha='center', va='bottom', fontsize=9)

plt.tight_layout()
plt.show()

### Income vs Crime Rate Analysis

In [None]:
# Scatter plot: Income vs Crime
plt.figure(figsize=(10, 6))
scatter = plt.scatter(df['median_income'], df['crime_rate'], 
                     s=df['population']/100, alpha=0.6,
                     c=df['population_density'], cmap='viridis',
                     edgecolors='black', linewidth=1)

# Add labels
for idx, row in df.iterrows():
    plt.annotate(row['district_name'], 
                (row['median_income'], row['crime_rate']),
                xytext=(5, 5), textcoords='offset points', fontsize=8)

cbar = plt.colorbar(scatter)
cbar.set_label('Population Density', rotation=270, labelpad=20)

plt.title('Median Income vs Crime Rate by District', fontsize=14, fontweight='bold')
plt.xlabel('Median Household Income ($)', fontsize=12)
plt.ylabel('Crime Rate (per 1000)', fontsize=12)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

### District Metrics Heatmap

In [None]:
# Create heatmap of normalized metrics
plt.figure(figsize=(10, 8))

metrics = ['population', 'median_income', 'parks_count', 'crime_rate', 'population_density']
df_normalized = df[metrics].copy()

# Normalize
for col in metrics:
    df_normalized[col] = (df[col] - df[col].min()) / (df[col].max() - df[col].min())

df_normalized.index = df['district_name']

sns.heatmap(df_normalized.T, annot=True, fmt='.2f', cmap='YlOrRd',
            cbar_kws={'label': 'Normalized Value'},
            linewidths=0.5, linecolor='gray')

plt.title('District Metrics Heatmap (Normalized)', fontsize=14, fontweight='bold')
plt.xlabel('District', fontsize=12)
plt.ylabel('Metric', fontsize=12)
plt.tight_layout()
plt.show()

### Comprehensive Dashboard

In [None]:
# Create a multi-panel dashboard
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('City Planning Dashboard', fontsize=16, fontweight='bold')

# Plot 1: Population
ax1 = axes[0, 0]
df_sorted = df.sort_values('population', ascending=False)
ax1.bar(range(len(df_sorted)), df_sorted['population'], color='steelblue', alpha=0.7)
ax1.set_xticks(range(len(df_sorted)))
ax1.set_xticklabels(df_sorted['district_name'], rotation=45, ha='right')
ax1.set_title('Population by District')
ax1.set_ylabel('Population')
ax1.grid(axis='y', alpha=0.3)

# Plot 2: Income distribution
ax2 = axes[0, 1]
ax2.hist(df['median_income'], bins=8, color='green', alpha=0.7, edgecolor='black')
ax2.set_title('Median Income Distribution')
ax2.set_xlabel('Median Income ($)')
ax2.set_ylabel('Frequency')
ax2.grid(axis='y', alpha=0.3)

# Plot 3: Crime vs Density
ax3 = axes[1, 0]
ax3.scatter(df['population_density'], df['crime_rate'], s=100, alpha=0.6, c='red', edgecolors='black')
ax3.set_title('Crime Rate vs Population Density')
ax3.set_xlabel('Population Density (per sq mi)')
ax3.set_ylabel('Crime Rate')
ax3.grid(alpha=0.3)

# Plot 4: Parks vs Population
ax4 = axes[1, 1]
ax4.scatter(df['population'], df['parks_count'], s=100, alpha=0.6, c='forestgreen', edgecolors='black')
ax4.set_title('Parks Count vs Population')
ax4.set_xlabel('Population')
ax4.set_ylabel('Number of Parks')
ax4.grid(alpha=0.3)

plt.tight_layout()
plt.show()

## Part 4: ArcGIS Integration

The ArcGIS Python API provides powerful GIS capabilities for spatial analysis.

### Key Capabilities:

1. **Geocoding**: Convert addresses to coordinates
2. **Feature Layers**: Access and query spatial data
3. **Web Maps**: Create interactive maps
4. **Spatial Analysis**: Perform GIS operations

### Example: Creating Location Data

In [None]:
# Create a DataFrame with spatial information
facilities = pd.DataFrame({
    'name': ['City Hall', 'Central Park', 'Community Center', 'Public Library', 'Fire Station'],
    'type': ['Government', 'Recreation', 'Community', 'Education', 'Emergency'],
    'latitude': [40.7128, 40.7829, 40.7589, 40.7531, 40.7308],
    'longitude': [-74.0060, -73.9654, -73.9851, -73.9772, -73.9973],
    'capacity': [500, 50000, 300, 150, 50]
})

print("City Planning Facilities with Coordinates:")
display(facilities)

print("\n✓ This data can be used with ArcGIS to create feature layers and web maps")

### Using ArcGIS (Optional)

To use the full ArcGIS functionality:

```python
from arcgis.gis import GIS
from arcgis.geocoding import geocode

# Connect to ArcGIS Online
gis = GIS()  # Anonymous access
# For authenticated access: gis = GIS("https://www.arcgis.com", username, password)

# Geocode an address
location = geocode("1600 Pennsylvania Avenue NW, Washington, DC")[0]
print(f"Coordinates: {location['location']}")

# Search for data
items = gis.content.search("census tracts", item_type="Feature Layer")
```

## Summary

This notebook demonstrated:

1. **Census Data**: How to structure and work with demographic data
2. **Pandas/NumPy**: Data manipulation and statistical analysis for city planning
3. **Visualization**: Creating informative plots with matplotlib and seaborn
4. **ArcGIS**: Spatial data concepts and GIS integration

### Next Steps:

- Get a Census API key and explore real census data
- Sign up for ArcGIS Online to access full GIS capabilities
- Explore additional libraries like geopandas for spatial operations
- Create interactive visualizations with plotly or folium