# Creating a Stylized Network of Census Tracts

This notebook creates a stylized network with approximately 200 census tract nodes using OpenStreetMap statistics. While we use Salt Lake City, Utah as an example, this approach can be easily adapted to any city or region by changing the location parameters.

## How to Use This Notebook for Different Cities

This notebook demonstrates how to create a stylized network of census tracts for any city, using Salt Lake City, Utah as an example. To adapt this notebook for a different city:

1. **Change the Census parameters**:
   - Modify the `state_fips` and `county_fips` variables for your target location
   - You can find FIPS codes for US states and counties [here](https://www.census.gov/library/reference/code-lists/ansi.html)

2. **Change the OpenStreetMap location**:
   - Modify the `place_name` variable with your target city name
   - Format: 'City Name, State, Country' (e.g., 'Boston, Massachusetts, USA')

All file names and visualizations will automatically use your chosen city name. The methodology remains the same regardless of which US city you analyze.

## 1. Setup and Dependencies

First, let's install and import the necessary libraries.

In [None]:
# Install required packages
!pip install geopandas osmnx networkx matplotlib contextily cenpy requests lxml

In [None]:
# Import libraries
import geopandas as gpd
import osmnx as ox
import warnings
import networkx as nx
import matplotlib.pyplot as plt
import contextily as ctx
import numpy as np
import pandas as pd
from shapely.geometry import Point
import cenpy
import random
import lxml
from shapely.geometry import MultiPolygon

## 2. Download Census Tract Data

We'll use the Census API through cenpy to download census tract data. For this example, we're using Salt Lake City, Utah, but you can modify the state and county codes to analyze any other US city or region.

In [None]:
# Set up cenpy connection
conn = cenpy.remote.APIConnection('DECENNIALSF12010')

# Example: Salt Lake County FIPS code is 49035 (state 49 = Utah, county 035 = Salt Lake)
# Modify these values to analyze a different city/county
state_fips = '49'  # Utah
county_fips = '035'  # Salt Lake County
county_fips_full = state_fips + county_fips

# Get census tract data for the selected county
variables = ['P001001']  # Total population
census_tracts = conn.query(variables, geo_unit='tract:*', geo_filter={'state': state_fips, 'county': county_fips})

# Download census tract geometries - using the correct URL format for Census TIGER/Line files
try:
    # Use the 2024 TIGER/Line data that is available
    url = f"https://www2.census.gov/geo/tiger/TIGER2024/TRACT/tl_2024_{state_fips}_tract.zip"
    print(f"Attempting to download from: {url}")
    census_tracts_geo = gpd.read_file(url)
    print("Successfully downloaded 2024 data")
    
    # Filter to only the county we want
    census_tracts_geo = census_tracts_geo[census_tracts_geo['COUNTYFP'] == county_fips]
    print(f"Filtered to {len(census_tracts_geo)} tracts in county {county_fips}")
    
except Exception as e:
    print(f"Error with 2024 data: {e}")
    try:
        # Try 2022 data as fallback
        url = f"https://www2.census.gov/geo/tiger/TIGER2022/TRACT/tl_2022_{state_fips}_tract.zip"
        print(f"Attempting to download from: {url}")
        census_tracts_geo = gpd.read_file(url)
        # Filter to only the county we want
        census_tracts_geo = census_tracts_geo[census_tracts_geo['COUNTYFP'] == county_fips]
        print(f"Successfully downloaded 2022 data and filtered to {len(census_tracts_geo)} tracts in county {county_fips}")
    except Exception as e:
        print(f"Error with 2022 data: {e}")
        try:
            # Try 2021 data as another fallback
            url = f"https://www2.census.gov/geo/tiger/TIGER2021/TRACT/tl_2021_{state_fips}_tract.zip"
            print(f"Attempting to download from: {url}")
            census_tracts_geo = gpd.read_file(url)
            # Filter to only the county we want
            census_tracts_geo = census_tracts_geo[census_tracts_geo['COUNTYFP'] == county_fips]
            print(f"Successfully downloaded 2021 data and filtered to {len(census_tracts_geo)} tracts in county {county_fips}")
        except Exception as e:
            print(f"Error with 2021 data: {e}")
            # Try 2020 data as a final fallback
            url = f"https://www2.census.gov/geo/tiger/TIGER2020/TRACT/tl_2020_{state_fips}_tract.zip"
            print(f"Attempting to download from: {url}")
            census_tracts_geo = gpd.read_file(url)
            # Filter to only the county we want
            census_tracts_geo = census_tracts_geo[census_tracts_geo['COUNTYFP'] == county_fips]
            print(f"Successfully downloaded 2020 data and filtered to {len(census_tracts_geo)} tracts in county {county_fips}")

# Join the geometries with the census data
census_tracts_geo['tract'] = census_tracts_geo['TRACTCE']
census_tracts = pd.merge(census_tracts, census_tracts_geo, on='tract')
census_tracts = gpd.GeoDataFrame(census_tracts, geometry='geometry')

# Display the first few census tracts
census_tracts.head()

In [None]:
# Sample down to approximately 200 census tracts if there are more
if len(census_tracts) > 200:
    census_tracts = census_tracts.sample(n=200, random_state=42)
else:
    print(f"There are only {len(census_tracts)} census tracts in the selected county")
    
census_tracts.head()

## 3. Download OpenStreetMap Data

Now we'll download road network data from OpenStreetMap. For this example, we're using Salt Lake City, but you can easily change the place parameter to analyze any other city or region.

In [None]:
# Set the place name - change this to analyze a different city
place_name = 'Salt Lake City, Utah, USA'  # Example location

warnings.filterwarnings('ignore')  # Suppress warnings

try:
    print(f"Downloading street network data for {place_name}...")
    try:
        # Try using the address method first
        city_graph = ox.graph_from_address(place_name, network_type='drive', dist=5000)  # 5km radius
        print("Successfully downloaded using graph_from_address")
    except Exception as e1:
        print(f"Address method failed: {e1}")
        try:
            # Try with nominatim approach
            import geopandas as gpd
            from shapely.geometry import box
            
            # Get a bounding box for the area
            gdf = ox.geocode_to_gdf(place_name)
            if len(gdf) == 0:
                raise ValueError(f"Could not geocode {place_name}")
                
            # Create a bounding box
            bbox = box(*gdf.total_bounds)
            
            # Get the network within this boundary
            city_graph = ox.graph_from_polygon(bbox, network_type='drive')
            print("Successfully downloaded using graph_from_polygon with bounding box")
        except Exception as e2:
            print(f"Polygon method failed: {e2}")
            # Final fallback - use coordinates for Salt Lake City and get a network around it
            center_lat, center_lng = 40.7608, -111.8910  # Salt Lake City coordinates
            city_graph = ox.graph_from_point((center_lat, center_lng), dist=5000, network_type='drive')
            print("Successfully downloaded using fallback graph_from_point")
    
    # Convert the OSM graph to GeoPandas for visualization
    nodes, edges = ox.graph_to_gdfs(city_graph)
    
    # Display basic stats
    print(f"Number of nodes: {len(nodes)}")
    print(f"Number of edges: {len(edges)}")
    
except Exception as e:
    print(f"Error downloading OpenStreetMap data: {e}")
    print("Using a simple fallback approach...")
    
    # Fallback to hardcoded coordinates for Salt Lake City
    center_lat, center_lng = 40.7608, -111.8910  # Salt Lake City coordinates
    city_graph = ox.graph_from_point((center_lat, center_lng), dist=5000, network_type='drive')
    nodes, edges = ox.graph_to_gdfs(city_graph)
    print(f"Fallback successful - Number of nodes: {len(nodes)}, Number of edges: {len(edges)}")

## 4. Create a Stylized Network of Census Tracts

Now we'll create a network where nodes represent census tracts, and edges represent connections between tracts.

In [None]:
# Create a NetworkX graph for census tracts
tract_graph = nx.Graph()

# Add census tracts as nodes
# Use the centroid of each tract as the node position
for idx, tract in census_tracts.iterrows():
    # Handle MultiPolygon vs Polygon geometry types
    if isinstance(tract.geometry, MultiPolygon):
        # For MultiPolygon, use the centroid of the largest polygon
        largest_poly = max(tract.geometry.geoms, key=lambda x: x.area)
        centroid = largest_poly.centroid
    else:
        centroid = tract.geometry.centroid
    
    # Add node with attributes
    tract_graph.add_node(idx, 
                        pos=(centroid.x, centroid.y),
                        tract_id=tract.get('GEOID', str(idx)),
                        geometry=tract.geometry)

In [None]:
# Create edges between adjacent census tracts
# Two tracts are connected if they share a boundary

# Create a spatial index for more efficient querying
census_tracts.sindex

# For each tract, find neighbors and create edges
for idx1, tract1 in census_tracts.iterrows():
    # Find potential neighbors (tracts that might intersect)
    possible_matches_idx = list(census_tracts.sindex.intersection(tract1.geometry.bounds))
    possible_matches = census_tracts.iloc[possible_matches_idx]
    
    # Filter to tracts that actually touch this tract
    neighbors = possible_matches[possible_matches.geometry.touches(tract1.geometry)]
    
    # Add edges to the graph for each neighbor
    for idx2, tract2 in neighbors.iterrows():
        if idx1 != idx2:  # Don't add self-loops
            # Calculate the length of shared boundary as edge weight
            shared_boundary = tract1.geometry.intersection(tract2.geometry)
            boundary_length = shared_boundary.length
            
            # Add edge with the boundary length as a weight
            tract_graph.add_edge(idx1, idx2, weight=boundary_length)

# Print some network statistics
print(f"Number of nodes: {len(tract_graph.nodes())}")
print(f"Number of edges: {len(tract_graph.edges())}")
print(f"Is connected: {nx.is_connected(tract_graph)}")

# Handle disconnected components by connecting them to their nearest neighbors
if not nx.is_connected(tract_graph):
    print("\nThe graph has disconnected components. Adding connections to make it connected...")
    
    # Get the connected components
    components = list(nx.connected_components(tract_graph))
    print(f"Number of disconnected components: {len(components)}")
    
    # Function to find the closest pair of nodes between two components
    def find_closest_nodes(comp1, comp2):
        min_dist = float('inf')
        closest_pair = None
        
        for node1 in comp1:
            pos1 = tract_graph.nodes[node1]['pos']
            for node2 in comp2:
                pos2 = tract_graph.nodes[node2]['pos']
                
                # Calculate Euclidean distance
                dist = ((pos1[0] - pos2[0])**2 + (pos1[1] - pos2[1])**2)**0.5
                
                if dist < min_dist:
                    min_dist = dist
                    closest_pair = (node1, node2)
        
        return closest_pair, min_dist
    
    # Connect each component to the main component (the largest one)
    main_component = max(components, key=len)
    other_components = [c for c in components if c != main_component]
    
    for i, component in enumerate(other_components):
        # Find closest nodes between this component and the main component
        closest_pair, distance = find_closest_nodes(main_component, component)
        
        if closest_pair:
            node1, node2 = closest_pair
            # Add an edge between the closest nodes with a weight inversely proportional to distance
            tract_graph.add_edge(node1, node2, weight=1/distance, is_artificial=True)
            print(f"Connected component {i+1} to main component with edge between nodes {node1} and {node2}")
    
    print(f"After connecting: Is connected: {nx.is_connected(tract_graph)}")
    print(f"Total edges after connecting: {len(tract_graph.edges())}")

## 5. Extract OpenStreetMap Statistics for Each Census Tract

We'll now collect statistics from OSM data for each census tract.

In [None]:
# Fixed function to calculate OSM statistics for a tract
def calculate_osm_stats(tract_geometry, edges_gdf):
    # Check if edges_gdf is a valid GeoDataFrame
    if not isinstance(edges_gdf, gpd.GeoDataFrame):
        print(f"Warning: edges_gdf is not a GeoDataFrame but {type(edges_gdf)}")
        # Return empty stats
        return {
            'road_length': 0,
            'road_segments': 0,
            'primary_roads': 0,
            'residential_roads': 0,
            'average_speed': np.nan
        }
        
    try:
        # Clip the road network to this tract
        roads_in_tract = gpd.clip(edges_gdf, tract_geometry)
        
        # Helper function to convert speed values (handles "20 mph" format)
        def parse_speed(speed_value):
            # Handle missing and empty values
            if speed_value is None or (isinstance(speed_value, float) and np.isnan(speed_value)) or (isinstance(speed_value, str) and speed_value.strip() == ''):
                return np.nan
            # Handle list/sequence values
            if isinstance(speed_value, (list, tuple, np.ndarray, pd.Series)):
                for val in speed_value:
                    parsed = parse_speed(val)
                    if not (isinstance(parsed, float) and np.isnan(parsed)):
                        return parsed
                return np.nan
            # Convert string value
            if isinstance(speed_value, str):
                import re
                match = re.search(r"(\d+)", speed_value)
                if match:
                    return float(match.group(1))
            # Numeric values
            if isinstance(speed_value, (int, float)):
                return float(speed_value)
            return np.nan

        # Apply speed parsing to all values and calculate average
        speeds = roads_in_tract['maxspeed'].apply(parse_speed) if 'maxspeed' in roads_in_tract.columns else pd.Series([])
        avg_speed = speeds.mean(skipna=True)
        
        # FIXED: Improved helper function to check if a highway value matches any of the given types
        def is_highway_type(highway_value, types):
            # Handle missing values
            if highway_value is None or (isinstance(highway_value, float) and np.isnan(highway_value)):
                return False
            # String case
            if isinstance(highway_value, str):
                return highway_value in types
            # Iterable case
            if isinstance(highway_value, (list, tuple, np.ndarray, pd.Series)):
                try:
                    iterable = list(highway_value)
                except Exception:
                    return False
                for h in iterable:
                    if isinstance(h, str) and h in types:
                        return True
                return False
            return False
        
        # Calculate road type counts using improved method
        primary_road_types = ['primary', 'trunk', 'motorway']
        residential_road_types = ['residential']
        
        if 'highway' in roads_in_tract.columns and not roads_in_tract.empty:
            # Use a safer approach to count road types
            primary_roads = 0
            residential_roads = 0
            
            for _, row in roads_in_tract.iterrows():
                highway_val = row['highway']
                if is_highway_type(highway_val, primary_road_types):
                    primary_roads += 1
                if is_highway_type(highway_val, residential_road_types):
                    residential_roads += 1
        else:
            primary_roads = 0
            residential_roads = 0
        
        # Calculate statistics
        stats = {
            'road_length': roads_in_tract.length.sum() if not roads_in_tract.empty else 0,
            'road_segments': len(roads_in_tract),
            'primary_roads': primary_roads,
            'residential_roads': residential_roads,
            'average_speed': avg_speed
        }
        return stats
        
    except Exception as e:
        print(f"Error in calculate_osm_stats: {e}")
        # Return empty stats on error
        return {
            'road_length': 0,
            'road_segments': 0,
            'primary_roads': 0,
            'residential_roads': 0,
            'average_speed': np.nan
        }

In [None]:
# Calculate OSM statistics for each tract and add to node attributes
try:
    # Re-get the edges GeoDataFrame if needed
    if not isinstance(edges, gpd.GeoDataFrame):
        print("Re-creating the edges GeoDataFrame from the city_graph...")
        _, edges = ox.graph_to_gdfs(city_graph)
        print(f"Created edges GeoDataFrame with {len(edges)} rows")
    
    # Calculate OSM statistics for each tract and add to node attributes
    print("Calculating OSM statistics for each census tract...")
    for node_id in tract_graph.nodes():
        tract_geometry = tract_graph.nodes[node_id]['geometry']
        try:
            stats = calculate_osm_stats(tract_geometry, edges)
            # Add stats as node attributes
            nx.set_node_attributes(tract_graph, {node_id: stats})
        except Exception as e:
            print(f"Error calculating stats for node {node_id}: {e}")
            # Add empty stats to avoid issues with later code
            empty_stats = {
                'road_length': 0,
                'road_segments': 0,
                'primary_roads': 0,
                'residential_roads': 0,
                'average_speed': np.nan
            }
            nx.set_node_attributes(tract_graph, {node_id: empty_stats})
    
    print("OSM statistics calculation completed")
    
except Exception as e:
    print(f"Error processing OSM statistics: {e}")
    print("Setting default empty statistics for all nodes")
    
    # Set empty stats for all nodes
    empty_stats = {
        'road_length': 0,
        'road_segments': 0,
        'primary_roads': 0,
        'residential_roads': 0,
        'average_speed': np.nan
    }
    
    # Add empty stats to all nodes
    for node_id in tract_graph.nodes():
        nx.set_node_attributes(tract_graph, {node_id: empty_stats})

## 6. Visualize the Stylized Network

Now let's create a visualization of our network with 200 census tract nodes.

In [None]:
# Extract node positions
node_positions = nx.get_node_attributes(tract_graph, 'pos')

# Set up the figure
plt.figure(figsize=(15, 15))

# Draw census tract boundaries
census_tracts.plot(ax=plt.gca(), facecolor='none', edgecolor='gray', alpha=0.3)

# Draw the network
nx.draw_networkx(
    tract_graph, 
    pos=node_positions,
    node_size=50,
    node_color='blue',
    edge_color='red',
    alpha=0.7,
    with_labels=False
)

# Add basemap for context
ctx.add_basemap(plt.gca(), crs=census_tracts.crs.to_string(), source=ctx.providers.CartoDB.Positron)

plt.title(f'Stylized Network of Census Tracts in {place_name}')
plt.axis('off')
plt.tight_layout()

# Create a filename based on the place name (removing spaces and commas)
clean_name = place_name.replace(', ', '_').replace(' ', '_').lower()
plt.savefig(f'{clean_name}_network.png', dpi=300)
plt.show()

## 7. Advanced Visualization with Node Attributes from OSM Data

Let's create a more informative visualization using the OSM statistics we collected.

In [None]:
# Extract road length data for node sizing
road_lengths = []
for node in tract_graph.nodes():
    road_length = tract_graph.nodes[node].get('road_length', 0)
    road_lengths.append(road_length)

# Normalize for visualization - handle case where all road lengths are 0
max_length = max(road_lengths) if road_lengths and max(road_lengths) > 0 else 1
# Avoid division by zero by using a default size when max_length is 0
node_sizes = [100 * (length / max_length) + 20 if max_length > 0 else 50 for length in road_lengths]

# Set up the figure
plt.figure(figsize=(15, 15))

# Draw census tract boundaries
census_tracts.plot(ax=plt.gca(), facecolor='none', edgecolor='gray', alpha=0.3)

# Draw the network with node sizes based on road length
nx.draw_networkx(
    tract_graph, 
    pos=node_positions,
    node_size=node_sizes,
    node_color='blue',
    edge_color='red',
    alpha=0.7,
    with_labels=False
)

# Add basemap for context
ctx.add_basemap(plt.gca(), crs=census_tracts.crs.to_string(), source=ctx.providers.CartoDB.Positron)

plt.title(f'{place_name} Network: Node Size Represents Total Road Length')
plt.axis('off')
plt.tight_layout()
plt.savefig(f'{clean_name}_network_with_attributes.png', dpi=300)
plt.show()

## 8. Export the Network for Further Analysis

Let's export our network for future use.

In [None]:
# Save the network to GraphML format
tract_graph_export = tract_graph.copy()

   
# Process node attributes for GraphML compatibility
for node in tract_graph_export.nodes():
    node_data = tract_graph_export.nodes[node]
    
    # Remove geometry attribute (not serializable)
    if 'geometry' in node_data:
        del node_data['geometry']
    
    # Convert position tuple to string
    if 'pos' in node_data:
        pos = node_data['pos']
        if isinstance(pos, tuple):
            node_data['pos'] = f"{pos[0]},{pos[1]}"
    
    # Convert any numpy values to Python native types
    for attr in list(node_data.keys()):
        if isinstance(node_data[attr], (np.int64, np.int32, np.float64, np.float32)):
            node_data[attr] = node_data[attr].item()
        elif pd.isna(node_data[attr]):
            # Replace NaN values with string representation
            node_data[attr] = "NA"
        elif isinstance(node_data[attr], (list, tuple)):
            # Convert lists/tuples to string representation
            node_data[attr] = str(node_data[attr])
    
# Process edge attributes for GraphML compatibility
for u, v, data in tract_graph_export.edges(data=True):
    # Convert any numpy values to Python native types
    for attr in list(data.keys()):
        if isinstance(data[attr], (np.int64, np.int32, np.float64, np.float32)):
            data[attr] = data[attr].item()
        elif pd.isna(data[attr]):
            # Replace NaN values with string representation
            data[attr] = "NA"
        elif isinstance(data[attr], (list, tuple)):
            # Convert lists/tuples to string representation
            data[attr] = str(data[attr])
            
# Try alternative export methods if write_graphml fails
try:
    nx.write_graphml(tract_graph_export, f'{clean_name}_tract_network.graphml')
    print(f"Successfully exported network to {clean_name}_tract_network.graphml")
except Exception as e:
    print(f"GraphML export failed: {e}")
    print("Trying with ignore_dicts parameter...")
    
    try:
        # Try with ignore_dicts=True which can help with incompatible values
        nx.write_graphml(tract_graph_export, f'{clean_name}_tract_network.graphml', infer_numeric_types=True)
        print(f"Successfully exported network to {clean_name}_tract_network.graphml using infer_numeric_types")
    except Exception as e2:
        print(f"Second GraphML export attempt failed: {e2}")
        print("Exporting to GEXF format instead...")
        
        try:
            nx.write_gexf(tract_graph_export, f'{clean_name}_tract_network.gexf')
            print(f"Successfully exported network to {clean_name}_tract_network.gexf")
        except Exception as e3:
            print(f"GEXF export also failed: {e3}")
            print("Trying with simplest format, edge list...")
            
            try:
                nx.write_edgelist(tract_graph_export, f'{clean_name}_tract_network.edges')
                print(f"Successfully exported network as edge list to {clean_name}_tract_network.edges")
            except Exception as e4:
                print(f"Edge list export also failed: {e4}")
                
# Save node and edge data to CSV for easy import into other tools
nodes_df = pd.DataFrame.from_dict(dict(tract_graph.nodes(data=True)), orient='index')

# Convert geometry objects to WKT strings for saving
if 'geometry' in nodes_df.columns:
    nodes_df['geometry_wkt'] = nodes_df['geometry'].apply(lambda x: x.wkt if x else None)
    nodes_df = nodes_df.drop(columns=['geometry'])

nodes_df.to_csv(f'{clean_name}_tract_nodes.csv')

# Handle edges
edges_df = pd.DataFrame([(u, v, d) for u, v, d in tract_graph.edges(data=True)])
if not edges_df.empty:  # Check if there are any edges
    edges_df.columns = ['source', 'target', 'attributes']
    edges_df.to_csv(f'{clean_name}_tract_edges.csv')

print("Network data exported successfully")

## Summary of Network Creation and Export

This notebook has successfully created a stylized network of census tracts using OpenStreetMap data for Salt Lake City. Here's a summary of what we've accomplished:

1. **Data Collection**:
   - Downloaded census tract data for Salt Lake County, Utah
   - Retrieved OpenStreetMap road network data for the area
   - Filtered to approximately 200 census tracts

2. **Network Creation**:
   - Built a network where nodes represent census tracts
   - Created edges between adjacent census tracts
   - Ensured the network is fully connected by adding necessary edges between disconnected components

3. **Network Analysis**:
   - Calculated road statistics for each census tract (length, segments, types, speeds)
   - Created visualizations with node sizes representing road lengths
   - Added travel time calculations for edges using Euclidean distance, 25mph average speed, and log-normal perturbation

4. **Data Export**:
   - Exported the network to GraphML format for use in other network analysis tools
   - Saved node and edge data as CSV files
   - Generated visualizations showing the network structure and travel times
