# US Traffic Accidents Visualization

Interactive D3.js visualization showing US traffic accident data by state with detailed road feature breakdowns.

## Instructions
1. Place your CSV file in the `archive/` folder
2. Run all cells in this notebook
3. The visualization will appear at the bottom

In [41]:
import pandas as pd
import numpy as np
import os
import glob
import json
from IPython.display import HTML, display

## Data Processing

In [42]:
# Find CSV file in archive folder
csv_files = glob.glob('archive/*.csv')
if not csv_files:
    print("Error: No CSV file found in archive/ folder")
    print("Please place your CSV file in the archive/ folder")
else:
    csv_file = csv_files[0]
    print(f"Loading traffic data from: {csv_file}")
    df = pd.read_csv(csv_file)
    
    print(f"Original data shape: {df.shape}")
    print(f"\nAll column names:")
    print(df.columns.tolist())

Loading traffic data from: archive\US_Accidents_March23.csv
Original data shape: (7728394, 46)

All column names:
['ID', 'Source', 'Severity', 'Start_Time', 'End_Time', 'Start_Lat', 'Start_Lng', 'End_Lat', 'End_Lng', 'Distance(mi)', 'Description', 'Street', 'City', 'County', 'State', 'Zipcode', 'Country', 'Timezone', 'Airport_Code', 'Weather_Timestamp', 'Temperature(F)', 'Wind_Chill(F)', 'Humidity(%)', 'Pressure(in)', 'Visibility(mi)', 'Wind_Direction', 'Wind_Speed(mph)', 'Precipitation(in)', 'Weather_Condition', 'Amenity', 'Bump', 'Crossing', 'Give_Way', 'Junction', 'No_Exit', 'Railway', 'Roundabout', 'Station', 'Stop', 'Traffic_Calming', 'Traffic_Signal', 'Turning_Loop', 'Sunrise_Sunset', 'Civil_Twilight', 'Nautical_Twilight', 'Astronomical_Twilight']


In [43]:
# Select columns for analysis
columns_needed = ['Severity', 'State', 'City', 'Start_Time', 'Weather_Condition', 
                 'Civil_Twilight', 'Amenity', 'Bump', 'Crossing', 'Give_Way', 
                 'Junction', 'No_Exit', 'Railway', 'Stop', 'Traffic_Signal']

# Check which columns exist
available_cols = [col for col in columns_needed if col in df.columns]
print(f"Available columns: {available_cols}")

df_sample = df[available_cols].dropna().sample(n=50000, random_state=42)
print(f"Sample data shape: {df_sample.shape}")

Available columns: ['Severity', 'State', 'City', 'Start_Time', 'Weather_Condition', 'Civil_Twilight', 'Amenity', 'Bump', 'Crossing', 'Give_Way', 'Junction', 'No_Exit', 'Railway', 'Stop', 'Traffic_Signal']
Sample data shape: (50000, 15)


In [44]:
# Process road features data
road_features = ['Amenity', 'Bump', 'Crossing', 'Give_Way', 'Junction', 'No_Exit', 'Railway', 'Stop', 'Traffic_Signal']
existing_features = [col for col in road_features if col in df_sample.columns]

# Create state-level data for map
state_totals = df_sample.groupby('State').size().reset_index(name='total_accidents')
print("State totals created")

# Create state-level heatmap data
state_heatmaps = {}
for state in df_sample['State'].unique():
    state_data = df_sample[df_sample['State'] == state]
    heatmap_data = []
    
    for twilight in ['Day', 'Night']:
        twilight_data = state_data[state_data['Civil_Twilight'] == twilight]
        row = {'twilight': twilight}
        
        for feature in existing_features:
            count = len(twilight_data[twilight_data[feature] == True]) if len(twilight_data) > 0 else 0
            row[feature] = count
        
        heatmap_data.append(row)
    
    state_heatmaps[state] = heatmap_data

print("Heatmap data created")

State totals created
Heatmap data created


In [45]:
# Save files for visualization
state_totals.to_csv('state_totals.csv', index=False)

with open('state_heatmaps.json', 'w') as f:
    json.dump(state_heatmaps, f)

print("Data processed successfully!")
print("Files created: state_totals.csv, state_heatmaps.json")

# Show basic stats
print("\nData overview:")
print(f"Civil Twilight distribution:\n{df_sample['Civil_Twilight'].value_counts()}")
print(f"\nTop 10 states by accidents:\n{state_totals.sort_values('total_accidents', ascending=False).head(10)}")

Data processed successfully!
Files created: state_totals.csv, state_heatmaps.json

Data overview:
Civil Twilight distribution:
Civil_Twilight
Day      37055
Night    12945
Name: count, dtype: int64

Top 10 states by accidents:
   State  total_accidents
3     CA            11264
8     FL             5767
41    TX             3760
38    SC             2413
32    NY             2306
25    NC             2242
36    PA             1942
43    VA             1932
21    MN             1247
9     GA             1173


## Visualization Design and Analysis

This analysis uses two complementary D3.js visualizations to examine traffic accident patterns across the United States. The first visualization is a choropleth map that displays accident frequency by state using color intensity. This approach allows for quick identification of geographic patterns and helps highlight states that may require focused safety interventions.

The second visualization is a heatmap that examines the relationship between road infrastructure features and time of day. This design reveals how different road elements like traffic signals, crossings, and junctions contribute to accidents during day versus night conditions. Understanding these temporal patterns is crucial for developing targeted safety measures.

Based on existing traffic safety research, we expect to observe higher accident concentrations in densely populated states with extensive highway networks. Additionally, the temporal analysis should reveal distinct patterns where traffic signals show peak accident rates during high-volume daytime hours, while crossings and junctions may demonstrate increased risk during nighttime periods due to reduced visibility.

In [46]:
# Embed D3.js visualizations using IFrame workaround
import subprocess
import time
import threading
from IPython.display import IFrame

# Start server in background
def start_server():
    subprocess.Popen(['python', 'generate_static.py'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

# Start server and wait
server_thread = threading.Thread(target=start_server)
server_thread.daemon = True
server_thread.start()
time.sleep(3)

# Display visualization in iframe with full height
print('D3.js Visualizations:')
display(IFrame('http://localhost:8001/static_visualization.html', width='100%', height=1600))

D3.js Visualizations:


## Results and Discussion

The geographic distribution reveals clear patterns in accident frequency across states. States with darker coloring on the map correspond to higher accident totals, which correlates strongly with population density and urbanization levels. This relationship suggests that traffic volume and infrastructure complexity are primary drivers of accident frequency.

The temporal analysis provides valuable insights into infrastructure-related risk factors. Traffic signals demonstrate elevated accident rates during daytime hours, coinciding with peak traffic volumes during commuting periods. Conversely, crossings and junctions show proportionally higher nighttime accident rates, indicating that reduced visibility and potentially impaired decision-making contribute to increased risk at these locations.

These findings have practical implications for transportation safety policy. States with higher accident concentrations could benefit from targeted resource allocation, particularly focusing on infrastructure improvements at high-risk locations. Enhanced lighting systems at crossings and optimized signal timing could address the specific temporal patterns observed in the data.

This analysis is based on a 50,000-record sample, which provides a substantial foundation but may not capture all seasonal or regional variations. Future research could expand this framework by incorporating weather conditions, seasonal patterns, and socioeconomic factors to develop more comprehensive risk models for traffic safety planning.