# 🔮 Mobile Speed Camera Data Processing

This notebook processes raw mobile speed camera data and converts it into the GeoJSON format required for the heatmap visualization.

## Overview

1. Load raw data
2. Clean and preprocess
3. Geocode locations (if needed)
4. Transform to GeoJSON
5. Export for visualization


In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json
import os
from pathlib import Path

## 1. Load Raw Data

First, let's load the mobile speed camera data from the CSV file.


In [2]:
# Set paths
ROOT_DIR = Path('../')
DATA_DIR = ROOT_DIR / 'data'
OUTPUT_DIR = Path('../output')

# Create output directory if it doesn't exist
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Load the mobile speed camera data
camera_path = DATA_DIR / 'Mobile_Speed_Camera_Visits_and_Stays.csv'
df = pd.read_csv(camera_path, header=0)

# Display the first few rows
df.head()

Unnamed: 0,Date,TimeAtSiteInHours,Description of Site,Camera Location,Street,Number Checked,Highest Speed,Average Speed,Posted Speed
0,12/04/2023,1.57,Barron Street/Stonehaven Crescent,0288A,Strickland Crescent,438,0.0,0.0,50
1,18/04/2023,1.42,Belconnen Way & Ginninderra Drive,0098E,Haydon Drive,412,0.0,0.0,80
2,20/04/2023,1.58,Between Brown St Dudley Street,0052B,Novar Street,214,0.0,0.0,60
3,12/04/2023,1.52,Between Brown Street & Dudley Street,0052C,Novar Street,176,0.0,0.0,60
4,14/04/2023,1.58,Between Hopetoun Circuit and Strickland Crescent,0105A,Stonehaven Crescent,380,0.0,0.0,50


In [3]:
# Basic data exploration
print(f"Dataset shape: {df.shape}")
df.info()

Dataset shape: (83486, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 83486 entries, 0 to 83485
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Date                 83486 non-null  object 
 1   TimeAtSiteInHours    83468 non-null  float64
 2   Description of Site  83486 non-null  object 
 3   Camera Location      83486 non-null  object 
 4   Street               83486 non-null  object 
 5   Number Checked       83486 non-null  int64  
 6   Highest Speed        83430 non-null  float64
 7   Average Speed        83481 non-null  float64
 8   Posted Speed         83486 non-null  int64  
dtypes: float64(3), int64(2), object(4)
memory usage: 5.7+ MB


## 2. Clean and Preprocess Data

Now let's clean the data and prepare it for geocoding.


In [4]:
# Clean column names (remove spaces, lowercase)
df.columns = [col.strip().lower().replace(' ', '_') for col in df.columns]

# Show columns after cleaning
df.columns

Index(['date', 'timeatsiteinhours', 'description_of_site', 'camera_location',
       'street', 'number_checked', 'highest_speed', 'average_speed',
       'posted_speed'],
      dtype='object')

In [5]:
# Check for missing values
print("Missing values by column:")
df.isna().sum()

Missing values by column:


date                    0
timeatsiteinhours      18
description_of_site     0
camera_location         0
street                  0
number_checked          0
highest_speed          56
average_speed           5
posted_speed            0
dtype: int64

In [6]:
# Convert date to datetime and extract useful features
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month

# Show the transformed data
df[['date', 'year', 'month']].head()

Unnamed: 0,date,year,month
0,2023-04-12,2023,4
1,2023-04-18,2023,4
2,2023-04-20,2023,4
3,2023-04-12,2023,4
4,2023-04-14,2023,4


## 3. Geocoding Locations

We need to map the locations to coordinates. For this, we could either:

1. Use a geocoding service
2. Create a mapping of known locations

Let's check what unique locations we have first.


In [7]:
# Check unique streets
print(f"Number of unique streets: {df['street'].nunique()}")
print("\nSample of streets:")
df['street'].unique()[:20]

Number of unique streets: 314

Sample of streets:


array(['Strickland Crescent', 'Haydon Drive', 'Novar Street',
       'Stonehaven Crescent', 'Gundaroo Drive', 'McCulloch Street',
       'Shumack Street', 'Kosciuszko Avenue', 'Bangalay Crescent',
       'Goyder Street', 'Hodgson Crescent', 'Shoalhaven Avenue',
       'Blackwood Terrace', 'Kings Avenue', 'Dixon Drive',
       'Duggan Street', 'Burrinjuck Crecent', 'Melrose Drive',
       'Learmonth Drive', 'Drakeford Drive'], dtype=object)

In [32]:
# Add this cell after discovering the unique streets

from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut, GeocoderServiceError
import time
import json
from tqdm import tqdm  # For progress bar

# First check if we already have saved coordinates
coordinates_file = OUTPUT_DIR / 'street_coordinates.json'
street_coordinates = {}

# Load existing coordinates if available
if os.path.exists(coordinates_file):
    with open(coordinates_file, 'r') as f:
        print(f"Loading existing coordinates from {coordinates_file}")
        street_coordinates = json.load(f)
    print(f"Loaded {len(street_coordinates)} street coordinates")

# Function to safely geocode an address with retries
def geocode_with_retry(geolocator, address, attempt=1, max_attempts=5):
    try:
        # Add "Canberra, ACT, Australia" to make the query more specific
        full_address = f"{address}, ACT, Australia"
        location = geolocator.geocode(full_address)
        return location
    except (GeocoderTimedOut, GeocoderServiceError) as e:
        if attempt <= max_attempts:
            time.sleep(1)  # Wait before retrying
            return geocode_with_retry(geolocator, address, attempt=attempt+1)
        return None

# Create a geocoder instance with a custom user agent
geolocator = Nominatim(user_agent="data_alchemy_mobile_camera_processing")

# Get the unique streets
unique_streets = df['street'].unique()
print(f"Found {len(unique_streets)} unique streets to geocode")

# Find streets that need to be geocoded
streets_to_geocode = [street for street in unique_streets if street not in street_coordinates]
print(f"Need to geocode {len(streets_to_geocode)} new streets")

# Geocode each street that doesn't have coordinates yet
if streets_to_geocode:
    for street in tqdm(streets_to_geocode, desc="Geocoding streets"):
        if street not in street_coordinates:  # Double check
            location = geocode_with_retry(geolocator, street)
            
            if location:
                street_coordinates[street] = {'lat': location.latitude, 'lon': location.longitude}
            else:
                print(f"Could not find coordinates for: {street}")
            
            # Be nice to the geocoding service
            time.sleep(1)
            
            # Save progress after each successful geocoding
            with open(coordinates_file, 'w') as f:
                json.dump(street_coordinates, f, indent=2)

# Print the results
print(f"Successfully geocoded {len(street_coordinates)} out of {len(unique_streets)} streets")
print(f"Coordinates saved to {coordinates_file}")

# For streets that couldn't be geocoded, add some default values for important ones
# This is a pre-defined list of important streets in Canberra
default_coordinates = {
    'Gungahlin Drive': {'lat': -35.2150, 'lon': 149.1150},
    'Horse Park Drive': {'lat': -35.1750, 'lon': 149.1450},
    'Parkes Way': {'lat': -35.2900, 'lon': 149.1390},
    'Belconnen Way': {'lat': -35.2440, 'lon': 149.0740},
    'William Hovell Drive': {'lat': -35.2750, 'lon': 149.0300},
    'Hindmarsh Drive': {'lat': -35.3560, 'lon': 149.1000},
    'Athllon Drive': {'lat': -35.3950, 'lon': 149.0850},
    'Yamba Drive': {'lat': -35.3750, 'lon': 149.1000},
    'Tuggeranong Parkway': {'lat': -35.3400, 'lon': 149.0450},
    'Monaro Highway': {'lat': -35.3890, 'lon': 149.1500},
    'Drakeford Drive': {'lat': -35.4250, 'lon': 149.0700},
    'Canberra Avenue': {'lat': -35.3430, 'lon': 149.1770},
    'Barton Highway': {'lat': -35.2100, 'lon': 149.0950},
    'Federal Highway': {'lat': -35.2370, 'lon': 149.1600},
    'Kings Highway': {'lat': -35.3520, 'lon': 149.2350},
    'Lady Denman Drive': {'lat': -35.2900, 'lon': 149.0850},
    'Mugga Lane': {'lat': -35.3780, 'lon': 149.1300},
    'Sutton Road': {'lat': -35.1900, 'lon': 149.2600},
    'Kingsford Smith Drive': {'lat': -35.2250, 'lon': 149.0350},
    'Macarthur Avenue': {'lat': -35.2550, 'lon': 149.1300},
    'Antill Street': {'lat': -35.2450, 'lon': 149.1420},
    'Namatjira Drive': {'lat': -35.3550, 'lon': 149.0380}
}

# Add default coordinates for important streets that might be missing
for street, coords in default_coordinates.items():
    if street not in street_coordinates:
        street_coordinates[street] = coords
        print(f"Added default coordinates for {street}")

# Save final version with defaults included
with open(coordinates_file, 'w') as f:
    json.dump(street_coordinates, f, indent=2)

print(f"Final coordinates count: {len(street_coordinates)}")

Loading existing coordinates from ../output/street_coordinates.json
Loaded 314 street coordinates
Found 314 unique streets to geocode
Need to geocode 0 new streets
Successfully geocoded 314 out of 314 streets
Coordinates saved to ../output/street_coordinates.json
Final coordinates count: 314


In [33]:

# Function to add coordinates based on the street name
def add_coordinates(row):
    street = row['street']
    if street in street_coordinates:
        return pd.Series([street_coordinates[street]['lat'], street_coordinates[street]['lon']])
    else:
        # Default to center of Canberra for unknown locations
        return pd.Series([None, None])

# Apply the function to add latitude and longitude columns
df[['latitude', 'longitude']] = df.apply(add_coordinates, axis=1)

# Check how many locations were successfully geocoded
print(f"Locations with coordinates: {df['latitude'].notna().sum()} out of {len(df)}")

# Display sample with coordinates
df[df['latitude'].notna()].head()

Locations with coordinates: 83486 out of 83486


Unnamed: 0,date,timeatsiteinhours,description_of_site,camera_location,street,number_checked,highest_speed,average_speed,posted_speed,year,month,latitude,longitude
0,2023-04-12,1.57,Barron Street/Stonehaven Crescent,0288A,Strickland Crescent,438,0.0,0.0,50,2023,4,-35.318611,149.101296
1,2023-04-18,1.42,Belconnen Way & Ginninderra Drive,0098E,Haydon Drive,412,0.0,0.0,80,2023,4,-35.242947,149.090703
2,2023-04-20,1.58,Between Brown St Dudley Street,0052B,Novar Street,214,0.0,0.0,60,2023,4,-35.31288,149.098216
3,2023-04-12,1.52,Between Brown Street & Dudley Street,0052C,Novar Street,176,0.0,0.0,60,2023,4,-35.31288,149.098216
4,2023-04-14,1.58,Between Hopetoun Circuit and Strickland Crescent,0105A,Stonehaven Crescent,380,0.0,0.0,50,2023,4,-35.317822,149.108902


## 4. Analyze and Aggregate Data

Now, let's analyze the data by aggregating it in meaningful ways.


In [35]:
# Group by street and calculate total hours, number of visits, and total vehicles checked
street_stats = df.groupby('street').agg(
    total_hours=('timeatsiteinhours', 'sum'),
    visits=('date', 'count'),
    total_checked=('number_checked', 'sum')
).sort_values('visits', ascending=False)

# Display top 20 most visited streets
street_stats.head(20)

Unnamed: 0_level_0,total_hours,visits,total_checked
street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Monaro Highway,5040.99,3774,4421729
Athllon Drive,3184.7,2377,1880615
Gungahlin Drive,2512.57,1936,2196939
Belconnen Way,2502.91,1929,1451788
Hindmarsh Drive,2340.76,1774,1552137
Canberra Avenue,2218.38,1675,1299611
Majura Parkway,1958.83,1494,1606654
Yamba Drive,1938.47,1494,848083
Ginninderra Drive,1933.86,1478,965716
Northbourne Avenue,1796.94,1374,1247971


In [36]:
# Add coordinates to the aggregated data
def add_coords_to_agg(index):
    street = index
    if street in street_coordinates:
        return street_coordinates[street]['lat'], street_coordinates[street]['lon']
    else:
        return None, None
    
# Add latitude and longitude to the aggregated data
street_stats['latitude'], street_stats['longitude'] = zip(*street_stats.index.map(add_coords_to_agg))

# Create a DataFrame for streets with coordinates
geo_data = street_stats.dropna(subset=['latitude', 'longitude']).reset_index()

# Normalize the visits to get intensity values between 0-100
geo_data['intensity'] = 100 * (geo_data['visits'] - geo_data['visits'].min()) / (geo_data['visits'].max() - geo_data['visits'].min())

# Display the result
geo_data.head()

Unnamed: 0,street,total_hours,visits,total_checked,latitude,longitude,intensity
0,Monaro Highway,5040.99,3774,4421729,-35.315856,149.16968,100.0
1,Athllon Drive,3184.7,2377,1880615,-35.420157,149.065853,62.973761
2,Gungahlin Drive,2512.57,1936,2196939,-35.258908,149.088125,51.285449
3,Belconnen Way,2502.91,1929,1451788,-35.243707,149.055269,51.09992
4,Hindmarsh Drive,2340.76,1774,1552137,-35.341428,149.046281,46.991784


## 5. Convert to GeoJSON

Now let's transform our data to GeoJSON format for visualization.


In [37]:
# Create GeoJSON features
features = []
for _, row in geo_data.iterrows():
    feature = {
        "type": "Feature",
        "properties": {
            "intensity": float(row['intensity']),
            "location": row['street'],
            "visits": int(row['visits']),
            "hours": float(row['total_hours']),
            "checked": int(row['total_checked'])
        },
        "geometry": {
            "type": "Point",
            "coordinates": [float(row['longitude']), float(row['latitude'])]
        }
    }
    features.append(feature)

# Create the GeoJSON structure
geojson_data = {
    "type": "FeatureCollection",
    "features": features
}

# Preview the first feature
geojson_data["features"][0]

{'type': 'Feature',
 'properties': {'intensity': 100.0,
  'location': 'Monaro Highway',
  'visits': 3774,
  'hours': 5040.99,
  'checked': 4421729},
 'geometry': {'type': 'Point', 'coordinates': [149.1696805, -35.315856]}}

In [38]:
# Save the GeoJSON data to a file
output_file = OUTPUT_DIR / 'speed_cameras.json'
with open(output_file, 'w') as f:
    json.dump(geojson_data, f, indent=2)

print(f"GeoJSON data saved to {output_file}")

GeoJSON data saved to ../output/speed_cameras.json


In [39]:
# Also save a CSV version for compatibility
csv_output = OUTPUT_DIR / 'speed_cameras.csv'
geo_data[['street', 'latitude', 'longitude', 'intensity', 'visits', 'total_hours', 'total_checked']].to_csv(csv_output, index=False)
print(f"CSV data saved to {csv_output}")

CSV data saved to ../output/speed_cameras.csv


## 6. Copy to Web App

Finally, let's copy the processed files to the client/data directory so they can be used by the web application.


In [None]:
import shutil

# Define paths
client_data_dir = ROOT_DIR / '..' / 'client' / 'data'

# Ensure the client data directory exists
os.makedirs(client_data_dir, exist_ok=True)

# Copy the GeoJSON file
shutil.copy(output_file, client_data_dir / 'speed_cameras.json')

# Copy the CSV file
shutil.copy(csv_output, client_data_dir / 'speed_cameras.csv')

print(f"Files copied to web app directory: {client_data_dir}")

Files copied to web app directory: ../../client/data
