This notebook analyzes the service alerts data from the City of Cape Town, both planned and unplanned, to provide a summary of alerts by service area.

In [1]:
import json
import pandas as pd
from pathlib import Path
import glob

# Find the latest service alerts files
service_alerts_dir = Path('service-alerts-data')
planned_file = max(service_alerts_dir.glob('service_alerts_planned_*.json'))
unplanned_file = max(service_alerts_dir.glob('service_alerts_unplanned_*.json'))

# Read the JSON files
with open(planned_file) as f:
    planned_alerts = json.load(f)
    
with open(unplanned_file) as f:
    unplanned_alerts = json.load(f)

# Convert to DataFrames
planned_df = pd.DataFrame(planned_alerts)
unplanned_df = pd.DataFrame(unplanned_alerts)

# Add a column to identify the alert type
planned_df['alert_type'] = 'Planned'
unplanned_df['alert_type'] = 'Unplanned'

# Combine the dataframes
all_alerts = pd.concat([planned_df, unplanned_df], ignore_index=True)

# Count all alerts by date
print(all_alerts['publish_date'].value_counts())
print()

# Count all alerts by date (Map display: effective_date >= today)
print(all_alerts['effective_date'].value_counts())
print()

# Count all alerts by date
print(all_alerts['expiry_date'].value_counts())
print()

# Count all alerts by date
print(all_alerts['start_timestamp'].value_counts())
print()

# Count all alerts by date
print(all_alerts['forecast_end_timestamp'].value_counts())
print()


publish_date
2025-06-09T22:00:00.000Z    27
2025-06-04T22:00:00.000Z    21
2025-06-08T22:00:00.000Z    19
2025-06-05T22:00:00.000Z    12
2025-06-06T22:00:00.000Z     9
2025-06-03T22:00:00.000Z     8
2025-06-07T22:00:00.000Z     4
2025-06-01T22:00:00.000Z     2
2025-05-19T22:00:00.000Z     2
2024-07-14T22:00:00.000Z     2
2025-06-25T22:00:00.000Z     1
2025-06-24T22:00:00.000Z     1
2025-06-23T22:00:00.000Z     1
2025-06-18T22:00:00.000Z     1
2025-06-12T22:00:00.000Z     1
2025-06-02T22:00:00.000Z     1
2025-05-31T22:00:00.000Z     1
Name: count, dtype: int64

effective_date
2025-06-09T22:00:00.000Z    22
2025-06-08T22:00:00.000Z    22
2025-06-04T22:00:00.000Z    21
2025-06-05T22:00:00.000Z    16
2025-06-06T22:00:00.000Z     9
2025-06-03T22:00:00.000Z     8
2025-06-07T22:00:00.000Z     4
2025-06-01T22:00:00.000Z     2
2025-05-20T22:00:00.000Z     2
2024-07-14T22:00:00.000Z     2
2025-06-12T22:00:00.000Z     1
2025-06-24T22:00:00.000Z     1
2025-06-23T22:00:00.000Z     1
2025-06-18T22:0

Let's analyze the distribution of alerts across different service areas, split by planned vs unplanned:

In [2]:
# Create a pivot table
service_area_summary = pd.pivot_table(
    all_alerts,
    values='Id',
    index='service_area',
    columns='alert_type',
    aggfunc='count',
    fill_value=0
)

# Add a total column
service_area_summary['Total'] = service_area_summary.sum(axis=1)

# Sort by total number of alerts
service_area_summary = service_area_summary.sort_values('Total', ascending=False)

print("Service Alerts Summary:")
print("-" * 60)
print(service_area_summary)
print("\nTotal Alerts:", len(all_alerts))
print("Planned Alerts:", len(planned_df))
print("Unplanned Alerts:", len(unplanned_df))

Service Alerts Summary:
------------------------------------------------------------
alert_type                 Planned  Unplanned  Total
service_area                                        
Electricity                      8         38     46
Water & Sanitation               7         38     45
Refuse                           0         17     17
Drivers Licence Enquiries        0          3      3
Roads and Stormwater             2          0      2

Total Alerts: 113
Planned Alerts: 17
Unplanned Alerts: 96


Let's create a bar chart to visualize the distribution of alerts by service area: