# Traffic Incident Analysis in Manila



## Introduction

Traffic incidents are a critical aspect of urban planning and safety management. Understanding the factors that contribute to these incidents can help policymakers and stakeholders make informed decisions to improve road safety and reduce the frequency and severity of accidents.

In this notebook, we will explore a dataset containing traffic incident reports over a span of 2-3 years. The dataset includes detailed information such as the location (latitude and longitude), time of day, date, and weather conditions at the time of the incident. Our analysis will focus on identifying patterns and correlations between these variables to determine the impact of location, time of day, and weather on traffic incidents.

We will address several key questions:
- **Do specific locations experience a higher frequency of incidents?**
- **Are there certain times of day when incidents are more likely to occur?**
- **How does weather affect the frequency and severity of traffic incidents?**

By the end of this analysis, we aim to provide insights that can inform traffic safety strategies and improve overall road conditions. This notebook is intended for a general audience, including recruiters and other data scientists who may be interested in evaluating the depth and quality of your data science skills.

In [11]:
# Load the dataset
import pandas as pd

df = pd.read_csv('data/data_mmda_traffic_spatial.csv')

In [12]:
df.head()

Unnamed: 0,Date,Time,City,Location,Latitude,Longitude,High_Accuracy,Direction,Type,Lanes_Blocked,Involved,Tweet,Source
0,2018-08-20,7:55 AM,Pasig City,ORTIGAS EMERALD,14.586343,121.061481,1,EB,VEHICULAR ACCIDENT,1.0,TAXI AND MC,MMDA ALERT: Vehicular accident at Ortigas Emer...,https://twitter.com/mmda/status/10313302019705...
1,2018-08-20,8:42 AM,Mandaluyong,EDSA GUADIX,14.589432,121.057243,1,NB,STALLED L300 DUE TO MECHANICAL PROBLEM,1.0,L300,MMDA ALERT: Stalled L300 due to mechanical pro...,https://twitter.com/mmda/status/10313462477459...
2,2018-08-20,9:13 AM,Makati City,EDSA ROCKWELL,14.559818,121.040737,1,SB,VEHICULAR ACCIDENT,1.0,SUV AND L300,MMDA ALERT: Vehicular accident at EDSA Rockwel...,https://twitter.com/mmda/status/10313589669896...
3,2018-08-20,8:42 AM,Mandaluyong,EDSA GUADIX,14.589432,121.057243,1,NB,STALLED L300 DUE TO MECHANICAL PROBLEM,1.0,L300,MMDA ALERT: Stalled L300 due to mechanical pro...,https://twitter.com/mmda/status/10313590696535...
4,2018-08-20,10:27 AM,San Juan,ORTIGAS CLUB FILIPINO,14.601846,121.046754,1,EB,VEHICULAR ACCIDENT,1.0,2 CARS,MMDA ALERT: Vehicular accident at Ortigas Club...,https://twitter.com/mmda/status/10313711248424...


In [13]:
# Check the data types of each column
df.dtypes

Date              object
Time              object
City              object
Location          object
Latitude         float64
Longitude        float64
High_Accuracy      int64
Direction         object
Type              object
Lanes_Blocked    float64
Involved          object
Tweet             object
Source            object
dtype: object

## Dataset Overview

The dataset consists of traffic incident records with the following columns:

| Column Name       | Data Type | Description                                 |
|-------------------|-----------|---------------------------------------------|
| `Date`            | object    | The date the incident occurred              |
| `Time`            | object    | The time the incident occurred              |
| `City`            | object    | City where the incident took place          |
| `Location`        | object    | Textual location description                |
| `Latitude`        | float64   | Latitude coordinate of the incident         |
| `Longitude`       | float64   | Longitude coordinate of the incident        |
| `High_Accuracy`   | int64     | Indicator of GPS accuracy (1 = high)        |
| `Direction`       | object    | Direction of travel (if available)          |
| `Type`            | object    | Type of incident (e.g., accident, hazard)   |
| `Lanes_Blocked`   | float64   | Number of lanes blocked (if any)            |
| `Involved`        | object    | Entities involved (if mentioned)            |
| `Tweet`           | object    | Original tweet text describing the incident |
| `Source`          | object    | Source account or reporting entity          |

### Data Cleaning Note

The `Date` and `Time` columns are currently stored as `object` types. To enable proper temporal analysis, we will:

- Convert both `Date` and `Time` to appropriate datetime formats.
- Create a new column called `Datetime` by combining `Date` and `Time`.

This will help with:
- Sorting and filtering by time
- Extracting features like hour, day of week, etc.
- Time-based visualizations and aggregations

In [None]:
# 

[datetime.time(7, 55) datetime.time(8, 42) datetime.time(9, 13) ...
 datetime.time(2, 4) datetime.time(4, 44) datetime.time(2, 13)]


In [None]:
# Handling missing values for 'Time' and 'Direction'
df['Time'] = df['Time'].fillna('Unknown')
df['Direction'] = df['Direction'].fillna('Unknown')

# Handling missing values for 'City', 'Location', 'Type', 'Lanes_Blocked', and 'Involved'
df = df.dropna(subset=['City', 'Location', 'Type', 'Lanes_Blocked', 'Involved'])