## City of Toronto Collisions Data


The Total Collisions Dataset is a CSV file containing detailed records of motor vehicle collisions within the City of Toronto. The dataset uses the WGS84 Coordinate Reference System, ensuring consistent geographic representation of collision locations. Key attributes include the geographic location of each collision, whether it resulted in a fatality or injury, and the timestamp of the event. For our analysis, we will focus on data from 2021 to 2024 to align with recent census data, providing insights into contemporary trends and patterns in collisions. Additionally, the dataset may include supplementary fields such as road conditions, weather visibility, and types of vehicles involved, offering a comprehensive view of the contributing factors to these incidents. By analyzing this dataset, we aim to identify high-risk areas and underlying causes of collisions to inform preventative strategies and improve road safety.

## Setup Notebook

In [3]:
# Import 3rd party libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pylab as plt
import geopandas as gpd
import matplotlib.pyplot as plt
import folium
from IPython.display import display
import geopandas as gpd
from shapely.geometry import Point

# Configure Notebook
%matplotlib inline
plt.style.use('fivethirtyeight')
sns.set_context("notebook")
import warnings
warnings.filterwarnings('ignore')

## Import GeoJson Data

In [None]:
# Create a base map
map_2 = folium.Map(location=[43.6426, -79.3871], 
                   tiles='OpenStreetMap', 
                   zoom_start=10)

# Correct the GeoJSON file path
geojson_file_path = "/Users/aaliyashaikh/Desktop/Traffic_Collisions_Open_Data_-6024052409346627848.geojson"

# Ensure the file exists at the specified path
try:
    folium.GeoJson(geojson_file_path, name="Collision Data").add_to(map_2)
    # Display the map
    map_2
except FileNotFoundError:
    print(f"Error: The file '{geojson_file_path}' was not found. Please verify the path.")

# Display the map directly in the notebook
display(map_2)

In [None]:
# Import dataset as a DataFrane
collision_data = gpd.read_file('FATALS_KSI_4359710384762535516.geojson')

# View DataFrame
collision_data.head()

## Data Analysis

In [None]:
# Check the number of columns and rows
collision_data.shape

In [None]:
# Check the columns in DataFrame
collision_data.columns

Based on the Toronto Police Service's Traffic Collisions Open Data (ASR-T-TBL-001), here is a description of each column in the dataset:

- OBJECTID: A unique identifier for each collision record.
- EVENT_UNIQUE_ID: A unique code assigned to each specific event.
- OCC_DATE: The date and time of the collision occurrence.
- OCC_MONTH: The month in which the collision occurred.
- OCC_DOW: The day of the week on which the collision happened.
- OCC_YEAR: The year in which the collision occurred.
- OCC_HOUR: The hour of the day (24-hour format) when the collision took place.
- DIVISION: The division code where the incident was recorded.
- FATALITIES: The number of fatalities resulting from the collision.
- INJURY_COLLISIONS: Indicates whether the collision involved injuries (e.g., "YES" or "NO").
- FTR_COLLISIONS: Indicates whether the collision was a "Fail to Remain" incident.
- PD_COLLISIONS: Indicates whether the collision involved property damage.
- HOOD_158: Numeric code representing the neighborhood where the collision occurred.
- NEIGHBOURHOOD_158: The name of the neighborhood and its associated number.
- LONG_WGS84: Longitude of the collision location (in WGS84 coordinate system).
- LAT_WGS84: Latitude of the collision location (in WGS84 coordinate system).
- AUTOMOBILE: Indicates if an automobile was involved in the collision (e.g., "YES" or "NO").
- MOTORCYCLE: Indicates if a motorcycle was involved in the collision.
- PASSENGER: Indicates if a passenger vehicle was involved in the collision.
- BICYCLE: Indicates if a bicycle was involved in the collision.
- PEDESTRIAN: Indicates if a pedestrian was involved in the collision.
- x: A spatial coordinate (likely projected coordinate system) for mapping the location.
- y: Another spatial coordinate for mapping the location.

In [None]:
# Check data types per column
print(collision_data.dtypes)

In [None]:
# Check numerical statistics for each column
collision_data.describe()

When analyzing collision data, the columns to drop depend on your analysis goals. However, some columns may be less relevant or redundant for most collision-related analyses. Here’s a suggestion of columns you can drop and why:

Columns that can be dropped:
- OBJECTID: A unique identifier that is not informative for analysis.
- EVENT_UNIQUE_ID: Similar to OBJECTID, this is unique to each record and not relevant for statistical or pattern analysis.
- HOOD_158: Numeric code for neighborhoods is redundant when the neighborhood name (NEIGHBOURHOOD_158) is available.
- x and y: Projected spatial coordinates may be redundant if LONG_WGS84 and LAT_WGS84 are present and sufficient for geospatial analysis.
- OCC_MONTH: The month can be derived from the date in OCC_DATE if needed.
- OCC_YEAR: Similarly, the year can also be derived from OCC_DATE.
- DIVISION: This may be redundant if neighborhood-level analysis is sufficient or if it doesn’t contribute to your objectives.
- PASSENGER: This column may overlap with AUTOMOBILE since most passenger vehicles are classified as automobiles. You can keep one depending on your needs.

In [None]:
# Drop columns 
collision_data = collision_data.drop(columns = ['OBJECTID', 'EVENT_UNIQUE_ID', 'HOOD_158', 'x', 'y',
                                      'OCC_MONTH', 'OCC_YEAR', 'DIVISION', 'PASSENGER', 'AUTOMOBILE',
                                      'FTR_COLLISIONS', 'PD_COLLISIONS', 'MOTORCYCLE'], errors='ignore')

# Check if columns are removed
collision_data.columns

## Data Cleaning

Data cleaning for the collision dataset ensures:

- Accuracy: Removes errors and inconsistencies.
- Efficiency: Streamlines the dataset for quicker and easier analysis.
- Reliability: Produces trustworthy insights and recommendations.
- Focus: Tailors the data for the specific analysis of collision patterns and trends.

In [None]:
# Check for missing values
print(collision_data.isnull().sum())

Handling missing values should be approached differently for numerical and categorical data to ensure the dataset's integrity and reliability. For numerical columns, such as LATITUDE, LONGITUDE, missing values can be imputed using statistically meaningful methods like the mean or median, depending on the distribution. This ensures that numerical characteristics are preserved without introducing bias, allowing smooth analysis and modeling. However, for categorical columns like AUTOMOBILE, OCC_DATE, or MOTORCYCLE, missing values often carry contextual meaning, such as unrecorded or unknown data. Imputing these values with the most frequent category (mode) could distort the actual patterns and introduce bias. Instead, categorical columns are better handled by retaining NaN values or replacing them with a placeholder like "Unknown" or "Not Recorded", preserving their contextual integrity. By treating these data types differently, we ensure accurate and meaningful analysis without compromising the dataset's validity.

In [None]:
# Function to handle missing data for both numerical and categorical columns
def handle_missing_data(collision_data):
    for column in collision_data.columns:
        if collision_data[column].isnull().sum() > 0:  # Check for missing values
            if pd.api.types.is_numeric_dtype(collision_data[column]):
                # Numerical data: Use mean or median based on skewness
                skewness = collision_data[column].skew()
                if abs(skewness) < 0.5:  # Normally distributed
                    impute_value = collision_data[column].mean()
                    print(f"Imputing missing values in numerical column '{column}' with mean: {impute_value:.2f}")
                else:  # Skewed distribution
                    impute_value = collision[column].median()
                    print(f"Imputing missing values in numerical column '{column}' with median: {impute_value:.2f}")
                collision_data[column].fillna(impute_value, inplace=True)
            
    return collision_data

# Handle missing data
collision_handled = handle_missing_data(collision_data)

In [None]:
# Verify that missing values have been handled
print(collision_handled.isnull().sum())

All numerical missing (null) values have been dealt with (removed or imputed). Let's also remove all duplicates. 

In [None]:
# Remove Duplicates
collision_data.drop_duplicates(inplace=True)

Now let's remove outliers with the Interquartile Range method. 

In [None]:
# Function to calculate IQR and remove outliers
def remove_outliers(collision_data,column):
    Q1 = collision_data[column].quantile(0.25)  # 25th percentile
    Q3 = collision_data[column].quantile(0.75)  # 75th percentile
    IQR = Q3 - Q1                   # Inter-Quartile Range
    
    # Define lower and upper bounds
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    
    # Filter out rows with outliers
    return collision_data[(collision_data[column] >= lower_bound) & (collision_data[column] <= upper_bound)]

# List of numeric columns
numeric_columns = collision_data.select_dtypes(include=['float64', 'int64']).columns

# Remove outliers using the IQR method
for column in numeric_columns:
    collision_data = remove_outliers(collision_data, column)

That is all for data cleaning, let's see the new size of the dataset.

In [None]:
collision_data.shape

In [None]:
# View DataFrame
collision_data.head()

## Exploratory Data Analysis of Collisions Data

The exploratory data analysis (EDA) for the collisions dataset aims to uncover patterns, trends, and insights into the distribution and characteristics of collisions across Toronto. By examining factors such as temporal trends, spatial distributions, road conditions, light conditions, and driver behavior, we can better understand the key contributors to traffic incidents. This analysis provides a foundational understanding of the dataset, enabling us to identify high-risk areas, seasonal patterns, and other influential factors that impact collision frequency. Through this EDA, we aim to generate actionable insights that can inform road safety initiatives, policy decisions, and future predictive modeling efforts.

### Extracted Time Features from Collision Data

After converting the `DATE` column to a proper datetime format, we extracted several time-based attributes to analyze collision patterns more effectively. The following features were derived:

1. **Year**: The year in which the collision occurred.
2. **Month**: The month of the collision (1 = January, 12 = December).
3. **Day of the Week**: The name of the day (e.g., Monday, Tuesday).
4. **Hour**: The hour of the day when the collision occurred (0-23).

These features allow for temporal analysis of the data, such as identifying trends over years, seasonal patterns, or the most common days and hours for collisions. For example, examining collisions by the day of the week can help identify whether weekdays or weekends experience more accidents, while the hour attribute can reveal peak traffic or high-risk hours.


In [None]:
# Ensure the DATE column is in datetime format
collision_data['DATE'] = pd.to_datetime(collision_data['DATE'], errors='coerce')

# Extract year, month, day of the week, and hour
collision_data['Year'] = collision_data['DATE'].dt.year
collision_data['Month'] = collision_data['DATE'].dt.month
collision_data['DayOfWeek'] = collision_data['DATE'].dt.day_name()
collision_data['Hour'] = collision_data['DATE'].dt.hour

# Display the updated DataFrame
print(collision_data.head())


These extracted features will support predictive modeling, such as forecasting collisions by time or understanding time-based risk factors within each ward.

### Yearly Collision Trends Analysis

Understanding the yearly trends in collision data is essential for identifying patterns over time, such as increases or decreases in the number of collisions. This analysis can highlight the effectiveness of safety interventions or reveal emerging risks that require attention. By plotting the number of collisions for each year, we can assess long-term trends, which can inform city planning, policy-making, and resource allocation for traffic safety improvements. The following code calculates and visualizes the yearly collision trends from the dataset to provide these insights.


In [None]:
# Plot collisions per year
plt.figure(figsize=(10, 6))
sns.countplot(x='Year', data=collision_data)
plt.title('Yearly Collision Counts')
plt.xlabel('Year')
plt.ylabel('Number of Collisions')
plt.xticks(rotation=45)
plt.show()


### Monthly Collision Patterns

Understanding the distribution of collisions across months provides valuable insights into seasonal trends and potential environmental or behavioral factors influencing road safety. Analyzing monthly collision patterns can help identify peak months for collisions, enabling policymakers and urban planners to implement targeted safety measures during high-risk periods. Below, we will explore the monthly collision trends using the dataset.

In [None]:
# Plot collisions per month
plt.figure(figsize=(10, 6))
sns.countplot(x='Month', data=collision_data)
plt.title('Monthly Collision Counts')
plt.xlabel('Month')
plt.ylabel('Number of Collisions')
plt.xticks(range(0, 12), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], rotation=45)
plt.show()

#### Relationship between Month and Year
To explore the monthly collision trends across individual years from 2020 to 2024, we will use a series of bar plots, one for each year. This approach allows us to focus on the collision counts for each month within a given year, enabling a more granular examination of seasonal patterns and year-specific variations. By visualizing how collisions fluctuate month-by-month for each year, we can identify recurring trends, seasonal peaks, or anomalies specific to certain years. This targeted analysis is particularly useful for tailoring safety interventions and resource allocation to address year-specific collision patterns effectively.

In [None]:
# Ensure the data has necessary columns: 'Year', 'Month' and collision counts
collision_data['Month'] = collision_data['DATE'].dt.month
collision_data['Year'] = collision_data['DATE'].dt.year

# Filter data to include only years from 2021 to 2024
collision_filtered = collision_data[(collision_data['Year'] >= 2020) & (collision_data['Year'] <= 2023)]

# Group the data by 'Year' and 'Month' to count the number of collisions per month per year
monthly_trends = collision_filtered.groupby(['Year', 'Month']).size().reset_index(name='Collision_Count')

# Create separate plots for each year showing monthly collision counts
g = sns.FacetGrid(monthly_trends, col="Year", col_wrap=2, height=4, sharey=True)
g.map(sns.barplot, "Month", "Collision_Count", order=range(1, 13))

# Set titles and labels
g.set_axis_labels("Month", "Collision Count")
g.set_titles("{col_name}")
g.fig.suptitle("Monthly Collision Trends by Year (2020-2023)", y=1.02)

plt.show()

### Day of the Week Collision Pattern

To further analyze collision patterns, we will examine how collisions are distributed across different days of the week. This analysis can help uncover trends related to weekly activities, such as increased traffic on weekdays due to work commutes or higher collision counts on weekends due to leisure travel. By visualizing these patterns, we can better understand how daily behaviors influence road safety and identify high-risk days for implementing targeted interventions.

In [None]:
# Plot collisions by day of the week
plt.figure(figsize=(10, 6))
sns.countplot(x='DayOfWeek', data=collision_data, order=['Monday', 'Tuesday', 'Wednesday', 
                                                    'Thursday', 'Friday', 'Saturday', 'Sunday'])
plt.title('Collisions by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Number of Collisions')
plt.xticks(rotation=45)
plt.show()

###  Hour Collision Patterns

## Import Ward Data

To predict the number of collisions in Toronto, the collision data must be spatially joined with the ward boundaries to determine which ward each collision occurred in. This process involves using the geographic coordinates (latitude and longitude) from the collision data and mapping them to the corresponding ward polygons in the ward dataset. After assigning collisions to their respective wards, we can aggregate the number of collisions per ward to identify trends and high-risk areas. This data can then be enriched with additional ward-specific features, such as population, road density, and traffic volume, to build a predictive model. By training a machine learning model with these features, we can forecast collision counts and provide actionable insights for city planning and traffic safety improvements.

In [None]:
# Load the shapefile
ward_shapefile_path = "WARD_WGS84.shx" 
ward_data = gpd.read_file(ward_shapefile_path.replace(".shx", ".shp"))  # Replace with .shp extension

# Create a base map
map_wards = folium.Map(location=[43.7, -79.4], zoom_start=11)  # Adjust to center on Toronto

# Add the ward shapefile to the map
ward_data_json = ward_data.to_json()  # Convert GeoDataFrame to GeoJSON
folium.GeoJson(ward_data_json, name="Wards").add_to(map_wards)

# Display the map
map_wards

## Ward Data Analysis 

In [None]:
# View geoDataFrame
ward_data.head()

In [None]:
# Check the number of columns and rows
ward_data.shape

In [None]:
# Check the columns in DataFrame
ward_data.columns

The columns describe the following:
- 'AREA_ID': A unique identifier for each ward in the dataset.
- 'AREA_TYPE': Indicates the type of area (all is WD18 for wards in the 2018 model of Toronto).
- 'AREA_S_CD': A short code representing the ward, likely numeric.
- 'AREA_L_CD': A longer code that may provide additional context or differentiation for the ward.
- 'AREA_NAME': The official name of the ward.
- X,y : coordiantes in a projected coordinate system.
- Longitude, Latitude: Geographic lat and long of the ward's approximate center.
  
Let's explore the differences between 'AREA_S_CD' and 'AREA_L_CD'.

In [None]:
ward_data['AREA_S_CD'].unique()

In [None]:
ward_data['AREA_L_CD'].unique()

There are no differences between the two, therefore drop 'AREA_L_CD'. 

These other features will also be dropped:
- Drop 'AREA_ID' and let 'AREA_S_CD' act as the only identifier of the wards.
- Drop 'AREA_TYPE' as all wards are of type WD18, therefore, no new information or classification is provided. 

In [None]:
# Drop unnecessary columns 
ward_data = ward_data.drop(columns = ['AREA_ID', 'AREA_S_CD', 'AREA_TYPE', 'X', 'Y', 'LONGITUDE', 'LATITUDE'], errors='ignore')

# Check the columnsare dropped in geoDataFrame
ward_data.columns

In [None]:
# Check data types per column
print(ward_data.dtypes)

## Ward Data Cleaning

In [None]:
# Check for missing values
print(ward_data.isnull().sum())
# Check for duplicates
ward_data.duplicated().sum()

In [None]:
# View geoDataFrame
ward_data.head()

The results indicate that there are no missing values or duplicate rows across key attributes like AREA_L_CD, AREA_NAME, X, Y, LONGITUDE, LATITUDE, and geometry. This suggests that the dataset is clean and ready for subsequent analysis, ensuring the integrity of the data for spatial operations and further processing.

## Overlay Collision Data onto Ward Data 


Overlaying the collision data onto the ward data is essential for spatial analysis and understanding where collisions are occurring within the city. By mapping each collision to a specific ward, we can identify patterns and trends in collision occurrences relative to geographic boundaries. This allows for aggregating the number of collisions per ward, which is critical for targeted analysis, policy-making, and resource allocation. For example, high-collision wards can be prioritized for road safety improvements or public awareness campaigns. Additionally, integrating collision data with ward-specific attributes such as population, traffic volume, or road density enables more accurate predictive modeling and helps address traffic safety issues more effectively.


Converting collision data into a GeoDataFrame is essential for spatial analysis, as it allows for operations like spatial joins, overlays, and mapping. By creating a geometry column from LATITUDE and LONGITUDE, each collision is represented as a precise point in space. Assigning a Coordinate Reference System (CRS), such as EPSG:4326 (WGS84), ensures the data aligns accurately with other spatial datasets, like ward boundaries. This conversion enables mapping collisions to specific wards, visualizing spatial patterns, and ensuring compatibility with geospatial tools, making it a critical step for reliable and accurate analysis.

In [None]:
# Convert collision data to GeoDataFrame
collision_data["geometry"] = collision_data.apply(lambda row: Point(row["LONGITUDE"], row["LATITUDE"]), axis=1)
collision_data = gpd.GeoDataFrame(collision_data, geometry="geometry", crs="EPSG:4326")

In [None]:
# Ensure CRS matches
if collision_data.crs != ward_data.crs:
    collisio_data = collision_data.to_crs(ward_data.crs)

# Spatial join to find which ward each collision occurred in
collision_with_wards = gpd.sjoin(collision_data, ward_data, how="left", op="within")

# Display the results
print(collision_with_wards.head())

# Visualize on a map
map_wards = folium.Map(location=[43.7, -79.4], zoom_start=11)
ward_data_json = ward_data.to_json()
folium.GeoJson(ward_data_json, name="Wards").add_to(map_wards)

# Add collision points
for _, row in collision_data.iterrows():
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        radius=3,
        color="red",
        fill=True,
        fill_color="red",
        fill_opacity=0.6
    ).add_to(map_wards)

map_wards

In [None]:
# Perform the spatial join to find which ward each collision occurred in
collision_with_wards = gpd.sjoin(collision_data, ward_data, how="left", op="within")

# Drop unnecessary geometry columns for tabular display
collision_table = collision_with_wards.drop(columns=["geometry", "index_right"])

# Display the table in the notebook
collision_table.tail()  # Display first few rows of the table

## Exploratory Data Analysis with Collision and Ward Data (2021 to 2024)

The exploratory data analysis (EDA) linking collision data to ward data focuses on understanding how traffic incidents are distributed across different geographic regions. By spatially associating each collision with its corresponding ward, we aim to identify high-risk areas, observe regional patterns, and analyze variations in collision frequency. This analysis allows us to explore how factors such as road conditions, traffic density, and infrastructure might vary across wards, influencing collision rates. By integrating collision and ward data, we can generate valuable insights to prioritize safety improvements, allocate resources effectively, and support data-driven decision-making at a regional level.

### Collisions Across Wards from 2021 to 2024

When analyzing the number of collisions within each ward, patterns may emerge regarding high-collision areas versus wards with lower collision counts. This type of analysis is critical for understanding geographic trends in collision frequency. High-collision wards might correspond to regions with greater population density, higher traffic volumes, or complex road infrastructure, while low-collision wards might indicate less urbanized or lower-traffic areas.

Before diving into this section, we aim to refine our collision dataset by filtering for data between the years 2021 and 2024, providing a recent and focused analysis. This filtered dataset will then be spatially joined with ward boundaries to associate each collision with its respective ward. By organizing the results chronologically by ward number and calculating the percentage distribution of collisions across wards, we gain a clear understanding of collision patterns in different regions. This step is essential for identifying high-collision areas and drawing meaningful insights to inform regional safety strategies and resource allocation.

In [None]:
# Convert the 'DATE' column to datetime for filtering
collision_data["DATE"] = pd.to_datetime(collision_data["DATE"])

# Filter collisions from 2021 to 2024
collision_2021_2024 = collision_data[(collision_data["DATE"].dt.year >= 2021) & (collision_data["DATE"].dt.year <= 2024)]

# Perform the spatial join to associate collisions with wards
collision_with_wards = gpd.sjoin(collision_2021_2024, ward_data, how="left", predicate="within")

# Clean up the resulting DataFrame by dropping unnecessary geometry and index columns
collision_table = collision_with_wards.drop(columns=["geometry", "index_right"])

# Group collisions by ward number and name, maintaining chronological order
ward_collision_counts = collision_table.groupby(["AREA_L_CD", "AREA_NAME"]).size().reset_index(name="Number of Collisions")

# Sort wards in ascending order by ward number
ward_collision_counts = ward_collision_counts.sort_values(by="AREA_L_CD")

# Calculate the percentage of collisions for each ward
total_collisions = ward_collision_counts["Number of Collisions"].sum()
ward_collision_counts["Percentage of Collisions"] = (ward_collision_counts["Number of Collisions"] / total_collisions) * 100

# Display the summary DataFrame
ward_collision_counts.head(25)

### Collisions across all Wards every month from 2021 to 2024

To gain a deeper understanding of collision patterns across different wards over time, we will generate both a tabular and visual representation of the number of collisions occurring in each ward per month from 2021 to 2023. The table provides a clear breakdown of collision counts by ward, organized by year and month, offering a detailed view of temporal trends. Following this, a bar graph visually highlights these trends, enabling easy comparison of collision counts across wards and months. These insights are essential for identifying high-risk periods and areas, aiding in targeted road safety interventions and policy-making.

In [None]:
# Convert the 'DATE' column to datetime for filtering
collision_data["DATE"] = pd.to_datetime(collision_data["DATE"])

# Filter collisions from 2021 to 2023
collision_filtered = collision_data[
    (collision_data["DATE"].dt.year >= 2021) & (collision_data["DATE"].dt.year <= 2023)
]

# Extract year and month from the 'DATE' column
collision_filtered["Year"] = collision_filtered["DATE"].dt.year
collision_filtered["Month"] = collision_filtered["DATE"].dt.month

# Perform the spatial join to associate collisions with wards
collision_with_wards = gpd.sjoin(collision_filtered, ward_data, how="left", predicate="within")

# Group collisions by ward number, ward name, and month
ward_monthly_collisions = (
    collision_with_wards.groupby(["AREA_L_CD", "AREA_NAME", "Year", "Month"])
    .size()
    .reset_index(name="Number of Collisions")
)

# Pivot the table to display months as columns
ward_monthly_pivot = ward_monthly_collisions.pivot_table(
    index=["AREA_L_CD", "AREA_NAME"],
    columns=["Year", "Month"],
    values="Number of Collisions",
    fill_value=0
)

# Flatten multi-level columns for readability
ward_monthly_pivot.columns = [f"{year}-{month:02}" for year, month in ward_monthly_pivot.columns]

# Reset the index to make it a clean DataFrame
ward_monthly_pivot.reset_index(inplace=True)

# Display the first few rows of the resulting DataFrame
ward_monthly_pivot.head(25)