<a href="https://colab.research.google.com/github/stheria4/sds510/blob/master/Module3Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Data Transformation Basics: Phoenix Crime Data Analysis

**Name:** Sean Theriault
**Student ID:** stheria4
**Course:** SDS 510 â€“ Python for Data Wrangling  
**Date:** 11/4/2025
**Project:** Module 3 - Basics: Crime Data Grouping

## File Import in Google Colab

When using Google Colaboratory, the CSV file was uploaded directly from my local machine using the following code.  

In [2]:
# from google.colab import files

# uploaded = files.upload()

Saving crime-data_crime-data_crimestat.csv to crime-data_crime-data_crimestat.csv


## Import Libraries and Load Data

This section imports the necessary Python libraries and loads the Phoenix crime dataset from the local data folder into a Pandas DataFrame.

In [3]:
# Import libraries
import pandas as pd
import numpy as np

# Load the data
crime_df = pd.read_csv('data/crime-data_crime-data_crimestat.csv')

  crime_df = pd.read_csv(file_path)


## Data Check

This section performs an initial exploration of the dataset to understand column names, content,etc:

In [4]:
# Data Check
print(crime_df.head())
print(crime_df.info())
print(crime_df.describe())
print(crime_df.isnull().sum())

        INC NUMBER        OCCURRED ON        OCCURRED TO   UCR CRIME CATEGORY  \
0  201600000594484  11/01/2015  00:00                NaN                 RAPE   
1  201500002102327  11/01/2015  00:00  11/01/2015  09:00        LARCENY-THEFT   
2  201500002168686  11/01/2015  00:00  11/11/2015  09:30        LARCENY-THEFT   
3  201500002102668  11/01/2015  00:00  11/01/2015  11:50  MOTOR VEHICLE THEFT   
4  201600000052855  11/01/2015  00:00  01/09/2016  00:00  MOTOR VEHICLE THEFT   

             100 BLOCK ADDR      ZIP         PREMISE TYPE  GRID  
0         13XX E ALMERIA RD  85006.0  SINGLE FAMILY HOUSE  BD30  
1            51XX N 15TH ST  85014.0            APARTMENT  BJ30  
2       14XX E HIGHLAND AVE  85014.0          PARKING LOT  BI30  
3            69XX W WOOD ST  85043.0  SINGLE FAMILY HOUSE  AF12  
4  N 43RD AVE & W CACTUS RD  85029.0  SINGLE FAMILY HOUSE  DA19  


## Convert Date Column to Datetime

This section converts the Date column to a Pandas datetime object.  
This ensures that date-based operations, such as grouping by month or calculating trends over time, can be performed correctly.

In [6]:
# Convert date column to datetime
if 'Date' in crime_df.columns:
    crime_df['Date'] = pd.to_datetime(crime_df['Date'])

## Grouping and Trends

This section analyzes the data by:

- Counting crimes by Premise Type  
- Counting crimes by ZIP code  
- Summarizing violent vs non-violent crimes
- Showing monthly crime trends by ZIP code

In [11]:
# Different grouping

# Count of crimes by Premise Type
if 'PREMISE TYPE' in crime_df.columns:
    location_counts = crime_df.groupby('PREMISE TYPE').size().sort_values(ascending=False)
    print("\nCrimes by Premise Type:")
    print(location_counts)

# Count of crimes by ZIP
if 'ZIP' in crime_df.columns:
    zip_counts = crime_df.groupby('ZIP').size().sort_values(ascending=False)
    print("\nCrimes by ZIP:")
    print(zip_counts)

# Violent vs Non-Violent crimes
violent_crimes = ['Assault','Robbery','Homicide']  # adjust based on UCR CRIME CATEGORY names
if 'UCR CRIME CATEGORY' in crime_df.columns and 'ZIP' in crime_df.columns:
    crime_df['Violent'] = crime_df['UCR CRIME CATEGORY'].isin(violent_crimes)
    violent_counts = crime_df.groupby('ZIP')['Violent'].sum().sort_values(ascending=False)
    print("\nViolent Crimes by ZIP:")
    print(violent_counts)

# Trend analysis by month (using OCCURRED ON)
if 'OCCURRED ON' in crime_df.columns and 'ZIP' in crime_df.columns:
    crime_df['OCCURRED ON'] = pd.to_datetime(crime_df['OCCURRED ON'], errors='coerce')
    crime_df.set_index('OCCURRED ON', inplace=True)
    monthly_trends = crime_df.groupby([pd.Grouper(freq='M'),'ZIP']).size().unstack(fill_value=0)
    print("\nMonthly Crime Trends (first 5 rows):")
    print(monthly_trends.head())


Crimes by Premise Type:
PREMISE TYPE
SINGLE FAMILY HOUSE            85268
APARTMENT                      83455
PARKING LOT                    47689
STREET / ROADWAY / SIDEWALK    39956
DEPARTMENT / DISCOUNT STORE    31726
                               ...  
REST AREA                         16
STOREROOM / SHED                  14
LAKE / WATERWAY / BEACH           10
MILITARY INSTALLATION              7
TRIBAL LANDS                       7
Length: 98, dtype: int64

Crimes by ZIP:
ZIP
85015.0    30599
85008.0    28921
85051.0    26863
85009.0    26770
85041.0    25356
           ...  
85256.0        1
85249.0        1
85361.0        1
85355.0        1
85390.0        1
Length: 117, dtype: int64

Violent Crimes by ZIP:
ZIP
85003.0    0
85004.0    0
85006.0    0
85007.0    0
85008.0    0
          ..
85388.0    0
85390.0    0
85392.0    0
85395.0    0
85396.0    0
Name: Violent, Length: 117, dtype: int64
