# Project 1

Today, I using the pandas and Python to analyze NYPD arrest data from 2025, focusing on which precincts have the most arrests. New York City has 77 police precincts, numbered from 1 to 123 (with some gaps). By analyzing the precinct numbers, we can understand the distribution of arrests across the city.

# Download the NYPD Arrest Data CSV from online website

URL: https://catalog.data.gov/dataset/nypd-arrest-data-year-to-date

# Part A: Using Pandas to calculate the Mean, Median, and Mode

In [None]:
# Step 1- Load the data
import pandas as pd

# read the CSV file
df = pd.read_csv('NYPD_Arrest_Data__Year_to_Date_.csv')

# display the first few rows of the dataframe
print(df.head())

# display the column names
print(df.columns.tolist())

# display the number of rows and columns
print(df.shape)

# Step 2- I select a numeric column, I am going to select "ARREST_PRECINCT"!
Numeric_col = 'ARREST_PRECINCT'

# covert the selected column to numberic type
df[Numeric_col] = pd.to_numeric(df[Numeric_col], errors='coerce')


# Step 3- Use Pandas to compute "the mean", "the median", and "the mode" of the "ARREST_PRECINCT" column
# compute the mean of the arrest_age column
mean= df ['ARREST_PRECINCT'].mean()

# compute the median of the arrest_age column
median = df ['ARREST_PRECINCT'].median()

# compute the mode of the arrest_age column
mode = df ['ARREST_PRECINCT'].mode()[0]

print(f"Mean precinct: {mean:.2f}")
print(f"Median precinct: {median:.2f}")
print(f"Mode precinct: {mode}")

   ARREST_KEY ARREST_DATE  PD_CD                           PD_DESC  KY_CD  \
0   298760433  01/02/2025    782          WEAPONS, POSSESSION, ETC  236.0   
1   299030225  01/07/2025    105                 STRANGULATION 1ST  106.0   
2   299127494  01/08/2025    849    NY STATE LAWS,UNCLASSIFIED VIO  677.0   
3   299188536  01/09/2025    259  CRIMINAL MISCHIEF,UNCLASSIFIED 4  351.0   
4   299533742  01/16/2025    155                            RAPE 2  104.0   

                        OFNS_DESC    LAW_CODE LAW_CAT_CD ARREST_BORO  \
0               DANGEROUS WEAPONS  PL 2650101          M           Q   
1                  FELONY ASSAULT  PL 1211200          F           M   
2                OTHER STATE LAWS  LOC00000V0          V           K   
3  CRIMINAL MISCHIEF & RELATED OF  PL 1450001          M           M   
4                            RAPE  PL 1303001          F           K   

   ARREST_PRECINCT  JURISDICTION_CODE AGE_GROUP PERP_SEX PERP_RACE  \
0              115                

# Part B: Comparing with hard-coding approach by using Python

In [None]:
# I am using the hard-way to calculate mean, median, and mode!
import csv

# read the CSV file
df = "NYPD_Arrest_Data__Year_to_Date_.csv"

# read the column names
column_name = 'ARREST_PRECINCT'

# read the file manually
value = []
with open(df, 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        val = row[column_name]
        try:
            num = float(val)
            value.append(num)
        except ValueError:
            continue  # skip non-numeric rows

print(f"Read {len(value)} numeric values from {column_name}")
        
    
# compute the mean, median, and mode of the "ARREST_PRECINCT" column
# Mean = sum / count
total = 0
for v in value:
    total += v

mean = total / len(value)
print(f"Mean: {mean:.2f}")

# Median = middle value when sorted
sorted_values = sorted(value)
n = len(sorted_values)
if n % 2 == 1:
    median = sorted_values[n//2]
else:
    median = (sorted_values[n//2-1]+sorted_values[n//2])/2
print(f"median: {median:.2f}")

# Mode = most frequent value
from collections import Counter
counter = counter = Counter (value)
mode_data = counter.most_common(1)
mode = mode_data [0][0]
print (f"mode: {mode:.2f}")


Read 212486 numeric values from ARREST_PRECINCT
Mean: 63.08
median: 62.00
mode: 14.00


# Part C: Data Visualization
## Drawing a bar chart to show the number of arrests in each precinct
### # I am creating a table and a bar chart to visualize the top 10 arrest precincts by number of arrests!

In [None]:
import csv

# Load the data
path = "NYPD_Arrest_Data__Year_to_Date_.csv"
column_name = "ARREST_PRECINCT"

# Step 1: Read counts of arrests per precinct
counts = {}
with open(path, "r", newline="", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        try:
            val = int(float(row[column_name]))
            counts[val] = counts.get(val, 0) + 1
        except (ValueError, KeyError):
            continue

# Step 2: Get top 10 precincts by arrest count
top_precincts = sorted(counts.items(), key=lambda x: x[1], reverse=True)[:10]

# Step 3: Print as a formatted text table
print("=" * 45)
print(f"{'Rank':<5}{'Precinct':<12}{'Arrest Count':>15}")
print("=" * 45)
for i, (precinct, cnt) in enumerate(top_precincts, start=1):
    print(f"{i:<5}{precinct:<12}{cnt:>15,}")
print("=" * 45)

# Step4: Create bar chart
# take top 10 by count
top = sorted(counts.items(), key=lambda kv: kv[1], reverse=True)[:10]
max_count = top[0][1] if top else 0

def bar(count, max_count, width=40, fill="*"):
    if max_count == 0:
        return ""
    filled = int(round(count / max_count * width))
    return fill * filled

print("Top 10 Precincts — Arrest Counts (bar chart)")
for precinct, cnt in top:
    print(f"{precinct:>3}: {bar(cnt, max_count)} {cnt}")

Rank Precinct       Arrest Count
1    14                    7,947
2    40                    7,423
3    75                    6,953
4    103                   6,010
5    44                    5,978
6    46                    5,104
7    120                   4,877
8    73                    4,580
9    47                    4,560
10   110                   4,409
Top 10 Precincts — Arrest Counts (bar chart)
 14: **************************************** 7947
 40: ************************************* 7423
 75: *********************************** 6953
103: ****************************** 6010
 44: ****************************** 5978
 46: ************************** 5104
120: ************************* 4877
 73: *********************** 4580
 47: *********************** 4560
110: ********************** 4409
