# BEST NEIGHBORHOOD IN PITTSBURGH  
**Team: The Code Warriors**

## Introduction

In this project, we set out to answer the question: **What is the best neighborhood in Pittsburgh for families?** We used real data from WPRDC to compare different neighborhoods and figure out which one is the most family-friendly.

We looked at three things that are important to families:
- **Low crime rates** (safer neighborhoods)
- **High public school enrollment** (shows schools are available and families live there)
- **Low vacancy rates** (fewer empty homes means a more stable and active community)

We picked these because they are easy to measure, and they give a good idea of what living in a neighborhood might be like for a family. For each neighborhood, we gave a score for each of the three factors, then averaged them to find the overall best neighborhood.

At first, we thought about using other data, like air quality, parks, or commute times. But we decided to focus on crime, school enrollment, and vacancy rates because:
1. The data was easy to find and clean  
2. These things are important to families  
3. They were simple to compare across neighborhoods

This project gave us the chance to use data to make real decisions, just like in the real world. We also learned how different types of information can help us understand what makes a neighborhood a good place to live.

## The Metric

To figure out the best neighborhood in Pittsburgh for families, we created a score based on three things that are important for family life: safety, schools, and housing stability. Each of these is measured using public data from different sources. We gave each neighborhood a score for each category, then averaged them to find the overall best one.

Here are the metrics we used and the data we worked with:

### 1. Crime Rate (Safety)
- **What we measured:** The number of police-reported incidents in each neighborhood.
- **Why it matters:** A lower crime rate means a safer place to live.
- **Dataset Used:** Police Incident Blotter  
- **Description:** Contains detailed records of police-reported incidents in Pittsburgh, including dates, locations, and types of offenses.
- **Link:** [Police Incident Blotter](https://data.wprdc.org/dataset/uniform-crime-reporting-data/resource/044f2016-1dfd-4ab0-bc1e-065da05fca2e)

### 2. School Enrollment (Education Access)
- **What we measured:** The total number of students enrolled in public schools per neighborhood.
- **Why it matters:** More students often means stronger school presence and access for families.
- **Dataset Used:** Pittsburgh Public Schools Enrollment by Neighborhood, School, and Feeder Pattern  
- **Description:** Shows enrollment numbers by neighborhood and school. Used to identify areas with higher school engagement and access.
- **Link:** [School Enrollment Data](https://data.wprdc.org/dataset/pittsburgh-public-schools-enrollment/resource/cbf270fd-891e-49bb-98fb-d6d52c260847)

### 3. Vacancy Rate (Housing Stability)
- **What we measured:** The percentage of vacant homes in each neighborhood.
- **Why it matters:** A lower vacancy rate usually means the area is more stable and has more long-term residents.
- **Dataset Used:** Pittsburgh Neighborhood Profiles  
- **Description:** Includes detailed demographic, economic, and housing data by neighborhood. Used to assess neighborhood stability via vacancy rate.
- **Link:** [Neighborhood Profiles](https://data.wprdc.org/dataset/ucsur_neighborhoodprofiles_2024/resource/a2d6468e-0229-4c6a-92c5-10814092e580)

## The Data

In [13]:
import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Load the dataset
crime_df = pd.read_csv("FinalProject/Big-Ideas-Final-Project/crime_data.csv")

# Step 2: Convert date column to datetime
crime_df["INCIDENTTIME"] = pd.to_datetime(crime_df["INCIDENTTIME"], errors="coerce")

# Step 3: Filter data to recent years (2023–2024)
crime_df = crime_df[crime_df["INCIDENTTIME"].dt.year >= 2023]

# Step 4: Group by neighborhood and count incidents
crime_counts = crime_df.groupby("INCIDENTNEIGHBORHOOD").size().reset_index(name="crime_count")

# Step 5: Normalize (lower crime = higher score)
crime_counts["crime_score"] = 1 - (
    (crime_counts["crime_count"] - crime_counts["crime_count"].min()) /
    (crime_counts["crime_count"].max() - crime_counts["crime_count"].min())
)

# Optional: Rename for merge later
crime_counts.rename(columns={"INCIDENTNEIGHBORHOOD": "neighborhood"}, inplace=True)

# Preview
crime_counts.head()

FileNotFoundError: [Errno 2] No such file or directory: 'FinalProject/Big-Ideas-Final-Project/crime_data.csv'

In [7]:
# Step 1: Load the dataset
school_df = pd.read_csv("data/school_enrollment.csv")

# Step 2: Drop missing values and clean
school_df = school_df.dropna(subset=["total_students_enrolled"])

# Step 3: Group by neighborhood and sum enrollment
school_totals = school_df.groupby("neighborhood")["total_students_enrolled"].sum().reset_index()

# Step 4: Normalize (more students = higher score)
school_totals["school_score"] = (
    (school_totals["total_students_enrolled"] - school_totals["total_students_enrolled"].min()) /
    (school_totals["total_students_enrolled"].max() - school_totals["total_students_enrolled"].min())
)

# Preview
school_totals.head()


FileNotFoundError: [Errno 2] No such file or directory: 'data/school_enrollment.csv'

In [6]:
# Step 1: Load the dataset
vacancy_df = pd.read_csv("data/neighborhood_profiles.csv")

# Step 2: Filter to neighborhoods only
vacancy_df = vacancy_df[vacancy_df["GeographyType"] == "neighborhood"]

# Step 3: Select needed columns
vacancy_data = vacancy_df[["NeighborhoodGroup", "Var_2022_vacancy_Per_2"]].copy()
vacancy_data.columns = ["neighborhood", "vacancy_percent"]

# Step 4: Normalize (lower vacancy = higher score)
vacancy_data["vacancy_score"] = 1 - (
    (vacancy_data["vacancy_percent"] - vacancy_data["vacancy_percent"].min()) /
    (vacancy_data["vacancy_percent"].max() - vacancy_data["vacancy_percent"].min())
)

# Preview
vacancy_data.head()


FileNotFoundError: [Errno 2] No such file or directory: 'data/neighborhood_profiles.csv'

In [8]:
# Step 1: Merge all datasets on "neighborhood"
merged = pd.merge(crime_counts[["neighborhood", "crime_score"]],
                  school_totals[["neighborhood", "school_score"]],
                  on="neighborhood", how="inner")

merged = pd.merge(merged,
                  vacancy_data[["neighborhood", "vacancy_score"]],
                  on="neighborhood", how="inner")

# Step 2: Calculate average of the three scores
merged["final_score"] = merged[["crime_score", "school_score", "vacancy_score"]].mean(axis=1)

# Step 3: Sort by final score
final_rankings = merged.sort_values("final_score", ascending=False).reset_index(drop=True)

# Show top 10
final_rankings.head(10)

NameError: name 'crime_counts' is not defined

In [9]:
plt.figure(figsize=(10, 6))
plt.barh(final_rankings.head(10)["neighborhood"], final_rankings.head(10)["final_score"])
plt.xlabel("Family-Friendliness Score")
plt.title("Top 10 Best Neighborhoods for Families in Pittsburgh")
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

NameError: name 'final_rankings' is not defined

<Figure size 1000x600 with 0 Axes>