#  Project 1 Report Text Sections
---

## SECTION 1: Introduction 

```markdown
# NYC Housing & Neighborhood Factors - Exploratory Data Analysis

## Introduction

This analysis examines the relationship between housing prices and neighborhood characteristics in New York City. Understanding these relationships is crucial for real estate investors, policymakers, and urban planners seeking to identify factors that influence property values.

The analysis integrates multiple data sources to provide a comprehensive view of the NYC housing market:
- Housing value trends over 25+ years
- Crime patterns and public safety indicators
- Socioeconomic and demographic characteristics
- Economic indicators affecting affordability

By exploring these interconnected factors, we aim to uncover patterns and correlations that explain variation in housing prices across NYC neighborhoods.
```

---

## SECTION 2: Dataset Description 

```markdown
## Dataset Description

This analysis utilizes four primary datasets covering different aspects of NYC neighborhoods:

### 1. Housing Data (Zillow Home Value Index)
- **Source**: Zillow Research Data
- **Coverage**: 26,307 records across NYC ZIP codes
- **Time Period**: January 2000 to December 2025 (312 months)
- **Key Variables**: ZIP code, city, metro area, monthly home values
- **Description**: The Zillow Home Value Index (ZHVI) represents the typical home value for a given geographic area, smoothed and seasonally adjusted to show market trends over time.

### 2. Crime Data (NYPD Complaint Records)
- **Source**: NYC Open Data - NYPD Complaint Database
- **Coverage**: 577,674 crime incidents
- **Time Period**: 2023 (recent year)
- **Key Variables**: Complaint number, date, borough, offense type, law category
- **Description**: Official NYPD records of criminal complaints reported across all five NYC boroughs, providing insights into neighborhood safety and crime patterns.

### 3. Demographics Data (Census ACS)
- **Source**: U.S. Census Bureau American Community Survey
- **Coverage**: 33,772 census areas
- **Time Period**: 2023 5-year estimates
- **Key Variables**: Population, median income, age, education levels, employment rates
- **Description**: Socioeconomic and demographic characteristics of NYC neighborhoods from the Census Bureau's annual community survey.

### 4. Economic Indicators (FRED)
- **Source**: Federal Reserve Economic Data
- **Coverage**: 197 time points
- **Time Period**: January 2023 to present
- **Key Variables**: 30-year mortgage rates, Federal Funds rate
- **Description**: Macroeconomic indicators that influence housing affordability and market conditions, including interest rates and monetary policy measures.
```

---

## SECTION 3: Data Overview Findings 

```markdown
## Initial Data Exploration Findings

### Housing Data Characteristics
Our analysis focuses on NYC properties (State = "NY"), yielding comprehensive coverage of the metropolitan area. The dataset contains 321 columns representing geographic identifiers and monthly home values from January 2000 through December 2025. This longitudinal structure allows us to track price trends over more than two decades.

### Crime Data Characteristics
The crime dataset contains 577,674 complaint records with 36 variables describing each incident. Key fields include offense descriptions, borough locations, law categories (felony, misdemeanor, violation), and temporal information. This substantial sample size provides robust insights into crime patterns across NYC neighborhoods.

### Demographics Data Characteristics
The demographics dataset covers 33,772 geographic areas with 12 socioeconomic variables including population size, median household income, median age, educational attainment (bachelor's degrees), and employment statistics. This granular data enables neighborhood-level analysis of demographic factors.

### Economic Data Characteristics
The economic indicators dataset tracks two critical metrics over time: the 30-year fixed mortgage rate (MORTGAGE30US) and the Federal Funds rate (FEDFUNDS). These indicators reflect broader economic conditions affecting housing affordability and demand.
```

---

## SECTION 4: Visualization Interpretations 

### Chart 1: Housing Price Trends
```markdown
### Analysis: Housing Price Trends Over Time

This visualization displays housing value trajectories for five sample NYC ZIP codes over approximately 300 months (2000-2025). 

**Key Observations:**
- **Strong upward trend**: All ZIP codes show substantial price appreciation from 2000 to 2025
- **2008 Financial Crisis impact**: A notable dip is visible around the 100-month mark (2008-2009), reflecting the housing market crash
- **Recovery period**: Prices recovered and exceeded pre-crisis levels by approximately month 150 (2012-2013)
- **Price variation**: ZIP codes show different price levels, with the orange line (ZIP 11385) consistently commanding the highest values (reaching ~$850,000)
- **Recent acceleration**: The steepest growth appears in the most recent period (months 250-300), corresponding to 2020-2025

The diverse price trajectories suggest that location within NYC significantly affects both absolute home values and appreciation rates.
```

### Chart 2: Price Distribution
```markdown
### Analysis: Current Home Value Distribution

This histogram reveals the distribution of current NYC home values with key statistical insights:

**Key Observations:**
- **Right-skewed distribution**: The majority of homes cluster in the lower price ranges, with a long tail of high-value properties
- **Median value**: $306,920 (shown by red dashed line) represents the typical NYC home value
- **Concentration**: Most properties (>500) fall in the $200,000-$400,000 range
- **Outliers**: A small number of properties exceed $1 million, with some reaching $6 million
- **Affordability gap**: The median is substantially lower than the mean ($458,458), indicating high-value outliers pull the average upward

This distribution pattern is typical of urban housing markets where a subset of premium neighborhoods command significantly higher prices.
```

### Chart 3: Crime Types
```markdown
### Analysis: Most Common Crime Types

This horizontal bar chart ranks the 15 most frequent crime categories in NYC:

**Key Observations:**
- **Petit Larceny dominates**: With over 110,000 incidents, petit larceny (theft of property valued under $1,000) is by far the most common crime
- **Harassment ranks second**: Approximately 95,000 harassment incidents were reported
- **Property crimes prevail**: The top crimes are predominantly property-related (larceny, burglary, theft-fraud, motor vehicle theft)
- **Violent crimes present**: Assault and felony assault appear in the top 15, though at lower frequencies than property crimes
- **Quality of life offenses**: Categories like "offenses against public administration" and "vehicle and traffic laws" indicate substantial enforcement of quality-of-life violations

Understanding these crime patterns is important for neighborhood safety perceptions, which can influence housing demand and prices.
```

### Chart 4: Crime by Borough
```markdown
### Analysis: Crime Distribution Across NYC Boroughs

This bar chart compares total crime incidents across NYC's five boroughs:

**Key Observations:**
- **Brooklyn leads**: Approximately 162,000 incidents, making it the borough with the highest crime volume
- **Similar levels in major boroughs**: Manhattan (~136,000), Bronx (~130,000), and Queens (~124,000) show relatively comparable crime totals
- **Staten Island significantly lower**: Only ~24,000 incidents, roughly one-sixth of Brooklyn's total
- **Null values minimal**: A small category labeled "(null)" suggests high data quality with few missing borough identifications

**Important Context**: These raw counts don't account for population differences. Brooklyn and Queens have larger populations than Manhattan, so per-capita crime rates may tell a different story. Staten Island's low count aligns with its smaller, more suburban character.

The borough-level variation suggests that location choice significantly impacts neighborhood safety characteristics.
```

### Chart 5: Economic Trends
```markdown
### Analysis: Economic Indicators Over Time

This dual-line chart tracks two critical economic metrics from 2023 to 2026:

**Key Observations:**

**Mortgage Rates (Orange line - MORTGAGE30US):**
- **Peak period**: Rates peaked around 7.5% in mid-2023
- **Declining trend**: Gradual decrease to approximately 6.0-6.2% by early 2026
- **Volatility**: Notable fluctuations throughout the period, reflecting Federal Reserve policy changes
- **Affordability impact**: The decline from 7.5% to 6% represents significant improvement in monthly payment affordability

**Federal Funds Rate (Blue dots - FEDFUNDS):**
- **Peak around 5.3%**: Reached highest point in mid-2023
- **Recent decline**: Decreased to approximately 3.5-3.8% by 2026
- **Policy shift**: The downward trend suggests a transition from aggressive interest rate hikes to a more accommodative monetary policy

**Housing Market Implications:**
- Lower mortgage rates in 2025-2026 likely contributed to the housing price acceleration observed in Chart 1
- The spread between Fed Funds and mortgage rates narrowed, reflecting improved market conditions
- These favorable financing conditions may have supported continued demand despite high absolute price levels
```

### Chart 6: Box Plot
```markdown
### Analysis: Home Value Distribution Summary

This box plot provides a statistical summary of NYC home value distribution:

**Key Observations:**
- **Median (orange line)**: Approximately $300,000, confirming our earlier histogram analysis
- **Interquartile range (box)**: Most homes (50%) fall between roughly $200,000 and $500,000
- **Outliers (circles above)**: Numerous high-value properties extend well beyond the typical range
- **Extreme outliers**: Several properties valued at $3-6 million represent ultra-premium real estate
- **Compact lower range**: The lower whisker is relatively short, indicating few properties below $100,000
- **Long upper tail**: The extensive outlier distribution confirms significant price stratification

This visualization reinforces that while most NYC homes cluster in a moderate price range, a substantial luxury market segment exists, driving up average values and creating affordability challenges.
```

---

## SECTION 5: Key Findings Summary 

```markdown
## Summary of Key Findings

### 1. Housing Market Trends
- **Long-term appreciation**: NYC housing values have increased substantially over 25 years, with median values rising from approximately $175,000 in 2000 to $307,000 in 2025
- **Market resilience**: Following the 2008 financial crisis decline, the market fully recovered and continued upward growth
- **Recent acceleration**: 2020-2025 showed particularly strong price growth despite economic uncertainties
- **Geographic variation**: Significant price differences exist across ZIP codes, with some areas commanding 2-3x the values of others

### 2. Crime and Safety Patterns
- **High volume of incidents**: 577,674 reported crimes indicate active law enforcement and reporting
- **Property crime dominance**: Petit larceny, harassment, and grand larceny account for the majority of incidents
- **Borough differences**: Brooklyn leads in total incidents, while Staten Island shows significantly lower crime volumes
- **Implications for housing**: Crime patterns likely influence neighborhood desirability and property values, though this relationship requires further statistical analysis

### 3. Economic Context
- **Improving affordability conditions**: Mortgage rates declined from 7.5% to 6.0% between 2023-2026
- **Federal Reserve policy shift**: The Fed Funds rate decreased from 5.3% to under 4%, signaling a more accommodative stance
- **Market stimulus**: Lower borrowing costs in 2025-2026 likely supported continued housing demand and price growth
- **Rate volatility**: Fluctuations in economic indicators reflect ongoing policy adjustments and market uncertainties

### 4. Demographic Landscape
- **Extensive coverage**: Analysis includes 33,772 geographic areas with detailed socioeconomic data
- **Multiple dimensions tracked**: Income, education, employment, and age demographics available for neighborhood characterization
- **Foundation for further analysis**: This rich demographic data enables investigation of relationships between community characteristics and housing values

---

## Challenges Faced and Solutions

### Data Quality Issues
**Challenge**: Initial parquet files were corrupted and could not be read using standard pandas methods.
**Solution**: Coordinated with team member to re-export data in CSV format, which loaded successfully.

### Data Size and Processing
**Challenge**: Crime dataset contains 577,674 records, requiring careful memory management and processing time.
**Solution**: Utilized efficient pandas operations and focused visualizations on aggregated summaries rather than individual records.

### Geographic Alignment
**Challenge**: Housing data uses ZIP codes while some demographic data uses Census tracts (ZCTA), creating potential matching issues.
**Solution**: Noted for future work - cleaning and preprocessing team will need to address geographic standardization.

### Time Period Inconsistencies
**Challenge**: Different datasets cover different time periods (housing: 2000-2025, crime: 2023, demographics: 2023).
**Solution**: Focused analysis on recent periods where data overlap exists; noted temporal limitations for interpretation.

---

## Future Recommendations

### For Data Cleaning and Preprocessing Team:
1. **Geographic standardization**: Convert all geographic identifiers to a common standard (recommend ZIP codes)
2. **Temporal alignment**: Filter housing data to match crime and demographics time periods for valid comparisons
3. **Missing value treatment**: Systematic approach to handling missing demographic and crime data
4. **Outlier handling**: Decide whether extreme-value properties should be retained or treated separately

### For Feature Engineering Team:
1. **Crime rate normalization**: Calculate per-capita or per-area crime rates to enable fair borough comparisons
2. **Price change metrics**: Compute year-over-year and overall appreciation rates for each ZIP code
3. **Composite indicators**: Create neighborhood quality scores combining crime, demographics, and amenities
4. **Interaction features**: Explore relationships between median income, crime rates, and housing values

### For Future Analysis:
1. **Statistical testing**: Formal correlation and regression analysis to quantify relationships
2. **Spatial analysis**: Mapping techniques to visualize geographic patterns
3. **Time series forecasting**: Predict future price trends based on historical patterns
4. **Segmentation**: Cluster analysis to identify distinct neighborhood types

---

## Data Sources and Methodology

### Data Acquisition
All datasets were acquired through public sources and loaded using Python pandas library:
- Housing: Downloaded from Zillow Research Data portal
- Crime: Accessed via NYC Open Data API
- Demographics: Retrieved from Census Bureau ACS 5-year estimates
- Economic: Obtained from FRED API

### Analysis Environment
- **Platform**: Jupyter Notebook
- **Language**: Python 3.x
- **Libraries**: pandas, numpy, matplotlib, seaborn
- **Visualization approach**: Created 6 distinct charts to explore different data dimensions

### Analytical Approach
This exploratory analysis focused on:
1. Understanding data structure and quality
2. Calculating summary statistics
3. Visualizing distributions and trends
4. Identifying patterns and anomalies
5. Generating insights for downstream modeling
```

---

## SECTION 6: Team Contributions 

```markdown
## Team Member Contributions

### [Xiao Teng] - Exploratory Data Analysis
- Loaded and validated all four datasets (housing, crime, demographics, economic indicators)
- Performed comprehensive exploratory data analysis
- Created 6 visualizations examining housing trends, crime patterns, and economic conditions
- Generated summary statistics and identified key patterns in the data
- Documented findings and insights for downstream analysis phases
- Identified data quality issues and recommended preprocessing steps

### [Teammate 1 Name] - Data Acquisition
- Developed data collection scripts for multiple sources
- Acquired housing data from Zillow Research portal
- Retrieved crime records via NYC Open Data API
- Obtained demographic data from Census Bureau
- Collected economic indicators from FRED database
- Documented data acquisition methodology

### [Teammate 2 Name] - Data Cleaning and Preprocessing
- [To be completed by teammate]

### [Teammate 3 Name] - Feature Engineering
- [To be completed by teammate]
```