# Readymade Data Module Assignment
## NYC 311 Service Request Analysis

**Student Name:** Nick Pisarczyk

**U-M Unique Name:** npisar

**Research Question:** State your chosen research question here

---

**BEFORE YOU START:**
1. Read the assignment instructions on Canvas carefully.
2. Make a copy of this notebook and work on your own copy.
3. Understand the dataset before you start cleaning and analyzing. NYC Open Data has a nice portal and a data dictionary for exploring their datasets.
4. NYC 311 is a **very large** dataset. When you are fetching data from the portal or API, we would recommend you to first think about your research questions and start with a small subset of the data and then increase the size of the data as you get more comfortable with the data. Generally, you do **not** need to use the entire dataset to answer your research question.
5. This notebook serves as a template. You can add more cells or make adjustments as you see fit. But make sure to keep all the sections mentioned in the assignment instructions. Also, format your notebook properly for better readability.

## Data Statement

Describe your data source here. Include:
- Where you obtained the data (URL or API endpoint)
- What subset you're analyzing (dates, geography, etc.)
- Any filters or sampling you applied
- File size/number of records

## 1. Setup and Data Loading

In [None]:
# Import libraries
# You can import any libraries you may need
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams'figure.figsize' = (12, 6)

# Display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

### Option A: Load from downloaded CSV file

**Note**: If you are running the notebook using `colab` kernel, you **cannot** directly import the data from your own laptop. Please see the class repo README files for more details

In [None]:
# Load your downloaded data
# df = pd.read_csv('311_Service_Requests.csv', 
#                  parse_dates='Created Date', 'Closed Date',
#                  low_memory=False)

### Option B: Load from Socrata API (recommended for smaller datasets)

In [None]:

# !pip install sodapy # UNCOMMENT THIS LINE IF YOU NEED TO INSTALL sodapy: 

# from sodapy import Socrata

# # Example: Get 100,000 records from 2023 for Brooklyn
# client = Socrata("data.cityofnewyork.us", None)
# results = client.get("erm2-nwe9", 
#                      where="created_date >= '2023-01-01' AND borough = 'BROOKLYN'",
#                      limit=100000)
# df = pd.DataFrame.from_records(results)

# # Convert date columns
# df'created_date' = pd.to_datetime(df'created_date')
# df'closed_date' = pd.to_datetime(df'closed_date')

## 2.Data Description

You can describe the data in many ways. Here are some baseline requirements:
- Display basic information about the dataset (what are the relevant variables? What are their types? How many observations are there?)
- Conduct summary statistics of the relevant variables
- Check for missing values

In [None]:
# You can have as many cells as you want

## 3. Data Cleaning

Document your cleaning decisions and rationale here

In [None]:
# Example cleaning steps (customize based on your needs)

# 1. Remove rows with missing essential data
# df_clean = df.dropna(subset='created_date', 'complaint_type')

# 2. Filter to specific time period if needed
# df_clean = df_clean(df_clean'created_date' >= '2023-01-01') & 
#                     (df_clean'created_date' < '2024-01-01')

# 3. Create derived variables
# Example: Calculate response time
# df_clean'response_time_hours' = (
#     (df_clean'closed_date' - df_clean'created_date').dt.total_seconds() / 3600
# )

# Example: Extract temporal features
# df_clean'hour' = df_clean'created_date'.dt.hour
# df_clean'day_of_week' = df_clean'created_date'.dt.dayofweek
# df_clean'month' = df_clean'created_date'.dt.month

print(f"Original dataset: {len(df)} records")
# print(f"Cleaned dataset: {len(df_clean)} records")
# print(f"Removed: {len(df) - len(df_clean)} records ({((len(df) - len(df_clean))/len(df)*100):.1f}%)")

## 4.Exploratory Data Analysis

Add narrative about what you're exploring, why and what you've found

In [None]:
# Example: Most common complaint types
# complaint_counts = df_clean'complaint_type'.value_counts().head(15)
# plt.figure(figsize=(12, 6))
# complaint_counts.plot(kind='barh')
# plt.xlabel('Number of Requests')
# plt.ylabel('Complaint Type')
# plt.title('Top 15 Most Common 311 Complaint Types')
# plt.tight_layout()
# plt.show()

In [None]:
# Example: Temporal patterns
# requests_by_month = df_clean.groupby('month').size()
# plt.figure(figsize=(12, 6))
# requests_by_month.plot(kind='bar')
# plt.xlabel('Month')
# plt.ylabel('Number of Requests')
# plt.title('311 Requests by Month')
# plt.xticks(rotation=0)
# plt.tight_layout()
# plt.show()

## 5.Research Question Analysis

This is the core of your assignment. Document your analytical approach here. You can add any cells if you see fit.

In [None]:
# Your focused analysis goes here
# This will vary significantly based on your research question. You can organize your analysis as you like. You can have as many cells as you want.

### Statistical Testing (if applicable)

In [None]:
# Example: Statistical tests
# from scipy import stats

# # t-test example
# group1 = df_cleandf_clean'borough' == 'MANHATTAN''response_time_hours'.dropna()
# group2 = df_cleandf_clean'borough' == 'BRONX''response_time_hours'.dropna()
# t_stat, p_value = stats.ttest_ind(group1, group2)
# print(f"t-statistic: {t_stat:.3f}")
# print(f"p-value: {p_value:.3f}")

## Key Visualizations

Create at least 3 polished visualizations that answer your research question

In [None]:
# Visualization 1
# Your code here

Interpretation of Visualization 1

In [None]:
# Visualization 2
# Your code here

Interpretation of Visualization 2

In [None]:
# Visualization 3
# Your code here

Interpretation of Visualization 3

## Written Summary

Summarize your key findings here. What patterns did you discover? What can you conclude?

### Research Question and Motivation

- Why is this question interesting? 
- What might we learn?

### Methods

Describe your data source, cleaning steps, and analytical approach

### Findings

Summarize key patterns and statistical results (refer to your key visualizations)

### Limitations

Discuss methodological limitations. What are the potential biases in 311 data? What alternative explanations exist for your findings? What can and cannot be concluded from this analysis?

### Ethical Considerations

Reflect on the ethical implications of using 311 data:
- Who is represented in this data? Who might be underrepresented?
- What are potential privacy concerns?
- How might this analysis be used or misused?
- What are the implications for equity and justice?

## AI Appendix (if applicable)

If you used AI during this assignnment, explain
1. what part of the work it was used for; 
2. what AI tools you used; 
3. the prompts you used; 
4. how you analyzed the AI work for accuracy; and, 
5. steps you took to rework and revise your final documents so that they were both factually accurate and reflected your own voice and style.


## Submission

Submit your assignment as .ipynb file. Make sure to double check with the assignment instructions on Canvas.