# Project: Investigating New Orleans 311 Data

## Student Name
Syeda Mah Noor Asad

## Webpage
https://syedamahnoorasad.github.io/Investigating-Nola-311-Data/

## Dataset Choice
I am using NOLA 311 Calls/Requests Dataset from 2012-2026. The dataset consist of records produced by the residents of New Orleans who submit their non-emergency problems to the city government. The data is available publicly at: https://data.nola.gov/City-Administration/311-OPCD-Calls-2012-Present-/2jgv-pqrq/data_preview. I downloaded the data in CSV format which I later import into a dataframe using pandas.

I am picking this dataset due to these reasons:
1. It relates to New Orleans so it hits close to home.
2. This data is massive and at the time of downloading had over 900,000 records. 
3. It is geenrally fairly organized but has issues with consistency, tidiness, and missing fields which make it a great candidate for pre-processing.
4. Within this data there are several different categories of issues reported. Such as entries pertaining to sewerage, parking, accessibility, electricity and so on. These sub-categories can be relevant and interesting. 
5. It is real, and open-government data that can lead to practical insights depicting real world trends. 
6. My own Phd research is based around civic data and I think this is very in line and relevant to that. It will be helpful for me to investigate a data like this.

## Project Plan
For this project, I want to build a public, end-to-end walkthrough that starts with data extraction, continues through cleaning and exploratory analysis, and ends with a clear managerial insight. I am currently considering three realistic datasets and will finalize one after deeper quality checks.


Question to answer: *What are the peak demand windows by station cluster, and where should operators rebalance bikes to reduce stockouts?*


## Workflow

### Import data from the database

In [5]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style='whitegrid')

In [6]:
## Import data from the dataset csv file

nola_311 = pd.read_csv('311_OPCD_Calls_2012-Present.csv')

## Display the first few rows of the dataset to understand its structure
nola_311.head()

Unnamed: 0,Service Request #,Request Type,Request Reason,Date Created,Date Modified,Date Closed,Request Status,Responsible Agency,Address,Council District,Status,Contractor,Contractor Action,RowID,X,Y,Longitude,Latitude,Location
0,2021-847416,Mayor's Request,Requests to the Mayor,12/10/2021 09:22:27 PM,07/08/2022 02:30:48 PM,07/08/2022 09:30:48 AM,Closed,Executive Office of the Mayor,,,Resolved,,,847416,,,0.0,0.0,"(0.0, 0.0)"
1,2022-858955,Tax and Revenue,Occupational License Tax,02/04/2022 03:39:45 PM,02/22/2022 07:09:55 AM,02/22/2022 01:09:55 AM,Closed,Bureau of Revenue,,,Resolved,,,858955,,,0.0,0.0,"(0.0, 0.0)"
2,2024-1145120,Traffic Safety,Request for Traffic Calming due to Speeding,10/30/2024 12:18:29 PM,12/22/2024 04:52:36 PM,,Pending,Department of Public Works,1 Bamboo Rd,A,Pending,,,1145120,3663515.88941,539789.54625,-90.124582,29.978768,"(29.978768001649772, -90.12458247242685)"
3,2024-1145799,Trash/Recycling,Missed Trash Pick-Up,11/02/2024 07:58:56 AM,12/22/2024 04:51:17 PM,11/02/2024 04:09:40 AM,Closed,Department of Sanitation,2608 Magnolia St,B,Invalid Request,IV Waste,Invalid Request,1145799,3674205.35803,526080.457854,-90.09129,29.940758,"(29.940758205316882, -90.091290391251)"
4,2024-1145838,Roads and Streets,Push-up/Pavement Expansion,11/02/2024 03:10:10 PM,12/22/2024 04:48:43 PM,,Pending,Department of Public Works,725 Voisin St,A,Pending,,,1145838,3668424.02055,543287.21947,-90.108963,29.988242,"(29.988242136809447, -90.10896265604138)"


In [None]:
# Interesting stat
region_sales = tidy_df.groupby('region', as_index=False)['net_sales'].sum().sort_values('net_sales', ascending=False)
top_region = region_sales.iloc[0]
print(f"Top region by net sales: {top_region['region']} (${top_region['net_sales']:.2f})")
region_sales

In [None]:
# Graph: monthly net sales trend
monthly = tidy_df.groupby('month', as_index=False)['net_sales'].sum()

plt.figure(figsize=(7, 4))
ax = sns.lineplot(data=monthly, x='month', y='net_sales', marker='o', linewidth=2.5, color='#1f77b4')
ax.set_title('Monthly Net Sales (Draft Dataset)')
ax.set_xlabel('Month')
ax.set_ylabel('Net Sales ($)')
plt.tight_layout()
plt.show()

## ETL Notes and Challenges
- Product strings had combined information (category + item), so they were split into separate tidy columns.
- Return records needed a business rule to avoid overstating revenue; this draft sets returned transactions to zero net sales.
- Dates were parsed and standardized to monthly periods for trend analysis.

In the final project, I will apply the same workflow to the chosen real dataset, add more robust validation checks, and extend the analysis with deeper comparisons (category-level seasonality, return-rate risk, and operational recommendations).