
# Day 04 â€“ Structured vs Unstructured Data  
## Analytics Cookbook using NYC 311 Service Requests Dataset

This notebook demonstrates how structured and unstructured data coexist in real operational systems and how combining both leads to decision-ready analytics.

Dataset: NYC 311 Service Requests (Kaggle)


## 1. Environment Setup

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


## 2. Load the Dataset

In [None]:

df = pd.read_csv("311_Service_Requests.csv", low_memory=False)
df.head()


## 3. Identify Structured and Unstructured Fields

In [None]:

df.columns


## 4. Select Relevant Columns

In [None]:

cols = ["Created Date", "Complaint Type", "Borough", "Descriptor"]
df = df[cols]
df.head()


## 5. Prepare Structured Data

In [None]:

df["Created Date"] = pd.to_datetime(df["Created Date"], errors="coerce")
df = df.dropna(subset=["Created Date", "Complaint Type"])
df["Year"] = df["Created Date"].dt.year
df["Month"] = df["Created Date"].dt.to_period("M")
df.head()


## 6. Structured Analytics

In [None]:

top_complaints = df["Complaint Type"].value_counts().head(10)
top_complaints


In [None]:

top_complaints.plot(kind="bar", figsize=(10,5))
plt.title("Top 10 Complaint Types")
plt.ylabel("Number of Complaints")
plt.xlabel("Complaint Type")
plt.show()


## 7. Unstructured Data Exploration

In [None]:

df["Descriptor"].head(10)


## 8. Clean Text Data

In [None]:

df["Descriptor"] = (
    df["Descriptor"]
    .astype(str)
    .str.lower()
    .str.replace(r"[^a-z\s]", "", regex=True)
)
df["Descriptor"].head(10)


## 9. Text Analysis within a Complaint Category

In [None]:

category = top_complaints.index[0]
subset = df[df["Complaint Type"] == category]
subset["Descriptor"].head(10)


In [None]:

from collections import Counter
words = " ".join(subset["Descriptor"].dropna()).split()
Counter(words).most_common(20)



## Key Takeaways

Structured data explains scale and trends.  
Unstructured data explains causes and context.  
Combining both enables actionable analytics.
