
# Day 5 â€“ Data Quality Analytics Cookbook  
## Accuracy, Completeness, Timeliness, Consistency  

This notebook supports **Day 5** of the 30-Day Data Analytics series.
Focus: practical **data quality validation** before analytics.

Dataset used: NYC 311 Service Requests (Kaggle).



## Environment Setup


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt



## Load Dataset


In [None]:

df = pd.read_csv("311_Service_Requests.csv", low_memory=False)
df.head()



## Select Decision-Relevant Columns


In [None]:

cols = ["Created Date", "Complaint Type", "Borough", "Status"]
df = df[cols]
df.info()



## Accuracy Check


In [None]:

df["Created Date"] = pd.to_datetime(df["Created Date"], errors="coerce")
df["Created Date"].isna().sum()



## Completeness Check


In [None]:

df.isnull().mean().sort_values(ascending=False)



## Timeliness Check


In [None]:

df["Created Date"].min(), df["Created Date"].max()



## Consistency Check


In [None]:

df["Complaint Type"].value_counts().head(15)



## Data Quality Metrics Summary


In [None]:

{
    "Total Records": len(df),
    "Missing Complaint Type (%)": df["Complaint Type"].isna().mean() * 100,
    "Missing Created Date (%)": df["Created Date"].isna().mean() * 100,
    "Duplicate Records": df.duplicated().sum()
}



## Key Takeaways

- Poor data quality misleads decisions  
- Quality checks must precede analytics  
- Simple metrics prevent major failures  
