# Air Quality Analysis

**Objective:**  
Analyze simulated air quality data for Kanjurmarg station using data science techniques — to identify trends, peak pollution hours, and high-risk events.

**Dataset:**  
Simulated CSV resembling air quality parameters: `PM2.5`, `PM10`, `NO2`, `SO2`, `CO` from sensors every hour.


Load Libraries & Modules 

In [1]:
import pandas as pd
import numpy as np
import sys
import os

# Add the project root directory to sys.path
sys.path.append(os.path.abspath(os.path.join("..")))

# Custom modules
from src.data_ingestion import load_air_quality_data
from src.data_analysis import (
    compute_summary_statistics,
    compute_hourly_average,
    compute_daily_average,
    identify_pollution_peaks
)
from src.data_visualization import (
    plot_pm25_trend,
    plot_pollutant_distributions
)
import config

ImportError: cannot import name 'compute_summary_statistics' from 'src.data_analysis' (/workspaces/Air-Quality-Project-/src/data_analysis.py)

Load Data

In [None]:
df = load_air_quality_data()
df.head()


> Loaded simulated air quality data containing hourly records.


Summary Statistics

In [None]:
stats = compute_summary_statistics(df)
stats


> **Summary Statistics**: Provides min, max, mean, and quartiles for each pollutant.


Trend Visualization

In [None]:
plot_pm25_trend(df)


>**PM2.5 Trend**: Visualizes PM2.5 hourly variation over time.


Hourly & Daily Averages

In [None]:
hourly_avg = compute_hourly_average(df)
hourly_avg.head()


In [None]:
daily_avg = compute_daily_average(df)
daily_avg.head()


Pollution Distribution

In [None]:
plot_pollutant_distributions(df)


> **Pollutant Distribution**: Histogram distribution for each pollutant.


Detect Pollution Peaks 

In [None]:
thresholds = {
    "PM2.5": 100,
    "PM10": 150,
    "NO2": 60,
    "SO2": 30,
    "CO": 2
}
peaks = identify_pollution_peaks(df, thresholds)
peaks.head()


> **High Pollution Events**: Records where any pollutant crossed safe limits.


Final Observations & Summary 

# Summary

- **PM2.5** shows higher values during early morning and evening hours.
- Several high-risk pollution events detected exceeding safe limits.
- Average pollution levels vary hourly and daily, suggesting scope for traffic and industrial regulation.

> ✅ This notebook serves as a template for further real data integration.
