# üå± Environmental Impact Analysis

> **PM Accelerator Mission**: "By making industry-leading tools and education available to individuals from all backgrounds, we level the playing field for future PM leaders."

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/moazmo/weather-trend-forecasting/blob/main/presentation/05_Environmental_Impact.ipynb)

This notebook analyzes:
1. Air Quality Overview (AQI, PM2.5, PM10)
2. Pollutant Distribution by Region
3. Weather-Pollution Correlations
4. Seasonal Pollution Patterns
5. Temperature Impact on Air Quality

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Load raw data (contains air quality columns)
df = pd.read_csv('../data/raw/GlobalWeatherRepository.csv', parse_dates=['last_updated'])

# Air quality columns
aq_cols = ['air_quality_Carbon_Monoxide', 'air_quality_Ozone', 'air_quality_Nitrogen_dioxide',
           'air_quality_Sulphur_dioxide', 'air_quality_PM2.5', 'air_quality_PM10',
           'air_quality_us-epa-index', 'air_quality_gb-defra-index']

print(f"üìä Dataset: {len(df):,} records")
print(f"üåç Countries: {df['country'].nunique()}")
print(f"\nüìä Air Quality Data Availability:")
for col in aq_cols:
    pct = (df[col].notna().sum() / len(df)) * 100
    print(f"   {col}: {pct:.1f}%")

## 1. Air Quality Overview

### US EPA Air Quality Index Distribution

In [None]:
# EPA AQI Categories
def get_aqi_category(aqi):
    if pd.isna(aqi):
        return 'Unknown'
    elif aqi <= 50:
        return 'Good'
    elif aqi <= 100:
        return 'Moderate'
    elif aqi <= 150:
        return 'Unhealthy for Sensitive'
    elif aqi <= 200:
        return 'Unhealthy'
    elif aqi <= 300:
        return 'Very Unhealthy'
    else:
        return 'Hazardous'

df['aqi_category'] = df['air_quality_us-epa-index'].apply(get_aqi_category)

# Distribution
aqi_dist = df['aqi_category'].value_counts()
colors = {'Good': '#00E400', 'Moderate': '#FFFF00', 'Unhealthy for Sensitive': '#FF7E00',
          'Unhealthy': '#FF0000', 'Very Unhealthy': '#8F3F97', 'Hazardous': '#7E0023', 'Unknown': '#999'}

fig = px.pie(
    values=aqi_dist.values, names=aqi_dist.index,
    title='üå¨Ô∏è US EPA Air Quality Index Distribution',
    color=aqi_dist.index,
    color_discrete_map=colors
)
fig.update_layout(template='plotly_dark')
fig.show()

### PM2.5 and PM10 Levels

In [None]:
# PM2.5 and PM10 distribution
fig = make_subplots(rows=1, cols=2, subplot_titles=['PM2.5 Distribution', 'PM10 Distribution'])

fig.add_trace(go.Histogram(x=df['air_quality_PM2.5'].dropna(), nbinsx=50, name='PM2.5', marker_color='#FF6B6B'), row=1, col=1)
fig.add_trace(go.Histogram(x=df['air_quality_PM10'].dropna(), nbinsx=50, name='PM10', marker_color='#4ECDC4'), row=1, col=2)

fig.update_layout(title='üí® Particulate Matter Distribution', template='plotly_dark', showlegend=False)
fig.show()

print(f"\nüìä PM2.5 Statistics:")
print(f"   Mean: {df['air_quality_PM2.5'].mean():.1f} ¬µg/m¬≥")
print(f"   Median: {df['air_quality_PM2.5'].median():.1f} ¬µg/m¬≥")
print(f"   Max: {df['air_quality_PM2.5'].max():.1f} ¬µg/m¬≥")

## 2. Pollutant Distribution by Region

In [None]:
# Average PM2.5 by country
country_aq = df.groupby('country').agg({
    'air_quality_PM2.5': 'mean',
    'air_quality_us-epa-index': 'mean',
    'latitude': 'first',
    'longitude': 'first'
}).dropna().reset_index()

fig = px.scatter_geo(
    country_aq,
    lat='latitude', lon='longitude',
    color='air_quality_PM2.5',
    size='air_quality_PM2.5',
    hover_name='country',
    title='üó∫Ô∏è Average PM2.5 Levels by Country',
    color_continuous_scale='Reds'
)
fig.update_layout(template='plotly_dark', geo=dict(bgcolor='rgba(0,0,0,0)'))
fig.show()

In [None]:
# Top 10 most polluted countries
top_polluted = country_aq.nlargest(10, 'air_quality_PM2.5')

fig = px.bar(
    top_polluted, x='air_quality_PM2.5', y='country', orientation='h',
    title='üè≠ Top 10 Countries by PM2.5 Levels',
    color='air_quality_PM2.5', color_continuous_scale='Reds'
)
fig.update_layout(template='plotly_dark', yaxis={'categoryorder': 'total ascending'})
fig.show()

## 3. Weather-Pollution Correlations

In [None]:
# Correlation between weather and air quality
corr_cols = ['temperature_celsius', 'humidity', 'wind_kph', 'pressure_mb', 
             'air_quality_PM2.5', 'air_quality_Ozone', 'air_quality_Carbon_Monoxide']

corr_matrix = df[corr_cols].corr()

fig = px.imshow(
    corr_matrix,
    title='üîó Weather vs Air Quality Correlation Matrix',
    color_continuous_scale='RdBu_r',
    zmin=-1, zmax=1
)
fig.update_layout(template='plotly_dark')
fig.show()

print("\nüìä Key Correlations with PM2.5:")
pm25_corr = corr_matrix['air_quality_PM2.5'].sort_values()
for col, val in pm25_corr.items():
    if col != 'air_quality_PM2.5':
        print(f"   {col}: {val:.3f}")

### Key Insight

- **Wind Speed** typically has a **negative correlation** with pollution (wind disperses pollutants)
- **Temperature** and **Ozone** are often positively correlated (sunlight + heat = more ozone)
- **Humidity** can trap pollutants or help them settle

## 4. Seasonal Pollution Patterns

In [None]:
# Extract month
df['month'] = df['last_updated'].dt.month

# Monthly pollution patterns
monthly_aq = df.groupby('month')[['air_quality_PM2.5', 'air_quality_Ozone', 'air_quality_Carbon_Monoxide']].mean().reset_index()

fig = go.Figure()
fig.add_trace(go.Scatter(x=monthly_aq['month'], y=monthly_aq['air_quality_PM2.5'], name='PM2.5', mode='lines+markers'))
fig.add_trace(go.Scatter(x=monthly_aq['month'], y=monthly_aq['air_quality_Ozone'], name='Ozone', mode='lines+markers'))

fig.update_layout(
    title='üìÖ Seasonal Pollution Patterns',
    xaxis_title='Month',
    yaxis_title='Concentration',
    template='plotly_dark'
)
fig.show()

## 5. Temperature Impact on Air Quality

In [None]:
# Temperature bins vs AQI
df['temp_bin'] = pd.cut(df['temperature_celsius'], bins=[-20, 0, 10, 20, 30, 40, 50], 
                        labels=['<0¬∞C', '0-10¬∞C', '10-20¬∞C', '20-30¬∞C', '30-40¬∞C', '>40¬∞C'])

temp_aq = df.groupby('temp_bin')[['air_quality_PM2.5', 'air_quality_Ozone']].mean().reset_index()

fig = px.bar(
    temp_aq, x='temp_bin', y='air_quality_Ozone',
    title='üå°Ô∏è Ozone Levels by Temperature Range',
    labels={'temp_bin': 'Temperature', 'air_quality_Ozone': 'Avg Ozone'},
    color='air_quality_Ozone', color_continuous_scale='YlOrRd'
)
fig.update_layout(template='plotly_dark')
fig.show()

print("\nüìä Insight: Ozone levels typically increase with temperature due to photochemical reactions.")

---

## üèÅ Summary

### Key Findings

1. **Regional Disparities**: PM2.5 levels vary significantly by country and region
2. **Weather Impact**: Wind speed helps disperse pollutants; temperature increases ozone
3. **Seasonal Patterns**: Air quality fluctuates with seasons (heating in winter, ozone in summer)
4. **Health Implications**: Understanding these patterns helps predict unhealthy air quality days

### Applications

- **Public Health Alerts**: Predict high-pollution days based on weather forecasts
- **Urban Planning**: Identify pollution hotspots
- **Climate Research**: Track long-term air quality trends