# Delhi NCR Air Quality (2020–2025)

This notebook analyzes AQI data for Delhi NCR to understand:
- How air quality has changed from 2020 to 2025
- Why winter months consistently show worse air quality
- Whether disruptions like COVID meaningfully altered long-term patterns


In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go


In [2]:
df = pd.read_csv("../data/raw/delhi_ncr_aqi_dataset.csv", low_memory=False)

df.head()


Unnamed: 0,datetime,date,year,month,day,hour,day_of_week,is_weekend,season,city,...,no2,so2,co,o3,temperature,humidity,wind_speed,visibility,aqi,aqi_category
0,2020-01-01 06:00:00,2020-01-01,2020,1,1,6,Wednesday,0,winter,Delhi,...,119.6,47.7,5.19,12.3,9.4,100,3.6,1.2,500,Severe
1,2020-01-01 12:00:00,2020-01-01,2020,1,1,12,Wednesday,0,winter,Delhi,...,117.9,39.3,4.32,15.8,20.6,50,5.9,1.4,500,Severe
2,2020-01-01 18:00:00,2020-01-01,2020,1,1,18,Wednesday,0,winter,Delhi,...,150.1,36.3,7.13,14.3,12.4,56,4.5,1.1,500,Severe
3,2020-01-01 23:00:00,2020-01-01,2020,1,1,23,Wednesday,0,winter,Delhi,...,142.0,30.3,4.9,13.2,14.4,48,5.8,1.4,500,Severe
4,2020-01-01 06:00:00,2020-01-01,2020,1,1,6,Wednesday,0,winter,Delhi,...,138.4,41.5,7.56,15.4,6.8,100,2.8,0.4,500,Severe


In [3]:
df["date"] = pd.to_datetime(df["date"], errors="coerce")

df = df.dropna(subset=["date", "aqi"])

df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month

df = df[(df["year"] >= 2020) & (df["year"] <= 2025)]

df[["date", "year", "month", "aqi"]].head()


Unnamed: 0,date,year,month,aqi
0,2020-01-01,2020,1,500
1,2020-01-01,2020,1,500
2,2020-01-01,2020,1,500
3,2020-01-01,2020,1,500
4,2020-01-01,2020,1,500


In [4]:
yearly_aqi = (
    df.groupby("year")["aqi"]
    .mean()
    .reset_index()
)

yearly_aqi


Unnamed: 0,year,aqi
0,2020,271.479894
1,2021,268.961078
2,2022,267.745414
3,2023,263.828529
4,2024,262.390948
5,2025,260.56897


In [5]:
fig = px.bar(
    yearly_aqi,
    x="year",
    y="aqi",
    title="Delhi NCR – Yearly Average AQI (2020–2025)",
    labels={"aqi": "Average AQI", "year": "Year"},
)

fig.update_traces(marker_color="black")

fig.update_layout(
    plot_bgcolor="white",
    paper_bgcolor="white",
    height=450,
    font=dict(family="Georgia", size=13),
    yaxis=dict(gridcolor="rgba(0,0,0,0.08)")
)

fig.show()


In [6]:
monthly_aqi = (
    df.groupby(["year", "month"])["aqi"]
    .mean()
    .reset_index()
)

month_map = {
    1:"Jan",2:"Feb",3:"Mar",4:"Apr",5:"May",6:"Jun",
    7:"Jul",8:"Aug",9:"Sep",10:"Oct",11:"Nov",12:"Dec"
}

monthly_aqi["month_name"] = monthly_aqi["month"].map(month_map)

monthly_aqi.head()


Unnamed: 0,year,month,aqi,month_name
0,2020,1,468.611851,Jan
1,2020,2,396.766867,Feb
2,2020,3,265.815919,Mar
3,2020,4,170.851812,Apr
4,2020,5,204.771388,May


In [7]:
fig = px.line(
    monthly_aqi,
    x="month",
    y="aqi",
    color="year",
    title="Seasonal Wave of AQI in Delhi NCR (2020–2025)",
    labels={"month": "", "aqi": "Average AQI"},
    color_discrete_sequence=px.colors.sequential.Greys
)

fig.update_traces(
    line=dict(width=2.5, shape="spline"),
    opacity=0.45
)

fig.update_layout(
    plot_bgcolor="white",
    paper_bgcolor="white",
    height=520,
    hovermode="x unified",
    font=dict(family="Georgia", size=13),
    xaxis=dict(
        tickmode="array",
        tickvals=list(range(1,13)),
        ticktext=list(month_map.values()),
        showgrid=False
    ),
    yaxis=dict(
        gridcolor="rgba(0,0,0,0.08)"
    )
)

# Highlight winter (Nov–Feb)
fig.add_vrect(x0=10.5, x1=12.5, fillcolor="black", opacity=0.05, line_width=0)
fig.add_vrect(x0=0.5, x1=2.5, fillcolor="black", opacity=0.05, line_width=0)

fig.show()


### Interpretation

Across all years, Delhi NCR follows a repeating seasonal pattern.
AQI rises sharply during winter months (November–February), when low wind
speeds and temperature inversions trap pollutants near the surface.

Even during the COVID lockdown year, this winter peak persists, suggesting
that geography and atmospheric conditions play a larger role than short-term
emission reductions alone.
