# Historical AQI Data Analysis

This notebook demonstrates how to download, analyze, and visualize historical Air Quality Index (AQI) data for Indian cities.

In [2]:
!pip install matplotlib seaborn pandas
from cpcbfetch import AQIClient
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better-looking plots
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

Collecting matplotlib
  Obtaining dependency information for matplotlib from https://files.pythonhosted.org/packages/45/c3/994ef20eb4154ab84cc08d033834555319e4af970165e6c8894050af0b3c/matplotlib-3.10.6-cp312-cp312-win_amd64.whl.metadata
  Downloading matplotlib-3.10.6-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting seaborn
  Obtaining dependency information for seaborn from https://files.pythonhosted.org/packages/83/11/00d3c3dfc25ad54e731d91449895a79e4bf2384dc3ac01809010ba88f6d5/seaborn-0.13.2-py3-none-any.whl.metadata
  Using cached seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Obtaining dependency information for contourpy>=1.0.1 from https://files.pythonhosted.org/packages/19/e8/6026ed58a64563186a9ee3f29f41261fd1828f527dd93d33b60feca63352/contourpy-1.3.3-cp312-cp312-win_amd64.whl.metadata
  Downloading contourpy-1.3.3-cp312-cp312-win_amd64.whl.metadata (5.5 kB)
Collecting cycler>=0.10 (from matplotlib)
  Obtaining dependency 


[notice] A new release of pip is available: 23.2.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## 1. Downloading Historical Data

Let's download AQI data for a city (e.g., Delhi) for analysis.

In [5]:
# Initialize AQI client
client = AQIClient()

# Download city-level data for Delhi in 2024
# Note: Uncomment the following lines to actually download data

city = "Delhi"
year = "2024"
output_file = f"{city.lower()}_aqi_{year}.csv"

client.download_past_year_AQI_data_cityLevel(city, year, output_file)
print(f"Data downloaded to {output_file}")

Data downloaded to delhi_aqi_2024.csv


## 2. Loading and Exploring the Data

Load the downloaded CSV file and explore its structure.

In [6]:
# Load the data
df = pd.read_csv(output_file)

# For demonstration purposes, let's create sample data structure
# Uncomment the above line when you have actual data

print("Dataset Information:")
print(f"Shape: {df.shape}")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nData types:\n{df.dtypes}")
print(f"\nFirst few rows:")
df.head()

Dataset Information:
Shape: (41, 13)

Columns: ['Day', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

Data types:
Day           object
January      float64
February     float64
March        float64
April        float64
May          float64
June         float64
July         float64
August       float64
September    float64
October      float64
November     float64
December     float64
dtype: object

First few rows:


Unnamed: 0,Day,January,February,March,April,May,June,July,August,September,October,November,December
0,1,346.0,176.0,208.0,133.0,200.0,245.0,105.0,64.0,101.0,149.0,339.0,285.0
1,2,340.0,215.0,117.0,144.0,197.0,173.0,118.0,76.0,87.0,173.0,318.0,280.0
2,3,341.0,199.0,126.0,167.0,264.0,155.0,108.0,68.0,89.0,162.0,381.0,268.0
3,4,377.0,274.0,141.0,173.0,282.0,211.0,61.0,64.0,69.0,184.0,380.0,178.0
4,5,333.0,177.0,125.0,174.0,292.0,251.0,77.0,59.0,83.0,145.0,372.0,165.0


## 3. Data Preprocessing

Clean and prepare the data for analysis.

In [13]:
# Remove non-data rows if present (e.g., summary/statistics rows)
df_clean = df[df['Day'].apply(lambda x: str(x).isdigit())].copy()

# Convert 'Day' to integer
df_clean['Day'] = df_clean['Day'].astype(int)

# Melt the dataframe to long format: columns 'Day', 'Month', 'AQI'
df_long = df_clean.melt(id_vars='Day', value_vars=[
    'January', 'February', 'March', 'April', 'May', 'June',
    'July', 'August', 'September', 'October', 'November', 'December'
], var_name='Month', value_name='AQI')

# Drop missing AQI values
df_long = df_long.dropna(subset=['AQI'])

# Sort by Month and Day
df_long = df_long.sort_values(['Month', 'Day']).reset_index(drop=True)

print(df_long.head())

   Day  Month    AQI
0    1  April  133.0
1    2  April  144.0
2    3  April  167.0
3    4  April  173.0
4    5  April  174.0


## 4. Statistical Summary

Get basic statistics about the AQI data.

In [None]:
# Statistical summary
# print("AQI Statistics:")
# print(df['AQI'].describe())

# print("\nAQI Category Distribution:")
# print(df['AQI_Category'].value_counts())

## 5. Time Series Visualization

Plot AQI trends over time.

In [None]:
# Plot AQI over time
# plt.figure(figsize=(14, 6))
# plt.plot(df['Date'], df['AQI'], linewidth=1, alpha=0.7)
# plt.xlabel('Date')
# plt.ylabel('AQI')
# plt.title(f'Air Quality Index Trend for {city} ({year})')
# plt.xticks(rotation=45)
# plt.grid(True, alpha=0.3)
# plt.tight_layout()
# plt.show()

## 6. Monthly Analysis

Analyze AQI patterns by month.

In [None]:
# Extract month from date
# df['Month'] = df['Date'].dt.month_name()

# Calculate monthly average AQI
# monthly_avg = df.groupby('Month')['AQI'].mean().sort_values(ascending=False)

# print("Average AQI by Month:")
# print(monthly_avg)

# Visualize monthly averages
# plt.figure(figsize=(12, 6))
# monthly_avg.plot(kind='bar', color='steelblue')
# plt.xlabel('Month')
# plt.ylabel('Average AQI')
# plt.title(f'Average AQI by Month - {city} ({year})')
# plt.xticks(rotation=45)
# plt.tight_layout()
# plt.show()

## 7. Pollutant Analysis

Analyze individual pollutant levels if available in the data.

In [None]:
# Pollutants to analyze
# pollutants = ['PM2.5', 'PM10', 'NO2', 'SO2', 'CO', 'O3']

# # Check which pollutants are available
# available_pollutants = [p for p in pollutants if p in df.columns]

# if available_pollutants:
#     # Plot pollutant trends
#     fig, axes = plt.subplots(len(available_pollutants), 1, figsize=(14, 4*len(available_pollutants)))
#     
#     for idx, pollutant in enumerate(available_pollutants):
#         ax = axes[idx] if len(available_pollutants) > 1 else axes
#         ax.plot(df['Date'], df[pollutant], linewidth=1, alpha=0.7)
#         ax.set_xlabel('Date')
#         ax.set_ylabel(f'{pollutant} (μg/m³)')
#         ax.set_title(f'{pollutant} Levels Over Time')
#         ax.grid(True, alpha=0.3)
#     
#     plt.tight_layout()
#     plt.show()

## 8. AQI Category Distribution

Visualize the distribution of AQI categories.

In [None]:
# Create a pie chart of AQI categories
# category_counts = df['AQI_Category'].value_counts()

# plt.figure(figsize=(10, 8))
# plt.pie(category_counts.values, labels=category_counts.index, autopct='%1.1f%%', startangle=90)
# plt.title(f'AQI Category Distribution - {city} ({year})')
# plt.axis('equal')
# plt.show()

## 9. Identifying Worst Days

Find the days with worst air quality.

In [None]:
# Get top 10 worst AQI days
# worst_days = df.nlargest(10, 'AQI')[['Date', 'AQI', 'AQI_Category']]
# print("Top 10 Worst Air Quality Days:")
# print(worst_days)

# Get top 10 best AQI days
# best_days = df.nsmallest(10, 'AQI')[['Date', 'AQI', 'AQI_Category']]
# print("\nTop 10 Best Air Quality Days:")
# print(best_days)

## 10. Export Analysis Results

Save the analysis results for future reference.

In [None]:
# Create summary statistics
# summary = {
#     'City': city,
#     'Year': year,
#     'Total Days': len(df),
#     'Average AQI': df['AQI'].mean(),
#     'Max AQI': df['AQI'].max(),
#     'Min AQI': df['AQI'].min(),
#     'Std Dev': df['AQI'].std()
# }

# print("\nAnalysis Summary:")
# for key, value in summary.items():
#     print(f"{key}: {value}")

# Save to JSON
# import json
# with open(f'{city.lower()}_analysis_{year}.json', 'w') as f:
#     json.dump(summary, f, indent=2)
# print(f"\nAnalysis saved to {city.lower()}_analysis_{year}.json")

## Conclusion

This notebook demonstrated how to:
- Download historical AQI data
- Perform basic data analysis
- Visualize trends and patterns
- Identify critical pollution days
- Export analysis results

For more examples, check out the other notebooks in this series!