# Project Title: Air Pollution Analysis in Selected Cities

## Problem Statement
Air pollution is a growing concern affecting human health and the environment. This project aims to analyze AQI trends across different cities over the years to identify areas with high pollution and observe temporal patterns.

## Project Description
The dataset contains city-wise air quality data including AQI readings over several years. The data is cleaned in Excel, and a pivot table is created to summarize the *sum of AQI* and *count of AQI buckets*. Charts are generated to visualize trends and support insights. Observations and conclusions are drawn based on the analysis.

In [2]:
## IMPORT LIBRARIES
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns



In [4]:
# Load the dataset
file_path = r"C:\Users\HP\OneDrive\air_pollution.xlsx"  # Replace with your actual file path
df = pd.read_excel(file_path, header=7)  
# Display first 5 rows to check
df.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14
0,Row Labels,Sum of AQI,Count of AQI_Bucket,Sum of AQI,Count of AQI_Bucket,Sum of AQI,Count of AQI_Bucket,Sum of AQI,Count of AQI_Bucket,Sum of AQI,Count of AQI_Bucket,Sum of AQI,Count of AQI_Bucket,,
1,Ahmedabad,81780,263,36289,117,38555,69,222148,357,181756,352,42604,176,603132.0,1334.0
2,Aizawl,,,,,,,,,,,3859,111,3859.0,111.0
3,Amaravati,,,,,7123,37,31634,312,30432,309,10958,183,80147.0,841.0
4,Amritsar,,,,,39386,266,39828,324,39639,362,16178,174,135031.0,1126.0


In [6]:
## Deleting missing values
df = df.dropna()
df.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14
1,Ahmedabad,81780,263,36289,117,38555,69,222148,357,181756,352,42604,176,603132.0,1334.0
5,Bengaluru,32196,286,37060,351,31712,364,31157,361,33435,365,14588,183,180148.0,1910.0
9,Chennai,40940,276,46281,334,37738,361,38504,365,37574,365,14686,183,215723.0,1884.0
11,Delhi,108414,365,110000,365,91395,356,90943,365,84718,365,33246,183,518716.0,1999.0
15,Hyderabad,39010,272,41869,337,40214,358,35608,365,34303,365,14306,183,205310.0,1880.0


In [8]:
# Select only relevant columns by index
df_cleaned = df.iloc[:, [0, 1, 2, 3, 4, 5, 6, 7]]

# Rename columns
df_cleaned.columns = ['City', 'Date', '2015', '2016', '2017', '2018', '2019', '2020']

# Show cleaned DataFrame
df_cleaned.head()


Unnamed: 0,City,Date,2015,2016,2017,2018,2019,2020
1,Ahmedabad,81780,263,36289,117,38555,69,222148
5,Bengaluru,32196,286,37060,351,31712,364,31157
9,Chennai,40940,276,46281,334,37738,361,38504
11,Delhi,108414,365,110000,365,91395,356,90943
15,Hyderabad,39010,272,41869,337,40214,358,35608


In [9]:
## basic info
print("=== Dataset Info ===")
df.info()
print("\n=== Statistical Summary ===")
print(df.describe())
print("\n=== Missing Values ===")
print(df.isnull().sum())

=== Dataset Info ===
<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, 1 to 27
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Unnamed: 0   8 non-null      object 
 1   Unnamed: 1   8 non-null      object 
 2   Unnamed: 2   8 non-null      object 
 3   Unnamed: 3   8 non-null      object 
 4   Unnamed: 4   8 non-null      object 
 5   Unnamed: 5   8 non-null      object 
 6   Unnamed: 6   8 non-null      object 
 7   Unnamed: 7   8 non-null      object 
 8   Unnamed: 8   8 non-null      object 
 9   Unnamed: 9   8 non-null      object 
 10  Unnamed: 10  8 non-null      object 
 11  Unnamed: 11  8 non-null      object 
 12  Unnamed: 12  8 non-null      object 
 13  Unnamed: 13  8 non-null      float64
 14  Unnamed: 14  8 non-null      float64
dtypes: float64(2), object(13)
memory usage: 1.0+ KB

=== Statistical Summary ===
        Unnamed: 13   Unnamed: 14
count  8.000000e+00      8.000000
mean   8.279466e+0