# 🌍 Air Quality Analysis Project

## 📝 Scenario
You are a junior data analyst working for your city's Environmental Protection Department. The department has been collecting air quality data from various monitoring stations across the city. They want to understand how different factors affect air quality and identify patterns that could help improve air quality in different areas of the city.

## 📊 Dataset Description
The dataset "air_quality_dataset.csv" contains measurements of various environmental factors and air quality indicators. Here are the columns:

1. Temperature (°C): Air temperature at time of measurement.
2. Humidity (%): Relative humidity in the air.
3. PM2.5 (μg/m³): Fine particulate matter concentration.
4. PM10 (μg/m³): Coarse particulate matter concentration.
5. NO2 (ppb): Nitrogen dioxide concentration.
6. SO2 (ppb): Sulfur dioxide concentration.
7. CO (ppm): Carbon monoxide concentration.
8. Proximity_to_Industrial_Areas (km): Distance to nearest industrial zone.
9. Population_Density (people/km²): Population density in the area.
10. Air Quality: Categorical rating (Good, Moderate, Poor, and Hazardous).

## 🎯 Learning Objectives
By completing this project, you will learn to:
1. Load and explore real-world environmental data using Python.
2. Perform basic data analysis using Pandas.
3. Create meaningful visualizations to understand air quality patterns.
4. Draw insights from environmental data that could help your community.
5. Practice presenting scientific findings in a clear, understandable way.


## 🚀 Task 1: Data Loading and Initial Exploration

### 📌 Detailed Instructions

1. **Import Required Libraries (5 minutes)**
   - Import Pandas for data manipulation.
   - Import matplotlib.pyplot for creating basic plots.
   - Import Seaborn for enhanced visualizations.
   - Make sure to use the standard alias names (pd, plt, sns).

2. **Load the Dataset (10 minutes)**
   - Use Pandas to read the CSV file from the provided URL.
   - Store the data in a variable called 'air_data'.
   - Check that the data loaded correctly by displaying the first few rows.

3. **Initial Data Exploration (15 minutes)**
   - Find out how many rows and columns are in your dataset.
   - Display basic information about the dataset:
     * Data types of each column.
     * Check for any missing values.
     * Get basic statistics of numerical columns.
   - Display the unique values in the 'Air Quality' column.
   - Calculate the percentage of each air quality category.

4. **Document Your Findings (10 minutes)**
   - Add markdown cells explaining what you discovered.
   - Comment on any interesting patterns or potential issues.
   - Note down questions that came up during your exploration.

### 💻 Template Code Cell



In [0]:
# Import necessary libraries
# Import pandas, matplotlib.pyplot, and seaborn
### BEGIN SOLUTION
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
### END SOLUTION

# Load the dataset
url = "https://raw.githubusercontent.com/Curriculum-Development/datasets/refs/heads/main/air_quality_dataset.csv"
# Load the data into air_data
### BEGIN SOLUTION
air_data = pd.read_csv(url)
### END SOLUTION

# Display the first few rows
# Use head() function
### BEGIN SOLUTION
print("First 5 rows of the dataset:")
print(air_data.head())
### END SOLUTION

# Get basic information about the dataset
# Use info() function
### BEGIN SOLUTION
print("\nDataset Information:")
print(air_data.info())
### END SOLUTION

# Check dataset dimensions
# Use shape attribute
### BEGIN SOLUTION
print("\nDataset dimensions (rows, columns):")
print(air_data.shape)
### END SOLUTION

# Get summary statistics
# Use describe() function
### BEGIN SOLUTION
print("\nSummary statistics:")
print(air_data.describe())
### END SOLUTION

# Analyze air quality categories
# Find unique values and calculate percentages
### BEGIN SOLUTION
print("\nUnique Air Quality categories:")
print(air_data['Air Quality'].unique())
### END SOLUTION

# Document your findings in markdown cells below


### Document Your Findings:
Feel free to add more Markdown cells as needed.

### 📝 Expected Output Documentation
After running Task 1, you should be able to answer these questions:
1. How many total measurements are in the dataset?
2. What is the most common air quality category?
3. Are there any missing values that need to be handled?
4. What are the ranges of values for key pollutants (PM2.5, PM10, NO2, SO2, CO)?
5. Are there any unusual or unexpected values in the dataset?

### 🎯 Success Criteria
You have successfully completed Task 1 when you can:
- ✓ Load the dataset without errors.
- ✓ Display basic information about the dataset.
- ✓ Generate summary statistics.
- ✓ Calculate air quality category distributions.
- ✓ Document your findings clearly in markdown cells.

## 🚀 Task 2: Creating Your First Visualization

### 📌 Instructions
In this task, we will create a simple bar chart to show how many measurements we have for each air quality category (Good, Moderate, Poor, or Hazardous). This will help us understand the distribution of air quality in our dataset.

Steps:
1. Use the air_data DataFrame we created in Task 1.
2. Create a figure using plt.figure().
3. Use plt.bar() to make a bar chart of air quality categories.
4. Add a title to your plot.
5. Label your X-axis and Y-axis.
6. Display your plot.

### 💻 Template Code Cell

In [0]:
# Create a bar chart of air quality categories

# First, count how many measurements we have for each category
quality_counts = air_data['Air Quality'].value_counts()

# Create the plot
plt.figure(figsize=(8, 6))

# Make your bar chart here
# Hint: use plt.bar(quality_counts.index, quality_counts.values)
### BEGIN SOLUTION
plt.bar(quality_counts.index, quality_counts.values, color='skyblue')
### END SOLUTION

# Add title and labels here
# Hint: use plt.title(), plt.xlabel(), plt.ylabel()
### BEGIN SOLUTION
plt.title('Number of Measurements by Air Quality Category')
plt.xlabel('Air Quality Category')
plt.ylabel('Number of Measurements')
### END SOLUTION

# Show the plot
plt.show()

## 🚀 Task 3: Exploring Relationships in Our Data

### 📌 Instructions
In this task, we will create a scatter plot to see how temperature relates to humidity in our dataset. A scatter plot helps us see if there's any relationship between two variables. We'll keep it simple and clear.

Steps:
1. Use the air_data DataFrame from Task 1.
2. Create a figure using plt.figure().
3. Use plt.scatter() to plot temperature vs. humidity.
4. Add a title to your plot.
5. Label your X-axis and Y-axis.
6. Display your plot.

### 💻 Template Code Cell

In [0]:
# Create a scatter plot of temperature vs humidity

# Create the plot
plt.figure(figsize=(8, 6))

# Make your scatter plot here
# Hint: use plt.scatter(air_data['Temperature'], air_data['Humidity'])
### BEGIN SOLUTION
plt.scatter(air_data['Temperature'], air_data['Humidity'],
           color='lightblue', alpha=0.5)
### END SOLUTION

# Add title and labels here
# Hint: use plt.title(), plt.xlabel(), plt.ylabel()
### BEGIN SOLUTION
plt.title('Temperature vs Humidity')
plt.xlabel('Temperature (°C)')
plt.ylabel('Humidity (%)')
### END SOLUTION

# Show the plot
plt.show()

## 🚀 Task 4: Visualizing Pollution Levels

### 📌 Instructions
In this task, we will create a simple line plot to see how PM2.5 levels vary across our measurements. Line plots are excellent for showing trends in our data. We'll plot each measurement point to see how PM2.5 levels change throughout our dataset.

Steps:
1. Use the air_data DataFrame from Task 1.
2. Create a figure using plt.figure().
3. Use plt.plot() to create a line plot of PM2.5 values.
4. Add a title to your plot.
5. Label your X-axis and Y-axis.
6. Display your plot.

### 💻 Template Code Cell





In [0]:
# Create a line plot of PM2.5 levels

# Create the plot
plt.figure(figsize=(10, 6))

# Make your line plot here
# Hint: use plt.plot(air_data['PM2.5'])
### BEGIN SOLUTION
plt.plot(air_data['PM2.5'], color='green', linewidth=2)
### END SOLUTION

# Add title and labels here
# Hint: use plt.title(), plt.xlabel(), plt.ylabel()
### BEGIN SOLUTION
plt.title('PM2.5 Levels Across Measurements')
plt.xlabel('Measurement Number')
plt.ylabel('PM2.5 (μg/m³)')
### END SOLUTION

# Show the plot
plt.show()

## 🚀 Task 5: Understanding Air Quality Impact on Temperature

### 📌 Instructions
In our final task, we will create a box plot to understand how temperature varies across different air quality categories. A box plot helps us see the distribution of temperature values within each air quality group, showing us the median, spread, and any unusual values.

Steps:
1. Use the air_data DataFrame from Task 1.
2. Create a figure using plt.figure().
3. Use Seaborn's boxplot function to create the visualization.
4. Add a title to your plot.
5. Label your X-axis and Y-axis.
6. Display your plot.
   
### 💻 Template Code Cell




In [0]:
# Create a box plot of temperature across air quality categories

# Create the plot
plt.figure(figsize=(10, 6))

# Make your box plot here
# Hint: use sns.boxplot(data=air_data, x='Air Quality', y='Temperature')
### BEGIN SOLUTION
sns.boxplot(data=air_data, x='Air Quality', y='Temperature',
           palette='coolwarm')
### END SOLUTION

# Add title and labels here
# Hint: use plt.title(), plt.xlabel(), plt.ylabel()
### BEGIN SOLUTION
plt.title('Temperature Distribution by Air Quality Category')
plt.xlabel('Air Quality Category')
plt.ylabel('Temperature (°C)')
### END SOLUTION

# Show the plot
plt.show()

Congratulations! You have now completed all five tasks in this air quality analysis project. You've learned how to:
- Load and explore environmental data.
- Create different types of visualizations including bar charts, scatter plots, line plots, and box plots.
- Use both Matplotlib and Seaborn for creating visualizations.
- Analyze relationships between different variables in your dataset.

These fundamental data visualization skills will serve as a strong foundation for your future data analysis projects. Remember that practice is key to becoming comfortable with these techniques. Try creating these same plots with different variables from the dataset to deepen your understanding.
