# Air Quality Analysis across European Cities
# =======================================

This notebook demonstrates how to analyze air quality data from three European cities:
Antwerp, Paris, and London. We'll cover data loading, cleaning, visualization, and basic analysis.

## 1. Setting up the Environment
First, let's import the necessary libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style for better visualization
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("muted")
sns.set_context("notebook", font_scale=1.2)

print("Environment setup complete!")

## 2. Loading the Data
Now we'll load the dataset from the CSV file

In [None]:
# Load the dataset from the CSV file
# The file name is air_quality_no2.csv which contains NO2 measurements
file_path = 'air_quality_no2.csv'
df = pd.read_csv(file_path)

# Let's check the first few rows to understand the data
print("First few rows of the dataset:")
df.head()

Let's convert the datetime column to the proper format

In [None]:
# Convert datetime string to datetime object
df['datetime'] = pd.to_datetime(df['datetime'])

# Display the dataset
print("Air Quality Dataset with datetime converted:")
df

## 3. Data Cleaning and Preparation
Let's check for missing values and prepare our data for analysis

In [None]:
# Let's check for missing values
print("Missing Values per Column:")
missing_values = df.isna().sum()
missing_values

In [None]:
# Let's create a cleaner version for visualization, filling missing values with NaN
df_clean = df.copy()

# Set the datetime as index for time series analysis
df_clean.set_index('datetime', inplace=True)

print("Dataset with datetime as index:")
df_clean

## 4. Data Visualization
Let's create a visualization of the air quality measurements over time

In [None]:
plt.figure(figsize=(12, 6))

# Plot each station's data
plt.plot(df_clean.index, df_clean['station_antwerp'], 'o-', label='Antwerp')
plt.plot(df_clean.index, df_clean['station_paris'], 's-', label='Paris')
plt.plot(df_clean.index, df_clean['station_london'], '^-', label='London')

plt.title('NO2 Air Quality Measurements Over Time', fontsize=16)
plt.xlabel('Time')
plt.ylabel('NO2 Concentration (μg/m³)')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()

## 5. Statistical Analysis
Calculate some basic statistics for each station

In [None]:
print("Basic Statistics for NO2 Measurements:")
stats = df_clean.describe()
stats

In [None]:
# Calculate average NO2 level by station
avg_by_station = df_clean.mean()
print("Average NO2 Levels by Station:")
avg_by_station

## 6. Handling Missing Values
Several approaches can be used for missing values

In [None]:
print("Demonstrating different methods for handling missing values:")

# Method 1: Forward fill (use previous value)
df_ffill = df_clean.ffill()
print("\nForward Fill Method:")
df_ffill

In [None]:
# Method 2: Backward fill (use next value)
df_bfill = df_clean.bfill()
print("\nBackward Fill Method:")
df_bfill

In [None]:
# Method 3: Linear interpolation
df_interp = df_clean.interpolate(method='linear')
print("\nLinear Interpolation Method:")
df_interp

## 7. Visualizing Interpolated Data
Let's visualize our data after interpolation

In [None]:
plt.figure(figsize=(12, 6))

# Plot each station's interpolated data
plt.plot(df_interp.index, df_interp['station_antwerp'], 'o-', label='Antwerp')
plt.plot(df_interp.index, df_interp['station_paris'], 's-', label='Paris')
plt.plot(df_interp.index, df_interp['station_london'], '^-', label='London')

plt.title('Interpolated NO2 Air Quality Measurements', fontsize=16)
plt.xlabel('Time')
plt.ylabel('NO2 Concentration (μg/m³)')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()

## 8. Comparing Stations
Create a box plot to compare distributions

In [None]:
plt.figure(figsize=(10, 6))
sns.boxplot(data=df_interp)
plt.title('Distribution of NO2 Measurements by Station', fontsize=16)
plt.ylabel('NO2 Concentration (μg/m³)')
plt.xticks(rotation=0)
plt.grid(True, axis='y')
plt.tight_layout()

## 9. Conclusion

In [None]:
print("""
Conclusion:
- We've loaded and analyzed air quality (NO2) data from three European cities.
- We've handled missing values using various techniques.
- We've visualized the trends and distributions of the data.
- Paris shows the highest average NO2 levels in this time period.
- There is clear time-of-day variation in all stations.
""")

## 10. Exercise Ideas for Students

In [None]:
print("""
Exercise Ideas:
1. Add more data processing options to handle missing values
2. Create a correlation analysis between stations
3. Identify peak pollution hours and possible causes
4. Implement a simple forecasting model for future values
5. Compare this data with other air quality indicators (PM2.5, CO2, etc.)
""")