# 3. Asking meaningful questions

**1. How does air quality differ across various seasons?**
- Answering this question will:
    - Reveal seasonal variations in air quality
    - Help understand environmental and climatic impacts on air pollution
    - Provide insights for public health and environmental policy
- How we answer this question:
    - Data Preparation:
        - Use Python with pandas for data manipulation
        - Calculate US Air Quality Index (AQI) for each measurement
        - Categorize data by seasons (Spring, Summer, Autumn, Winter)
        - Split countries into continental regions
    - Analysis Techniques:
        - Calculate average AQI for each season and continent
        - Create visualizations to demonstrate seasonal variations
        - Statistical tests to validate significant differences
- Visualization Strategy:
    - Line chart showing US_AQI trends across seasons for different regions
    - Bar chart comparing average pollutant levels by season and region

**2. What are the levels of pollution during different weather conditions?**
- Answering this question will:
    - Understand how weather parameters influence air pollution
    - Identify weather conditions associated with higher pollution levels
    - Support environmental monitoring and forecasting efforts
- How we answer this question:
    - Data Categorization:
        - Group temperature into ranges (Cold, Mild, Hot)
        - Categorize humidity levels (Low, Medium, High)
        - Classify wind speed (Calm, Moderate, Strong)
    - Analysis Approach:
        - Calculate average AQI for each weather category
        - Perform statistical analysis to determine correlations
        - Develop multi-dimensional visualization
- Visualization Strategy:
    - Heatmaps showing AQI variations across weather parameters
    - Box plots comparing pollution levels in different weather conditions

**3. What is the correlation between humidity and the ratio of PM2.5 to PM10 in different continents?**
- Answering this question will:
    - Explore relationship between humidity and particulate matter ratios
    - Understand regional differences in air pollution characteristics
    - Provide insights into atmospheric particulate interactions
- How we answer this question:
    - Data Preparation:
        - Create new column 'PM2.5/PM10' ratio
        - Group data by continent
        - Calculate Pearson correlation coefficient
    - Analysis Techniques:
        - Compute correlation between humidity and PM ratio for each continent
        - Assess statistical significance of correlations
        - Visualize correlation strengths
- Visualization Strategy:
    - Bar chart showing correlation coefficients by continent
    - Scatter plots with regression lines

**4. Which pollutants contribute the most to overall Air Pollution level in each city?**
- Answering this question will:
    - Identify primary pollution sources in different cities
    - Support targeted environmental interventions
    - Help understand urban air quality variations
- How we answer this question:
    - Data Analysis:
        - Calculate average levels for pollutants: PM2.5, PM10, NO2, SO2, CO, O3
        - Normalize pollutant concentrations
        - Rank pollutants by contribution to overall air pollution
    - Visualization Techniques:
        - Stacked bar charts for each city
        - Percentage contribution of each pollutant
        - Interactive visualizations for detailed exploration
- Visualization Strategy:
    - Stacked bar charts showing pollutant contributions
    - Radar/spider charts comparing pollutant profiles

**5. How does population density impact air quality?**
- Answering this question will:
    - Explore relationship between urban concentration and pollution
    - Provide insights for urban planning and environmental policy
    - Understand human settlement impacts on air quality
- How we answer this question:
    - Data Preparation:
        - Integrate population density data
            - Categorize cities into density groups:
            - Very Low: < 500 people/km²
            - Low: 500-1500 people/km²
            - Moderate: 1500-3000 people/km²
            - High: 3000-5000 people/km²
            - Very High: > 5000 people/km²
    - Analysis Approach:
        - Calculate average AQI for each population density category
        - Perform statistical tests to validate relationships
        - Create comparative visualizations

- Visualization Strategy:
    - Box plots showing AQI distribution across population density groups
    - Scatter plots with regression lines


# 4. Preprocessing + analyzing data to answer each question

## 4.0 Import libraries and load datasets

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
df = pd.read_csv('../data/global_air_quality_data_10000.csv')
df

Unnamed: 0,City,Country,Date,PM2.5,PM10,NO2,SO2,CO,O3,Temperature,Humidity,Wind Speed
0,Bangkok,Thailand,2023-03-19,86.57,25.19,99.88,30.63,4.46,36.29,17.67,59.35,13.76
1,Istanbul,Turkey,2023-02-16,50.63,97.39,48.14,8.71,3.40,144.16,3.46,67.51,6.36
2,Rio de Janeiro,Brazil,2023-11-13,130.21,57.22,98.51,9.92,0.12,179.31,25.29,29.30,12.87
3,Mumbai,India,2023-03-16,119.70,130.52,10.96,33.03,7.74,38.65,23.15,99.97,7.71
4,Paris,France,2023-04-04,55.20,36.62,76.85,21.85,2.00,67.09,16.02,90.28,14.16
...,...,...,...,...,...,...,...,...,...,...,...,...
9995,Johannesburg,South Africa,2023-09-16,147.85,184.34,90.33,34.93,2.81,191.45,-1.92,65.22,15.48
9996,Berlin,Germany,2023-12-05,12.22,121.49,49.04,5.66,2.10,184.56,-9.81,12.16,10.75
9997,Moscow,Russia,2023-11-26,44.07,143.62,8.41,32.58,0.69,167.68,39.35,53.95,4.56
9998,Berlin,Germany,2023-02-03,67.43,96.79,43.23,29.19,6.01,148.50,26.21,58.54,2.71


## 4.1. How does air quality differ across various seasons?

*Preprocessing steps:*

*Analysis steps:*

## 4.2. What are the levels of pollution during different weather conditions?


*Preprocessing steps:*

*Analysis steps:*

## 4.3. What is the correlation between humidity and the ratio of PM2.5 to PM10 in different continents?


*Preprocessing steps:*

*Analysis steps:*

## 4.4. Which pollutants contribute the most to overall Air Pollution level in each city?

*Preprocessing steps:*

*Analysis steps:*

## 4.5. How does population density impact air quality?

*Preprocessing steps:*

*Analysis steps:*