### Project 3

In [1]:
import plotly.io as pio

pio.renderers.default = "vscode+jupyterlab+notebook_connected"

# Air Pollution Around the World: How Income Levels and Renewable Energy Affect Air Quality

### Datasets

For this project, I used data from the WHO, World Bank, and OurWorldinData to study air pollution levels, measured by PM 2.5, in different countries. I looked at how air quality is connected to GNI per capita (used as an indicator of income) and how renewable energy might affect air pollution. My goal was to compare air quality in countries with different income levels and see if using more renewable energy helps reduce air pollution.

1. Air pollution data from [World Health Organization](https://www.who.int/data/gho/data/indicators/indicator-details/GHO/concentrations-of-fine-particulate-matter-(pm2-5))
2. GNI per capita data from [World Bank](https://data.worldbank.org/indicator/NY.GNP.PCAP.PP.CD?locations=US)
3. Renewable Energy Use data from [OurWorldinData.com](https://ourworldindata.org/grapher/renewable-share-energy.csv?v=1&csvType=full&useColumnShortNames=true)


### Analysis Questions

1. How many countries around the world have successfully met the safe air pollution level recommended by the [WHO](https://www.who.int/news-room/feature-stories/detail/what-are-the-who-air-quality-guidelines#:~:text=New%20research%20has%20also%20shown,version%20was%20published%20in%202005.) (annual PM2.5 AQG level of 5 µg/m³)?

2. Do lower-income countries have higher levels of air pollution compared to upper-middle-income countries?

3. Is there a correlation between air pollution and adoption of renewable energy across countries?


### Hypothesis

I hypothesize that the majority of countries around the world have not yet achieved the safe air pollution level recommended by the WHO, which sets an annual PM2.5 AQG level of 5 µg/m³. I expect that air pollution levels will vary significantly across income groups, with lower-income countries experiencing higher levels of air pollution compared to upper-middle-income and high-income countries. Additionally, I predict that lower levels of air pollution are likely to be linked to higher adoption rates of renewable energy, as cleaner energy sources help to reduce particulate matter emissions.

### Does the majority of the world meet the safe air quality level set by the WHO? (Hypothesis 1)

First, I started by importing the pandas library. I loaded the air pollution dataset using `pd.read_csv` and displayed it to check if everything loaded correctly. Then, I used `.info()` to look at the structure of the dataset. I noticed that the `"Value"` column, which contains air pollution levels (PM 2.5), was stored as text instead of numbers, so I had to fix that later. I also used the `.unique()` function to check the years available in the `"Period"` column. This helped me figure out that the dataset only includes data from 2010 to 2019.

In [2]:
import pandas as pd

air_pollution = pd.read_csv("Air pollution.csv")
air_pollution

Unnamed: 0,IndicatorCode,Indicator,ValueType,ParentLocationCode,ParentLocation,Location type,SpatialDimValueCode,Location,Period type,Period,...,FactValueUoM,FactValueNumericLowPrefix,FactValueNumericLow,FactValueNumericHighPrefix,FactValueNumericHigh,Value,FactValueTranslationID,FactComments,Language,DateModified
0,SDGPM25,Concentrations of fine particulate matter (PM2.5),text,AFR,Africa,Country,KEN,Kenya,Year,2019,...,,,6.29,,13.74,10.01 [6.29-13.74],,,EN,2022-08-12T04:00:00.000Z
1,SDGPM25,Concentrations of fine particulate matter (PM2.5),text,AMR,Americas,Country,TTO,Trinidad and Tobago,Year,2019,...,,,7.44,,12.55,10.02 [7.44-12.55],,,EN,2022-08-12T04:00:00.000Z
2,SDGPM25,Concentrations of fine particulate matter (PM2.5),text,EUR,Europe,Country,GBR,United Kingdom of Great Britain and Northern I...,Year,2019,...,,,9.73,,10.39,10.06 [9.73-10.39],,,EN,2022-08-12T04:00:00.000Z
3,SDGPM25,Concentrations of fine particulate matter (PM2.5),text,AMR,Americas,Country,GRD,Grenada,Year,2019,...,,,7.07,,13.20,10.08 [7.07-13.20],,,EN,2022-08-12T04:00:00.000Z
4,SDGPM25,Concentrations of fine particulate matter (PM2.5),text,AMR,Americas,Country,BRA,Brazil,Year,2019,...,,,8.23,,12.46,10.09 [8.23-12.46],,,EN,2022-08-12T04:00:00.000Z
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9445,SDGPM25,Concentrations of fine particulate matter (PM2.5),text,AMR,Americas,Country,BLZ,Belize,Year,2010,...,,,3.91,,20.28,9.92 [3.91-20.28],,,EN,2022-08-12T04:00:00.000Z
9446,SDGPM25,Concentrations of fine particulate matter (PM2.5),text,AMR,Americas,Country,TTO,Trinidad and Tobago,Year,2010,...,,,7.80,,12.89,9.92 [7.80-12.89],,,EN,2022-08-12T04:00:00.000Z
9447,SDGPM25,Concentrations of fine particulate matter (PM2.5),text,AFR,Africa,Country,KEN,Kenya,Year,2010,...,,,6.30,,13.57,9.94 [6.30-13.57],,,EN,2022-08-12T04:00:00.000Z
9448,SDGPM25,Concentrations of fine particulate matter (PM2.5),text,AMR,Americas,Country,USA,United States of America,Year,2010,...,,,9.78,,10.11,9.95 [9.78-10.11],,,EN,2022-08-12T04:00:00.000Z


In [3]:
air_pollution.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9450 entries, 0 to 9449
Data columns (total 34 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   IndicatorCode               9450 non-null   object 
 1   Indicator                   9450 non-null   object 
 2   ValueType                   9450 non-null   object 
 3   ParentLocationCode          9450 non-null   object 
 4   ParentLocation              9450 non-null   object 
 5   Location type               9450 non-null   object 
 6   SpatialDimValueCode         9450 non-null   object 
 7   Location                    9450 non-null   object 
 8   Period type                 9450 non-null   object 
 9   Period                      9450 non-null   int64  
 10  IsLatestYear                9450 non-null   bool   
 11  Dim1 type                   9450 non-null   object 
 12  Dim1                        9450 non-null   object 
 13  Dim1ValueCode               9450 

In [4]:
air_pollution["Period"].unique()

array([2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010])

Next, I decided to focus on the data for 2019, since it’s the most recent year in the dataset. I filtered the rows where `"Dim1"` was `"Total"` because that represented the overall air pollution levels for each country. Then, I kept just the `"Location"` (countries) and `"Value"` columns since those were the ones I needed for my analysis. The `"Value"` column had extra characters that I do not need, so I took the first five characters (the air pollution value with two decimal points) and converted it into a float to make it easier to work with.

In [5]:
air_pollution_2019 = air_pollution[
    (air_pollution["Period"] == 2019) & (air_pollution["Dim1"] == "Total")][["Location", "Value"]]
air_pollution_2019["Value"] = air_pollution_2019["Value"].str[:5]
air_pollution_2019["Value"] = pd.to_numeric(air_pollution_2019["Value"])
air_pollution_2019

Unnamed: 0,Location,Value
3,Grenada,10.08
10,Trinidad and Tobago,10.26
14,Lithuania,10.37
22,France,10.46
23,Mauritius,10.48
...,...,...
919,United Kingdom of Great Britain and Northern I...,9.52
922,Denmark,9.66
925,Haiti,9.69
930,Barbados,9.79


In [6]:
air_pollution_2019.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 195 entries, 3 to 936
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Location  195 non-null    object 
 1   Value     195 non-null    float64
dtypes: float64(1), object(1)
memory usage: 4.6+ KB


I noticed that some countries appeared multiple times in the data, so I grouped the rows by `"Location"` and calculated the average PM 2.5 levels for each country. After that, I rounded the results to two decimal places to make the numbers easier to read. To make the DataFrame more understandable, I renamed the columns: `"Location"` became `"Country Name"` and `"Value"` became `"Air Pollution"`. Then, I sorted the data in ascending order to see which country had the lowest PM 2.5 level. It turned out that in 2019, the Bahamas had the cleanest air, with a PM 2.5 value of 5.20 µg/m³, while Kuwait having the worst with a PM 2.5 value of 64.08 µg/m³.

In [7]:
air_pollution_2019 = air_pollution_2019.groupby("Location", as_index=False).agg({"Value": "mean"}).round(2)
air_pollution_2019

Unnamed: 0,Location,Value
0,Afghanistan,62.49
1,Albania,16.28
2,Algeria,22.68
3,Andorra,8.52
4,Angola,27.16
...,...,...
190,Viet Nam,20.89
191,Yemen,41.61
192,Zambia,16.90
193,Zimbabwe,13.08


In [9]:
air_pollution_2019.sort_values("Value", ascending=True, inplace=True)
air_pollution_2019

Unnamed: 0,Location,Value
11,Bahamas,5.20
61,Finland,5.47
77,Iceland,5.79
167,Sweden,5.96
128,Norway,6.30
...,...,...
151,Saudi Arabia,57.16
139,Qatar,59.04
0,Afghanistan,62.49
53,Egypt,63.16


In [10]:
air_pollution_2019 = air_pollution_2019.rename(columns={"Location": "Country Name","Value": "Air Pollution"})
air_pollution_2019

Unnamed: 0,Country Name,Air Pollution
11,Bahamas,5.20
61,Finland,5.47
77,Iceland,5.79
167,Sweden,5.96
128,Norway,6.30
...,...,...
151,Saudi Arabia,57.16
139,Qatar,59.04
0,Afghanistan,62.49
53,Egypt,63.16


To get a better idea of how air pollution levels looked overall, I created a boxplot using Plotly. This chart showed the distribution of PM 2.5 levels across all the countries. I added a red dashed line to represent the WHO’s recommended safe air quality level of 5 µg/m³. The chart made it really clear that every country in the dataset was above this threshold, which was surprising but also really important to highlight.

In [11]:
import plotly.express as px

fig = px.box(
    air_pollution_2019,
    y='Air Pollution',
    title='Air Pollution Levels Across Countries in 2019',
    labels={'Air Pollution': 'Air Pollution (µg/m³)'},
    color_discrete_sequence=['blue']
)
fig.add_hline(y=5, line_color="red", annotation_text="WHO Threshold ((5 µg/m³))", line_dash="dash")
fig.show()

Lastly, I wanted to see how air pollution looked across the world, so I created a choropleth map. This map used colors to show the average PM 2.5 levels for each country. I used the `"Plasma"` color scale, which made it easy to spot areas with higher pollution (in brighter colors) and lower pollution (in darker colors). This map made it really easy to see which regions were struggling with air quality and added a visual element to my analysis.

In [12]:
fig = px.choropleth(
    air_pollution_2019,
    locations="Country Name",
    locationmode="country names",
    color="Air Pollution",
    color_continuous_scale="Plasma",
    title="Air Pollution Levels by Country (µg/m³) in 2019",
    labels={"Air Pollution": "Air Pollution (µg/m³)"},
)

fig.show()

### Do lower-income countries tend to experience higher levels of air pollution compared to upper-middle-income countries? (Hypothesis 2)

Now, shifting the focus to the second hypothesis, I will analyze whether lower-income countries tend to experience higher levels of air pollution compared to upper-middle-income countries. First, I started by reading the GNI per capita data from the World Bank into the notebook. This dataset contains information about the GNI per capita of each country from 1960 to 2023. After loading it, I checked the dataset to make sure it was imported correctly and to see the column structure. Then, I used `.info()` to check the data types of each column. Thankfully, the GNI per capita values were already stored as floats, which meant I didn’t need to convert anything.

In [13]:
GNIpercapita = pd.read_csv("GNI per capita.csv")
GNIpercapita

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,Aruba,ABW,"GNI per capita, PPP (current international $)",NY.GNP.PCAP.PP.CD,,,,,,,...,35410.000000,35660.00000,35750.000000,36560.000000,37730.000000,38310.000000,29630.000000,37070.000000,43670.000000,
1,Africa Eastern and Southern,AFE,"GNI per capita, PPP (current international $)",NY.GNP.PCAP.PP.CD,,,,,,,...,3351.916096,3388.75599,3502.798136,3531.130759,3584.229869,3668.187802,3575.826645,3863.004386,4180.350280,4355.819919
2,Afghanistan,AFG,"GNI per capita, PPP (current international $)",NY.GNP.PCAP.PP.CD,,,,,,,...,2240.000000,2300.00000,2250.000000,2360.000000,2470.000000,2630.000000,2590.000000,2150.000000,2100.000000,
3,Africa Western and Central,AFW,"GNI per capita, PPP (current international $)",NY.GNP.PCAP.PP.CD,,,,,,,...,4022.993044,4014.43247,3979.757115,4015.580187,4140.938693,4398.642528,4421.156379,4613.383747,4979.901737,5239.787316
4,Angola,AGO,"GNI per capita, PPP (current international $)",NY.GNP.PCAP.PP.CD,,,,,,,...,7510.000000,6760.00000,6490.000000,6570.000000,6790.000000,6850.000000,5880.000000,6810.000000,7320.000000,7310.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
261,Kosovo,XKX,"GNI per capita, PPP (current international $)",NY.GNP.PCAP.PP.CD,,,,,,,...,8380.000000,8860.00000,9270.000000,9570.000000,10190.000000,11110.000000,10790.000000,12600.000000,14120.000000,15310.000000
262,"Yemen, Rep.",YEM,"GNI per capita, PPP (current international $)",NY.GNP.PCAP.PP.CD,,,,,,,...,,,,,,,,,,
263,South Africa,ZAF,"GNI per capita, PPP (current international $)",NY.GNP.PCAP.PP.CD,,,,,,,...,13230.000000,13290.00000,13370.000000,13590.000000,13260.000000,13360.000000,12850.000000,13900.000000,15010.000000,15630.000000
264,Zambia,ZMB,"GNI per capita, PPP (current international $)",NY.GNP.PCAP.PP.CD,,,,,,,...,3410.000000,3300.00000,3240.000000,3270.000000,3420.000000,3330.000000,3090.000000,3220.000000,3650.000000,3960.000000


In [14]:
GNIpercapita.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Data columns (total 68 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    266 non-null    object 
 1   Country Code    266 non-null    object 
 2   Indicator Name  266 non-null    object 
 3   Indicator Code  266 non-null    object 
 4   1960            0 non-null      float64
 5   1961            0 non-null      float64
 6   1962            0 non-null      float64
 7   1963            0 non-null      float64
 8   1964            0 non-null      float64
 9   1965            0 non-null      float64
 10  1966            0 non-null      float64
 11  1967            0 non-null      float64
 12  1968            0 non-null      float64
 13  1969            0 non-null      float64
 14  1970            0 non-null      float64
 15  1971            0 non-null      float64
 16  1972            0 non-null      float64
 17  1973            0 non-null      flo

Next, I narrowed the dataset to only include the columns I needed: `"Country Name"`, `"Country Code"`, and `"2019"`. I chose the 2019 values because they match the year of the air pollution dataset I analyzed earlier. This made the comparison between air pollution and GNI per capita more consistent.

In [15]:
GNIpercapita = GNIpercapita[["Country Name","Country Code","2019"]]
GNIpercapita

Unnamed: 0,Country Name,Country Code,2019
0,Aruba,ABW,38310.000000
1,Africa Eastern and Southern,AFE,3668.187802
2,Afghanistan,AFG,2630.000000
3,Africa Western and Central,AFW,4398.642528
4,Angola,AGO,6850.000000
...,...,...,...
261,Kosovo,XKX,11110.000000
262,"Yemen, Rep.",YEM,
263,South Africa,ZAF,13360.000000
264,Zambia,ZMB,3330.000000


After that, I classified countries into income groups based on the [World Bank](https://blogs.worldbank.org/en/opendata/world-bank-country-classifications-by-income-level-for-2024-2025)’s income classification. I used the income thresholds provided by the World Bank and created four categories: Low-income, Lower-middle-income, Upper-middle-income, and High-income. Using the `pd.cut()` function, I assigned each country to one of these categories based on its GNI per capita in 2019. This made it easier to compare air pollution levels across income groups.

![Image Description](https://s7d1.scene7.com/is/image/wbcollab/FY25-thresholds-v4?qlt=90&fmt=webp&resMode=sharp2)



In [16]:
bins = [0, 1145, 4515, 14005, float("inf")]
labels = ["Low-income", "Lower-middle-income", "Upper-middle-income", "High-income"]

GNIpercapita["Income Group"] = pd.cut(GNIpercapita["2019"], bins=bins, labels=labels, right=True)
GNIpercapita



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Country Name,Country Code,2019,Income Group
0,Aruba,ABW,38310.000000,High-income
1,Africa Eastern and Southern,AFE,3668.187802,Lower-middle-income
2,Afghanistan,AFG,2630.000000,Lower-middle-income
3,Africa Western and Central,AFW,4398.642528,Lower-middle-income
4,Angola,AGO,6850.000000,Upper-middle-income
...,...,...,...,...
261,Kosovo,XKX,11110.000000,Upper-middle-income
262,"Yemen, Rep.",YEM,,
263,South Africa,ZAF,13360.000000,Upper-middle-income
264,Zambia,ZMB,3330.000000,Lower-middle-income


Then, I merged the GDP per capita data with the air pollution data for 2019. By combining the two datasets, I could analyze how air pollution levels varied by income group. I rounded the air pollution and GDP per capita values to two decimal places to make the results easier to read.

In [17]:
merged_hip1 = pd.merge(GNIpercapita, air_pollution_2019, on="Country Name", how="inner")
merged_hip1["2019"].round(2)
merged_hip1

Unnamed: 0,Country Name,Country Code,2019,Income Group,Air Pollution
0,Afghanistan,AFG,2630.0,Lower-middle-income,62.49
1,Angola,AGO,6850.0,Upper-middle-income,27.16
2,Albania,ALB,14430.0,High-income,16.28
3,Andorra,AND,63180.0,High-income,8.52
4,United Arab Emirates,ARE,79230.0,High-income,41.75
...,...,...,...,...,...
163,Vanuatu,VUT,3640.0,Lower-middle-income,8.42
164,Samoa,WSM,6320.0,Upper-middle-income,7.78
165,South Africa,ZAF,13360.0,Upper-middle-income,19.75
166,Zambia,ZMB,3330.0,Lower-middle-income,16.90


Next, I calculated the average air pollution for each income group. This step helped me identify patterns and differences in air pollution levels across the four income groups. Again, I rounded the averages to two decimal places for simplicity.

In [18]:
avg_pollution_2019 = merged_hip1.groupby("Income Group")["Air Pollution"].mean().reset_index().round(2)
avg_pollution_2019

Unnamed: 0,Income Group,Air Pollution
0,Low-income,27.6
1,Lower-middle-income,28.72
2,Upper-middle-income,23.92
3,High-income,16.96


Finally, I created a bar chart using Plotly to visualize the results.

In [19]:
fig = px.bar(
    avg_pollution_2019,
    x="Income Group",
    y="Air Pollution",
    title="Average Air Pollution by Income Group in 2019",
    labels={"Air Pollution": "Air Pollution (µg/m³)", "Income Group": "Income Group"},
    color="Income Group",
    text="Air Pollution",
)

fig.show()

The chart shows the average air pollution (PM 2.5 levels) for each income group in 2019. From the chart, it’s clear that lower-middle-income countries had the highest average air pollution level (28.72 µg/m³), followed by low-income countries (27.6 µg/m³). Upper-middle-income countries (23.92 µg/m³) and high-income countries (16.96 µg/m³) had much lower air pollution levels. This pattern shows that as income levels increase, air pollution levels tend to decrease.

### Is the lower air pollution linked to the adoption of renewable energy sources? (Hypothesis 3)

Next, I wanted to find out if lower air pollution in high-income and upper-middle-income countries is related to the high adoption of renewable energy resources. To explore this, I started by reading the renewable energy data from OurWorldinData.com. This dataset includes columns for the country name (referred to as `"Entity"`), `Year` column, and the percentage of energy coming from renewable sources (`Renewables (% equivalent primary energy)`)

In [20]:
renew_energy = pd.read_csv("renewable-share-energy.csv")
renew_energy

Unnamed: 0,Entity,Code,Year,Renewables (% equivalent primary energy)
0,Africa,,1965,5.740281
1,Africa,,1966,6.113969
2,Africa,,1967,6.316580
3,Africa,,1968,6.994845
4,Africa,,1969,7.943916
...,...,...,...,...
4898,World,OWID_WRL,2019,12.228147
4899,World,OWID_WRL,2020,13.404395
4900,World,OWID_WRL,2021,13.469198
4901,World,OWID_WRL,2022,14.119935


After loading the data, I checked its structure using `.info()` to make sure all the data types were correct. `"Renewables (% equivalent primary energy)"` column was already in numeric form, so I didn’t need to convert it. However, the values had more than two decimal places, so I rounded them to two decimal points to make them easier to read.

In [21]:
renew_energy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4903 entries, 0 to 4902
Data columns (total 4 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   Entity                                    4903 non-null   object 
 1   Code                                      3553 non-null   object 
 2   Year                                      4903 non-null   int64  
 3   Renewables (% equivalent primary energy)  4903 non-null   float64
dtypes: float64(1), int64(1), object(2)
memory usage: 153.3+ KB


In [22]:
renew_energy["Renewables (% equivalent primary energy)"] = renew_energy["Renewables (% equivalent primary energy)"].round(2)
renew_energy

Unnamed: 0,Entity,Code,Year,Renewables (% equivalent primary energy)
0,Africa,,1965,5.74
1,Africa,,1966,6.11
2,Africa,,1967,6.32
3,Africa,,1968,6.99
4,Africa,,1969,7.94
...,...,...,...,...
4898,World,OWID_WRL,2019,12.23
4899,World,OWID_WRL,2020,13.40
4900,World,OWID_WRL,2021,13.47
4901,World,OWID_WRL,2022,14.12


I then narrowed down the dataset to only include data from 2019, since this matches the year of the air pollution data I worked with earlier. I also selected just the columns I needed: `"Entity"`, which contains the country names, and `"Renewables (% equivalent primary energy)"`, which represents the share of renewable energy each country uses.

In [23]:
renew_energy = renew_energy[renew_energy["Year"] == 2019][["Entity","Renewables (% equivalent primary energy)"]]
renew_energy

Unnamed: 0,Entity,Renewables (% equivalent primary energy)
54,Africa,8.36
113,Africa (EI),8.36
172,Algeria,0.29
231,Argentina,11.91
290,Asia,10.00
...,...,...
4678,Uzbekistan,3.15
4737,Venezuela,25.22
4780,Vietnam,15.80
4839,Western Africa (EI),8.16


Next, I merged the renewable energy data with the 2019 air pollution data, using the country names as the key to combine the datasets. This allowed me to compare air pollution levels with renewable energy adoption for each country. After merging, I created a scatter plot to visualize the relationship between air pollution (PM 2.5 levels) and renewable energy usage. To make the trend easier to see, I added a trendline to the chart.

In [24]:
merged_hip3 = pd.merge(air_pollution_2019, renew_energy, left_on="Country Name", right_on = "Entity", how="inner")
merged_hip3

Unnamed: 0,Country Name,Air Pollution,Entity,Renewables (% equivalent primary energy)
0,Finland,5.47,Finland,29.33
1,Sweden,5.96,Sweden,44.81
2,Norway,6.3,Norway,67.99
3,Estonia,6.35,Estonia,10.69
4,Canada,6.39,Canada,28.55
5,Portugal,7.34,Portugal,26.37
6,Ireland,8.2,Ireland,18.01
7,New Zealand,8.61,New Zealand,36.8
8,Luxembourg,8.89,Luxembourg,7.92
9,Australia,8.93,Australia,8.88


In [25]:
fig = px.scatter(
    merged_hip3,
    x = "Air Pollution",
    y = "Renewables (% equivalent primary energy)",
    hover_name="Country Name",
    title="Relationship between Air Pollution and Renewable Energy Usage in 2019",
    labels={"Air Pollution":"Air Pollution (µg/m³)","Renewables (% equivalent primary energy)":"Renewable Energy (%))"},
    trendline="ols"
)

fig.show()

The scatter plot revealed a *negative correlation* between the two variables. This means that countries with higher adoption of renewable energy sources tend to have lower levels of air pollution.

### Analysis and Conclusion

This project gave me a better understanding of air pollution and how it connects to income levels and renewable energy use around the world. The first thing I noticed was that almost every country doesn’t meet the WHO’s air quality threshold for PM 2.5, which is set at an annual level of 5 µg/m³. **This finding confirmed my first hypothesis that the majority of countries around the world have not yet achieved the safe air pollution level recommended by the WHO**. It highlights the global challenge of addressing air pollution and the urgent need for stronger environmental policies and international collaboration.

Next, I looked at how air pollution levels differ between countries with different income levels. Lower-middle-income countries had the highest average PM 2.5 levels (28.72 µg/m³), followed by low-income countries (27.6 µg/m³). On the other hand, upper-middle-income countries (23.92 µg/m³) and high-income countries (16.96 µg/m³) had much lower pollution levels. **This confirmed my second hypothesis that lower-income countries tend to have higher pollution levels than wealthier ones**. It may suggest that wealthier countries have better systems, policies, and resources to control air pollution, while lower-income countries face more challenges in doing so.

I also found that renewable energy plays a big role in reducing air pollution. The scatter plot showed a negative correlation, meaning that countries using more renewable energy tend to have lower PM 2.5 levels. **This confirmed my third hypothesis that higher renewable energy use is linked to lower air pollution levels**. It demonstrates the importance of renewable energy in improving air quality and reducing harmful emissions.

That said, my analysis had some limitations. For example, I only used data from 2019 because that was the most recent year available. This meant I couldn’t look at trends over time, which could give a clearer picture of how air pollution changes. Also, I focused only on PM 2.5 levels, even though other pollutants also contribute to poor air quality.

Overall, this project helped me see how air pollution connects to income levels and renewable energy. It showed that wealthier countries and those with more renewable energy use tend to have better air quality. At the same time, it highlights the need to support low-income countries in adopting cleaner energy solutions to improve their air quality.

### URL