# FP-5 Main Report

**Topic:** Income, emissions and the composition of electricity generation  
**Data:** Our World in Data â€“ Energy; World Bank income classification (FY23)

This notebook summarises two main findings from the project:

1. The relationship between greenhouse gas (GHG) emissions and fossil-fuel electricity share across income groups.
2. Differences in renewable electricity share across World Bank income groups.


In [1]:
import parse_data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.style.use('seaborn-v0_8')


finaldata = parse_data.load_data()
finaldata.head()

Unnamed: 0,Country,Year,ISO,Population,Electricity demand,GHG emissions,FF electricity share,RE electricity share,Income Group FY23
1,Albania,2021,ALB,2849591.0,8.39,0.21,0.0,100.0,UM
2,Algeria,2021,DZA,44761051.0,84.45,54.23,99.054,0.946,LM
3,American Samoa,2021,ASM,49202.0,0.17,0.11,100.0,0.0,UM
4,Angola,2021,AGO,34532382.0,16.85,2.91,24.57,75.43,LM
5,Antigua and Barbuda,2021,ATG,92316.0,0.35,0.22,94.286,5.714,H


## Result 1: GHG emissions and fossil-fuel electricity share

In [None]:
df = finaldata[['FF electricity share', 'GHG emissions', 'Income Group FY23']].dropna()

# Income clusters
low_group  = ['L', 'LM']
high_group = ['UM', 'H']

df['income_cluster'] = df['Income Group FY23'].apply(
    lambda x: 'Low/LM' if x in low_group else 'UM/H'
)

# log-transform GHG
df['GHG_log'] = np.log10(df['GHG emissions'] + 1)

from scipy import stats

df_low  = df[df['income_cluster'] == 'Low/LM']
df_high = df[df['income_cluster'] == 'UM/H']

r_low,  p_low  = stats.pearsonr(df_low['GHG_log'],  df_low['FF electricity share'])
r_high, p_high = stats.pearsonr(df_high['GHG_log'], df_high['FF electricity share'])

print("Correlation (Low/LM): ",  round(r_low, 3),  "p =", round(p_low, 3))
print("Correlation (UM/H): ",   round(r_high, 3), "p =", round(p_high, 3))

plt.figure(figsize=(9,5))
plt.scatter(df_low['GHG_log'],  df_low['FF electricity share'],
            alpha=0.6, label='Low/LM', color='tab:green')
plt.scatter(df_high['GHG_log'], df_high['FF electricity share'],
            alpha=0.6, label='UM/H', color='tab:blue')

plt.xlabel('log10(GHG emissions + 1)')
plt.ylabel('Fossil fuel electricity share (%)')
plt.title('Relationship Between log10(GHG) and Fossil Electricity Share')
plt.legend()
plt.tight_layout()
plt.show()

**Interpretation:**  
Correlations were weak in both income clusters, indicating that total GHG emissions do not strongly predict how fossil-dependent a country's electricity system is. This suggests that electricity mix is shaped more by geography, policy, and historical infrastructure than by emissions volume alone.

## Result 2: Renewable electricity share across income groups

In [None]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd

df2 = finaldata[['RE electricity share', 'Income Group FY23']].dropna()

# ANOVA
model = ols('Q("RE electricity share") ~ C(Q("Income Group FY23"))', data=df2).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
display(anova_table)

# Tukey test
tukey = pairwise_tukeyhsd(
    endog=df2['RE electricity share'],
    groups=df2['Income Group FY23'],
    alpha=0.05
)
print(tukey)

# Boxplot
plt.figure(figsize=(8,6))
df2.boxplot(column='RE electricity share', by='Income Group FY23', grid=False)
plt.title("Renewable Electricity Share Across Income Groups")
plt.suptitle("")
plt.xlabel("Income Group")
plt.ylabel("Renewable Electricity Share (%)")
plt.show()

**Interpretation:**  
The ANOVA test was not significant (*p* = 0.108), and Tukey post-hoc tests found no significant pairwise differences. Renewable electricity shares vary widely within all income groups, suggesting that income level alone does not explain renewable electricity deployment. Geographic, historical and political factors likely play a stronger role.

## Conclusion

This project examined how income level, emissions, and electricity composition interact across countries. The results show that:

1. The relationship between GHG emissions and fossil-fuel electricity share is weak in both low- and high-income groups.
2. Renewable electricity shares do not significantly differ between income groups.

Overall, the findings indicate that national energy systems are shaped less by income level and more by structural factors such as geography, natural resource availability, and long-term policy trajectories.