
#  Hypothesis: 
Countries with diverse portfolios of renewable energy sources, incorporating multiple technologies like solar, wind, and hydroelectric power, demonstrate lower carbon intensity (carbon emissions per unit of energy produced) compared to countries relying predominantly on a single renewable energy source.

Analysis plan:

Step 1: Prepare data
Get Energy Generation, Capacity and Generation intensity of each techonolgy in each year
Calculate growth rate of each technology during each year

Step 2: Using t-test or ANOVA to confirm/reject the hypothesis 

Step 3: Visualisation

Visualized data in Tableau



# Import necessary library

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 
import numpy as np
from scipy.stats import pearsonr
from scipy import stats
%matplotlib inline
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import statsmodels.api as sm
import statsmodels.formula.api as smf
import config as cf
import sql_functions as sf

# Discover trend

## generation intensity


In [2]:
# Get generation intensity data frame from Postgresql
schema= "capstone_renewable_energy"
table_name = "generation_intensity"
sql_query = f'SELECT * FROM {schema}.{table_name};'
gen_intensity_total = sf.get_dataframe(sql_query)
gen_intensity_total['year'] = gen_intensity_total['year'].dt.year

In [3]:
# add the growth rate column
gen_intensity_total["gen_growth_rate"] = 1
years = gen_intensity_total.year.unique().tolist()
types_list = gen_intensity_total.type.unique().tolist()
for i in range(1, len(years)):
    for j in types_list: 
        current_gen = gen_intensity_total.loc[(gen_intensity_total["year"] == years[i])\
                                              & (gen_intensity_total["type"] == j), "electricity_generation_gwh"].values[0]

        previous_gen = gen_intensity_total.loc[(gen_intensity_total["year"] == years[i-1])\
                                              & (gen_intensity_total["type"] == j), "electricity_generation_gwh"].values[0]
        gen_intensity_total.loc[(gen_intensity_total["year"] == years[i])\
                                              & (gen_intensity_total["type"] == j), "gen_growth_rate"] = (current_gen-previous_gen)/previous_gen
gen_intensity_total

Unnamed: 0,year,type,electricity_generation_gwh,electricity_capacity_mw,generation_intensity,gen_growth_rate
0,2000,Biofuels,128118.622,24765.240,5.173324,1.000000
1,2000,Geothermal,52571.035,8272.700,6.354761,1.000000
2,2000,Hydropower,2614270.604,697169.998,3.749832,1.000000
3,2000,Others,18438.738,3803.130,4.848306,1.000000
4,2000,Solar,1330.912,1224.691,1.086733,1.000000
...,...,...,...,...,...,...
156,2022,Hydropower,1423853.885,1260882.706,1.129252,-0.667087
157,2022,Others,53696.999,20916.082,2.567259,-0.378979
158,2022,Solar,442904.326,1073135.531,0.412720,-0.571664
159,2022,Total Renewable,2815483.910,3396323.388,0.828980,-0.641616


In [134]:
# Inferential Statistics - Hypothesis Testing
# Define function to calculate growth rate
def calculate_growth_rate(values):
    return (values.iloc[-1] / values.iloc[0]) ** (1 / len(values)) - 1

# Calculate growth rates for each energy type
growth_rates = gen_intensity.groupby('type').agg({'electricity_generation_gwh': calculate_growth_rate,
                                                   'electricity_capacity_mw': calculate_growth_rate})

# Separate solar energy growth rates
solar_growth_rates = growth_rates.loc['Solar']

# Perform t-test to compare solar growth rates with other energy types
for energy_type, growth_rate in growth_rates.iterrows():
    if energy_type != 'Solar':
        _, p_value = stats.ttest_ind(solar_growth_rates, growth_rate)
        print(f"T-test p-value for solar vs {energy_type}: {p_value}")

# Perform ANOVA test to compare growth rates of all energy types (using Annova instead of t-test since we are comparing solar energy with more than 1 other variables)
_, p_value_anova = stats.f_oneway(*[growth_rates.loc[energy_type] for energy_type in growth_rates.index])
print(f"\nANOVA p-value for comparing growth rates of all energy types: {p_value_anova}")

T-test p-value for solar vs Biofuels: 0.0032213179027520537
T-test p-value for solar vs Geothermal: 5.3143016007552044e-05
T-test p-value for solar vs Hydropower: 0.0028926080186438374
T-test p-value for solar vs Others: nan
T-test p-value for solar vs Total Renewable: 0.0013649885094761019
T-test p-value for solar vs Wind: 0.011740167967093786

ANOVA p-value for comparing growth rates of all energy types: nan


  return (values.iloc[-1] / values.iloc[0]) ** (1 / len(values)) - 1
  alldata -= offset
  ssbn += _square_of_sums(sample - offset,


In [133]:
# Calculating p value for wind:
# Separate solar energy growth rates
solar_growth_rates = growth_rates.loc['Wind']

# Perform t-test to compare solar growth rates with other energy types
for energy_type, growth_rate in growth_rates.iterrows():
    if energy_type != 'Wind':
        _, p_value = stats.ttest_ind(solar_growth_rates, growth_rate)
        print(f"T-test p-value for solar vs {energy_type}: {p_value}")

# Perform ANOVA test to compare growth rates of all energy types (using Annova instead of t-test since we are comparing solar energy with more than 1 other variables)
_, p_value_anova = stats.f_oneway(*[growth_rates.loc[energy_type] for energy_type in growth_rates.index])
print(f"\nANOVA p-value for comparing growth rates of all energy types: {p_value_anova}")

T-test p-value for solar vs Biofuels: 0.0029024084009149752
T-test p-value for solar vs Geothermal: 0.0002826520209579417
T-test p-value for solar vs Hydropower: 0.0028860482151233462
T-test p-value for solar vs Others: nan
T-test p-value for solar vs Solar: 0.011740167967093759
T-test p-value for solar vs Total Renewable: 0.0022362665492986957

ANOVA p-value for comparing growth rates of all energy types: nan


  alldata -= offset
  ssbn += _square_of_sums(sample - offset,


based on the statistical analysis and the p-values obtained from the t-tests, we can conclude that solar energy is growing faster compared to several other renewable energy types. Here's why:

Statistical Significance: The p-values for the t-tests comparing solar energy growth rates with other renewable energy types are below the significance level of 0.05 for most comparisons. This indicates that the observed differences in growth rates are unlikely to have occurred by chance alone.

Lower P-values: The lower the p-value, the stronger the evidence against the null hypothesis (that there is no difference in growth rates). In this case, the p-values for solar energy vs. biofuels, geothermal, hydropower, total renewable energy, and wind energy are all below 0.05, indicating strong evidence of a difference in growth rates.

Consistency: The consistently low p-values across multiple comparisons suggest that solar energy is indeed growing faster compared to various other renewable energy types.

Context: While statistical significance is important, it's also essential to consider the practical significance and the context of the analysis. Solar energy's growth rate being statistically significant compared to other energy types indicates that it's a notable trend worth considering in discussions and decision-making regarding renewable energy investments and policies.

Based on these points, we can reasonably conclude that solar energy is growing faster compared to several other renewable energy types, as supported by the statistical analysis.

Given these p-values, we could  chosen a significance level of α = 0.05, you would accept the hypothesis (i.e., conclude that solar energy is growing faster) for all comparisons


In [132]:
df = gen_intensity_total
generation_growth_rates = {}
for energy_type in df.type.unique().tolist():
    generation_2021 = float(df.loc[(df['year'] == 2021) & (df['type'] == energy_type), 'electricity_generation_gwh'].values[0])
    generation_2010 = float(df.loc[(df['year'] == 2010) & (df['type'] == energy_type), 'electricity_generation_gwh'].values[0])
    
    # Calculate the growth rate
    type_growth_rate = ((generation_2021 - generation_2010) / generation_2010) * 100
    print(f'The generation growth rate of {energy_type} is {type_growth_rate:.2f}%')

The generation growth rate of Biofuels is 88.35%
The generation growth rate of Geothermal is 39.18%
The generation growth rate of Hydropower is 24.41%
The generation growth rate of Others is 119.51%
The generation growth rate of Solar is 2959.16%
The generation growth rate of Total Renewable is 86.99%
The generation growth rate of Wind is 436.16%
