# UK Renewable Energy Consumption

This dataset shows how the UK's renewable energy sector has grown over the 30 years from 1990 to 2020.

I'm going to do analysis based on this dataset and hopefully make some predictions as well.

## A. Load Packages

In [26]:
import numpy as np
import pandas as pd
import plotly.express as px

import plotly.graph_objects as go
from plotly.subplots import make_subplots

## B. Load dataset

In [27]:
df = pd.read_csv('Datasets/uk_renewable_energy.csv',sep=',')

## C. Data exploration

In [28]:
df

Unnamed: 0,Year,Energy from renewable & waste sources,Total energy consumption of primary fuels and equivalents,Fraction from renewable sources and waste,Hydroelectric power,"Wind, wave, tidal",Solar photovoltaic,Geothermal aquifers,Landfill gas,Sewage gas,...,Municipal solid waste (MSW),Poultry litter,Straw,Wood,Charcoal,Liquid bio-fuels,Bioethanol,Biodiesel,Biomass,Cross-boundary Adjustment
0,1990,1.647,225.532,0.007,0.448,0.001,0.0,0.001,0.08,0.138,...,0.183,0.0,0.007,0.687,0.039,0.0,0.0,0.0,0.065,0.0
1,1991,1.634,231.288,0.007,0.398,0.001,0.0,0.001,0.105,0.151,...,0.185,0.0,0.007,0.687,0.036,0.0,0.0,0.0,0.065,0.0
2,1992,1.843,228.696,0.008,0.467,0.003,0.0,0.001,0.155,0.151,...,0.21,0.016,0.007,0.736,0.033,0.0,0.0,0.0,0.065,0.0
3,1993,1.862,231.368,0.008,0.37,0.019,0.0,0.001,0.162,0.158,...,0.268,0.043,0.007,0.736,0.034,0.0,0.0,0.0,0.065,0.0
4,1994,2.528,230.739,0.011,0.438,0.03,0.0,0.001,0.188,0.17,...,0.385,0.101,0.007,1.108,0.034,0.0,0.0,0.0,0.065,0.0
5,1995,2.644,230.886,0.011,0.416,0.034,0.0,0.001,0.199,0.193,...,0.411,0.101,0.007,1.182,0.035,0.0,0.0,0.0,0.065,0.0
6,1996,2.581,243.392,0.011,0.292,0.042,0.0,0.001,0.249,0.193,...,0.396,0.101,0.007,1.194,0.041,0.0,0.0,0.0,0.065,0.0
7,1997,2.611,240.756,0.011,0.378,0.057,0.0,0.001,0.317,0.192,...,0.465,0.101,0.007,0.991,0.038,0.0,0.0,0.0,0.065,0.0
8,1998,3.013,246.79,0.012,0.44,0.075,0.0,0.001,0.402,0.181,...,0.649,0.112,0.007,1.077,0.04,0.0,0.0,0.0,0.029,0.0
9,1999,3.148,246.112,0.013,0.459,0.073,0.0,0.001,0.572,0.189,...,0.633,0.148,0.007,0.959,0.033,0.0,0.0,0.0,0.074,0.0


This dataset has 31 rows for each year from year 1990 to 2020 and has 21 different columns. Based on small no. of rows, I'm not sure whether I can predict or not later in this project.

In [29]:
df.head()

Unnamed: 0,Year,Energy from renewable & waste sources,Total energy consumption of primary fuels and equivalents,Fraction from renewable sources and waste,Hydroelectric power,"Wind, wave, tidal",Solar photovoltaic,Geothermal aquifers,Landfill gas,Sewage gas,...,Municipal solid waste (MSW),Poultry litter,Straw,Wood,Charcoal,Liquid bio-fuels,Bioethanol,Biodiesel,Biomass,Cross-boundary Adjustment
0,1990,1.647,225.532,0.007,0.448,0.001,0.0,0.001,0.08,0.138,...,0.183,0.0,0.007,0.687,0.039,0.0,0.0,0.0,0.065,0.0
1,1991,1.634,231.288,0.007,0.398,0.001,0.0,0.001,0.105,0.151,...,0.185,0.0,0.007,0.687,0.036,0.0,0.0,0.0,0.065,0.0
2,1992,1.843,228.696,0.008,0.467,0.003,0.0,0.001,0.155,0.151,...,0.21,0.016,0.007,0.736,0.033,0.0,0.0,0.0,0.065,0.0
3,1993,1.862,231.368,0.008,0.37,0.019,0.0,0.001,0.162,0.158,...,0.268,0.043,0.007,0.736,0.034,0.0,0.0,0.0,0.065,0.0
4,1994,2.528,230.739,0.011,0.438,0.03,0.0,0.001,0.188,0.17,...,0.385,0.101,0.007,1.108,0.034,0.0,0.0,0.0,0.065,0.0


In [30]:
df.keys()

Index(['Year', 'Energy from renewable & waste sources',
       'Total energy consumption of primary fuels and equivalents',
       'Fraction from renewable sources and waste', 'Hydroelectric power',
       'Wind, wave, tidal', 'Solar photovoltaic', 'Geothermal aquifers',
       'Landfill gas', 'Sewage gas', 'Biogas from autogen',
       'Municipal solid waste (MSW)', 'Poultry litter', 'Straw', 'Wood',
       'Charcoal', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', 'Biomass',
       'Cross-boundary Adjustment'],
      dtype='object')

Converting year column to datetime

In [31]:
df['Year'] = pd.to_datetime(df['Year'], format='%Y')

These are key columns of dataset. The columns 'Hydroelectric power','Wind, wave, tidal', 'Solar photovoltaic','Geothermal aquifers','Landfill gas', 'Sewage gas', 'Biogas from autogen','Municipal solid waste (MSW)', 'Poultry litter', 'Straw', 'Wood','Charcoal', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', 'Biomass','Cross-boundary Adjustment' are different renewable and waste sources of energy and their values are the energy consumption for that given year. 'Energy from renewable & waste sources' is total of above columns, 'Total energy consumption of primary fuels and equivalents' is total energy consumption from both renewable and non renewable sources and 'Fraction from renewable sources and waste' is fraction for these two columns.

#### Confirming 
if sum of columns ('Hydroelectric power','Wind, wave, tidal', 'Solar photovoltaic', 'Geothermal aquifers','Landfill gas', 'Sewage gas', 'Biogas from autogen','Municipal solid waste (MSW)', 'Poultry litter', 'Straw', 'Wood','Charcoal', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', 'Biomass','Cross-boundary Adjustment') and column 'Energy from renewable & waste sources' has same value

In [32]:
df['sum'] = df[['Hydroelectric power','Wind, wave, tidal', 'Solar photovoltaic', 'Geothermal aquifers',
       'Landfill gas', 'Sewage gas', 'Biogas from autogen',
       'Municipal solid waste (MSW)', 'Poultry litter', 'Straw', 'Wood',
       'Charcoal', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', 'Biomass',
       'Cross-boundary Adjustment']].sum(axis=1)

In [33]:
df[['Energy from renewable & waste sources','sum']]

Unnamed: 0,Energy from renewable & waste sources,sum
0,1.647,1.649
1,1.634,1.636
2,1.843,1.844
3,1.862,1.863
4,2.528,2.527
5,2.644,2.644
6,2.581,2.581
7,2.611,2.612
8,3.013,3.013
9,3.148,3.148


yep they are!!!

In [34]:
df.describe()

Unnamed: 0,Year,Energy from renewable & waste sources,Total energy consumption of primary fuels and equivalents,Fraction from renewable sources and waste,Hydroelectric power,"Wind, wave, tidal",Solar photovoltaic,Geothermal aquifers,Landfill gas,Sewage gas,...,Poultry litter,Straw,Wood,Charcoal,Liquid bio-fuels,Bioethanol,Biodiesel,Biomass,Cross-boundary Adjustment,sum
count,31,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,...,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0
mean,2004-12-31 14:42:34.838709632,7.812,226.586161,0.037516,0.431323,1.299968,0.209581,0.001,1.023968,0.247387,...,0.153613,0.092194,1.844097,0.048226,0.020419,0.169097,0.353065,0.594129,0.005742,7.812387
min,1990-01-01 00:00:00,1.634,169.439,0.007,0.278,0.001,0.0,0.001,0.08,0.138,...,0.0,0.007,0.687,0.031,0.0,0.0,0.0,0.029,0.0,1.636
25%,1997-07-02 12:00:00,2.8285,213.8765,0.0115,0.399,0.065,0.0,0.001,0.3595,0.172,...,0.1065,0.007,0.772,0.0345,0.0,0.0,0.0,0.065,0.0,2.8285
50%,2005-01-01 00:00:00,5.026,230.886,0.02,0.438,0.25,0.001,0.001,1.202,0.193,...,0.176,0.076,1.108,0.038,0.0,0.048,0.027,0.386,0.001,5.027
75%,2012-07-02 00:00:00,10.3805,244.2685,0.0485,0.4635,2.0745,0.1445,0.001,1.555,0.319,...,0.2075,0.0985,2.1645,0.0515,0.015,0.405,0.6795,0.6655,0.0125,10.382
max,2020-01-01 00:00:00,24.472,252.807,0.144,0.58,6.481,1.131,0.001,1.758,0.44,...,0.227,0.338,5.478,0.107,0.202,0.492,1.335,2.469,0.018,24.474
std,,6.833277,20.912081,0.03743,0.067324,1.867693,0.392818,6.612744999999999e-19,0.608756,0.095966,...,0.067326,0.098076,1.545942,0.020654,0.047775,0.204388,0.428339,0.695856,0.006552,6.833289


### Conclusion from above
1. The average energy from renewable sources is 7.812 and total is energy consumption is 226.586 i.e. energy consumption from renewable sources is only 3.75% of total energy consumption which is very small fraction of overall consumption. This means there is still need further investement in renewable energy production to significantly shift the energy mix toward sustainability.
2. Going through min and max for same, we can see that fraction changes from 0.7 % to 14.4%, which shows there is sifnificant increase of renewable energy sources within 30 years time span.

In [35]:
df.describe().iloc[:, -18:]

Unnamed: 0,Hydroelectric power,"Wind, wave, tidal",Solar photovoltaic,Geothermal aquifers,Landfill gas,Sewage gas,Biogas from autogen,Municipal solid waste (MSW),Poultry litter,Straw,Wood,Charcoal,Liquid bio-fuels,Bioethanol,Biodiesel,Biomass,Cross-boundary Adjustment,sum
count,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0
mean,0.431323,1.299968,0.209581,0.001,1.023968,0.247387,0.200161,1.118419,0.153613,0.092194,1.844097,0.048226,0.020419,0.169097,0.353065,0.594129,0.005742,7.812387
min,0.278,0.001,0.0,0.001,0.08,0.138,0.0,0.183,0.0,0.007,0.687,0.031,0.0,0.0,0.0,0.029,0.0,1.636
25%,0.399,0.065,0.0,0.001,0.3595,0.172,0.0,0.549,0.1065,0.007,0.772,0.0345,0.0,0.0,0.0,0.065,0.0,2.8285
50%,0.438,0.25,0.001,0.001,1.202,0.193,0.005,0.816,0.176,0.076,1.108,0.038,0.0,0.048,0.027,0.386,0.001,5.027
75%,0.4635,2.0745,0.1445,0.001,1.555,0.319,0.2145,1.221,0.2075,0.0985,2.1645,0.0515,0.015,0.405,0.6795,0.6655,0.0125,10.382
max,0.58,6.481,1.131,0.001,1.758,0.44,1.021,3.367,0.227,0.338,5.478,0.107,0.202,0.492,1.335,2.469,0.018,24.474
std,0.067324,1.867693,0.392818,6.612744999999999e-19,0.608756,0.095966,0.353263,0.896538,0.067326,0.098076,1.545942,0.020654,0.047775,0.204388,0.428339,0.695856,0.006552,6.833289


### Conclusion from above
Out of 17 renewable sources, average energy consumption from wood is higher than any other sources.

In [36]:
df.describe().iloc[[3, 7], -18:]


Unnamed: 0,Hydroelectric power,"Wind, wave, tidal",Solar photovoltaic,Geothermal aquifers,Landfill gas,Sewage gas,Biogas from autogen,Municipal solid waste (MSW),Poultry litter,Straw,Wood,Charcoal,Liquid bio-fuels,Bioethanol,Biodiesel,Biomass,Cross-boundary Adjustment,sum
25%,0.399,0.065,0.0,0.001,0.3595,0.172,0.0,0.549,0.1065,0.007,0.772,0.0345,0.0,0.0,0.0,0.065,0.0,2.8285
std,0.067324,1.867693,0.392818,6.612744999999999e-19,0.608756,0.095966,0.353263,0.896538,0.067326,0.098076,1.545942,0.020654,0.047775,0.204388,0.428339,0.695856,0.006552,6.833289


### Conclusion from above
1. Hydropower is relatively stable (0.278 to 0.580)
2. Wind, wave, tidal shows high variability (0.001 to 6.481)
3. Solar PV grows over time (0 to 1.131)
4. Geothermal aquifers, Charcoal, Liquid biofuels, cross-boundary adjustment shows steady usage

## D. Data Visualization

### 1. Line chart for energy consumption of overall renewable resources in each year

In [37]:
fig = px.line(df, x='Year', y=['Energy from renewable & waste sources'], 
              title='Total Energy Consumption from Renewable & Waste Sources per Year')

# Show the plot
fig.show()

This time series chart is for analysing trend of overall renewable energy consumption from which we can see that:
1. Steady growth in renewable energy consumption, with slow increases in the 1990s and faster acceleration after 2005.
2. Rapid growth after 2010, which may be driven by cheaper technologies, supportive policies, and climate targets.
3. This growth reflects the UK’s push towards net-zero goals and reducing reliance on fossil fuels.
4. Future growth will depend on continued policy support, technology advances, and grid integration.

### 2. Line chart for Total energy consumption of primary fuels and equivalents in each year

In [38]:
fig = px.line(df, x='Year', y=['Total energy consumption of primary fuels and equivalents'], 
              title='Total Energy Consumption of Primary Fuels and Equivalents per Year')

# Show the plot
fig.show()

This is time series analysis for total energy consumption of primary fuels and equivalents per year just to analyse whether or not it decreases as overall renewable energy consumption increases and it is found that:
1. Overall declining trend in total energy consumption after the mid-2000s.
2. Consumption peaked around the early 2000s, followed by a steady decrease — particularly sharp after 2010.
3. This decline reflects improved energy efficiency, reduced reliance on fossil fuels, deindustrialization, and the shift towards renewables.
4. The drop around 2020 could also be partially due to the impact of the COVID-19 pandemic, which significantly reduced industrial and transportation energy demand.

### 3. Line chart for Fraction from renewable sources and waste in each year

In [39]:
fig = px.line(df, x='Year', y=['Fraction from renewable sources and waste'], 
              title='Fraction from renewable sources and waste per Year')

# Show the plot
fig.show()

### 4. Line chart for energy consumption of each renewable resources in each year

In [40]:
fig = px.line(df, x='Year', y=['Hydroelectric power',
       'Wind, wave, tidal', 'Solar photovoltaic', 'Geothermal aquifers',
       'Landfill gas', 'Sewage gas', 'Biogas from autogen',
       'Municipal solid waste (MSW)', 'Poultry litter', 'Straw', 'Wood',
       'Charcoal', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', 'Biomass',
       'Cross-boundary Adjustment'], 
              title='Energy Consumption per Year',
              labels={'variable':'Renewable Resources'})

# Show the plot
fig.show()

This is time series analysis for energy consumption for different renewable resources per year to analyse their pattern and it is found that:
1. Wind, wave, tidal has increased significantly and exceeded all renewable 
2. Geothermal aquifers is completely steady, no progress
![image-3.png](attachment:image-3.png)
3. Charcoal and cross-boundary adjustment seems to show steady usage in compare to others.
![image-2.png](attachment:image-2.png)
4. Landfill gas, Liquid bio-fuels and bioethanol usages seems to increase at some point and then reduced by 2020.
![image.png](attachment:image.png)

### 5. Correlations Between Sources

In [41]:
correlation_matrix = df[[ 'Hydroelectric power',
       'Wind, wave, tidal', 'Solar photovoltaic', 'Geothermal aquifers',
       'Landfill gas', 'Sewage gas', 'Biogas from autogen',
       'Municipal solid waste (MSW)', 'Poultry litter', 'Straw', 'Wood',
       'Charcoal', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', 'Biomass',
       'Cross-boundary Adjustment']].corr()

print(correlation_matrix)

                             Hydroelectric power  Wind, wave, tidal  \
Hydroelectric power                     1.000000           0.652471   
Wind, wave, tidal                       0.652471           1.000000   
Solar photovoltaic                      0.610064           0.957115   
Geothermal aquifers                          NaN                NaN   
Landfill gas                            0.269568           0.467771   
Sewage gas                              0.602002           0.950693   
Biogas from autogen                     0.610908           0.971342   
Municipal solid waste (MSW)             0.594224           0.970053   
Poultry litter                          0.308015           0.621670   
Straw                                   0.548945           0.953652   
Wood                                    0.638266           0.983604   
Charcoal                                0.496676           0.858197   
Liquid bio-fuels                       -0.090611          -0.136490   
Bioeth

From above correlation data it can be assumed that:
1. Renewable energy sources like 'Wind, wave, tidal', 'Solar photovoltaic', 'Sewage gas', 'Biogas from autogen', 'Municipal solid waste (MSW)', 'Straw', 'Wood' and 'Biomass' tend to have strong positive correlations (above 0.95), suggesting they often coexist or are developed together.
2. 'Liquid bio-fuels' show a distinct negative correlation pattern, likely because biofuels follow a separate policy track (focused on transport fuels rather than electricity).
3. No reliable correlation could be calculated, probably due to incomplete data for 'Geothermal aquifers'.
4. The Cross-boundary adjustment has a very strong positive correlation with 'Biodiesel' (0.98) and a strong correlation with 'Bioethanol' (0.86)

### 6. Scatter Plot for Correlations Between selected Sources

In [42]:

fig = make_subplots(rows=3, cols=2, subplot_titles=[
    "Wind vs Solar ",
    "Biomass vs MSW ",
    "Liquid biofuels vs Solar ",
    "Hydropower vs Wind ",
    "Cross-boundary vs Biodiesel "
])

fig.add_trace(go.Scatter(x=df['Wind, wave, tidal'], 
                         y=df['Solar photovoltaic'], 
                         mode='markers',
                         name="Wind vs Solar"), 
              row=1, col=1)

fig.add_trace(go.Scatter(x=df['Biomass'], 
                         y=df['Municipal solid waste (MSW)'], 
                         mode='markers',
                         name="Biomass vs MSW"), 
              row=1, col=2)

fig.add_trace(go.Scatter(x=df['Liquid bio-fuels'], 
                         y=df['Solar photovoltaic'], 
                         mode='markers',
                         name="Liquid biofuels vs Solar"), 
              row=2, col=1)

fig.add_trace(go.Scatter(x=df['Hydroelectric power'], 
                         y=df['Wind, wave, tidal'], 
                         mode='markers',
                         name="Hydropower vs Wind"), 
              row=2, col=2)

fig.add_trace(go.Scatter(x=df['Cross-boundary Adjustment'], 
                         y=df['Biodiesel'], 
                         mode='markers',
                         name="Cross-boundary vs Biodiesel"), 
              row=3, col=1)

fig.update_layout(height=900, width=1000, title_text="Correlation Scatter Plots")

fig.show()

Above scatter plots are based on correlation between different renewal resources, from which we can conclude that:
1. In 'Wind vs Solar' and 'Biomass vs MSW', the points generally form an upward trend, indicating a positive correlation. 
2. In 'Liquid Biofuels vs Solar', the points are scattered with no clear trend, suggesting a very weak or no correlation.
3. In 'Hydropower vs Wind', there appears to be a moderate positive correlation, but it's less strong than the Wind vs Solar or Biomass vs MSW plots.
4. In 'Cross-boundary vs Biodiesel', the points are very tightly grouped along a line showing a very strong positive correlation.

### 7. Growth Rates of different renewable sources

In [43]:
for source in ['Hydroelectric power',
       'Wind, wave, tidal', 'Solar photovoltaic', 'Geothermal aquifers',
       'Landfill gas', 'Sewage gas', 'Biogas from autogen',
       'Municipal solid waste (MSW)', 'Poultry litter', 'Straw', 'Wood',
       'Charcoal', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', 'Biomass',
       'Cross-boundary Adjustment']:  # Add all sources
    df[f"{source}_growth_rate"] = df[source].pct_change() * 100

growth_rates_df = df[["Year"] + [f"{source}_growth_rate" for source in ['Hydroelectric power',
       'Wind, wave, tidal', 'Solar photovoltaic', 'Geothermal aquifers',
       'Landfill gas', 'Sewage gas', 'Biogas from autogen',
       'Municipal solid waste (MSW)', 'Poultry litter', 'Straw', 'Wood',
       'Charcoal', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', 'Biomass',
       'Cross-boundary Adjustment']]]



In [44]:
fig = px.line(df, x='Year', y=[f"{source}_growth_rate" for source in ['Hydroelectric power',
       'Wind, wave, tidal', 'Solar photovoltaic', 'Geothermal aquifers',
       'Landfill gas', 'Sewage gas', 'Biogas from autogen',
       'Municipal solid waste (MSW)', 'Poultry litter', 'Straw', 'Wood',
       'Charcoal', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', 'Biomass',
       'Cross-boundary Adjustment']], 
              title='Energy Consumption Growth Rate per Year',
              labels={'variable':'Renewable Resources'})

# Show the plot
fig.show()

From the growth rate analysis above:
1. Some sources exhibit sharp spikes in certain years, suggesting sudden expansions or changes in reporting/measurement.
2. For example, 'Solar photovoltaic', 'Biogas from autogen', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', and 'Cross-boundary Adjustment' seems to start growing from around 2005.
3. Charcoal also shows a significant spike in the early 1990s, which might indicate a temporary increase in production or consumption.
4. The majority of energy sources like Sewage gas, Biomass, and Biogas maintain low and stable growth rates over time.
5. Sudden jumps in Liquid biofuels, Charcoal, and Straw suggest events like: Policy changes, Technological advancements or Market disruptions.
6. From 2010 onward, most energy sources appear to have stabilized, with growth rates returning closer to zero or modest gains.
7. Cross-boundary Adjustment may reflect energy imports/exports, showing occasional fluctuations that align with changing trade conditions or international energy agreements.

### 8. Pie (donut) chart for average energy consumption of each renewable resources 

In [45]:
df_donut = df.describe().iloc[[1], -18:-1].values.flatten()

In [46]:
df_donut

array([ 7.8123871 ,  3.00785416, 47.2484765 ,         inf,  0.        ,
       10.27442766,  4.15364473,         inf, 10.87108961,         inf,
       23.39564078,  8.25964402,  4.38163656,         inf,         inf,
               inf, 18.8764588 ])

In [47]:
fig = px.pie(values=df_donut, names=df.describe().columns[-18:-1],
             hole=0.5, title='Donut Chart for Average Energy Consumption of Renewable Energy')
fig.show()

This donut chart clearifies that:
1. Wood and Wind, wave, tidal combined account for over 40% of total consumption, showcasing their dominance.
2. The presence of smaller contributors (like Solar photovoltaic and Bioethanol) reflects the diversified nature of renewable energy strategies.
3. The mix shows a blend of both traditional (e.g., Wood, Biomass) and modern sources (e.g., Solar, Wind).

### 9. Shifts in renewable energy consumption over the decades

In [48]:

df['Decade'] = (df['Year'].dt.year // 10) * 10  # Group years by decade

df_decade = df.groupby('Decade')[['Hydroelectric power',
       'Wind, wave, tidal', 'Solar photovoltaic', 'Geothermal aquifers',
       'Landfill gas', 'Sewage gas', 'Biogas from autogen',
       'Municipal solid waste (MSW)', 'Poultry litter', 'Straw', 'Wood',
       'Charcoal', 'Liquid bio-fuels', 'Bioethanol', 'Biodiesel', 'Biomass',
       'Cross-boundary Adjustment']].mean()




In [49]:
fig = px.area(df_decade, x=df_decade.index, y=df_decade.columns, 
              title="Shifts in Energy Mix Over Decades", 
              labels={'value': 'Energy (units)', 'variable': 'Energy Source'})

fig.update_layout(
    xaxis=dict(
        tickvals=[1990, 2000, 2010, 2020], 
        ticktext=["1990", "2000", "2010", "2020"]  
    )
)

fig.show()

This is stacked area chart that shows how the renewable energy mix evolved across decades. They are:
1. A steady increase in total energy consumption.
2. Wood and municipal solid waste (MSW) have remained consistent energy sources over time.
3. Bioethanol, biodiesel, and biogas from autogen have grown significantly, especially after 2000.
4. Cross-boundary adjustment has seen a notable rise in recent decades.

## E. Forecast

### Using Prophet to forecast for next 10 years

In [50]:
from prophet import Prophet

#### Forecast for Total energy consumption of primary fuels and equivalents

In [51]:

df_total_energy_primary_fuels = df[['Year','Total energy consumption of primary fuels and equivalents']]
df_total_energy_primary_fuels.rename(columns={'Year': 'ds', 'Total energy consumption of primary fuels and equivalents': 'y'}, inplace=True)

# Initialize Prophet model
model = Prophet()
model.fit(df_total_energy_primary_fuels)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Total Energy Consumption Forecast (Next 10 Years)")

fig.add_scatter(x=df_total_energy_primary_fuels['ds'], y=df_total_energy_primary_fuels['y'], mode='markers', name='Actual Data')

fig.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

19:41:24 - cmdstanpy - INFO - Chain [1] start processing
19:41:25 - cmdstanpy - INFO - Chain [1] done processing


Here scattor plot denotes actual data and blue line denotes prediction for "Total energy consumption of primary fuels and equivalents" which is almost similar to actual data.


#### Lets fine tune it to get more precise forecast

In [52]:
model = Prophet(
    yearly_seasonality=True,
    changepoint_prior_scale=0.1,  # More trend flexibility
    changepoints=['1990-01-01','1995-01-01','2001-01-01', '2005-01-01','2009-01-01','2014-01-01','2018-01-01', '2020-01-01'],  # Important changes
)

model.add_seasonality(name='custom_yearly', period=1, fourier_order=10)  # More seasonal complexity

model.fit(df_total_energy_primary_fuels)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Total Energy Consumption Forecast (Next 10 Years)")

fig.add_scatter(x=df_total_energy_primary_fuels['ds'], y=df_total_energy_primary_fuels['y'], mode='markers', name='Actual Data')

fig.show()

19:41:26 - cmdstanpy - INFO - Chain [1] start processing
19:41:26 - cmdstanpy - INFO - Chain [1] done processing


Looks similar

#### Forecast for Energy from renewable & waste sources

In [53]:

df_renewable_energy = df[['Year','Energy from renewable & waste sources']]
df_renewable_energy.rename(columns={'Year': 'ds', 'Energy from renewable & waste sources': 'y'}, inplace=True)

# Initialize Prophet model
model = Prophet()
model.fit(df_renewable_energy)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Energy from renewable & waste sources Forecast (Next 10 Years)")

fig.add_scatter(x=df_renewable_energy['ds'], y=df_renewable_energy['y'], mode='markers', name='Actual Data')

fig.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

19:41:26 - cmdstanpy - INFO - Chain [1] start processing
19:41:26 - cmdstanpy - INFO - Chain [1] done processing


Here actual data is curved upward trend which doesn't match the forecast for repeating month

#### Lets fine tune it to get more precise forecast

In [54]:
model = Prophet(
    yearly_seasonality=True,
    changepoint_prior_scale=0.1,  # More trend flexibility
    changepoints=['1990-01-01','2000-01-01', '2005-01-01','2010-01-01','2015-01-01','2016-01-01', '2020-01-01'],  # Important changes
)

model.add_seasonality(name='custom_yearly', period=1, fourier_order=10)  # More seasonal complexity

model.fit(df_renewable_energy)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Energy from renewable & waste sources Forecast (Next 10 Years)")

fig.add_scatter(x=df_renewable_energy['ds'], y=df_renewable_energy['y'], mode='markers', name='Actual Data')

fig.show()

19:41:26 - cmdstanpy - INFO - Chain [1] start processing
19:41:26 - cmdstanpy - INFO - Chain [1] done processing


Now, it looks accurate

#### Forecast for Wind, wave, tidal

In [55]:

df_wind = df[['Year','Wind, wave, tidal']]
df_wind.rename(columns={'Year': 'ds', 'Wind, wave, tidal': 'y'}, inplace=True)

# Initialize Prophet model
model = Prophet()
model.fit(df_wind)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Wind, wave, tidal Forecast (Next 10 Years)")

fig.add_scatter(x=df_wind['ds'], y=df_wind['y'], mode='markers', name='Actual Data')

fig.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

19:41:26 - cmdstanpy - INFO - Chain [1] start processing
19:41:26 - cmdstanpy - INFO - Chain [1] done processing


Actual is curved upward trend

#### Lets fine tune it to get more precise forecast

In [56]:
model = Prophet(
    yearly_seasonality=True,
    changepoint_prior_scale=0.1,  # More trend flexibility
    changepoints=['1990-01-01','2003-01-01', '2005-01-01','2010-01-01','2014-01-01','2016-01-01', '2020-01-01'],  # Important changes
)

model.add_seasonality(name='custom_yearly', period=1, fourier_order=10)  # More seasonal complexity

model.fit(df_wind)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Wind, wave, tidal Forecast (Next 10 Years)")

fig.add_scatter(x=df_wind['ds'], y=df_wind['y'], mode='markers', name='Actual Data')

fig.show()

19:41:26 - cmdstanpy - INFO - Chain [1] start processing
19:41:27 - cmdstanpy - INFO - Chain [1] done processing


Looks accurate

#### Forecast for Hydroelectric power

In [57]:

df_Hydroelectric_power = df[['Year','Hydroelectric power']]
df_Hydroelectric_power.rename(columns={'Year': 'ds', 'Hydroelectric power': 'y'}, inplace=True)

# Initialize Prophet model
model = Prophet()
model.fit(df_Hydroelectric_power)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Hydroelectric power Forecast (Next 10 Years)")

fig.add_scatter(x=df_Hydroelectric_power['ds'], y=df_Hydroelectric_power['y'], mode='markers', name='Actual Data')

fig.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

19:41:27 - cmdstanpy - INFO - Chain [1] start processing
19:41:27 - cmdstanpy - INFO - Chain [1] done processing


#### Lets fine tune it to get more precise forecast

In [58]:
model = Prophet(
    yearly_seasonality=True,
    changepoint_prior_scale=0.1,  # More trend flexibility
    # changepoints=['1990-01-01','1991-01-01','1992-01-01','1993-01-01','1994-01-01','1996-01-01','1999-01-01','2001-01-01','2002-01-01','2003-01-01','2004-01-01', '2010-01-01','2011-01-01','2013-01-01','2015-01-01','2016-01-01','2017-01-01','2018-01-01', '2020-01-01'],  # Important changes
    changepoints=['1990-01-01','1992-01-01','1999-01-01','2011-01-01','2015-01-01','2018-01-01', '2020-01-01'],  # Important changes
)

model.add_seasonality(name='custom_yearly', period=1, fourier_order=10)  # More seasonal complexity

model.fit(df_Hydroelectric_power)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Hydroelectric power Forecast (Next 10 Years)")

fig.add_scatter(x=df_Hydroelectric_power['ds'], y=df_Hydroelectric_power['y'], mode='markers', name='Actual Data')

fig.show()

19:41:27 - cmdstanpy - INFO - Chain [1] start processing
19:41:27 - cmdstanpy - INFO - Chain [1] done processing


#### Forecast for Charcoal

In [59]:

df_Charcoal = df[['Year','Charcoal']]
df_Charcoal.rename(columns={'Year': 'ds', 'Charcoal': 'y'}, inplace=True)

# Initialize Prophet model
model = Prophet()
model.fit(df_Charcoal)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Charcoal Forecast (Next 10 Years)")

fig.add_scatter(x=df_Charcoal['ds'], y=df_Charcoal['y'], mode='markers', name='Actual Data')

fig.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

19:41:27 - cmdstanpy - INFO - Chain [1] start processing
19:41:27 - cmdstanpy - INFO - Chain [1] done processing


#### Lets fine tune it to get more precise forecast

In [60]:
model = Prophet(
    yearly_seasonality=True,
    changepoint_prior_scale=0.1,  # More trend flexibility
    # changepoints=['1990-01-01','1991-01-01','1992-01-01','1993-01-01','1994-01-01','1996-01-01','1999-01-01','2001-01-01','2002-01-01','2003-01-01','2004-01-01', '2010-01-01','2011-01-01','2013-01-01','2015-01-01','2016-01-01','2017-01-01','2018-01-01', '2020-01-01'],  # Important changes
    changepoints=['1990-01-01','1992-01-01','1996-01-01','2000-01-01','2004-01-01','2008-01-01','2010-01-01','2013-01-01','2016-01-01','2018-01-01', '2020-01-01'],  # Important changes
)

model.add_seasonality(name='custom_yearly', period=1, fourier_order=10)  # More seasonal complexity

model.fit(df_Charcoal)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Charcoal Forecast (Next 10 Years)")

fig.add_scatter(x=df_Charcoal['ds'], y=df_Charcoal['y'], mode='markers', name='Actual Data')

fig.show()

19:41:27 - cmdstanpy - INFO - Chain [1] start processing
19:41:27 - cmdstanpy - INFO - Chain [1] done processing


#### Forecast for Landfill gas

In [61]:

df_Landfill_gas = df[['Year','Landfill gas']]
df_Landfill_gas.rename(columns={'Year': 'ds', 'Landfill gas': 'y'}, inplace=True)

# Initialize Prophet model
model = Prophet()
model.fit(df_Landfill_gas)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Landfill gas Forecast (Next 10 Years)")

fig.add_scatter(x=df_Landfill_gas['ds'], y=df_Landfill_gas['y'], mode='markers', name='Actual Data')

fig.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

19:41:27 - cmdstanpy - INFO - Chain [1] start processing
19:41:28 - cmdstanpy - INFO - Chain [1] done processing


Actual data is initially growing but then is downward from 2011 but prediction  shows consistend upward trend

#### Lets fine tune it to get more precise forecast

In [62]:
model = Prophet(
    yearly_seasonality=True,
    changepoint_prior_scale=0.1,  # More trend flexibility
    # changepoints=['1990-01-01','1991-01-01','1992-01-01','1993-01-01','1994-01-01','1996-01-01','1999-01-01','2001-01-01','2002-01-01','2003-01-01','2004-01-01', '2010-01-01','2011-01-01','2013-01-01','2015-01-01','2016-01-01','2017-01-01','2018-01-01', '2020-01-01'],  # Important changes
    changepoints=['1990-01-01','1995-01-01','1997-01-01','2000-01-01','2002-01-01','2004-01-01','2008-01-01','2011-01-01','2016-01-01','2019-01-01', '2020-01-01'],  # Important changes
)

model.add_seasonality(name='custom_yearly', period=1, fourier_order=10)  # More seasonal complexity

model.fit(df_Landfill_gas)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Landfill gas Forecast (Next 10 Years)")

fig.add_scatter(x=df_Landfill_gas['ds'], y=df_Landfill_gas['y'], mode='markers', name='Actual Data')

fig.show()

19:41:28 - cmdstanpy - INFO - Chain [1] start processing
19:41:28 - cmdstanpy - INFO - Chain [1] done processing


No change even after fine tuning

#### Forecast for Bioethanol

In [63]:

df_Bioethanol = df[['Year','Bioethanol']]
df_Bioethanol.rename(columns={'Year': 'ds', 'Bioethanol': 'y'}, inplace=True)

# Initialize Prophet model
model = Prophet()
model.fit(df_Bioethanol)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast = model.predict(future)
forecast

fig = px.line(forecast, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Bioethanol Forecast (Next 10 Years)")

fig.add_scatter(x=df_Bioethanol['ds'], y=df_Bioethanol['y'], mode='markers', name='Actual Data')

fig.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

19:41:28 - cmdstanpy - INFO - Chain [1] start processing
19:41:28 - cmdstanpy - INFO - Chain [1] done processing


#### Lets fine tune it to get more precise forecast

In [64]:
model = Prophet(
    yearly_seasonality=True,
    changepoint_prior_scale=0.1,  # More trend flexibility
    # changepoints=['1990-01-01','1991-01-01','1992-01-01','1993-01-01','1994-01-01','1996-01-01','1999-01-01','2001-01-01','2002-01-01','2003-01-01','2004-01-01', '2010-01-01','2011-01-01','2013-01-01','2015-01-01','2016-01-01','2017-01-01','2018-01-01', '2020-01-01'],  # Important changes
    changepoints=['1990-01-01','2004-01-01','2013-01-01', '2020-01-01'],  # Important changes
)
model.fit(df_Bioethanol)

future = model.make_future_dataframe(periods=10, freq='Y')

forecast_Bioethanol = model.predict(future)

fig = px.line(forecast_Bioethanol, x='ds', y='yhat', labels={'ds': 'Year', 'yhat': 'Predicted Energy Consumption'},
              title="Bioethanol Forecast (Next 10 Years)")

fig.add_scatter(x=df_Bioethanol['ds'], y=df_Bioethanol['y'], mode='markers', name='Actual Data')

fig.show()

19:41:28 - cmdstanpy - INFO - Chain [1] start processing
19:41:28 - cmdstanpy - INFO - Chain [1] done processing
