#Brief description of the dataset

I am going to work with two datasets and combine them:

- Total amount of power generated through time by energy type.
- Price of power.

First we are going to load the necessary modules for this program to work:

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Then, we will load the datasets that we have produced from our JS application. I need to combine different years of data:

In [4]:
years = ['2017', '2018', '2019', '2020', '2021']

EnergyGeneration_All = pd.io.json.read_json('./data/genTypes/energy_generation_type_date2016.json')
for x in years:
    pathEnergy = './data/genTypes/energy_generation_type_date' + x + '.json'
    newDF = pd.io.json.read_json(pathEnergy).round()
    EnergyGeneration_All = EnergyGeneration_All.append(newDF)

EnergyGeneration_All.tail()
EnergyGeneration_All.fillna(0, inplace=True)

del(newDF, pathEnergy, x, years)

Unnamed: 0,Hidráulica,Turbinación bombeo,Nuclear,Carbón,Fuel + Gas,Motores diésel,Turbina de gas,Turbina de vapor,Ciclo combinado,Hidroeólica,Eólica,Solar fotovoltaica,Solar térmica,Otras renovables,Cogeneración,Residuos no renovables,Residuos renovables,Generación total
2016-01-01,31642.298,9886.914,169843.416,30064.305,0.0,8304.632,1501.270,6071.229,61174.055,11.714,202916.119,9021.566,695.552,7543.142,44030.379,4525.3350,1754.8050,588986.731
2016-01-02,35554.520,24261.468,169614.296,32070.636,0.0,8635.330,1718.746,7339.712,56294.848,13.572,235804.590,12635.816,1260.093,7876.332,53249.260,4571.3020,1613.1100,652513.631
2016-01-03,35920.392,12502.102,164994.879,27063.031,0.0,8490.337,1616.735,6738.157,51333.835,20.916,311377.459,5096.597,95.938,7565.244,55824.967,5568.9755,1662.9085,695872.473
2016-01-04,65278.775,30484.420,145859.298,41511.320,0.0,8897.895,2242.070,7886.107,59893.786,28.387,258510.870,4849.798,69.916,7953.584,64904.942,5928.2040,2133.3190,706432.691
2016-01-05,79728.796,20729.332,146050.955,43830.176,0.0,8740.527,2086.604,7909.799,50045.608,33.797,290893.912,12562.126,1432.872,7944.637,66781.178,6287.7865,2474.5605,747532.666
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-11-12,41847.000,7652.000,119416.000,19585.000,0.0,7113.000,677.000,3623.000,304183.000,41.000,87048.000,48765.000,8161.000,14025.000,77054.000,6077.0000,2846.0000,748114.000
2021-11-13,28461.000,1921.000,119204.000,19499.000,0.0,7162.000,579.000,3727.000,201214.000,25.000,127559.000,50885.000,9218.000,14362.000,72555.000,5882.0000,2780.0000,665034.000
2021-11-14,27153.000,1401.000,119008.000,19174.000,0.0,6949.000,412.000,3434.000,128111.000,14.000,221044.000,55100.000,9539.000,15024.000,69138.000,5834.0000,2784.0000,684119.000
2021-11-15,34820.000,5511.000,118811.000,19412.000,0.0,7341.000,813.000,3282.000,172103.000,6.000,270822.000,53224.000,8679.000,15879.000,73203.000,5757.0000,2618.0000,792280.000


From this dataframe we are only interested in the totals of renewable vs. non-renewable

In [3]:
energySummary = pd.DataFrame()

energySummary['Renewable'] = EnergyGeneration_All['Hidráulica'] + EnergyGeneration_All['Hidroeólica'] + EnergyGeneration_All['Eólica'] + EnergyGeneration_All['Solar fotovoltaica'] + EnergyGeneration_All['Solar térmica'] + EnergyGeneration_All['Otras renovables'] + EnergyGeneration_All['Residuos renovables']
energySummary['Non_Renewable'] = EnergyGeneration_All['Generación total'] - energySummary['Renewable']

energySummary.tail()

del(EnergyGeneration_All)

On the other hand we will now import the prices of energy, also on a daily basis.
The dates are in European form, in fact they're not even interpreted as dates. Let's convert them and set them as index, so that it looks the same as the other df:

In [4]:

pathPrices = './data/energy_price.csv'

historyPrice = pd.read_csv(pathPrices, decimal=',')
historyPrice['Fecha'] = pd.to_datetime(historyPrice['Fecha'], format='%d/%m/%Y')
historyPrice.set_index('Fecha', inplace=True)

historyPrice.head()

del(pathPrices)

Now it's time to load, also, the CO2 prices, another feature that may affect.

In [5]:
pathCarbonPrices = './data/carbon_price.csv'

carbonPrice = pd.read_csv(pathCarbonPrices, decimal=',')
carbonPrice['Fecha'] = pd.to_datetime(carbonPrice['Fecha'], format='%d.%m.%Y')
carbonPrice.set_index('Fecha', inplace=True)
carbonPrice.head()

del (pathCarbonPrices)

Finally we will read a data that contains information on what has been the dominant type of energy.

In [6]:
pathDomTypes = './data/dominantTypes/DomTypes_2021.csv'
domTypePerHour = pd.read_csv(pathDomTypes, header=3)
domTypePerHour['Dia']

domTypePerHour['Dia'] = pd.to_datetime(domTypePerHour['Dia'], format='%d/%m/%y')
domTypePerHour.set_index('Dia', inplace=True)

domTypePerHour.head()
del(pathDomTypes)

I will now get the dominant type for energy of each day.


In [9]:
from collections import Counter

domTypeDailyRatio = pd.DataFrame(index=domTypePerHour.index)

for Fecha, datos in domTypePerHour.iterrows():
    recuento = domTypePerHour.loc[Fecha, :].value_counts()
    #print(recuento)
    for index, value in recuento.iteritems():
        domTypeDailyRatio.loc[Fecha, index] = value/24
        if (index == '0'):
            print(recuento)
            print(index)
            print(value)

domTypeDailyRatio.fillna(0, inplace=True)
del(value, index, recuento, Fecha, datos)


HI    13
RE     9
BG     1
0      1
Name: 2021-03-28 00:00:00, dtype: int64
0
1


Now we are ready to merge this data by using the dates as an index.
This will yield only as a result the rows where the date is present for both dataframes.

In [16]:
AllSummary = energySummary\
    .merge(historyPrice['Precio'].rename('Energy price'), left_index=True, right_on='Fecha')\
    .merge(carbonPrice['Último'].rename('CO2 ton price'), left_index=True, right_index=True)\
    .merge(domTypeDailyRatio, left_index=True, right_index=True)
AllSummary['year'] = AllSummary.index.to_series().dt.year
AllSummary.head()

Unnamed: 0,Renovable,No_Renovable,Precio Energía,Precio CO2,HI,RE,BG,TCC,TER,MIP,0,year
2021-01-04,338858.0,443010.0,59.85,3369,0.416667,0.166667,0.416667,0.0,0.0,0.0,0.0,2021
2021-01-05,291895.0,436704.0,67.55,3296,0.416667,0.166667,0.25,0.083333,0.0,0.0,0.0,2021
2021-01-06,273265.0,407474.0,70.6,3363,0.666667,0.041667,0.208333,0.083333,0.0,0.0,0.0,2021
2021-01-07,281307.0,486273.0,88.93,3476,0.333333,0.166667,0.083333,0.041667,0.041667,0.0,0.0,2021
2021-01-08,393578.0,416164.0,94.99,3492,0.75,0.0,0.083333,0.0,0.166667,0.0,0.0,2021
