#Brief description of the dataset

I am going to work with two datasets and combine them:

- Total amount of power generated through time by energy type.
- Price of power.

First we are going to load the necessary modules for this program to work:

In [2]:
import sys
!{sys.executable} -m pip install pandas scikit-learn matplotlib seaborn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


Then, we will load the datasets that we have produced from our JS application. I need to combine different years of data:

In [48]:
years = ['2016', '2017', '2018', '2019', '2020', '2021']
EnergyGeneration_All = pd.DataFrame()

for x in years:
    pathEnergy = '../data/genTypes/energy_generation_type_date' + x + '.json'
    newDF = pd.io.json.read_json(pathEnergy).round()
    EnergyGeneration_All = pd.concat([EnergyGeneration_All, newDF])

EnergyGeneration_All.fillna(0, inplace=True)

del(newDF, pathEnergy, x, years)

From this dataframe we are only interested in the totals of renewable vs. non-renewable

In [49]:


energySummary = pd.DataFrame()
renewableTypes = ['Hidráulica', 'Hidroeólica', 'Eólica', 'Solar fotovoltaica', 'Solar térmica', 'Otras renovables', 'Residuos renovables']

energySummary['Renewable'] = EnergyGeneration_All[renewableTypes].sum(axis=1)
energySummary['Non_Renewable'] = EnergyGeneration_All['Generación total'] - energySummary['Renewable']



#del(EnergyGeneration_All)

energySummary.head()

Unnamed: 0,Renewable,Non_Renewable
2016-01-01,253586.0,335401.0
2016-01-02,294759.0,357755.0
2016-01-03,361739.0,334133.0
2016-01-04,338825.0,367608.0
2016-01-05,395072.0,352461.0


In [50]:

energySummary = pd.DataFrame()
renewableTypes = ['Hidráulica', 'Hidroeólica', 'Eólica', 'Solar fotovoltaica', 'Solar térmica', 'Otras renovables', 'Residuos renovables']
energySummary['Renewable'] = EnergyGeneration_All[renewableTypes].sum(axis=1)

energySummary.head()

Unnamed: 0,Renewable
2016-01-01,253586.0
2016-01-02,294759.0
2016-01-03,361739.0
2016-01-04,338825.0
2016-01-05,395072.0


On the other hand we will now import the prices of energy, also on a daily basis.
The dates are in European form, in fact they're not even interpreted as dates. Let's convert them and set them as index, so that it looks the same as the other df:

In [17]:

pathPrices = '../data/energy_price.csv'

historyPrice = pd.read_csv(pathPrices, decimal=',')
historyPrice['Fecha'] = pd.to_datetime(historyPrice['Fecha'], format='%d/%m/%Y')
historyPrice.set_index('Fecha', inplace=True)

historyPrice.head()

del(pathPrices)

Now it's time to load, also, the CO2 prices, another feature that may affect.

In [18]:
pathCarbonPrices = '../data/carbon_price.csv'

carbonPrice = pd.read_csv(pathCarbonPrices, decimal=',')
carbonPrice['Fecha'] = pd.to_datetime(carbonPrice['Fecha'], format='%d.%m.%Y')
carbonPrice.set_index('Fecha', inplace=True)
carbonPrice.head()

del (pathCarbonPrices)

Finally we will read a data that contains information on what has been the dominant type of energy.

In [20]:
pathDomTypes = '../data/dominantTypes/DomTypes_2021.csv'
domTypePerHour = pd.read_csv(pathDomTypes, header=3)
domTypePerHour['Dia']

domTypePerHour['Dia'] = pd.to_datetime(domTypePerHour['Dia'], format='%d/%m/%y')
domTypePerHour.set_index('Dia', inplace=True)

domTypePerHour.head()
del(pathDomTypes)

I will now get the dominant type for energy of each day.


In [22]:
from collections import Counter

domTypeDailyRatio = pd.DataFrame(index=domTypePerHour.index)

for Fecha, datos in domTypePerHour.iterrows():
    recuento = domTypePerHour.loc[Fecha, :].value_counts()
    #print(recuento)
    for index, value in recuento.items():
        domTypeDailyRatio.loc[Fecha, index] = value/24
        if (index == '0'):
            print(recuento)
            print(index)
            print(value)

domTypeDailyRatio.fillna(0, inplace=True)
del(value, index, recuento, Fecha, datos)


Now we are ready to merge this data by using the dates as an index.
This will yield only as a result the rows where the date is present for both dataframes.

In [23]:
AllSummary = energySummary\
    .merge(historyPrice['Precio'].rename('Energy price'), left_index=True, right_on='Fecha')\
    .merge(carbonPrice['Último'].rename('CO2 ton price'), left_index=True, right_index=True)\
    .merge(domTypeDailyRatio, left_index=True, right_index=True)
AllSummary['year'] = AllSummary.index.to_series().dt.year
AllSummary.head()

Unnamed: 0,Renewable,Non_Renewable,Energy price,CO2 ton price,HI,RE,BG,TCC,TER,MIP,year
2021-01-04,338858.0,443010.0,59.85,33.69,0.416667,0.166667,0.416667,0.0,0.0,0.0,2021
2021-01-05,291895.0,436704.0,67.55,32.96,0.416667,0.166667,0.25,0.083333,0.0,0.0,2021
2021-01-06,273265.0,407474.0,70.6,33.63,0.666667,0.041667,0.208333,0.083333,0.0,0.0,2021
2021-01-07,281307.0,486273.0,88.93,34.76,0.333333,0.166667,0.083333,0.041667,0.041667,0.0,2021
2021-01-08,393578.0,416164.0,94.99,34.92,0.75,0.0,0.083333,0.0,0.166667,0.0,2021
