# Eric González Caballero - MSc Big Data Analytics Thesis
## Forecasting the System Imbalance in the Spanish Electricity Market


### Notebook 01 - DataIngestion

This notebook aims to ingest the data from the REE ESIOS API for the desired indicators and the desired dates

Official API documentation can be found at: https://api.esios.ree.es/

A useful library for easing ingestions via the ESIOS API is used, developed by Santiago Peñate Vera (santiago.penate.vera@gmail.com) under the GNU General Public License and available at https://github.com/SanPen/ESIOS which was slightly modified to adapt it to the needs of this project.


#### Libraries import

In [10]:
# Import general libraries
import requests
import json

import numpy as np
import pandas as pd

import datetime
import holidays

import sys
sys.path.append('../libraries')

# Import modified esios library
from utils_esios_mod import *

#### Constants definition
Set the different needed dates, mainly the initial and final ingestion dates that can depend on the current day. For a full load, data from 2019 (a lot of indicators were added then) onwards is ingested, maximum date is the next day at 23h, when some forecast indicators are already available.

In [None]:
# Define constans
# Dynamic dates, time in UTC+01/02 (Spanish local time, not utc)
now = datetime.datetime.now()
today = datetime.date.today()

yesterday = today - datetime.timedelta(days=1)
tomorrow = today + datetime.timedelta(days=1)

# Format the timestamp to be included in the name
tmstmp = str(now.strftime("%Y_%m_%d"))

# Set desired dates to be used in the correct format
start = "2019-01-01T00:00:00"
end = tomorrow.strftime("%Y-%m-%dT23:00:00")

# I/O paths
raw_path = '../data/raw/data_' + tmstmp + '.csv'
curated_path = '../data/curated/data_bronze.csv'

# ESIOS personal token
token = '0c74832e546dfc1ba873175f25fcdf1773b4b0ee37f6d5d767d5e3791f129a47' #Personal token, should be in a private config file or key vault

#### Query Data
Instantiate the ESIOS custom class with the personal token and query the list of available indicators

In [11]:
esios = ESIOS(token)

Getting the indicators...


In [12]:
resp = requests.get('https://api.esios.ree.es/indicators',
                  headers={'Authorization': 'Token token="%s"' % (token)})

print(f"There are {len(resp.json()['indicators'])} indicators available:")

for element in resp.json()['indicators']: print(str(element['id']) + ': ' + element['name'])

There are 1878 indicators available:
1901: Precio medio horario componente RD-L 10/2022 mercado diario e intradiario - diferencia por liquidación con medidas 
10405: Precio medio horario componente mecanismo de ajuste RD-L 10/2022 comercializadores de referencia
10404: Precio medio horario componente mecanismo de ajuste RD-L 10/2022 contratación libre
10403: Precio medio horario componente mecanismo de ajuste RD-L 10/2022
1909: Precio medio horario componente RD-L 10/2022 mercado diario e intradiario comercializadores de referencia
1908: Precio medio horario componente RD-L 10/2022 mercado diario e intradiario contratación libre
1907: Precio medio horario componente RD-L 10/2022 mercado diario e intradiario 
1906: Precio medio horario componente RD-L 10/2022 restricciones técnicas y mercados de balance comercializadores de referencia
1905: Mecanismo de ajuste contratacion libre
1904: Precio medio horario componente RD-L 10/2022 restricciones técnicas y mercados de balance contratación 

Query data from normal indicators (those that do not require special options like geographical aggregations)

In [14]:
# Read the normal variables by using the indicator id. Normal variables get update at 11h.
## 1775: Previsión diaria D+1 demanda
## 1293: Demanda real
## 545: Demanda programada
## 1779: Previsión diaria D+1 fotovoltaica
## 1777: Previsión diaria D+1 eólica
## 600: Precio mercado SPOT Diario

indicators = [1775, 1293, 545, 1779, 1777, 600]  ## Others 817, 544, 684, 1338, 1192, 10030, 10031, 10032, 767, 460, 10358]

# Query the data
df_list, names = esios.get_multiple_series(indicators, start, end)

# Merge the different series into a unique dataframe
df_merged = esios.merge_series(df_list, names)

Parsing Previsión diaria D+1 demanda
Parsing Demanda real
Parsing Demanda programada
Parsing Previsión diaria D+1 fotovoltaica
Parsing Previsión diaria D+1 eólica
Parsing Precio mercado SPOT Diario
merging


In [15]:
df = df_merged
df

Unnamed: 0_level_0,Previsión diaria D+1 demanda,Demanda real,Demanda programada,Previsión diaria D+1 fotovoltaica,Previsión diaria D+1 eólica,Precio mercado SPOT Diario
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-01-01 00:00:00+01:00,23753.0,23676.0,23251.0,0.0,3214.0,66.88
2019-01-01 01:00:00+01:00,23018.0,23128.0,22485.0,0.0,3222.0,66.88
2019-01-01 02:00:00+01:00,21808.0,22109.0,20977.0,0.0,3081.0,66.00
2019-01-01 03:00:00+01:00,20635.0,20666.0,19754.0,0.0,3069.0,63.64
2019-01-01 04:00:00+01:00,19824.0,19680.0,19321.0,0.0,2973.0,58.85
...,...,...,...,...,...,...
2022-08-10 19:00:00+02:00,31688.3,,,4284.4,4608.0,153.15
2022-08-10 20:00:00+02:00,31126.5,,,1513.6,4780.5,166.50
2022-08-10 21:00:00+02:00,31363.5,,,99.6,5048.3,171.68
2022-08-10 22:00:00+02:00,30481.8,,,0.0,5336.0,170.01


#### Output

In [7]:
# Save into raw with the timestamp in the name to keep the history
df.to_csv(raw_path, index=True)

# Save into curated as the latest file
df.to_csv(curated_path, index=True)

Query data from those indicators that require special options such as generation, which is presented by geo_codes (a lot of data, server collapses) and the national aggregated one is wished, therefore sum aggregation function is applied in the query, as they are uploaded into the ESIOS platform with a month of lag, they are not going to be used but code remains in case some other indicator is added in the future.

In [8]:
# indicators2 = [10043, 10037, 10205, 10035] ## Others 1153, 1156, 10039

# names2 = esios.get_names(indicators2)
# for name in names2: print(name)

In [9]:
# df_list = list()

# for id_code in indicators: #2:
#     df_new = esios_get(id_code, start_date, start_hour, end_date, end_hour, token, '&geo_agg=sum')
    
#     df_list.append(df_new)

# df2 = esios.merge_series(df_list, names2)   
# df2