# Script funcional
Este notebook se utilizará para el funcionamiento general de la aplicación


- [Preparación](#Preparación)<br>

### 1. [Datos instalación](#Datos-instalación)

### 2. [Obtención de los datos](#Obtención-de-los-datos)

### 3. [Limpieza de datos](#Limpieza-de-datoss)

### 4. [Preparación de los datos](#Preparación-de-los-datos)

### 5. [Predicción radiación](#Predicción-radiación)

### 6. [Predicción temperatura ambiente](#Predicción-temperatura-ambiente)

### 7. [Obtención de la producción eléctrica](#Obtención-de-la-producción-eléctrica)


## Preparación

Cargo las diferentes librerías, datasets y funciones

In [1]:
import numpy as np
import pandas as pd
import random
pd.options.display.max_columns = None
pd.options.display.max_rows = None
import matplotlib.pyplot as plt
plt.style.use("seaborn")

In [2]:
import math
import time
from datetime import timezone, datetime, date, timedelta
import os
import requests
import json
import re
import io

In [3]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import pickle as pk
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import Pipeline

In [4]:
hora_ini = 4
hora_fin = 20

Se importan las funciones necesarias definidas en otro dataset

In [5]:
# !pip install ipynb
from ipynb.fs.full.Funciones_solares import *

Se fija el directorio de trabajo

In [6]:
%cd /home/dsc/git/TFM/

/home/dsc/git/TFM


In [7]:
directorio = '/home/dsc/git/TFM/'

Se carga la lista de estaciones meteorológicas de AEMET

In [8]:
df_estaciones = pd.read_csv(directorio + 'data/estaciones.csv')

Se carga la lista de estaciones de radiación de AEMET

In [9]:
df_estaciones_rad = pd.read_csv(directorio + 'data/estaciones_rad.csv')
df_estaciones_rad.dropna(inplace = True)
df_estaciones_rad.reset_index(drop = True, inplace = True)

### ``get_response_aemet()``

Función API Aemet

In [10]:
def get_response_aemet(url_base = "", url = "", api_key = "", ide = ""):
    
    # Se unen las partes de la url final
    call = '/'.join([url_base, url, ide])
    if(ide == ""):
        call = call[:-1]

    headers = {    
        'Accept': 'application/json',  
        'Authorization': 'api_key' + api_key
    }
    response = requests.get(call, headers = headers)
    
    #Se obtienen los datos del body
    body = json.loads(response.text)["datos"]
    
    
    response = requests.get(body, headers = headers)
    if response:
        print('Exito')
    else:
        print('Ha ocurrido un error')

    return response.text


### ``get_response_OW()``

Función API OpenWeather

In [11]:
def get_response_OW(url = ""):
    
    response = requests.get(url)

    if response:
        print('Exito')
    else:
        print('Ha ocurrido un error')

    return response.content


### ``openAndSkipLines()``

Función para contar las líneas hasta los datos de la respuesta de la API de CAMS SODA

In [12]:
def openAndSkipLines(f, symbol):
# open a file, e.g. a CSV file, and skip lines beginning with symbol. Return the total number of lines and number of lines to skip (i.e. not containing data). If <0, file is empty
# The file is ready to be read at the first line of data

    buf = io.StringIO(f)
    
    nbTotalLines = len(buf.read())
    if(nbTotalLines == 0): return -1, -1
    buf.seek(0,0)
    stop = False
    nbLine = 0
    while (not stop) :
        nbLine = nbLine + 1
        l = buf.readline()
        if (l[0] != symbol): stop = True
    buf.seek(buf.tell()-len(l),0)
    nbLinesToSkip = nbLine-1
    return nbTotalLines, nbLinesToSkip 



### ``getCamsData()``

Función para generar dataframe con los datos de la respuesta de CAMS SODA

In [13]:
def getCamsData(camsFile, nbLinesToSkip):

    # Lista de variables CAMS:
    # Observation period;TOA;Clear sky GHI;Clear sky BHI;Clear sky DHI;Clear sky BNI;GHI;BHI;DHI;BNI;Reliability
    camsFile = io.StringIO(camsFile) 
    datacolumns = pd.DataFrame()
    dateBegins = list()
    dateEnds = list()
    toa = list()
    cs_ghi = list()
    cs_bhi = list()
    cs_dhi = list()
    cs_bni = list()
    ghi = list()
    bhi = list()
    dhi = list()
    bni = list()
    reliability = list()
    cont_lines = 0
    
    # Almaceno los datos de cada fila
    for ll in camsFile.readlines():
        cont_lines += 1
        if (cont_lines > nbLinesToSkip):
            ll = ll[0:len(ll)-1]
            #print(ll)
            l = ll.split(';')
            date = l[0].split('/')
            dateBegins.append(date[0].strip())
            dateEnds.append(date[1].strip())
            toa.append(l[1].strip())
            cs_ghi.append(l[2].strip())
            cs_bhi.append(l[3].strip())
            cs_dhi.append(l[4].strip())
            cs_bni.append(l[5].strip())
            ghi.append(l[6].strip())
            bhi.append(l[7].strip())
            dhi.append(l[8].strip())
            bni.append(l[9].strip())
            reliability.append(l[10].strip())

    # Genero el data frame
    dictio = {"dateBegins":dateBegins, "dateEnds":dateEnds, "toa":toa, "cs_ghi":cs_ghi, "cs_bhi":cs_bhi, "cs_dhi":cs_dhi, "cs_bni":cs_bni, "ghi":ghi, "bhi":bhi, "dhi" : dhi, "bni" : bni, "reliability" : reliability}
    datacolumns = pd.DataFrame(dictio)

    return datacolumns

### ``distancia()``

Función que calcula la distancia euclidea o de Manhattan entre dos puntos

In [14]:
def distancia(lat1, lon1, lat2, lon2, distancia = "euclidea"):
    
    if(distancia == "euclidea"):
        dist = math.sqrt((lat1 - lat2)**2 + (lon1 -lon2)**2)
    
    elif(distancia == "manhattan"):
        dist = abs(lat1 - lat2) + abs(lon1 -lon2)
  
    return dist 

### ``conversor_coordenadas()``

Función que transforma coordenadas de GMS a decimal

In [15]:
def conversor_coordenadas(coord):
    #Si coord es latitud, al norte del ecuador es siempre positiva
    #Si coord es longitud, al oeste del Meridiano 0º son negativas
    
    D = int(coord[0:2])
    M = float(coord[2:4])
    S = float(coord[4:6])
    
    #GMS a GD
    DD = float((D) + (M/60) + (S/3600))
        
    if(coord[6] == "S" or coord[6] == "W"):
            DD = -DD
            
    return DD

### ``dividir_train_test()``
Esta función divide los datos en sets de train y de test:
- Train: prop (Por defecto, 80%)
- Test: 100% - prop

In [16]:
def dividir_train_test(x,y, prop = 0.8):
    
    # Proporción de train
    tam_train = prop

    # Divido en train y test
    x_train, x_test, y_train, y_test = train_test_split(x, y, train_size = tam_train, random_state = 1)

    print('x_train: {}%. Nº de datos: {}'.format((len(x_train)/len(x))*100, len(x_train)))
    print('y_train: {}%. Nº de datos: {}'.format((len(y_train)/len(y))*100, len(y_train)))


    print('x_test: {}%. Nº de datos: {}'.format((len(x_test)/len(x))*100, len(x_test)))
    print('y_test: {}%. Nº de datos: {}'.format((len(y_test)/len(y))*100, len(y_test)))
    
    return x_train, x_test, y_train, y_test

### ``graf_compara()``

Esta función representa mediante barras verticales los valores reales frente a los predichos.  

In [17]:
def graf_compara(nombre_modelo, y_real, y_pred):
    
    # Valores predicción
    predic = pd.DataFrame({'Dato': y_pred})
    predic.insert(len(predic.columns),"index",[i for i in range(0,len(predic["Dato"]))],True)
    
    # Valores reales
    real = pd.DataFrame({'Dato': y_real})
    real.insert(len(real.columns),"index",[i for i in range(0,len(real["Dato"]))],True)
    
    # Comparación
    comparacion = pd.concat([real, predic], keys=["Real", "Prediccion"]).reset_index()
    comparacion.drop(['level_1'], axis=1, inplace = True)
    comparacion.columns = ['Tipo', 'Dato', "Index"]
    #print(comparacion)

    sns.catplot(data = comparacion, kind = "bar", x = "Index", y = "Dato", hue = "Tipo", estimator = np.median, height = 10, aspect = 5)

### ``mape_fun()``

In [18]:
def mape_fun(y_real, y_pred): 
    y_real, y_pred = np.array(y_real), np.array(y_pred)
    return np.mean(np.abs((y_real - y_pred) / y_real)) * 100

### ``metricas()``

Esta función calcula las diferentes métricas de la predicción arrojada por el modelo  

Métricas: ``mae``, ``mse``,``rmse``, ``r2``, ``mape``

In [19]:
def metricas(modelo, y_real, y_pred):
    
    from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
    
    # MAE: el error se calcula como un promedio de diferencias absolutas entre los valores objetivo y las predicciones. Todas las diferencias individuales se ponderan por igual en el promedio.
    mae = mean_absolute_error(y_real, y_pred)

    # MSE: mide el error cuadrado promedio de las predicciones. Para cada punto, calcula la diferencia cuadrada entre las predicciones y el objetivo y luego promedia esos valores.
    mse = mean_squared_error(y_real, y_pred, squared = False)
    
    # RMSE: es la raíz cuadrada de MSE. Tiene la escala de la variable objetivo.
    rmse = np.sqrt(mean_squared_error(y_real, y_pred))
    
    # R^2: está estrechamente relacionada con la MSE, pero tiene la ventaja de estar libre de escala. Está siempre entre -∞ y 1.
    r2 = r2_score(y_real, y_pred)
    
    # MAPE: Para cada objeto, el error absoluto se divide por el valor objetivo, dando un error relativo.
    #mape = mape_fun(y_real, y_pred)
    
    
    print('MODEL: ', modelo)
    print('MAE: ', mae)
    print('MSE: ', mse)
    print('RMSE: ', rmse)
    print('R2 : ', r2)
    #print('MAPE : ', mape)
    
    #return modelo, mae, mse, rmse, r2, mape
    return modelo, mae, mse, rmse, r2

### ``compracion_metricas()``

Esta función representa las métricas de cada modelo entrenado, para poder compararlas.

In [20]:
def compracion_metricas(lista_modelos):
    
    plt.style.use('ggplot')
    
    # Creamos dataframes para albergar las métricas
    df_mae = pd.DataFrame(columns = ['mae', "modelo"])
    df_mse = pd.DataFrame(columns = ['mse', "modelo"])
    df_rmse = pd.DataFrame(columns = ['rmse', "modelo"])
    df_r2 = pd.DataFrame(columns = ['r2', "modelo"])
    #df_mape = pd.DataFrame(columns = ['mape', "modelo"])

    # Se llenan los dataframes con las métricas
    for modelo in lista_modelos:
        
        df_mae = df_mae.append({'mae': modelo[1], "modelo": modelo[0]}, ignore_index=True)
        df_mse = df_mse.append({'mse': modelo[2], "modelo": modelo[0]}, ignore_index=True)
        df_rmse = df_rmse.append({'rmse': modelo[3], "modelo": modelo[0]}, ignore_index=True)
        df_r2 = df_r2.append({'r2': modelo[4], "modelo": modelo[0]}, ignore_index=True)
        #df_mape = df_mape.append({'mape': modelo[5], "modelo": modelo[0]}, ignore_index=True)
    
    # Se crea la figura y añado los subplots de cada métrica
    fig = plt.figure(figsize = (15, len(lista_modelos*4)))
    ax1 = fig.add_subplot(5,1,1)
    ax2 = fig.add_subplot(5,1,2)
    ax3 = fig.add_subplot(5,1,3)
    ax4 = fig.add_subplot(5,1,4)
    #ax5 = fig.add_subplot(5,1,5)
    
    #MAE
    fig = sns.barplot(x = "mae", y = "modelo", data = df_mae, ax = ax1, orient = "h", color = 'green').set_title("Comparación de métricas")
    ax1.tick_params(labelbottom = False, bottom = False)
    ax1.set_xlabel("MAE")
    ax1.set_ylabel(" ")
    for pa in ax1.patches:
        ax1.annotate("%.4f" % pa.get_width(), xy = (pa.get_width(), pa.get_y() + pa.get_height()/2),
            xytext = (5, 0), textcoords = 'offset points', ha = "left", va = "center")
    
    #MSE
    fig = sns.barplot(x = "mse", y = "modelo", data = df_mse, ax = ax2, orient = "h", color = 'red')
    ax2.tick_params(labelbottom = False, bottom = False)
    ax2.set_xlabel("MSE")
    ax2.set_ylabel(" ")
    for pa in ax2.patches:
        ax2.annotate("%.4f" % pa.get_width(), xy = (pa.get_width(), pa.get_y() + pa.get_height()/2),
            xytext = (5, 0), textcoords = 'offset points', ha = "left", va = "center")
        
    #RMSE
    fig = sns.barplot(x = "rmse", y = "modelo", data = df_rmse, ax = ax3, orient = "h", color = 'blue')
    ax3.tick_params(labelbottom = False, bottom = False)
    ax3.set_xlabel("RMSE")
    ax3.set_ylabel(" ")
    for pa in ax3.patches:
        ax3.annotate("%.4f" % pa.get_width(), xy = (pa.get_width(), pa.get_y() + pa.get_height()/2),
            xytext = (5, 0), textcoords = 'offset points', ha = "left", va = "center")

    #R2
    fig = sns.barplot(x = "r2", y = "modelo", data = df_r2, ax = ax4, orient = "h", color = 'yellow')
    ax4.set_xlabel("R2")
    ax4.set_ylabel(" ")
    ax4.tick_params(labelbottom = False, bottom = False)
    for pa in ax4.patches:
        ax4.annotate("%.4f" % pa.get_width(), xy = (pa.get_width(), pa.get_y() + pa.get_height()/2),
            xytext = (5, 0), textcoords = 'offset points', ha = "left", va = "center")
        
    #MAPE
    #fig = sns.barplot(x = "mape", y = "modelo", data = df_mape, ax = ax5, orient = "h", color = 'grey')
    #ax5.set_xlabel("Valor de la métrica")
    #ax5.set_ylabel(" ")
    #ax5.tick_params(labelbottom = False, bottom = False)
    #for pa in ax5.patches:
    #    ax5.annotate("%.4f" % pa.get_width(), xy = (pa.get_width(), pa.get_y() + pa.get_height()/2),
    #        xytext = (5, 0), textcoords = 'offset points', ha = "left", va = "center")

# Datos instalación
<div style = "float:right"><a style="text-decoration:none" href = "#Script-funcional">

In [21]:
fecha_a_usar = date.today()
fecha = fecha_a_usar
fecha = "{}-{}-{}".format(fecha.year, str(fecha.month).zfill(2), str(fecha.day).zfill(2))
fecha

fallo = 0

In [22]:
# Datos instalacion

lat = 41.29277777777778 #Latitud
lon = 2.0700000000000003 #Longitud
orient = 10 #Orientacion Oeste
incl = 25 #Inclinacion
ppico = 4.62 #kW pico

# Obtención de datos
<div style = "float:right"><a style="text-decoration:none" href = "#Script-funcional">

## Datos climatológicos de los 5 días anteriores

Estos datos se obtienen del portal OpenWeather (gracias a una licencia de estudiante que permite hacer un gran número de llamadas al día) (https://openweathermap.org/api/one-call-api#history). **Datos en UTC.** Se accede a los datos climáticos horarios de los 5 días anteriores a la llamada. Los campos obtenidos son:

- ``dt``: Time of historical data, Unix, UTC
- ``temp``: Temperature. Units: kelvin
- ``feels_like``:  Temperature. This accounts for the human perception of weather. Units: kelvin
- ``pressure``: Atmospheric pressure on the sea level, hPa
- ``humidity``: Humidity, %
- ``dew_point``: Atmospheric temperature below which water droplets begin to condense and dew can form. Units: kelvin
- ``clouds``: Cloudiness, %
- ``visibility``: Average visibility, metres
- ``wind_speed``: Wind speed. Wind speed. Units: m/s
- ``wind_gust``: Wind gust. Units: m/s
- ``wind_deg``: Wind direction, degrees (meteorological)
- ``rain``: Precipitation volume, mm
- ``snow``: Snow volume, mm
- ``weather``: Incluye un id y otros parámetros

In [23]:
try:
    
    # Contraseña API
    api_key = "f21448c171f8f0584b48b3c51c9b6cd6"
        
    df_clima_ow_total = pd.DataFrame()
    
    # Para cada día histórico (5 días anteriores)
    for retardo in range(0,5):


        dia = date.today() + timedelta(days = -retardo)
        dia = "{}-{}-{}".format(dia.year, str(dia.month).zfill(2), str(dia.day).zfill(2))
        print("fecha: {}".format(date.today() + timedelta(days = -retardo)))

        dia = datetime.strptime(dia, "%Y-%m-%d")

        # Convierto datetime a timestamp
        dia_unix = int(datetime.timestamp(dia))

           
        time = dia_unix

        url = "https://api.openweathermap.org/data/2.5/onecall/timemachine?lat={}&lon={}&dt={}&appid={}".format(lat, lon, time, api_key)

        print("url: {}".format(url))
        response = get_response_OW(url)
            
            
        # Obtengo datos de la respuesta
        response = json.loads(response)
        df_clima_ow = pd.json_normalize(response["hourly"])
            
        # Genero columnas extra
            
        # Indicador de clima
        df_we = []
        # Fecha del día de los datos
        df_time = []
        # Hora
        df_hour = []
        # ID de estacion
        df_estacion = []
        # Fecha de obtención de datos
        df_fecha = []
        for m in range(0,24):
            df_we.append(df_clima_ow["weather"][m][0]["id"])
            df_estacion.append(str(str(lat)+str(lon)))
            df_time.append(datetime.utcfromtimestamp(int(df_clima_ow["dt"][m])).strftime('%Y-%m-%d'))
            df_hour.append(datetime.utcfromtimestamp(int(df_clima_ow["dt"][m])).strftime('%H:%M')) 
            df_fecha.append(fecha)
            
        df_we = pd.DataFrame(df_we, columns=['we']) 
        df_estacion = pd.DataFrame(df_estacion, columns=['estacion'])
        df_time = pd.DataFrame(df_time, columns=['date']) 
        df_hour = pd.DataFrame(df_hour, columns=['hour']) 
        df_fecha = pd.DataFrame(df_fecha, columns=['fecha prediccion']) 

        # Añado la columna con el indicador de clima
        df_clima_ow = pd.concat([df_clima_ow, df_we], axis=1)
        # Elimino la fila de weather, con más indicadores
        df_clima_ow = df_clima_ow.drop("weather", axis = 1)
            
        # Añado la columna con el ID de estacion
        df_clima_ow = pd.concat([df_estacion, df_clima_ow], axis=1)

        # Añado la columna con la fecha del día que obtengo los datos y del día y hora al que corresponde cada uno
        # Elimino la dt de la que las obtuve
        df_time = pd.concat([df_time, df_hour, df_fecha], axis=1)
        df_clima_ow = pd.concat([df_time, df_clima_ow], axis=1)
        df_clima_ow = df_clima_ow.drop("dt", axis = 1)
        
        
        df_clima_ow_total = df_clima_ow_total.append(df_clima_ow, ignore_index = True)
    
    df_clima_ow = df_clima_ow_total

except:
    print("Fallo clima")
    fallo = 1

fecha: 2021-06-16
url: https://api.openweathermap.org/data/2.5/onecall/timemachine?lat=41.29277777777778&lon=2.0700000000000003&dt=1623794400&appid=f21448c171f8f0584b48b3c51c9b6cd6
Exito
fecha: 2021-06-15
url: https://api.openweathermap.org/data/2.5/onecall/timemachine?lat=41.29277777777778&lon=2.0700000000000003&dt=1623708000&appid=f21448c171f8f0584b48b3c51c9b6cd6
Exito
fecha: 2021-06-14
url: https://api.openweathermap.org/data/2.5/onecall/timemachine?lat=41.29277777777778&lon=2.0700000000000003&dt=1623621600&appid=f21448c171f8f0584b48b3c51c9b6cd6
Exito
fecha: 2021-06-13
url: https://api.openweathermap.org/data/2.5/onecall/timemachine?lat=41.29277777777778&lon=2.0700000000000003&dt=1623535200&appid=f21448c171f8f0584b48b3c51c9b6cd6
Exito
fecha: 2021-06-12
url: https://api.openweathermap.org/data/2.5/onecall/timemachine?lat=41.29277777777778&lon=2.0700000000000003&dt=1623448800&appid=f21448c171f8f0584b48b3c51c9b6cd6
Exito


## Predicciones climatológicas de los 2 días siguientes

Estos datos se obtienen del portal OpenWeather (gracias a una licencia de estudiante que permite hacer un gran número de llamadas al día) (https://openweathermap.org/api/one-call-api). **Datos en UTC.** Se accede a la predicción climática horaria de los 2 días siguientes a la llamada. Los campos obtenidos son:

- ``dt``: Time of the forecasted data, Unix, UTC
- ``temp``: Temperature. Units: kelvin
- ``feels_like``: Temperature. This accounts for the human perception of weather. Units: kelvin
- ``pressure``: Atmospheric pressure on the sea level, hPa
- ``humidity``: Humidity, %
- ``dew_point``: Atmospheric temperature (varying according to pressure and humidity) below which water droplets begin to condense and dew can form. Units: kelvin
- ``uvi``: UV index
- ``clouds``: Cloudiness, %
- ``visibility``: Average visibility, metres
- ``wind_speed``: Wind speed. Units: m/s
- ``wind_gust``: Wind gust. Units: m/s
- ``wind_deg``: Wind direction, degrees (meteorological)
- ``pop``: Probability of precipitation
- ``rain``: Rain volume for last hour, mm
- ``snow``: Snow volume for last hour, mm
- ``weather``: Incluye un id y otros parámetros

In [24]:
try:
    
    api_key= "f21448c171f8f0584b48b3c51c9b6cd6"

    exclude = "current,minutely,daily,alerts"

    url = "https://api.openweathermap.org/data/2.5/onecall?lat={}&lon={}&exclude={}&appid={}".format(lat, lon, exclude, api_key)


    response = get_response_OW(url)
    response = json.loads(response)
    df_pred_ow = pd.json_normalize(response["hourly"])


    # Genero columnas extra
            
    # Indicador de clima
    df_we = []
    # Fecha del día de los datos
    df_time = []
    # Hora
    df_hour = []
    # ID de estacion
    df_estacion = []
    # Fecha de obtención de datos
    df_fecha = []
    for m in range(0,48):
        df_we.append(df_pred_ow["weather"][m][0]["id"])
        df_estacion.append(str(str(lat)+str(lon)))
        df_time.append(datetime.utcfromtimestamp(int(df_pred_ow["dt"][m])).strftime('%Y-%m-%d'))
        df_hour.append(datetime.utcfromtimestamp(int(df_pred_ow["dt"][m])).strftime('%H:%M'))
        df_fecha.append(fecha)
    df_we = pd.DataFrame(df_we, columns=['we'])  
    df_estacion = pd.DataFrame(df_estacion, columns=['estacion'])
    df_time = pd.DataFrame(df_time, columns=['date']) 
    df_hour = pd.DataFrame(df_hour, columns=['hour'])
    df_fecha = pd.DataFrame(df_fecha, columns=['fecha prediccion']) 
        
    # Añado el indicador de clima y elimino la columna weather, con más valores
    df_pred_ow = pd.concat([df_pred_ow, df_we], axis = 1)
    df_pred_ow = df_pred_ow.drop("weather", axis = 1)
        
    # Añado el ID de estación
    df_pred_ow = pd.concat([df_estacion, df_pred_ow], axis=1)

    # Añado la fecha del día de la petición de los datos y el día y hora al que correpsonden
    # Elimino la columna dt, del que obtengo los valores
    df_time = pd.concat([df_time, df_hour, df_fecha], axis=1)
    df_pred_ow = pd.concat([df_time, df_pred_ow], axis=1)
    df_pred_ow = df_pred_ow.drop("dt", axis = 1)


except:
    print("Fallo pred")
    fallo = 1

Exito


## Radiación del día anterior

**Estos datos solo están disponibles para las diferentes estaciones de radiación.**

Datos horarios (**HORA SOLAR VERDADERA**) acumulados de radiación global, directa, difusa e infrarroja. Estos datos se obtienen del portal Opendata de AEMET (https://opendata.aemet.es/centrodedescargas/productosAEMET). Los campos obtenidos para cada día son:

- ``Estación``: Nombre Estación
- ``Indicativo``: Indicativo Climatológico Estación
- ``Tipo``: Variable medida (Global/Difusa/Directa/UV Eritemática/Infrarroja)
- ``GL/DF/DT``: Radiación horaria acumulada entre: (hora indicada -1) y (hora indicada) entre las 5 y las 20 Hora Solar Verdadera. Variables: Global/Difusa/Directa (10*kJ/m²)
- ``UVER``: Radiación semihoraria acumulada entre: (hora:minutos indicados - 30 minutos y (hora:minutos indicados) entre las 4:30 y las 20 Hora  Solar Verdadera. Variables: Radiación Ultravioleta Eritemática (J/m²)
- ``IR``: Radiación horaria acumulada entre (hora indicada -1) y (hora indicada) entre las 1 y las 24 Hora Solar Verdadera. Variables: Radiación Infrarroja (10*kJ/m²)
- ...

No será necesario transformar la hora por es aproximadamente igual a la UTC (https://relojesdesol.info/node/748)

In [25]:
try:
    import csv
    
    api_key = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJhbGVqYW5kcm8ucnVpei5iZXJjaWFub0BnbWFpbC5jb20iLCJqdGkiOiI2NDNmZjZmMi04OTQyLTQ1YzYtODIxNC0yZGU4NmQzMDU0NWYiLCJpc3MiOiJBRU1FVCIsImlhdCI6MTYxMzQ3NjEwNywidXNlcklkIjoiNjQzZmY2ZjItODk0Mi00NWM2LTgyMTQtMmRlODZkMzA1NDVmIiwicm9sZSI6IiJ9.CCEfI4NjKp9kiTCFsNLQFB-u_oLhcXJTEtdHluoToe8"

    url_base = "https://opendata.aemet.es/opendata/api"

    estaciones_url = "red/especial/radiacion"

    resp = get_response_aemet(url_base, estaciones_url, api_key)

    # Se procesan los datos
    datos_rad = resp[32:]

    lines = datos_rad.splitlines()
    fecha_lines = (resp.splitlines()[1])
    fecha_lines = fecha_lines[1:len(fecha_lines)-1]

    reader = csv.reader(lines)
    parsed_csv = list(reader)

    titulos = [palabra.strip() for palabra in parsed_csv[0][0].replace(';', ', ').replace("\"", "").split(",")]
    filas = [[palabra.strip() for palabra in fila[0].replace(';', ', ').replace("\"", "").split(",")] for fila in parsed_csv[1:]]
    
    
    # Las estaciones 9 y 16 tienen el titulo partido, hay que unir los strings
    filas[15][0] = filas[15][0] + filas[15][1]
    filas[15].pop(1)

    
    # Se corrige un indicativo
    df_rad_aemet = pd.DataFrame(columns = titulos, data = filas)
    df_rad_aemet.loc[df_rad_aemet['Indicativo'] == '6156', 'Indicativo'] = '6156X'

    # Se añade la columna con la fecha a la que corresponden los datos
    # Además se asegura que los nombres de las estaciones que se encuentran también en la lista de estaciones sean los mismos
    df_fecha = []
    for i in range(0, len(df_rad_aemet['Indicativo'])):
        for j in range(0, len(df_estaciones['indicativo'])):
            if (df_rad_aemet['Indicativo'][i] == df_estaciones['indicativo'][j]):
                df_rad_aemet.loc[df_rad_aemet['Indicativo'] == df_estaciones['indicativo'][j], 'Estación'] = df_estaciones['nombre'][j]
        for j in range(0, len(titulos)):
            if(df_rad_aemet.iloc[i][j] == ""):
                continue
        df_fecha.append(fecha)

    df_fecha = pd.DataFrame(df_fecha, columns=['fecha']) 
    df_rad_aemet = pd.concat([df_fecha, df_rad_aemet], axis=1)

except:
    print("Fallo aemet")
    fallo = 1

Exito


## Datos de radiación de dos días antes

Estos datos se obtienen del portal CAMS Radiation Service de la Unión Europea (http://www.soda-pro.com/web-services/radiation/cams-radiation-service). **En hora UTC.** Proporciona la radiación para cualquier fecha hasta 2 días antes de la llamada (retardo de 3 días). Los campos obtenidos para cada día son:

- ``Observation period``: Beginning/end of the time period with the format "yyyy-mm-ddTHH:MM:SS.S/yyyy-mm-ddTHH:MM:SS.S"
- ``TOA``: Irradiation on horizontal plane at the top of atmosphere (Wh/m2) computed from Solar Geometry 2
- ``Clear sky GHI``: Clear sky global irradiation on horizontal plane at ground level (Wh/m2)
- ``Clear sky BHI``: Clear sky beam irradiation on horizontal plane at ground level (Wh/m2)
- ``Clear sky DHI``: Clear sky diffuse irradiation on horizontal plane at ground level (Wh/m2)
- ``Clear sky BNI``: Clear sky beam irradiation on mobile plane following the sun at normal incidence (Wh/m2)
- ``GHI``: Global irradiation on horizontal plane at ground level (Wh/m2)
- ``BHI``: Beam irradiation on horizontal plane at ground level (Wh/m2)
- ``DHI``: Diffuse irradiation on horizontal plane at ground level (Wh/m2)
- ``BNI``: Beam irradiation on mobile plane following the sun at normal incidence (Wh/m2)
- ``Reliability``: Proportion of reliable data in the summarization (0-1)

In [26]:
try:
    
    import math
    import time
    from bs4 import BeautifulSoup
    
    dia = date.today() + timedelta(days = -2)
    fecha_buscar = "{}-{}-{}".format(dia.year, str(dia.month).zfill(2), str(dia.day).zfill(2))
    print(fecha_buscar)
    fecha_ini = fecha_buscar
    fecha_fin = fecha_buscar    

    print(lat, lon)

    correo = 'alejandro.ruiz.berciano%2540gmail.com'
        
    url = 'http://www.soda-is.com/service/wps?Service=WPS&Request=Execute&Identifier=get_cams_radiation&version=1.0.0&DataInputs=latitude={};longitude={};altitude=-999;date_begin={};date_end={};time_ref=UT;summarization=PT01H;username={}&RawDataOutput=irradiation'.format(lat, lon, fecha_ini, fecha_fin, correo)
    print(url)

    response = requests.get(url)
        
    # Se convierte la respuesta en texto y se determina cuántas líneas hay hasta los datos
    soup = BeautifulSoup(response.content)

    f = soup.text
    nbTotalLines = 0
    nbLinesToSkip = 0
    nbTotalLines, nbLinesToSkip = openAndSkipLines(f, '#')

    if(nbTotalLines < 0):
        print('No hay datos')
        exit()
    sizeData = nbTotalLines - nbLinesToSkip
        
    # Se crea el data frame y se añade una columna con el ID de la estación
    df_soda = getCamsData(f, nbLinesToSkip)
    df_soda.insert(len(df_soda.columns),"estacion",list(np.repeat([str(str(lat)+str(lon))], len(df_soda["dateEnds"]))),True)
    
except:
    print("Fallo soda")
    fallo = 1

2021-06-14
41.29277777777778 2.0700000000000003
http://www.soda-is.com/service/wps?Service=WPS&Request=Execute&Identifier=get_cams_radiation&version=1.0.0&DataInputs=latitude=41.29277777777778;longitude=2.0700000000000003;altitude=-999;date_begin=2021-06-14;date_end=2021-06-14;time_ref=UT;summarization=PT01H;username=alejandro.ruiz.berciano%2540gmail.com&RawDataOutput=irradiation


# Limpieza de datos
<div style = "float:right"><a style="text-decoration:none" href = "#Script-funcional">

## Datos climáticos de 5 días anteriores

Estos datos se obtienen del portal OpenWeather (gracias a una licencia de estudiante que permite hacer un gran número de llamadas al día) (https://openweathermap.org/api/one-call-api#history). **Datos en UTC.** Se accede a los datos climáticos horarios de los 5 días anteriores a la llamada. Los campos obtenidos son:

- ``dt``: Time of historical data, Unix, UTC
- ``temp``: Temperature. Units: kelvin
- ``feels_like``:  Temperature. This accounts for the human perception of weather. Units: kelvin
- ``pressure``: Atmospheric pressure on the sea level, hPa
- ``humidity``: Humidity, %
- ``dew_point``: Atmospheric temperature below which water droplets begin to condense and dew can form. Units: kelvin
- ``clouds``: Cloudiness, %
- ``visibility``: Average visibility, metres
- ``wind_speed``: Wind speed. Wind speed. Units: m/s
- ``wind_gust``: Wind gust. Units: m/s
- ``wind_deg``: Wind direction, degrees (meteorological)
- ``rain``: Precipitation volume, mm
- ``snow``: Snow volume, mm
- ``we``: Incluye un id que indica el tipo de tiempo meteorológico

La hora X contiene los datos transcurridos entre las X:00 y las X:59

In [27]:
def clima_ow_clean(df_datos):
    
    # Convierto las columnas a los tipos de dato correctos

    df_datos["hour"] = pd.to_numeric([np.nan if pd.isna(c) == True else str(c)[:2] for c in df_datos["hour"]])
    df_datos = df_datos[(df_datos["hour"] < hora_fin) & (df_datos["hour"] >= hora_ini)]
    df_datos.reset_index(drop=True, inplace=True)
    
    # Elimino Na's
    
    try:
        df_datos.fillna({'visibility': df_datos["visibility"].mean(), 'wind_gust': df_datos["wind_gust"].mean()}, inplace = True)
    except:
        df_datos.fillna({'visibility': 0, 'wind_gust': 0}, inplace = True)
        
    try:
        df_datos.drop(['rain.1h'], axis=1, inplace = True)
    except:
        pass

    try:
        df_datos.drop(['snow.1h'], axis=1, inplace = True)
    except:
        pass
    
    df_datos = df_datos.fillna(0)
    
    # Se eliminan filas reptidas
    
    df_datos = df_datos.drop_duplicates(['date', 'hour', "fecha prediccion", "estacion"],
                        keep = 'first')
    df_datos.reset_index(drop = True, inplace = True)
    
    return df_datos

In [28]:
# Se llama a la función
try:
    df_clima_clean = clima_ow_clean(df_clima_ow)
except:
    fallo = 1
    
df_clima_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().fillna(


Unnamed: 0,date,hour,fecha prediccion,estacion,temp,feels_like,pressure,humidity,dew_point,clouds,visibility,wind_speed,wind_deg,wind_gust,we
0,2021-06-15,4,2021-06-16,41.292777777777782.0700000000000003,295.88,293.18,1015,46,283.67,0,10000.0,4.12,340,2.283,800
1,2021-06-15,5,2021-06-16,41.292777777777782.0700000000000003,295.89,295.99,1016,53,285.83,20,10000.0,1.03,60,2.283,801
2,2021-06-15,6,2021-06-16,41.292777777777782.0700000000000003,297.29,298.0,1017,73,292.15,0,10000.0,3.6,240,2.283,800
3,2021-06-15,7,2021-06-16,41.292777777777782.0700000000000003,298.87,298.18,1017,47,286.72,0,10000.0,2.57,340,2.283,800
4,2021-06-15,8,2021-06-16,41.292777777777782.0700000000000003,301.19,300.1,1017,32,282.97,0,10000.0,1.54,260,2.283,800


## Predicciones climatológicas de los 2 días siguientes

Estos datos se obtienen del portal OpenWeather (gracias a una licencia de estudiante que permite hacer un gran número de llamadas al día) (https://openweathermap.org/api/one-call-api). **Datos en UTC.** Se accede a la predicción climática horaria de los 2 días siguientes a la llamada. Los campos obtenidos son:

- ``dt``: Time of the forecasted data, Unix, UTC
- ``temp``: Temperature. Units: kelvin
- ``feels_like``: Temperature. This accounts for the human perception of weather. Units: kelvin
- ``pressure``: Atmospheric pressure on the sea level, hPa
- ``humidity``: Humidity, %
- ``dew_point``: Atmospheric temperature (varying according to pressure and humidity) below which water droplets begin to condense and dew can form. Units: kelvin
- ``uvi``: UV index
- ``clouds``: Cloudiness, %
- ``visibility``: Average visibility, metres
- ``wind_speed``: Wind speed. Units: m/s
- ``wind_gust``: Wind gust. Units: m/s
- ``wind_deg``: Wind direction, degrees (meteorological)
- ``pop``: Probability of precipitation
- ``rain``: Rain volume for last hour, mm
- ``snow``: Snow volume for last hour, mm
- ``weather``: Incluye un id y otros parámetros

La hora X contiene los datos transcurridos entre las X:00 y las X:59

In [29]:
def pred_clean(df_datos):
    
    # Convierto las columnas a los tipos de dato correctos
    
    df_datos["hour"] = pd.to_numeric([np.nan if pd.isna(c) == True else str(c)[:2] for c in df_datos["hour"]])
    df_datos = df_datos[(df_datos["hour"] < hora_fin) & (df_datos["hour"] >= hora_ini)]
    df_datos.reset_index(drop=True, inplace=True)
    
    # Elimino Na's
    
    try:
        df_datos.drop(['rain.1h'], axis=1, inplace = True)
    except:
        pass
    try:
        df_datos.drop(['snow.1h'], axis=1, inplace = True)
    except:
        pass
    
    df_datos = df_datos.fillna(0)
    
    # Se elimina posibles filas repetidas
    
    df_datos = df_datos.drop_duplicates(['date', 'hour', "fecha prediccion", "estacion"],
                        keep = 'first')
    df_datos.reset_index(drop = True, inplace = True)
    
    return df_datos

In [30]:
# Se llama a la función
try:
    df_pred_clean = pred_clean(df_pred_ow)
except:
    fallo = 1
    
df_pred_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,date,hour,fecha prediccion,estacion,temp,feels_like,pressure,humidity,dew_point,uvi,clouds,visibility,wind_speed,wind_deg,wind_gust,pop,we
0,2021-06-16,13,2021-06-16,41.292777777777782.0700000000000003,301.65,302.74,1013,55,291.74,8.05,9,10000,4.17,110,5.27,0.0,800
1,2021-06-16,14,2021-06-16,41.292777777777782.0700000000000003,302.28,303.24,1013,52,291.42,6.5,0,10000,3.99,103,4.76,0.0,800
2,2021-06-16,15,2021-06-16,41.292777777777782.0700000000000003,301.58,302.64,1013,55,291.67,4.52,16,10000,2.07,151,2.55,0.0,801
3,2021-06-16,16,2021-06-16,41.292777777777782.0700000000000003,301.02,302.1,1013,57,291.72,2.02,34,10000,2.6,208,3.17,0.0,802
4,2021-06-16,17,2021-06-16,41.292777777777782.0700000000000003,300.17,301.18,1012,59,291.48,0.91,53,10000,1.1,151,1.47,0.0,803


## Datos de radiación del día anterior

**Estos datos solo están disponibles para las diferentes estaciones de radiación.**

Datos horarios (**HORA SOLAR VERDADERA**) acumulados de radiación global, directa, difusa e infrarroja. Estos datos se obtienen del portal Opendata de AEMET (https://opendata.aemet.es/centrodedescargas/productosAEMET). Los campos obtenidos para cada día son:

- ``Estación``: Nombre Estación
- ``Indicativo``: Indicativo Climatológico Estación
- ``Tipo``: Variable medida (Global/Difusa/Directa/UV Eritemática/Infrarroja)
- ``GL/DF/DT``: Radiación horaria acumulada entre: (hora indicada -1) y (hora indicada) entre las 5 y las 20 Hora Solar Verdadera. Variables: Global/Difusa/Directa (10*kJ/m²)
- ``UVER``: Radiación semihoraria acumulada entre: (hora:minutos indicados - 30 minutos y (hora:minutos indicados) entre las 4:30 y las 20 Hora  Solar Verdadera. Variables: Radiación Ultravioleta Eritemática (J/m²)
- ``IR``: Radiación horaria acumulada entre (hora indicada -1) y (hora indicada) entre las 1 y las 24 Hora Solar Verdadera. Variables: Radiación Infrarroja (10*kJ/m²)
- ...

La hora X contiene los datos transcurridos entre las X-1:00 y las X:00

In [31]:
def rad_aemet_clean(df_datos):
    
    # Cambio los nombres de columna como corresponde
    columnas = []
    for i in df_rad_aemet.columns:

        if (i+".3") in columnas:
            columnas.append(i+".4")
        elif (i+".2") in columnas:
            columnas.append(i+".3")
        elif (i+".1") in columnas:
            columnas.append(i+".2")
        elif i in columnas:
            columnas.append(i+".1")
        elif i not in columnas:
            columnas.append(i) 
    df_datos.columns = columnas

    
    # Genero el dataset por hora
    
    hora_ini_aemet = 5
    hora_fin_aemet = 21
    dif = int(int(hora_fin_aemet)-int(hora_ini_aemet))
    df_rad_horas = pd.DataFrame(columns = ["fecha", "hora", "estacion", "indicativo", "GL", "DF", "DT", "UVB", "IR"])

    for i, fila in df_datos.iterrows():

        for j in range(0, dif):
            hora = 5+j
            col_gl = str(hora)
            col_df = str(hora) + ".1"
            col_dt = str(hora) + ".2"
            col_uvb = str(hora) + ".3"
            col_uvb_2 = str(hora-1) + ".5"
            col_ir = str(hora) + ".4"
            df_rad_horas = df_rad_horas.append({'fecha' : fila["fecha"], 'estacion' : fila["Estación"], 'indicativo' : fila["Indicativo"], 'GL' : fila[col_gl], 'DF' : fila[col_df], 'DT' : fila[col_dt], 'UVB' : (fila[col_uvb] + fila[col_uvb_2]), 'IR' : fila[col_ir], 'hora' : hora-1}, ignore_index = True)
    df_rad_horas.drop(['DF'], axis=1, inplace = True)
    df_rad_horas.drop(['DT'], axis=1, inplace = True)
    
    # Convierto las columnas a los tipos de dato correctos
    
    df_rad_horas["hora"] = pd.to_numeric([np.nan if pd.isna(c) == True else int(c) for c in df_rad_horas["hora"]])
    df_rad_horas["GL"] = pd.to_numeric([np.nan if (pd.isna(c) == True) or (c == "") else float(c) for c in df_rad_horas["GL"]])
    df_rad_horas["UVB"] = pd.to_numeric([np.nan if (pd.isna(c) == True) or (c == "") else float(c) for c in df_rad_horas["UVB"]])
    df_rad_horas["IR"] = pd.to_numeric([np.nan if (pd.isna(c) == True) or (c == "") else float(c) for c in df_rad_horas["IR"]])
    df_rad_horas["GL"] = df_rad_horas["GL"] *10/3.6
    df_rad_horas["UVB"] = df_rad_horas["UVB"] *1/(3.6*1000)
    df_rad_horas["IR"] = df_rad_horas["IR"] *10/3.6
    
    # Elimino Na's
    
    try:
        df_rad_horas.fillna({'IR': df_rad_horas["IR"].mean(), "UVB": df_rad_horas["UVB"].mean()}, inplace = True)
    except:
        df_rad_horas.fillna({'IR': 0, "UVB": 0}, inplace = True)
    
    df_rad_horas = df_rad_horas.fillna(0)
    
    # Se eliminan posibles filas repetidas
    
    df_rad_horas = df_rad_horas.drop_duplicates(['fecha', 'hora', "indicativo"], keep = 'first')
    df_rad_horas.reset_index(drop = True, inplace = True)

    
    return df_rad_horas

In [32]:
# Se llama a la función
try:
    df_aemet_clean = rad_aemet_clean(df_rad_aemet)
except:
    fallo = 1
    
df_aemet_clean.head()

Unnamed: 0,fecha,hora,estacion,indicativo,GL,UVB,IR
0,2021-06-16,4,A CORUÑA,1387,13.888889,0.005556,338.888889
1,2021-06-16,5,A CORUÑA,1387,119.444444,0.052222,341.666667
2,2021-06-16,6,A CORUÑA,1387,291.666667,1.536944,338.888889
3,2021-06-16,7,A CORUÑA,1387,480.555556,3.328889,338.888889
4,2021-06-16,8,A CORUÑA,1387,655.555556,56.155278,341.666667


## Datos de radiación de dos días antes

Estos datos se obtienen del portal CAMS Radiation Service de la Unión Europea (http://www.soda-pro.com/web-services/radiation/cams-radiation-service). **En hora UTC.** Proporciona la radiación para cualquier fecha hasta 2 días antes de la llamada (retardo de 3 días). Los campos obtenidos para cada día son:

- ``Observation period``: Beginning/end of the time period with the format "yyyy-mm-ddTHH:MM:SS.S/yyyy-mm-ddTHH:MM:SS.S"
- ``TOA``: Irradiation on horizontal plane at the top of atmosphere (Wh/m2) computed from Solar Geometry 2
- ``Clear sky GHI``: Clear sky global irradiation on horizontal plane at ground level (Wh/m2)
- ``Clear sky BHI``: Clear sky beam irradiation on horizontal plane at ground level (Wh/m2)
- ``Clear sky DHI``: Clear sky diffuse irradiation on horizontal plane at ground level (Wh/m2)
- ``Clear sky BNI``: Clear sky beam irradiation on mobile plane following the sun at normal incidence (Wh/m2)
- ``GHI``: Global irradiation on horizontal plane at ground level (Wh/m2)
- ``BHI``: Beam irradiation on horizontal plane at ground level (Wh/m2)
- ``DHI``: Diffuse irradiation on horizontal plane at ground level (Wh/m2)
- ``BNI``: Beam irradiation on mobile plane following the sun at normal incidence (Wh/m2)
- ``Reliability``: Proportion of reliable data in the summarization (0-1)

La hora X contiene los datos transcurridos entre las X:00 y las X:59

In [33]:
def soda_clean(df_datos):
    
    # Quito columnas innecesarias
    
    df_datos.drop(['dateEnds'], axis=1, inplace = True)
    df_datos.drop(['toa'], axis=1, inplace = True)
    df_datos.drop(['cs_ghi'], axis=1, inplace = True)
    df_datos.drop(['cs_bhi'], axis=1, inplace = True)
    df_datos.drop(['cs_dhi'], axis=1, inplace = True)
    df_datos.drop(['cs_bni'], axis=1, inplace = True)
    df_datos.drop(['bhi'], axis=1, inplace = True)
    df_datos.drop(['dhi'], axis=1, inplace = True)
    df_datos.drop(['bni'], axis=1, inplace = True)
    df_datos.drop(['reliability'], axis=1, inplace = True)
    
    # Quito NAs
    df_datos = df_datos.fillna(0)
    
    # Convierto las columnas a los tipos de dato correctos
    
    df_datos['dateBegins'] = pd.to_datetime(df_datos['dateBegins'])
    df_datos = df_datos.rename(columns={'dateBegins':'date'})
    df_datos['hora'] = pd.to_datetime(df_datos['date']).dt.hour
    df_datos['fecha'] = [str(a)[0:10] for a in df_datos['date']]
    df_datos = df_datos[(df_datos["hora"] < hora_fin) & (df_datos["hora"] >= hora_ini)]
    df_datos.reset_index(drop = True, inplace = True)
    
    # Se eliminan posibles filas repetidas
    
    df_datos = df_datos.drop_duplicates(["date", 'fecha', 'hora', "estacion"], keep = 'first')
    df_datos.reset_index(drop = True, inplace = True)
    df_datos.head()

    
    return df_datos

In [34]:
# Se llama a la función
try:
    df_soda_clean = soda_clean(df_soda)
except:
    fallo = 1
    
df_soda_clean.head()

Unnamed: 0,date,ghi,estacion,hora,fecha
0,2021-06-14 04:00:00,2.5472,41.292777777777782.0700000000000003,4,2021-06-14
1,2021-06-14 05:00:00,120.0472,41.292777777777782.0700000000000003,5,2021-06-14
2,2021-06-14 06:00:00,292.7581,41.292777777777782.0700000000000003,6,2021-06-14
3,2021-06-14 07:00:00,468.8891,41.292777777777782.0700000000000003,7,2021-06-14
4,2021-06-14 08:00:00,631.936,41.292777777777782.0700000000000003,8,2021-06-14


# Generación de las filas con los días en columnas

Para cada día que se toman datos, hay que tener solamente una fila por hora y estación. Por ejemplo, para los datos históricos de clima de los últimos 5 días, los datos de cada día deben ser columnas asociadas a las horas del día que se descargó el data frame (para cada estación, de 4 a 19, columnas con los datos del día anterior, columnas con los del previo...).

## Datos históricos de clima

In [35]:
df_clima_total = df_clima_clean
df_clima_total.head()

Unnamed: 0,date,hour,fecha prediccion,estacion,temp,feels_like,pressure,humidity,dew_point,clouds,visibility,wind_speed,wind_deg,wind_gust,we
0,2021-06-15,4,2021-06-16,41.292777777777782.0700000000000003,295.88,293.18,1015,46,283.67,0,10000.0,4.12,340,2.283,800
1,2021-06-15,5,2021-06-16,41.292777777777782.0700000000000003,295.89,295.99,1016,53,285.83,20,10000.0,1.03,60,2.283,801
2,2021-06-15,6,2021-06-16,41.292777777777782.0700000000000003,297.29,298.0,1017,73,292.15,0,10000.0,3.6,240,2.283,800
3,2021-06-15,7,2021-06-16,41.292777777777782.0700000000000003,298.87,298.18,1017,47,286.72,0,10000.0,2.57,340,2.283,800
4,2021-06-15,8,2021-06-16,41.292777777777782.0700000000000003,301.19,300.1,1017,32,282.97,0,10000.0,1.54,260,2.283,800


Genero filas con todos los días de cada llamada. Etiqueto la columnas en función del día (d-1, d-2...)

In [36]:
columnas_1 = [col for col in df_clima_total.columns[1:4]] + [str(col+"_d-1") for col in df_clima_total.columns[4:]]
columnas_2 = [col for col in df_clima_total.columns[1:4]] + [str(col+"_d-2") for col in df_clima_total.columns[4:]]
columnas_3 = [col for col in df_clima_total.columns[1:4]] + [str(col+"_d-3") for col in df_clima_total.columns[4:]]
columnas_4 = [col for col in df_clima_total.columns[1:4]] + [str(col+"_d-4") for col in df_clima_total.columns[4:]]
columnas_5 = [col for col in df_clima_total.columns[1:4]] + [str(col+"_d-5") for col in df_clima_total.columns[4:]]

In [37]:
df_clima_dias_1 = pd.DataFrame(columns = columnas_1)
df_clima_dias_2 = pd.DataFrame(columns = columnas_2)
df_clima_dias_3 = pd.DataFrame(columns = columnas_3)
df_clima_dias_4 = pd.DataFrame(columns = columnas_4)
df_clima_dias_5 = pd.DataFrame(columns = columnas_5)

for i, fila in df_clima_total.iterrows():
    
    if (i in list(range(0, len(df_clima_total["date"]), 5000))) | (i == len(df_clima_total["date"])-1):
        print("Procesando fila {} de {}".format(i, len(df_clima_total["date"])))
        print("La cantidad de filas de los datasets (aproximadamente 1/5) es {}".format(len(df_clima_dias_1["hour"])))

    
    # Para cada fila horaria, detecto de a que día pertenece y la adjunto al dataset correspondiente
    if (pd.to_datetime(fila["fecha prediccion"]) - pd.to_datetime(fila["date"])).days == 1:
        df_clima_dias_1.loc[len(df_clima_dias_1["fecha prediccion"])] = [elem for elem in fila][1:]

    if (pd.to_datetime(fila["fecha prediccion"]) - pd.to_datetime(fila["date"])).days == 2:
        df_clima_dias_2.loc[len(df_clima_dias_2["fecha prediccion"])] = [elem for elem in fila][1:]
    
    if (pd.to_datetime(fila["fecha prediccion"]) - pd.to_datetime(fila["date"])).days == 3:
        df_clima_dias_3.loc[len(df_clima_dias_3["fecha prediccion"])] = [elem for elem in fila][1:]
        
    if (pd.to_datetime(fila["fecha prediccion"]) - pd.to_datetime(fila["date"])).days == 4:
        df_clima_dias_4.loc[len(df_clima_dias_4["fecha prediccion"])] = [elem for elem in fila][1:]
        
    if (pd.to_datetime(fila["fecha prediccion"]) - pd.to_datetime(fila["date"])).days == 5:
        df_clima_dias_5.loc[len(df_clima_dias_5["fecha prediccion"])] = [elem for elem in fila][1:]
        

df_clima_dias_1.head()

Procesando fila 0 de 80
La cantidad de filas de los datasets (aproximadamente 1/5) es 0
Procesando fila 79 de 80
La cantidad de filas de los datasets (aproximadamente 1/5) es 16


Unnamed: 0,hour,fecha prediccion,estacion,temp_d-1,feels_like_d-1,pressure_d-1,humidity_d-1,dew_point_d-1,clouds_d-1,visibility_d-1,wind_speed_d-1,wind_deg_d-1,wind_gust_d-1,we_d-1
0,4,2021-06-16,41.292777777777782.0700000000000003,295.88,293.18,1015,46,283.67,0,10000.0,4.12,340,2.283,800
1,5,2021-06-16,41.292777777777782.0700000000000003,295.89,295.99,1016,53,285.83,20,10000.0,1.03,60,2.283,801
2,6,2021-06-16,41.292777777777782.0700000000000003,297.29,298.0,1017,73,292.15,0,10000.0,3.6,240,2.283,800
3,7,2021-06-16,41.292777777777782.0700000000000003,298.87,298.18,1017,47,286.72,0,10000.0,2.57,340,2.283,800
4,8,2021-06-16,41.292777777777782.0700000000000003,301.19,300.1,1017,32,282.97,0,10000.0,1.54,260,2.283,800


Los uno por fecha predicción, estación y hora, para generar filas por cada día de llamada, hora y estación

In [38]:
df_total = pd.merge(df_clima_dias_1, df_clima_dias_2, how = "inner", on = ["hour", "fecha prediccion", "estacion"])
df_total = pd.merge(df_total, df_clima_dias_3, how = "inner", on = ["hour", "fecha prediccion", "estacion"])
df_total = pd.merge(df_total, df_clima_dias_4, how = "inner", on = ["hour", "fecha prediccion", "estacion"])
df_total = pd.merge(df_total, df_clima_dias_5, how = "inner", on = ["hour", "fecha prediccion", "estacion"])
df_total.head()

Unnamed: 0,hour,fecha prediccion,estacion,temp_d-1,feels_like_d-1,pressure_d-1,humidity_d-1,dew_point_d-1,clouds_d-1,visibility_d-1,wind_speed_d-1,wind_deg_d-1,wind_gust_d-1,we_d-1,temp_d-2,feels_like_d-2,pressure_d-2,humidity_d-2,dew_point_d-2,clouds_d-2,visibility_d-2,wind_speed_d-2,wind_deg_d-2,wind_gust_d-2,we_d-2,temp_d-3,feels_like_d-3,pressure_d-3,humidity_d-3,dew_point_d-3,clouds_d-3,visibility_d-3,wind_speed_d-3,wind_deg_d-3,wind_gust_d-3,we_d-3,temp_d-4,feels_like_d-4,pressure_d-4,humidity_d-4,dew_point_d-4,clouds_d-4,visibility_d-4,wind_speed_d-4,wind_deg_d-4,wind_gust_d-4,we_d-4,temp_d-5,feels_like_d-5,pressure_d-5,humidity_d-5,dew_point_d-5,clouds_d-5,visibility_d-5,wind_speed_d-5,wind_deg_d-5,wind_gust_d-5,we_d-5
0,4,2021-06-16,41.292777777777782.0700000000000003,295.88,293.18,1015,46,283.67,0,10000.0,4.12,340,2.283,800,296.38,294.46,1017,53,286.28,0,10000.0,4.12,300,2.283,800,295.57,292.34,1020,53,285.53,0,10000.0,5.66,310,2.283,800,294.15,290.71,1018,64,287.09,0,10000.0,6.69,330,2.283,800,296.58,296.92,1017,57,287.58,32,10000.0,1.53,322,1.78,802
1,5,2021-06-16,41.292777777777782.0700000000000003,295.89,295.99,1016,53,285.83,20,10000.0,1.03,60,2.283,801,296.67,296.28,1017,53,286.55,0,10000.0,2.06,250,2.283,800,295.8,292.91,1020,56,286.58,0,10000.0,5.66,310,2.283,800,294.41,291.41,1019,64,287.33,0,10000.0,6.17,330,2.283,800,296.58,296.92,1017,57,287.58,32,10000.0,1.53,322,1.78,802
2,6,2021-06-16,41.292777777777782.0700000000000003,297.29,298.0,1017,73,292.15,0,10000.0,3.6,240,2.283,800,297.97,296.24,1017,50,286.85,0,10000.0,4.12,320,2.283,800,297.52,295.24,1020,53,287.33,0,10000.0,5.14,320,2.283,800,295.62,292.31,1019,60,287.48,0,10000.0,6.69,330,2.283,800,296.58,296.92,1017,57,287.58,32,10000.0,1.53,322,1.78,802
3,7,2021-06-16,41.292777777777782.0700000000000003,298.87,298.18,1017,47,286.72,0,10000.0,2.57,340,2.283,800,299.16,297.48,1017,47,286.98,0,10000.0,4.12,320,2.283,800,299.07,297.0,1020,47,286.9,0,10000.0,4.63,310,2.283,800,297.24,294.51,1019,53,287.07,0,10000.0,5.66,320,2.283,800,297.2,297.92,1017,56,287.89,37,10000.0,1.14,354,1.3,802
4,8,2021-06-16,41.292777777777782.0700000000000003,301.19,300.1,1017,32,282.97,0,10000.0,1.54,260,2.283,800,301.23,301.3,1018,47,288.87,0,10000.0,2.57,300,2.283,800,301.74,299.87,1020,39,286.44,0,10000.0,4.12,50,2.283,800,299.36,296.71,1020,41,285.08,0,10000.0,4.63,330,2.283,800,297.97,299.22,1017,54,288.04,36,10000.0,0.46,74,0.99,802


In [39]:
df_clima_clean = df_total

## Datos de predicción de clima

In [40]:
df_pred_total = df_pred_clean
df_pred_total.head()

Unnamed: 0,date,hour,fecha prediccion,estacion,temp,feels_like,pressure,humidity,dew_point,uvi,clouds,visibility,wind_speed,wind_deg,wind_gust,pop,we
0,2021-06-16,13,2021-06-16,41.292777777777782.0700000000000003,301.65,302.74,1013,55,291.74,8.05,9,10000,4.17,110,5.27,0.0,800
1,2021-06-16,14,2021-06-16,41.292777777777782.0700000000000003,302.28,303.24,1013,52,291.42,6.5,0,10000,3.99,103,4.76,0.0,800
2,2021-06-16,15,2021-06-16,41.292777777777782.0700000000000003,301.58,302.64,1013,55,291.67,4.52,16,10000,2.07,151,2.55,0.0,801
3,2021-06-16,16,2021-06-16,41.292777777777782.0700000000000003,301.02,302.1,1013,57,291.72,2.02,34,10000,2.6,208,3.17,0.0,802
4,2021-06-16,17,2021-06-16,41.292777777777782.0700000000000003,300.17,301.18,1012,59,291.48,0.91,53,10000,1.1,151,1.47,0.0,803


Genero filas con todos los días de cada llamada. Clasifico los días en 2 grupos. El primero contendrá las horas del día que se obtienen los datos y de dos días después. EL segundo los datos del día siguiente al que se obtienen los datos. Como se descragan las predicciones de las 48 horas siguientes a la llamada, estos dos grupos tendrán 14 valores cada uno para cada día y ubicación (16 tras el filtrado de horas útiles).

In [41]:
columnas_1 = [col for col in df_pred_total.columns[1:4]] + [str(col+"_pred_1") for col in df_pred_total.columns[4:]]
columnas_2 = [col for col in df_pred_total.columns[1:4]] + [str(col+"_pred_2") for col in df_pred_total.columns[4:]]

In [42]:
df_pred_dias_1 = pd.DataFrame(columns = columnas_1)
df_pred_dias_2 = pd.DataFrame(columns = columnas_2)

for i, fila in df_pred_total.iterrows():
    
    if (i in list(range(0,len(df_pred_total["date"]),5000))) | (i == len(df_pred_total["date"])-1):
        print("Procesando fila {} de {}".format(i, len(df_pred_total["date"])))
        print("La cantidad de filas de los datasets (aproximadamente 1/2) es {}".format(len(df_pred_dias_1["hour"])))
       
    # Para cada fila horaria, detecto a que día pertenece y la adjunto al dataset correspondiente
    if ((pd.to_datetime(fila["fecha prediccion"]) - pd.to_datetime(fila["date"])).days == 0) | ((pd.to_datetime(fila["fecha prediccion"]) - pd.to_datetime(fila["date"])).days == -2):
        df_pred_dias_1.loc[len(df_pred_dias_1["fecha prediccion"])] = [elem for elem in fila][1:] 
        
    if (pd.to_datetime(fila["date"]) - pd.to_datetime(fila["fecha prediccion"])).days == 1:
        df_pred_dias_2.loc[len(df_pred_dias_2["fecha prediccion"])] = [elem for elem in fila][1:]
        

df_pred_dias_2.head()

Procesando fila 0 de 32
La cantidad de filas de los datasets (aproximadamente 1/2) es 0
Procesando fila 31 de 32
La cantidad de filas de los datasets (aproximadamente 1/2) es 15


Unnamed: 0,hour,fecha prediccion,estacion,temp_pred_2,feels_like_pred_2,pressure_pred_2,humidity_pred_2,dew_point_pred_2,uvi_pred_2,clouds_pred_2,visibility_pred_2,wind_speed_pred_2,wind_deg_pred_2,wind_gust_pred_2,pop_pred_2,we_pred_2
0,4,2021-06-16,41.292777777777782.0700000000000003,296.81,297.09,1010,71,290.67,0.0,100,10000,6.99,65,9.28,0.0,804
1,5,2021-06-16,41.292777777777782.0700000000000003,296.35,296.71,1010,76,291.56,0.11,100,10000,7.3,69,9.37,0.2,500
2,6,2021-06-16,41.292777777777782.0700000000000003,296.43,296.85,1011,78,291.95,0.52,100,10000,8.04,71,10.17,0.0,804
3,7,2021-06-16,41.292777777777782.0700000000000003,296.31,296.8,1010,81,292.45,0.56,100,10000,9.93,65,11.61,0.1,804
4,8,2021-06-16,41.292777777777782.0700000000000003,296.81,297.29,1011,79,292.63,1.13,100,10000,8.78,68,10.46,0.02,804


Los uno por fecha predicción, estación y hora, para generar filas por cada día de llamada, hora y estación

In [43]:
df_pred_dias_1 = df_pred_dias_1.drop_duplicates(['hour', "fecha prediccion", "estacion"],
                        keep = 'first')
df_pred_dias_1.reset_index(drop = True, inplace = True)
df_pred_dias_2 = df_pred_dias_2.drop_duplicates(['hour', "fecha prediccion", "estacion"],
                        keep = 'first')
df_pred_dias_2.reset_index(drop = True, inplace = True)

df_total_previo = pd.merge(df_pred_dias_1, df_pred_dias_2, how = "inner", on = ["hour", "fecha prediccion", "estacion"])
df_total_previo.head()

Unnamed: 0,hour,fecha prediccion,estacion,temp_pred_1,feels_like_pred_1,pressure_pred_1,humidity_pred_1,dew_point_pred_1,uvi_pred_1,clouds_pred_1,visibility_pred_1,wind_speed_pred_1,wind_deg_pred_1,wind_gust_pred_1,pop_pred_1,we_pred_1,temp_pred_2,feels_like_pred_2,pressure_pred_2,humidity_pred_2,dew_point_pred_2,uvi_pred_2,clouds_pred_2,visibility_pred_2,wind_speed_pred_2,wind_deg_pred_2,wind_gust_pred_2,pop_pred_2,we_pred_2
0,13,2021-06-16,41.292777777777782.0700000000000003,301.65,302.74,1013,55,291.74,8.05,9,10000,4.17,110,5.27,0.0,800,297.18,297.86,1011,85,294.04,8.22,70,10000,8.53,96,10.94,0.38,500
1,14,2021-06-16,41.292777777777782.0700000000000003,302.28,303.24,1013,52,291.42,6.5,0,10000,3.99,103,4.76,0.0,800,297.29,297.98,1009,85,294.24,6.64,84,10000,7.63,67,8.42,0.68,500
2,15,2021-06-16,41.292777777777782.0700000000000003,301.58,302.64,1013,55,291.67,4.52,16,10000,2.07,151,2.55,0.0,801,297.36,298.03,1010,84,294.07,4.61,84,10000,5.57,78,6.95,0.6,803
3,16,2021-06-16,41.292777777777782.0700000000000003,301.02,302.1,1013,57,291.72,2.02,34,10000,2.6,208,3.17,0.0,802,297.13,297.83,1010,86,294.11,2.71,88,10000,6.11,86,7.59,0.65,804
4,17,2021-06-16,41.292777777777782.0700000000000003,300.17,301.18,1012,59,291.48,0.91,53,10000,1.1,151,1.47,0.0,803,296.59,297.31,1011,89,294.4,1.23,90,10000,3.36,101,4.65,0.68,500


Para cada día y estación, obtengo 2 valores de predicción para cada hora (dos para las 4, dos para las 5...) correpsondientes a la prediccion de las 48 horas siguientes a la llamada. Genero después le dataset con el valor medio de predicción de cada hora.

In [44]:
columnas_total = [col for col in df_pred_total.columns[1:4]] + [str(col+"_pred") for col in df_pred_total.columns[4:]]
df_total = pd.DataFrame(columns = columnas_total)

for i, fila in df_total_previo.iterrows():
    
    if (i in list(range(0,len(df_total_previo["hour"]),5000))) | (i == len(df_total_previo["hour"])-1):
        print("Procesando fila {} de {}".format(i, len(df_total_previo["hour"])))
       
    # Para cada hora, obtengo la media de los datos de las dos predichas
    fila_nueva = []
    df_new = pd.DataFrame()
    for j in range(0, len(columnas_total)):
        if j in [0,1,2]:
            fila_nueva.append(fila[j]) 
        else:
            fila_nueva.append(np.mean([fila[j], fila[j + (int((len(columnas_total)-3)))]]))
    df_new = pd.DataFrame([tuple(fila_nueva)], columns = columnas_total)
    df_total = df_total.append(df_new, ignore_index = True)
    
    
df_total.head()

Procesando fila 0 de 16
Procesando fila 15 de 16


Unnamed: 0,hour,fecha prediccion,estacion,temp_pred,feels_like_pred,pressure_pred,humidity_pred,dew_point_pred,uvi_pred,clouds_pred,visibility_pred,wind_speed_pred,wind_deg_pred,wind_gust_pred,pop_pred,we_pred
0,13,2021-06-16,41.292777777777782.0700000000000003,299.415,300.3,1012.0,70.0,292.89,8.135,39.5,10000.0,6.35,103.0,8.105,0.19,650.0
1,14,2021-06-16,41.292777777777782.0700000000000003,299.785,300.61,1011.0,68.5,292.83,6.57,42.0,10000.0,5.81,85.0,6.59,0.34,650.0
2,15,2021-06-16,41.292777777777782.0700000000000003,299.47,300.335,1011.5,69.5,292.87,4.565,50.0,10000.0,3.82,114.5,4.75,0.3,802.0
3,16,2021-06-16,41.292777777777782.0700000000000003,299.075,299.965,1011.5,71.5,292.915,2.365,61.0,10000.0,4.355,147.0,5.38,0.325,803.0
4,17,2021-06-16,41.292777777777782.0700000000000003,298.38,299.245,1011.5,74.0,292.94,1.07,71.5,10000.0,2.23,126.0,3.06,0.34,651.5


In [45]:
df_pred_clean = df_total

# Preparación de los datos
<div style = "float:right"><a style="text-decoration:none" href = "#Script-funcional">

In [46]:
def merge_datasets(df_estaciones_rad, df_clima, df_pred, df_aemet, df_soda):
    
    ## CLIMA ##
    print("Procesando CLIMA")
    
    # Se añade la columna de fechas del día a predecir al dataset de datos climatológicos
    import datetime
    fechas_atrasadas = ["{}-{}-{}".format(str((pd.to_datetime(f) + datetime.timedelta(days=1)).year), str((pd.to_datetime(f) + datetime.timedelta(days=1)).month).zfill(2), str((pd.to_datetime(f) + datetime.timedelta(days=1)).day).zfill(2)) for f in df_clima["fecha prediccion"]]
    df_clima.insert(0, "fecha_rad", fechas_atrasadas, True)
    
    # Se renombran columnas y se convierten las temperaturas a grados Cº
    df_clima = df_clima.rename(index = str, columns = {"hour": "hora", "estacion": "indicativo"})

    df_clima["temp_d-1"] = df_clima["temp_d-1"] - 273.15
    df_clima["temp_d-2"] = df_clima["temp_d-2"] - 273.15
    df_clima["temp_d-3"] = df_clima["temp_d-3"] - 273.15
    df_clima["temp_d-4"] = df_clima["temp_d-4"] - 273.15
    df_clima["temp_d-5"] = df_clima["temp_d-5"] - 273.15
    df_clima["feels_like_d-1"] = df_clima["feels_like_d-1"] - 273.15
    df_clima["feels_like_d-2"] = df_clima["feels_like_d-2"] - 273.15
    df_clima["feels_like_d-3"] = df_clima["feels_like_d-3"] - 273.15
    df_clima["feels_like_d-4"] = df_clima["feels_like_d-4"] - 273.15
    df_clima["feels_like_d-5"] = df_clima["feels_like_d-5"] - 273.15
    
    # Se crea el nuevo dataframe con las columnas objetivo de temperatura y velocidad de viento 
    df_objetivos = pd.DataFrame(columns = ["hora", "indicativo", "temp_objetivo"])
    df_objetivos["hora"] = df_clima["hora"]
    df_objetivos["indicativo"] = df_clima["indicativo"]
    df_objetivos["temp_objetivo"] = df_clima["temp_d-1"]

    # Se añade la columna de fechas del día a predecir al dataset de datos climatológicos
    fechas_atrasadas = ["{}-{}-{}".format(str((pd.to_datetime(f) - datetime.timedelta(days=1)).year), str((pd.to_datetime(f) - datetime.timedelta(days=1)).month).zfill(2), str((pd.to_datetime(f) - datetime.timedelta(days=1)).day).zfill(2)) for f in df_clima["fecha prediccion"]]
    df_objetivos.insert(0, "fecha_rad", fechas_atrasadas, True)
    
    # Se eliminan columnas innecesarias
    df_clima.drop(['fecha prediccion'], axis = 1, inplace = True)
    
    # Se añade la columna de indicativos de las estaciones de radiación más cercanas    
    df_clima["indicativo_rad"] = np.nan
    df_clima["lat"] = np.nan
    df_clima["lon"] = np.nan

    for i, fila in df_clima.iterrows():
        
        df_clima.loc[i, "lat"] = lat
        df_clima.loc[i, "lon"] = lon

        dist = 99999999999999999999

        # Para cada fila, busco la estacion de radiación más cercana
        for k in range(0, len(df_estaciones_rad["indicativo"])): 
            lat_est = conversor_coordenadas(str(df_estaciones_rad["latitud"].loc[k]))
            lon_est = conversor_coordenadas(str(df_estaciones_rad["longitud"].loc[k]))

            distancia_prueba = distancia(lat, lon, lat_est, lon_est)
            if(distancia_prueba < dist):
                dist = distancia_prueba
                df_clima.loc[i, "indicativo_rad"] = df_estaciones_rad.loc[k, "indicativo"]

    ## PREDICCION ##  
    print("Procesando PREDICCION")
    # Se convierten las variables de temperatura a grados Cº        
    df_pred["temp_pred"] = df_pred["temp_pred"] - 273.15
    df_pred["feels_like_pred"] = df_pred["feels_like_pred"] - 273.15
    
    # Se añade la columna de fechas del día a predecir al dataset de predicciones climatológicos
    fechas_atrasadas = ["{}-{}-{}".format(str((pd.to_datetime(f) + datetime.timedelta(days=1)).year), str((pd.to_datetime(f) + datetime.timedelta(days=1)).month).zfill(2), str((pd.to_datetime(f) + datetime.timedelta(days=1)).day).zfill(2)) for f in df_pred["fecha prediccion"]]
    df_pred.insert(0, "fecha_rad", fechas_atrasadas, True)
    
    # Se renombran columnas y eliminan las innecesarias
    df_pred = df_pred.rename(index = str, columns = {"hour": "hora", "estacion": "indicativo"})
    df_pred.drop(['fecha prediccion'], axis = 1, inplace = True)
    
    ## RADIACION DIA ANTERIOR ##
    print("Procesando RADIACION DIA ANTERIOR")
    # Se añade la columna de fechas del día a predecir al dataset de radiación de AEMET
    df_aemet = df_aemet.rename(columns={'indicativo':'indicativo_rad'})
    fechas_atrasadas = ["{}-{}-{}".format(str((pd.to_datetime(f) + datetime.timedelta(days=1)).year), str((pd.to_datetime(f) + datetime.timedelta(days=1)).month).zfill(2), str((pd.to_datetime(f) + datetime.timedelta(days=1)).day).zfill(2)) for f in df_aemet["fecha"]]
    df_aemet.insert(0, "fecha_rad", fechas_atrasadas, True)
    
    # Se renombran columnas 
    df_aemet = df_aemet.rename(columns={'GL': 'rad_d-1', 'UVB': 'uvb_d-1', 'IR': 'ir_d-1'})
    
    # Se eliminan las columnas innecesarias
    df_aemet.drop(['fecha'], axis = 1, inplace = True)
    df_aemet.drop(['estacion'], axis = 1, inplace = True)
    
    ## RADIACION ##
    print("Procesando RADIACION")
    # Se renombran columnas y eliminan las innecesarias
    df_soda = df_soda.rename(index = str, columns = {"estacion": "indicativo", "fecha": "fecha_rad"})
    df_soda.drop(['date'], axis=1, inplace = True)
    
    # Se crea un dataset que contenga la variable de radiación de tres días antes
    df_rad_2 = pd.DataFrame(columns = ["indicativo", "hora", "rad_d-2"])
    df_rad_2["indicativo"] = df_soda["indicativo"]
    df_rad_2["hora"] = df_soda["hora"]
    df_rad_2["rad_d-2"] = df_soda["ghi"]
    
    # Se añade la columna de fechas del día a predecir al dataset de radiación de tres días antes
    fechas_atrasadas = ["{}-{}-{}".format(str((pd.to_datetime(f) + datetime.timedelta(days=3)).year), str((pd.to_datetime(f) + datetime.timedelta(days=3)).month).zfill(2), str((pd.to_datetime(f) + datetime.timedelta(days=3)).day).zfill(2)) for f in df_soda["fecha_rad"]]
    df_rad_2.insert(0, "fecha_rad", fechas_atrasadas, True)
    
    ## MERGE ##
    print("Procesando MERGE")
    # Se unen los datasets
    df_total = pd.merge(df_clima, df_pred, how = "inner", on = ["fecha_rad", "hora", "indicativo"])
    df_total = pd.merge(df_total, df_aemet, how = "inner", on = ["fecha_rad", "hora", "indicativo_rad"])
    df_total = pd.merge(df_total, df_rad_2, how = "inner", on = ["fecha_rad", "hora", "indicativo"])
    
    df_total.drop(['indicativo_rad'], axis = 1, inplace = True)
    
    df_total.columns = [str(i) for i in df_total.columns]

    
    return df_total
    

In [47]:
# Se importa el dataset de datos climáticos
df_clima = df_clima_clean

# Se importa el dataset de datos de predicción climática
df_pred = df_pred_clean

# Se importa el dataset de radiación de AEMET
df_aemet = df_aemet_clean

# Se importa el dataset de radiación objetivo
df_soda = df_soda_clean

try:
    df_total = merge_datasets(df_estaciones_rad, df_clima, df_pred, df_aemet, df_soda)
except:
    fallo = 1
    
df_total["hora"] = pd.to_numeric([int(c) for c in df_total["hora"]])
for i in range(3, len(df_total.columns)):
    df_total[df_total.columns[i]] = pd.to_numeric([float(c) for c in df_total[df_total.columns[i]]])

df_total.head()

Procesando CLIMA
Procesando PREDICCION
Procesando RADIACION DIA ANTERIOR
Procesando RADIACION
Procesando MERGE


Unnamed: 0,fecha_rad,hora,indicativo,temp_d-1,feels_like_d-1,pressure_d-1,humidity_d-1,dew_point_d-1,clouds_d-1,visibility_d-1,wind_speed_d-1,wind_deg_d-1,wind_gust_d-1,we_d-1,temp_d-2,feels_like_d-2,pressure_d-2,humidity_d-2,dew_point_d-2,clouds_d-2,visibility_d-2,wind_speed_d-2,wind_deg_d-2,wind_gust_d-2,we_d-2,temp_d-3,feels_like_d-3,pressure_d-3,humidity_d-3,dew_point_d-3,clouds_d-3,visibility_d-3,wind_speed_d-3,wind_deg_d-3,wind_gust_d-3,we_d-3,temp_d-4,feels_like_d-4,pressure_d-4,humidity_d-4,dew_point_d-4,clouds_d-4,visibility_d-4,wind_speed_d-4,wind_deg_d-4,wind_gust_d-4,we_d-4,temp_d-5,feels_like_d-5,pressure_d-5,humidity_d-5,dew_point_d-5,clouds_d-5,visibility_d-5,wind_speed_d-5,wind_deg_d-5,wind_gust_d-5,we_d-5,lat,lon,temp_pred,feels_like_pred,pressure_pred,humidity_pred,dew_point_pred,uvi_pred,clouds_pred,visibility_pred,wind_speed_pred,wind_deg_pred,wind_gust_pred,pop_pred,we_pred,rad_d-1,uvb_d-1,ir_d-1,rad_d-2
0,2021-06-17,4,41.292777777777782.0700000000000003,22.73,20.03,1015.0,46.0,283.67,0.0,10000.0,4.12,340.0,2.283,800.0,23.23,21.31,1017.0,53.0,286.28,0.0,10000.0,4.12,300.0,2.283,800.0,22.42,19.19,1020.0,53.0,285.53,0.0,10000.0,5.66,310.0,2.283,800.0,21.0,17.56,1018.0,64.0,287.09,0.0,10000.0,6.69,330.0,2.283,800.0,23.43,23.77,1017.0,57.0,287.58,32.0,10000.0,1.53,322.0,1.78,802.0,41.292778,2.07,23.06,23.33,1011.0,73.0,290.7,0.0,81.5,10000.0,4.395,183.0,5.605,0.0,803.5,11.111111,0.002778,365.972222,2.5472
1,2021-06-17,5,41.292777777777782.0700000000000003,22.74,22.84,1016.0,53.0,285.83,20.0,10000.0,1.03,60.0,2.283,801.0,23.52,23.13,1017.0,53.0,286.55,0.0,10000.0,2.06,250.0,2.283,800.0,22.65,19.76,1020.0,56.0,286.58,0.0,10000.0,5.66,310.0,2.283,800.0,21.26,18.26,1019.0,64.0,287.33,0.0,10000.0,6.17,330.0,2.283,800.0,23.43,23.77,1017.0,57.0,287.58,32.0,10000.0,1.53,322.0,1.78,802.0,41.292778,2.07,22.8,23.105,1011.0,75.5,291.035,0.115,84.5,10000.0,4.77,192.5,5.93,0.1,651.5,111.111111,0.025833,365.972222,120.0472
2,2021-06-17,6,41.292777777777782.0700000000000003,24.14,24.85,1017.0,73.0,292.15,0.0,10000.0,3.6,240.0,2.283,800.0,24.82,23.09,1017.0,50.0,286.85,0.0,10000.0,4.12,320.0,2.283,800.0,24.37,22.09,1020.0,53.0,287.33,0.0,10000.0,5.14,320.0,2.283,800.0,22.47,19.16,1019.0,60.0,287.48,0.0,10000.0,6.69,330.0,2.283,800.0,23.43,23.77,1017.0,57.0,287.58,32.0,10000.0,1.53,322.0,1.78,802.0,41.292778,2.07,22.97,23.28,1012.0,75.0,291.115,0.54,86.5,10000.0,5.155,199.5,6.41,0.0,803.5,286.111111,1.061111,365.972222,292.7581
3,2021-06-17,7,41.292777777777782.0700000000000003,25.72,25.03,1017.0,47.0,286.72,0.0,10000.0,2.57,340.0,2.283,800.0,26.01,24.33,1017.0,47.0,286.98,0.0,10000.0,4.12,320.0,2.283,800.0,25.92,23.85,1020.0,47.0,286.9,0.0,10000.0,4.63,310.0,2.283,800.0,24.09,21.36,1019.0,53.0,287.07,0.0,10000.0,5.66,320.0,2.283,800.0,24.05,24.77,1017.0,56.0,287.89,37.0,10000.0,1.14,354.0,1.3,802.0,41.292778,2.07,23.18,23.515,1011.5,75.0,291.305,1.08,100.0,10000.0,6.3,199.5,7.41,0.05,804.0,480.555556,2.628333,365.972222,468.8891
4,2021-06-17,8,41.292777777777782.0700000000000003,28.04,26.95,1017.0,32.0,282.97,0.0,10000.0,1.54,260.0,2.283,800.0,28.08,28.15,1018.0,47.0,288.87,0.0,10000.0,2.57,300.0,2.283,800.0,28.59,26.72,1020.0,39.0,286.44,0.0,10000.0,4.12,50.0,2.283,800.0,26.21,23.56,1020.0,41.0,285.08,0.0,10000.0,4.63,330.0,2.283,800.0,24.82,26.07,1017.0,54.0,288.04,36.0,10000.0,0.46,74.0,0.99,802.0,41.292778,2.07,24.005,24.325,1012.0,71.5,291.2,2.2,91.5,10000.0,5.47,209.5,6.635,0.01,803.5,652.777778,48.091944,365.972222,631.936


# Predicción radiación
<div style = "float:right"><a style="text-decoration:none" href = "#Script-funcional">

In [48]:
df_datos = df_total
df_datos = df_datos[["hora"] + list(df_datos.columns)[3:]]
df_datos.head()

Unnamed: 0,hora,temp_d-1,feels_like_d-1,pressure_d-1,humidity_d-1,dew_point_d-1,clouds_d-1,visibility_d-1,wind_speed_d-1,wind_deg_d-1,wind_gust_d-1,we_d-1,temp_d-2,feels_like_d-2,pressure_d-2,humidity_d-2,dew_point_d-2,clouds_d-2,visibility_d-2,wind_speed_d-2,wind_deg_d-2,wind_gust_d-2,we_d-2,temp_d-3,feels_like_d-3,pressure_d-3,humidity_d-3,dew_point_d-3,clouds_d-3,visibility_d-3,wind_speed_d-3,wind_deg_d-3,wind_gust_d-3,we_d-3,temp_d-4,feels_like_d-4,pressure_d-4,humidity_d-4,dew_point_d-4,clouds_d-4,visibility_d-4,wind_speed_d-4,wind_deg_d-4,wind_gust_d-4,we_d-4,temp_d-5,feels_like_d-5,pressure_d-5,humidity_d-5,dew_point_d-5,clouds_d-5,visibility_d-5,wind_speed_d-5,wind_deg_d-5,wind_gust_d-5,we_d-5,lat,lon,temp_pred,feels_like_pred,pressure_pred,humidity_pred,dew_point_pred,uvi_pred,clouds_pred,visibility_pred,wind_speed_pred,wind_deg_pred,wind_gust_pred,pop_pred,we_pred,rad_d-1,uvb_d-1,ir_d-1,rad_d-2
0,4,22.73,20.03,1015.0,46.0,283.67,0.0,10000.0,4.12,340.0,2.283,800.0,23.23,21.31,1017.0,53.0,286.28,0.0,10000.0,4.12,300.0,2.283,800.0,22.42,19.19,1020.0,53.0,285.53,0.0,10000.0,5.66,310.0,2.283,800.0,21.0,17.56,1018.0,64.0,287.09,0.0,10000.0,6.69,330.0,2.283,800.0,23.43,23.77,1017.0,57.0,287.58,32.0,10000.0,1.53,322.0,1.78,802.0,41.292778,2.07,23.06,23.33,1011.0,73.0,290.7,0.0,81.5,10000.0,4.395,183.0,5.605,0.0,803.5,11.111111,0.002778,365.972222,2.5472
1,5,22.74,22.84,1016.0,53.0,285.83,20.0,10000.0,1.03,60.0,2.283,801.0,23.52,23.13,1017.0,53.0,286.55,0.0,10000.0,2.06,250.0,2.283,800.0,22.65,19.76,1020.0,56.0,286.58,0.0,10000.0,5.66,310.0,2.283,800.0,21.26,18.26,1019.0,64.0,287.33,0.0,10000.0,6.17,330.0,2.283,800.0,23.43,23.77,1017.0,57.0,287.58,32.0,10000.0,1.53,322.0,1.78,802.0,41.292778,2.07,22.8,23.105,1011.0,75.5,291.035,0.115,84.5,10000.0,4.77,192.5,5.93,0.1,651.5,111.111111,0.025833,365.972222,120.0472
2,6,24.14,24.85,1017.0,73.0,292.15,0.0,10000.0,3.6,240.0,2.283,800.0,24.82,23.09,1017.0,50.0,286.85,0.0,10000.0,4.12,320.0,2.283,800.0,24.37,22.09,1020.0,53.0,287.33,0.0,10000.0,5.14,320.0,2.283,800.0,22.47,19.16,1019.0,60.0,287.48,0.0,10000.0,6.69,330.0,2.283,800.0,23.43,23.77,1017.0,57.0,287.58,32.0,10000.0,1.53,322.0,1.78,802.0,41.292778,2.07,22.97,23.28,1012.0,75.0,291.115,0.54,86.5,10000.0,5.155,199.5,6.41,0.0,803.5,286.111111,1.061111,365.972222,292.7581
3,7,25.72,25.03,1017.0,47.0,286.72,0.0,10000.0,2.57,340.0,2.283,800.0,26.01,24.33,1017.0,47.0,286.98,0.0,10000.0,4.12,320.0,2.283,800.0,25.92,23.85,1020.0,47.0,286.9,0.0,10000.0,4.63,310.0,2.283,800.0,24.09,21.36,1019.0,53.0,287.07,0.0,10000.0,5.66,320.0,2.283,800.0,24.05,24.77,1017.0,56.0,287.89,37.0,10000.0,1.14,354.0,1.3,802.0,41.292778,2.07,23.18,23.515,1011.5,75.0,291.305,1.08,100.0,10000.0,6.3,199.5,7.41,0.05,804.0,480.555556,2.628333,365.972222,468.8891
4,8,28.04,26.95,1017.0,32.0,282.97,0.0,10000.0,1.54,260.0,2.283,800.0,28.08,28.15,1018.0,47.0,288.87,0.0,10000.0,2.57,300.0,2.283,800.0,28.59,26.72,1020.0,39.0,286.44,0.0,10000.0,4.12,50.0,2.283,800.0,26.21,23.56,1020.0,41.0,285.08,0.0,10000.0,4.63,330.0,2.283,800.0,24.82,26.07,1017.0,54.0,288.04,36.0,10000.0,0.46,74.0,0.99,802.0,41.292778,2.07,24.005,24.325,1012.0,71.5,291.2,2.2,91.5,10000.0,5.47,209.5,6.635,0.01,803.5,652.777778,48.091944,365.972222,631.936


In [49]:
# Los datos se estandarizan y se aplica PCA, antes de predecir mediante el modelo entrenado
scalar = pk.load(open(directorio + 'data/Modelo_2/scaler_rad.pkl','rb'))
pca = pk.load(open(directorio + 'data/Modelo_2/pca_rad.pkl','rb'))
model = pk.load(open(directorio + 'data/Modelo_2/modelo_2_rad.pkl','rb'))

pipeline = Pipeline([('transformer', scalar), ('pca', pca), ('estimator', model)])

pred = pipeline.predict(df_datos)
pred_rad = []
[pred_rad.append(y[0]) for y in pred.tolist()]
pred_rad

[750.2755,
 689.90845,
 6.04955,
 10.6075,
 0.24555,
 0.24555,
 0.24555,
 0.24555,
 0.24555,
 0.24555,
 0.24555,
 0.24555,
 21.49795,
 99.26859999999999,
 421.82980000000003,
 311.94485]

# Predicción temperatura ambiente
<div style = "float:right"><a style="text-decoration:none" href = "#Script-funcional">

In [50]:
# Los datos se estandarizan y se aplica PCA, antes de predecir mediante el modelo entrenado
scalar = pk.load(open(directorio + 'data/Modelo_2/scaler_temp.pkl','rb'))
pca = pk.load(open(directorio + 'data/Modelo_2/pca_temp.pkl','rb'))
model = pk.load(open(directorio + 'data/Modelo_2/modelo_2_temp.pkl','rb'))

pipeline = Pipeline([('transformer', scalar), ('pca', pca), ('estimator', model)])

pred = pipeline.predict(df_datos)
pred_temp = []
[pred_temp.append(y[0]) for y in pred.tolist()]
pred_temp

[19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042,
 19.985000000000042]

# Obtención de la producción eléctrica
<div style = "float:right"><a style="text-decoration:none" href = "#Script-funcional">

In [51]:
#Datos

print("lat: ", lat)
print("lon: ", lon)
print("orient: ", orient)
print("incl: ", incl)
print("ppico: ", ppico)
print("fecha: ", fecha)

lat:  41.29277777777778
lon:  2.0700000000000003
orient:  10
incl:  25
ppico:  4.62
fecha:  2021-06-16


In [52]:
#Radiación global en plano horizontal: 
dia_Gh = pred_rad

#temperatura_ambiente:  
temperatura_ambiente = pred_temp

In [53]:
produccion = calcularEnergia(lat, lon, orient, incl, ppico, fecha, dia_Gh, temperatura_ambiente)
produccion

[0,
 0,
 0,
 0,
 0,
 0,
 501.4826088740893,
 461.60836939366277,
 23.62642194549373,
 41.42117151556445,
 0.9601101342192568,
 0.9601120645340413,
 0.9601129834114949,
 0.9601134951150146,
 0.9601138229605044,
 0.9601140468977565,
 0.9601141540444045,
 0.9601139704307796,
 83.88891287307541,
 383.5274246250583,
 813.7983492940616,
 210.06269876178706,
 0,
 0]

Además, se desea mostrar al usuario una estimación de los ingresos que obtendrá el día predicho por la venta de sus excedentes

Se obtiene la fecha del día anterior (para obtener los 24 valores en UTC, ya que el primer valor del día actual será el de las 22:00 en UTC del día previo

In [54]:
fecha_ant = date.today() + timedelta(days = -1)
fecha_ayer = "{}-{}-{}".format(fecha_ant.year, str(fecha_ant.month).zfill(2), str(fecha_ant.day).zfill(2))
fecha_ayer

'2021-06-15'

Se obtiene la fecha del día anterior al anterior, por si los valores del día actual aún no han sido publicados

In [55]:
fecha_ant_ant = date.today() + timedelta(days = -2)
fecha_anteayer = "{}-{}-{}".format(fecha_ant_ant.year, str(fecha_ant_ant.month).zfill(2), str(fecha_ant_ant.day).zfill(2))
fecha_anteayer

'2021-06-14'

Función para obtener la curva de precios de compensación de la api de REE

In [56]:
import time
import requests
import urllib


def api_ree(indicador, token):
    url = "https://api.esios.ree.es/indicators"
    hoy = fecha
    dia_ant = fecha_ayer
    dia_ant_ant = fecha_anteayer    
    try:
        url = "https://api.esios.ree.es/indicators/" + str(indicador) + "?start_date=" + dia_ant + "T22%3A00%3A00Z&end_date=" + hoy + "T23%3A59%3A59Z"

        headers = {
            'Accept': 'application/json; application/vnd.esios-api-v1+json',
            'Host': 'api.esios.ree.es',
            'Content-Type': 'application/json',
            'Authorization': 'Token token="{}"'.format(token)
        }

        response = requests.request("GET", url, headers = headers)
        
    except:
        url = "https://api.esios.ree.es/indicators/" + str(indicador) + "?start_date=" + dia_ant_ant + "T22%3A00%3A00Z&end_date=" + dia_ant + "T23%3A59%3A59Z"

        headers = {
            'Accept': 'application/json; application/vnd.esios-api-v1+json',
            'Host': 'api.esios.ree.es',
            'Content-Type': 'application/json',
            'Authorization': 'Token token="{}"'.format(token)
        }

        response = requests.request("GET", url, headers = headers)

    precios = response.json()
    precios = precios['indicator']['values']
    precios_datos = pd.DataFrame(precios)
    precios_datos['datetime'] = pd.to_datetime(precios_datos['datetime'])
    precios_datos['dia'] = precios_datos['datetime'].dt.strftime('%m/%d/%Y')

    return precios_datos

In [57]:
# Definición del ID del indicador de precios de compensación de excedentes
indicador = 1739

# Token de acceso
token = "27089f947b3e16a296875a0bc9dc387efa41acc7db6498676668e5150028cfdb"

precios_excedente = api_ree(indicador, token)
precios_excedente.head()

Unnamed: 0,value,datetime,datetime_utc,tz_time,geo_id,geo_name,dia
0,97.33,2021-06-16 00:00:00+02:00,2021-06-15T22:00:00Z,2021-06-15T22:00:00.000Z,3,España,06/16/2021
1,94.83,2021-06-16 01:00:00+02:00,2021-06-15T23:00:00Z,2021-06-15T23:00:00.000Z,3,España,06/16/2021
2,93.18,2021-06-16 02:00:00+02:00,2021-06-16T00:00:00Z,2021-06-16T00:00:00.000Z,3,España,06/16/2021
3,93.18,2021-06-16 03:00:00+02:00,2021-06-16T01:00:00Z,2021-06-16T01:00:00.000Z,3,España,06/16/2021
4,93.18,2021-06-16 04:00:00+02:00,2021-06-16T02:00:00Z,2021-06-16T02:00:00.000Z,3,España,06/16/2021


Precios por hora en €/MWh

In [58]:
lista_precios = [y for y in precios_excedente["value"]]
lista_precios

[97.33,
 94.83,
 93.18,
 93.18,
 93.18,
 93.29,
 94.84,
 99.13,
 99.83,
 96.04,
 93.22,
 92.52,
 92.03,
 89.82,
 89.04,
 82.43,
 85.36,
 92.53,
 95.64,
 98.86,
 105.68,
 100.9,
 99.94,
 95.5]

Se importa el csv de perfiles de consumo tipo. Para cada hora, se indica (en tanto por uno) el peso relativo de esa hora. Es decir, la fracción del consumo diario que cada hora representa, en términos medios, para diferentes perfiles estandar de suministros eléctricos (hogares).

In [59]:
perfiles = pd.read_csv('./data/Perfiles_consumo.csv', sep=',')
for i in range(0,24):
    perfiles[str(i)] = pd.to_numeric(perfiles[str(i)])
perfiles

Unnamed: 0,Tipo de consumidor,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
0,Madrugadores,0.03922,0.02614,0.01961,0.01961,0.01961,0.02941,0.04248,0.05882,0.05229,0.04902,0.03922,0.03922,0.03922,0.03922,0.03922,0.04575,0.04248,0.04248,0.03922,0.04248,0.05229,0.05882,0.06536,0.05882
1,Caseros,0.0367,0.02446,0.01835,0.01835,0.01835,0.02446,0.02446,0.02752,0.0367,0.04281,0.04893,0.05199,0.05505,0.0581,0.04893,0.04587,0.04587,0.04587,0.04893,0.04893,0.05505,0.0581,0.06116,0.05505
2,Matutinos,0.06639,0.06224,0.04979,0.04149,0.03734,0.03734,0.03734,0.04979,0.06639,0.04979,0.0332,0.02905,0.0166,0.0166,0.0166,0.0166,0.0166,0.0166,0.02905,0.0332,0.04979,0.06639,0.08299,0.07884
3,Vespertinos,0.05381,0.03587,0.02691,0.02691,0.02691,0.02691,0.02691,0.03587,0.04484,0.03587,0.02691,0.02691,0.02691,0.03139,0.03587,0.05381,0.05381,0.04484,0.03587,0.03587,0.05381,0.07175,0.08969,0.07175
4,Dia fuera,0.04795,0.0411,0.03425,0.0274,0.0274,0.0274,0.0274,0.0274,0.0411,0.05479,0.06849,0.06849,0.06849,0.05479,0.05137,0.04452,0.03425,0.0274,0.02397,0.02397,0.0274,0.0411,0.05479,0.05479
5,Otros,0.06993,0.05594,0.04196,0.02797,0.02797,0.02797,0.02797,0.02797,0.02797,0.03147,0.03497,0.03846,0.04196,0.04196,0.04545,0.04196,0.04196,0.04196,0.04196,0.04196,0.04545,0.04895,0.05594,0.06993


In [60]:
pot_media = 4.5 #kW
consumo_medio_casa = 3754/365 #kWh día
consumo_medio_piso = 3373/365 #kWh día

In [61]:
# Si el usuario vive en casa
cons = consumo_medio_casa
# Si el usuario vive en piso
cons = consumo_medio_piso

Para obtener la curva de consumo medio en Wh de cada tipo de consumidor, se multiplica el consumo diario medio por los coeficientes que representan la fracción de consumo diario que supone cada hora. Se multiplica también por la potencia real entre la media, para hacer la curva proporcional a la del usuario real. Además, se multiplica por 1000 para convertir los kWh en Wh.

In [62]:
# Por ejemplo, si el usuario tiene la siguiente potencia contratada:
pot_c = 4 #kW

In [63]:
perfil_1 = [perfiles.loc[0][1+i]*cons*pot_c/pot_media*1000 for i in range(0,len(perfiles.loc[0])-1)] # En Wh
perfil_2 = [perfiles.loc[1][1+i]*cons*pot_c/pot_media*1000 for i in range(0,len(perfiles.loc[1])-1)]
perfil_3 = [perfiles.loc[2][1+i]*cons*pot_c/pot_media*1000 for i in range(0,len(perfiles.loc[2])-1)]
perfil_4 = [perfiles.loc[3][1+i]*cons*pot_c/pot_media*1000 for i in range(0,len(perfiles.loc[3])-1)]
perfil_5 = [perfiles.loc[4][1+i]*cons*pot_c/pot_media*1000 for i in range(0,len(perfiles.loc[4])-1)]
perfil_6 = [perfiles.loc[5][1+i]*cons*pot_c/pot_media*1000 for i in range(0,len(perfiles.loc[5])-1)]

Se obtiene el consumo neto en cada hora, según el perfil del usuario

In [64]:
# Por ejemplo, perfil 1
diferencia = [perfil_1[i]-produccion[i] for i in range(0, 24)]
diferencia

[322.16513850837134,
 214.7219969558599,
 161.08256925418567,
 161.08256925418567,
 161.08256925418567,
 241.58278234398787,
 -152.53882805217148,
 21.557195294312976,
 405.89971504080756,
 361.2441800826091,
 321.2050283741521,
 321.2050264438373,
 321.20502552495986,
 321.2050250132563,
 321.2050246854108,
 374.84445216314793,
 347.9836666678734,
 347.98366685148704,
 238.27622563529593,
 -34.58364380314049,
 -384.2722123077603,
 273.10286592618866,
 536.8871354642314,
 483.16556468797575]

Para obtener el ingreso económico por la compensación de excedentes, se multiplica el precio obtenido (se transforma a €/Wh) por el vertido neto en cada hora

In [65]:
compensacion = []
for i in range(0, len(diferencia)):
    if diferencia[i] < 0:
        compensacion.append((lista_precios[i]/1000000)*abs(diferencia[i]))
    else:
         compensacion.append(0)

In [66]:
compensacion

[0,
 0,
 0,
 0,
 0,
 0,
 0.014466782452467944,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0.0034189390263784686,
 0.04060988739668411,
 0,
 0,
 0]