# Predicciones sobre Modelo Final
En este cuaderno se carga el modelo final conseguido anteriormente.

In [1]:
import pickle
with open ('modelo_final.pkl', 'rb') as file:
    modelo_final = pickle.load(file)

A continuación, cargamos el conjunto de datos de competición:

In [2]:
import pandas as pd
wind_comp=pd.read_csv('wind_comp.csv.gz', compression = "gzip")

In [12]:
wind_comp.shape

(1189, 23)

El conjunto de datos siguiente debe ser preprocesado para adaptarse correctamente al modelo con el que se obtendrán las predicciones.

In [3]:
wind_comp.head()

Unnamed: 0,datetime,p54.162.1,p54.162.2,p54.162.3,p54.162.4,p54.162.5,p54.162.6,p54.162.7,p54.162.8,p54.162.9,...,v100.16,v100.17,v100.18,v100.19,v100.20,v100.21,v100.22,v100.23,v100.24,v100.25
0,2010-01-01 00:00:00,2403131.0,2395445.0,2387755.0,2380065.0,2372380.0,2399548.0,2391582.0,2383621.0,2375660.0,...,7.212586,7.057422,6.90176,6.746597,6.591434,7.184147,7.03098,6.877313,6.723647,6.570479
1,2010-01-01 06:00:00,2410306.0,2402394.0,2394483.0,2386571.0,2378660.0,2406786.0,2398599.0,2390412.0,2382225.0,...,0.207289,0.583972,0.960654,1.337836,1.714518,0.345988,0.72317,1.10085,1.478031,1.855712
2,2010-01-01 12:00:00,2434908.0,2426793.0,2418683.0,2410573.0,2402462.0,2431465.0,2423075.0,2414689.0,2406298.0,...,1.670114,1.691568,1.712522,1.733976,1.75493,1.664127,1.682587,1.700548,1.718509,1.73647
3,2010-01-01 18:00:00,2447112.0,2439069.0,2431027.0,2422984.0,2414942.0,2443696.0,2435378.0,2427060.0,2418742.0,...,1.217597,1.278464,1.339332,1.399701,1.460569,1.215102,1.272477,1.329853,1.387228,1.444105
4,2010-01-02 00:00:00,2459695.0,2451752.0,2443809.0,2435866.0,2427923.0,2456252.0,2448034.0,2439815.0,2431596.0,...,3.755089,3.686738,3.617887,3.549536,3.481184,3.781532,3.710686,3.640338,3.569492,3.498646


Como se puede observar, este dataset no tiene el índice indicado, que debería ser la columna 'datetime', y contiene los datos de todas las regiones del mapa, mientras que en este caso sólo se necesitan los datos relativos a la zona 13. 

En primer lugar, se modifica el índice de los datos y se comprueba que el cambio se ha realizado correctamente. Adicionalmente, se modifica el tipo de datos de la columna 'datetime' al tipo *datetime*, para ajustarnos completamente al modelo.

In [4]:
wind_comp['datetime'] = pd.to_datetime(wind_comp['datetime'])
wind_comp = wind_comp.set_index('datetime')
wind_comp.head()

Unnamed: 0_level_0,p54.162.1,p54.162.2,p54.162.3,p54.162.4,p54.162.5,p54.162.6,p54.162.7,p54.162.8,p54.162.9,p54.162.10,...,v100.16,v100.17,v100.18,v100.19,v100.20,v100.21,v100.22,v100.23,v100.24,v100.25
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010-01-01 00:00:00,2403131.0,2395445.0,2387755.0,2380065.0,2372380.0,2399548.0,2391582.0,2383621.0,2375660.0,2367699.0,...,7.212586,7.057422,6.90176,6.746597,6.591434,7.184147,7.03098,6.877313,6.723647,6.570479
2010-01-01 06:00:00,2410306.0,2402394.0,2394483.0,2386571.0,2378660.0,2406786.0,2398599.0,2390412.0,2382225.0,2374038.0,...,0.207289,0.583972,0.960654,1.337836,1.714518,0.345988,0.72317,1.10085,1.478031,1.855712
2010-01-01 12:00:00,2434908.0,2426793.0,2418683.0,2410573.0,2402462.0,2431465.0,2423075.0,2414689.0,2406298.0,2397912.0,...,1.670114,1.691568,1.712522,1.733976,1.75493,1.664127,1.682587,1.700548,1.718509,1.73647
2010-01-01 18:00:00,2447112.0,2439069.0,2431027.0,2422984.0,2414942.0,2443696.0,2435378.0,2427060.0,2418742.0,2410423.0,...,1.217597,1.278464,1.339332,1.399701,1.460569,1.215102,1.272477,1.329853,1.387228,1.444105
2010-01-02 00:00:00,2459695.0,2451752.0,2443809.0,2435866.0,2427923.0,2456252.0,2448034.0,2439815.0,2431596.0,2423377.0,...,3.755089,3.686738,3.617887,3.549536,3.481184,3.781532,3.710686,3.640338,3.569492,3.498646


En segundo lugar, se eliminan todas las columnas o variables de los datos irrelevantes para el problema planteado. Es decir, se mantienen únicamente las variables relativas al sector 13 del mapa.

In [5]:
for c in wind_comp.columns:
    if not c.endswith('.13') and c != 'datetime':
        wind_comp = wind_comp.drop(c, axis = 1)
        
wind_comp.head()

Unnamed: 0_level_0,p54.162.13,p55.162.13,cape.13,p59.162.13,lai_lv.13,lai_hv.13,u10n.13,v10n.13,sp.13,stl1.13,...,t2m.13,stl2.13,stl3.13,iews.13,inss.13,stl4.13,fsr.13,flsr.13,u100.13,v100.13
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010-01-01 00:00:00,2380345.0,10.537942,10.38251,2220341.0,2.346936,2.433955,1.258152,4.389059,96056.601316,278.713081,...,277.372035,280.494544,282.003767,0.223763,0.469945,285.865293,0.426499,-5.692383,3.028044,6.801977
2010-01-01 06:00:00,2387186.0,11.038628,1.136771,3593417.0,2.346543,2.433821,0.2586,-0.063731,96132.834569,278.298853,...,276.877148,280.01163,281.986895,0.049144,0.037347,285.844043,0.42672,-5.673355,1.905932,0.819461
2010-01-01 12:00:00,2411521.0,9.635489,0.265247,4477477.0,2.346151,2.433676,4.505959,1.314488,96882.102158,280.531223,...,281.085733,279.880559,281.958683,0.363838,0.075026,285.823566,0.420984,-5.772643,6.254707,1.65814
2010-01-01 18:00:00,2423920.0,10.724937,4.281838,6237006.0,2.345759,2.433542,1.620416,0.278042,97243.861285,279.684007,...,278.417187,280.376232,281.945406,0.152444,0.057106,285.803281,0.418642,-5.807686,4.680087,1.276968
2010-01-02 00:00:00,2436652.0,13.533924,1.250448,5666546.0,2.34534,2.433397,0.958361,1.244256,97438.05947,277.77866,...,276.086144,279.878626,281.929364,0.098397,0.105619,285.783963,0.419667,-5.780384,2.686441,3.517605


Una vez, se ha realizado el preprocesado de los datos de competición, se procede a realizar predicciones sobre ellos mediante el modelo final.

In [6]:
pred = modelo_final.predict(wind_comp)

In [11]:
wind_comp.shape

(1189, 23)

Como último paso, se almacenan todas las predicciones en un Data Frame con una única columna encabezada como 'Predicciones'. A continuación, se convierte dicho Data Frame en un archivo CSV llamado *predicciones.csv*

In [7]:
wind_comp['energy_pred'] = pred

In [8]:
wind_comp['energy_pred'].to_csv('predicciones.csv')
#pred_df.to_csv('predicciones.csv', index=False)

In [10]:
wind_comp['energy_pred'].shape

(1189,)