# RES Forecasting - Data Presentation

The Greek RES dataset combines information from different sources. It contains geolocation, weather and RES energy production data of the top contributors of wind and solar energy that contribute the 20% of the RES power in the Greek energy grid. The weather data were collected from a weather forecast API and represent the last generated forecast for each installation.

The data were collected from the following sources:

* RES energy production from [ENTSO-E](https://transparency.entsoe.eu/load-domain/r2/totalLoadR2/show)
* Weather data captured using the [Stormglass API](https://docs.stormglass.io/#/weather?id=point-request)
* Wind turbine and PV installations geolocation data, for the locations that aquired operation licence as found in [Regulatory Authority for Energy (RAE)](https://geo.rae.gr/)

More information regarding energy licencing http://www.opengov.gr/minenv/?p=1031

---

**NOTE:** This notebook presents the process of the combination of the different types of data 

In [1]:
import json
import pandas as pd
from datetime import timedelta, datetime
import os
import numpy as np
import re
import seaborn as sns
from tqdm import tqdm


## 1. Location data

Import the location data that contains the latitude and longtitude information of the most significant wind and solar power generation stations in Greece.
* id: location ID
* lon: longtitude
* lat: latidute
* type: energy type {wind, solar}


In [2]:
location_df = pd.read_csv(os.path.join('resources', 'location_data.csv'), index_col=0)
location_df

Unnamed: 0,id,lon,lat,type
0,11381,23.946727,37.937648,solar
1,12936,22.59963,36.903438,solar
2,13673,22.611828,36.881737,solar
3,33332,23.155247,37.485296,wind
4,33651,20.548365,39.774893,wind
5,33714,22.931187,38.317112,wind
6,36876,21.232926,38.769349,wind
7,33837,23.496811,38.233498,wind
8,33815,22.362301,37.396455,wind
9,33804,22.581566,38.313572,wind


## 2. Weather data

The location information should be used to collect weather data for the respective locations, that will be used to compile an aggregated weather dataset with the average values of the selected weather variables for the period of interest.

The most relevant weather variables captured by the [Stormglass API](https://docs.stormglass.io/#/weather?id=point-request) are the following:


| Time 	| Location 	|     Temperature    	|     Solar    	|        Wind       	|      Misc     	|
|:----:	|:--------:	|:------------------:	|:------------:	|:-----------------:	|:-------------:	|
| time 	| id1      	| airTemperature     	| cloudCover   	| windDirection     	| gust          	|
|      	| lat      	| airTemperature80m  	| currentSpeed 	| windDirection20m  	| precipitation 	|
|      	| lon      	| airTemperature100m 	| visibility   	| windDirection30m  	| humidity      	|
|      	|          	|                    	|              	| windDirection40m  	| pressure      	|
|      	|          	|                    	|              	| windDirection50m  	|               	|
|      	|          	|                    	|              	| windDirection80m  	|               	|
|      	|          	|                    	|              	| windDirection100m 	|               	|
|      	|          	|                    	|              	| windSpeed         	|               	|
|      	|          	|                    	|              	| windSpeed100m     	|               	|
|      	|          	|                    	|              	| windSpeed20m      	|               	|
|      	|          	|                    	|              	| windSpeed30m      	|               	|
|      	|          	|                    	|              	| windSpeed40m      	|               	|
|      	|          	|                    	|              	| windSpeed50m      	|               	|
|      	|          	|                    	|              	| windSpeed80m      	|               	|


More information on the weather variable can be found in the [Stormglass API Documentation](https://docs.stormglass.io/#/weather?id=point-request)

Lets load the combined historical weather data for all designated locations:

In [3]:
weather_df = pd.read_csv('resources/raw_data/weather_data/historic_weather_data.20170101-20210102.csv', 
                         index_col=0,
                         parse_dates=['time'])
weather_df

Unnamed: 0_level_0,lat,lon,airTemperature,airTemperature100m,airTemperature80m,cloudCover,currentSpeed,gust,humidity,precipitation,...,windDirection80m,windSpeed,windSpeed100m,windSpeed20m,windSpeed30m,windSpeed40m,windSpeed50m,windSpeed80m,time,energy_type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11381,37.937648,23.946727,5.540000,,,99.000000,,6.600000,82.900000,1.37,...,,5.270000,,,,,,,2017-01-01 00:00:00+00:00,solar
11381,37.937648,23.946727,5.530000,,,98.170000,,6.130000,83.370000,1.22,...,,4.860000,,,,,,,2017-01-01 01:00:00+00:00,solar
11381,37.937648,23.946727,5.510000,,,97.330000,,5.670000,83.830000,1.08,...,,4.440000,,,,,,,2017-01-01 02:00:00+00:00,solar
11381,37.937648,23.946727,5.500000,,,96.500000,,5.200000,84.300000,0.93,...,,4.030000,,,,,,,2017-01-01 03:00:00+00:00,solar
11381,37.937648,23.946727,5.600000,,,95.670000,,5.270000,84.200000,0.78,...,,4.170000,,,,,,,2017-01-01 04:00:00+00:00,solar
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36876,38.769349,21.232926,7.993333,10.56,10.55,0.000000,0.01,1.700000,82.923333,0.00,...,90.01,1.186667,1.74,1.62,1.64,1.67,1.69,1.73,2021-01-01 19:00:00+00:00,wind
36876,38.769349,21.232926,7.290000,10.24,10.24,0.000000,0.01,1.746667,84.303333,0.00,...,90.01,1.370000,1.70,1.59,1.61,1.63,1.66,1.69,2021-01-01 20:00:00+00:00,wind
36876,38.769349,21.232926,6.676667,9.92,9.93,0.000000,0.01,1.730000,85.746667,0.00,...,90.01,1.556667,1.66,1.55,1.57,1.60,1.62,1.65,2021-01-01 21:00:00+00:00,wind
36876,38.769349,21.232926,6.223333,9.64,9.68,13.133333,0.01,2.090000,88.376667,0.00,...,90.01,1.640000,1.75,1.64,1.66,1.69,1.71,1.74,2021-01-01 22:00:00+00:00,wind


Data preprocessing methods

In [4]:
def set_na_values(df):
   '''
   Sets unfilled values or certain string as NA.
   '''
   return df.replace(to_replace={'n/e': np.nan, '-': np.nan})

def drop_na_values(df):
   '''
   Drops columns with less than 80% non-NA values
   '''
   return df.dropna(axis=1, thresh=int(len(df) * 0.8))
    
def drop_column(df, columns):
   return df.drop(columns=columns)

def set_time(df):
   df['time'] = pd.to_datetime(df['MTU'].str.split('-', expand=True)[0].str.strip())
   df.loc[:, 'time'] = df['time'].dt.tz_localize('CET', nonexistent=timedelta(hours=1), ambiguous=False)
   df.drop(columns=['MTU'], inplace=True)
   df.reset_index(inplace=True)
   df.drop(columns='index', inplace=True)
   return df

def to_numeric(df):
   '''
   Convert data to numeric.
   '''
   for col in set(df.columns) - set(['time']):
      df.loc[:, col] = pd.to_numeric(df[col])   
   return df

def na_fill(df, column):
   '''
   Fills missing values of the input column with the values of similar past days.
   '''
   na_dates = sorted(set(df.loc[df[column].isna()].index.date))
   for day in na_dates:
      orig_day_range = pd.date_range(start=floor_date(day), end=ceil_date(day), freq='H', tz='CET')
      similar_days = [day.replace(year=year) for year in list(set(df.index.year) - set([day.year]))]
      _df_list = []
      for sim_day in similar_days:
         day_range = pd.date_range(start=floor_date(sim_day), end=ceil_date(sim_day), freq='H', tz='CET')
         _df_list.append(df.loc[df.index.isin(day_range), column])
      _df = pd.concat(_df_list).to_frame()
      _df['hour'] = _df.index.time
      df.loc[df.index.isin(orig_day_range), column] = _df.groupby('hour').mean().values[:len(df[df.index.isin(orig_day_range)])]
   return df

def floor_date(date):
   '''
   Round date to the start of the day.
   '''
   date = pd.to_datetime(date)
   return date - timedelta(hours=date.time().hour, 
                         minutes=date.time().minute, 
                         seconds=date.time().second, 
                         microseconds=date.time().microsecond)

def ceil_date(date):
   '''
   Round date to the end of the day.
   '''
   date = floor_date(date)   
   return date + timedelta(hours=23)

### 2.1 Wind weather data preprocessing

Wind weather data preprocessing:

In [5]:
wind_energy_df = weather_df.query('energy_type==\"wind\"').copy()
wind_energy_df.shape

(631584, 28)

In [6]:
wind_energy_df = wind_energy_df.pipe(drop_na_values)
wind_energy_df.shape

(631584, 13)

In [7]:
na_value_columns = [column for column in wind_energy_df.columns if wind_energy_df[column].isna().sum() > 0]

for column in na_value_columns:
   print(column)
   progressbar = tqdm(wind_energy_df.index.unique(), desc='Processing')

   for i in progressbar:
      progressbar.set_description('Processing location ID: %s' % i)
      df = wind_energy_df.loc[i].set_index('time')
      df = df.pipe(na_fill, column)
      
wind_energy_df.shape

Processing location ID: 32947:   0%|          | 0/18 [00:00<?, ?it/s]

visibility


Processing location ID: 36876: 100%|██████████| 18/18 [00:49<00:00,  2.77s/it]


(631584, 13)

In [8]:
wind_energy_df = wind_energy_df.groupby('time').mean()
wind_energy_df.head()

Unnamed: 0_level_0,lat,lon,airTemperature,cloudCover,gust,humidity,precipitation,pressure,visibility,windDirection,windSpeed
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2017-01-01 00:00:00+00:00,38.784385,22.861444,1.686667,81.277778,3.366667,73.1,0.173889,1026.061111,,161.911667,2.804444
2017-01-01 01:00:00+00:00,38.784385,22.861444,1.673333,72.796667,3.217778,73.290556,0.160556,1025.918889,,162.192778,2.682222
2017-01-01 02:00:00+00:00,38.784385,22.861444,1.663889,64.314444,3.071111,73.481667,0.147778,1025.780556,,162.475,2.561111
2017-01-01 03:00:00+00:00,38.784385,22.861444,1.650556,55.833333,2.922222,73.672222,0.132778,1025.642222,,162.756111,2.438889
2017-01-01 04:00:00+00:00,38.784385,22.861444,1.437222,47.352222,2.818333,74.692222,0.119444,1025.661111,,214.995,2.355556


### 2.1 Solar weather data preprocessing

Solar weather data preprocessing:

In [9]:
solar_energy_df = weather_df.query('energy_type==\"solar\"').copy()
solar_energy_df.shape

(105264, 28)

In [10]:
solar_energy_df = solar_energy_df.pipe(drop_na_values)
solar_energy_df.shape

(105264, 13)

In [11]:
na_value_columns = [column for column in solar_energy_df.columns if solar_energy_df[column].isna().sum() > 0]

for column in na_value_columns:
   print(column)
   progressbar = tqdm(solar_energy_df.index.unique(), desc='Processing')

   for i in progressbar:
      progressbar.set_description('Processing location ID: %s' % i)
      df = solar_energy_df.loc[i].set_index('time')
      df = df.pipe(na_fill, column)
      
solar_energy_df.shape

Processing location ID: 11381:   0%|          | 0/3 [00:00<?, ?it/s]

visibility


Processing location ID: 13673: 100%|██████████| 3/3 [00:08<00:00,  2.93s/it]


(105264, 13)

In [12]:
solar_energy_df = solar_energy_df.groupby('time').mean()
solar_energy_df.head()

Unnamed: 0_level_0,lat,lon,airTemperature,cloudCover,gust,humidity,precipitation,pressure,visibility,windDirection,windSpeed
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2017-01-01 00:00:00+00:00,37.240941,23.052728,4.466667,55.0,6.2,74.9,0.456667,1024.97,,341.576667,4.93
2017-01-01 01:00:00+00:00,37.240941,23.052728,4.423333,51.276667,6.043333,75.77,0.406667,1024.933333,,342.936667,4.8
2017-01-01 02:00:00+00:00,37.240941,23.052728,4.37,47.556667,5.89,76.63,0.36,1024.896667,,344.293333,4.66
2017-01-01 03:00:00+00:00,37.240941,23.052728,4.326667,43.833333,5.733333,77.5,0.31,1024.853333,,345.653333,4.53
2017-01-01 04:00:00+00:00,37.240941,23.052728,4.353333,40.11,5.803333,77.82,0.26,1024.91,,345.796667,4.63


## 3. Energy data
### 3.1 Energy data loading

Actual energy output data, at hourly intervals aggregated per energy type, were collected by the European Network of Transmission System Operators for Electricity ([ENTSO-E](https://transparency.entsoe.eu/load-domain/r2/totalLoadR2/show)), that represents 42 electricity transmission system operators (TSOs) from 35 countries across Europe.

In [13]:
energy_data_dir = os.path.join('resources/raw_data/', 'ENTSOE_data')

# Combine all energy data in the directory 
energy_df = pd.concat([pd.read_csv(os.path.join(energy_data_dir, csv), 
                                   parse_dates=['MTU']) for csv in os.listdir(energy_data_dir)])
print(energy_df.shape)
energy_df.head()

(61375, 23)


Unnamed: 0,Area,MTU,Biomass - Actual Aggregated [MW],Fossil Brown coal/Lignite - Actual Aggregated [MW],Fossil Coal-derived gas - Actual Aggregated [MW],Fossil Gas - Actual Aggregated [MW],Fossil Hard coal - Actual Aggregated [MW],Fossil Oil - Actual Aggregated [MW],Fossil Oil shale - Actual Aggregated [MW],Fossil Peat - Actual Aggregated [MW],...,Hydro Run-of-river and poundage - Actual Aggregated [MW],Hydro Water Reservoir - Actual Aggregated [MW],Marine - Actual Aggregated [MW],Nuclear - Actual Aggregated [MW],Other - Actual Aggregated [MW],Other renewable - Actual Aggregated [MW],Solar - Actual Aggregated [MW],Waste - Actual Aggregated [MW],Wind Offshore - Actual Aggregated [MW],Wind Onshore - Actual Aggregated [MW]
0,Greece (GR),01.01.2014 00:00 - 01.01.2014 01:00 (CET),n/e,,n/e,,n/e,,n/e,n/e,...,n/e,n/e,n/e,n/e,n/e,n/e,,n/e,n/e,
1,Greece (GR),01.01.2014 01:00 - 01.01.2014 02:00 (CET),n/e,,n/e,,n/e,,n/e,n/e,...,n/e,n/e,n/e,n/e,n/e,n/e,,n/e,n/e,
2,Greece (GR),01.01.2014 02:00 - 01.01.2014 03:00 (CET),n/e,,n/e,,n/e,,n/e,n/e,...,n/e,n/e,n/e,n/e,n/e,n/e,,n/e,n/e,
3,Greece (GR),01.01.2014 03:00 - 01.01.2014 04:00 (CET),n/e,,n/e,,n/e,,n/e,n/e,...,n/e,n/e,n/e,n/e,n/e,n/e,,n/e,n/e,
4,Greece (GR),01.01.2014 04:00 - 01.01.2014 05:00 (CET),n/e,,n/e,,n/e,,n/e,n/e,...,n/e,n/e,n/e,n/e,n/e,n/e,,n/e,n/e,


In [14]:
# Preprocessing
energy_df = (energy_df.pipe(set_na_values) # Define NA values
                      # Parse time column
                      .pipe(set_time) 
                      # Drop irrelevant columns
                      .pipe(drop_column, ['Area'] + list(set(energy_df.filter(regex='.+Actual.+').columns) 
                                                        - set(energy_df.filter(like='Wind').columns) 
                                                        - set(energy_df.filter(like='Solar').columns)))
                      # Drop NA values
                      .pipe(drop_na_values)
                      # 
                      .pipe(to_numeric))
energy_df.tail()

Unnamed: 0,Solar - Actual Aggregated [MW],Wind Onshore - Actual Aggregated [MW],time
61370,0.0,1659.0,2020-12-31 19:00:00+01:00
61371,0.0,1516.0,2020-12-31 20:00:00+01:00
61372,0.0,1426.0,2020-12-31 21:00:00+01:00
61373,0.0,1382.0,2020-12-31 22:00:00+01:00
61374,0.0,1416.0,2020-12-31 23:00:00+01:00


In [15]:
energy_df = energy_df.set_index('time').sort_index().rename(columns={energy_df.columns[0]: 'solar power', 
                                                                     energy_df.columns[1]: 'wind power'})
energy_df.tail()

Unnamed: 0_level_0,solar power,wind power
time,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-12-31 19:00:00+01:00,0.0,1659.0
2020-12-31 20:00:00+01:00,0.0,1516.0
2020-12-31 21:00:00+01:00,0.0,1426.0
2020-12-31 22:00:00+01:00,0.0,1382.0
2020-12-31 23:00:00+01:00,0.0,1416.0


## 3.2 Align data according to time

Weather data are captrured at `UTC` timezone whereas the energy data are at the local `CET` timezone, thus the weather data are translated to `CET` time. Moreover, time missalignment in the datasets, caused due to DST time changes should be handled.

In [16]:
energy_df.index = energy_df.index.tz_convert('CET')
energy_df.tail()

Unnamed: 0_level_0,solar power,wind power
time,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-12-31 19:00:00+01:00,0.0,1659.0
2020-12-31 20:00:00+01:00,0.0,1516.0
2020-12-31 21:00:00+01:00,0.0,1426.0
2020-12-31 22:00:00+01:00,0.0,1382.0
2020-12-31 23:00:00+01:00,0.0,1416.0


In [17]:
solar_energy_df.index = solar_energy_df.index.tz_convert('CET')
solar_energy_df.tail()

Unnamed: 0_level_0,lat,lon,airTemperature,cloudCover,gust,humidity,precipitation,pressure,visibility,windDirection,windSpeed
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2021-01-01 20:00:00+01:00,37.240941,23.052728,7.78,15.203333,3.375556,71.2,0.0,1019.034444,24.14,303.963333,1.117778
2021-01-01 21:00:00+01:00,37.240941,23.052728,7.461111,14.63,3.813333,71.448889,0.0,1019.261111,24.14,311.441111,1.203333
2021-01-01 22:00:00+01:00,37.240941,23.052728,7.112222,14.253333,4.566667,73.075556,0.0,1019.468889,24.14,318.908889,1.29
2021-01-01 23:00:00+01:00,37.240941,23.052728,6.992222,14.027778,5.028889,73.433333,0.0,1019.603333,24.13,324.183333,1.335556
2021-01-02 00:00:00+01:00,37.240941,23.052728,6.964444,12.907778,4.975556,70.917778,0.002222,1019.557778,24.13,329.461111,1.382222


In [18]:
wind_energy_df.index = wind_energy_df.index.tz_convert('CET')
wind_energy_df.tail()

Unnamed: 0_level_0,lat,lon,airTemperature,cloudCover,gust,humidity,precipitation,pressure,visibility,windDirection,windSpeed
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2021-01-01 20:00:00+01:00,38.784385,22.861444,7.78537,3.027593,2.348148,78.45037,0.0,1019.113704,24.134444,152.822778,1.291667
2021-01-01 21:00:00+01:00,38.784385,22.861444,7.451667,2.927222,2.442778,78.233704,0.0,1019.468333,24.134444,152.857222,1.329074
2021-01-01 22:00:00+01:00,38.784385,22.861444,7.148333,2.464074,2.485185,78.599259,0.0,1019.718704,24.134444,159.560741,1.365926
2021-01-01 23:00:00+01:00,38.784385,22.861444,6.883148,2.940185,2.56537,79.523889,0.0,1019.870185,24.13,164.018148,1.354444
2021-01-02 00:00:00+01:00,38.784385,22.861444,6.658889,3.372037,2.652407,80.623148,0.0,1019.82963,24.13,168.474815,1.339444


Define start and end period

In [19]:
period_start = max(energy_df.index[0], 
                   solar_energy_df.index[0], 
                   wind_energy_df.index[0])
period_end = min(energy_df.index[-1], 
                    solar_energy_df.index[-1], 
                    wind_energy_df.index[-1])
period_start, period_end

(Timestamp('2017-01-01 01:00:00+0100', tz='CET'),
 Timestamp('2020-12-31 23:00:00+0100', tz='CET'))

Trim data according to start and end of the period

In [20]:
energy_df = energy_df[(period_start < energy_df.index) & 
                      (energy_df.index < period_end)]
energy_df.head()

Unnamed: 0_level_0,solar power,wind power
time,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01 02:00:00+01:00,0.0,391.0
2017-01-01 03:00:00+01:00,0.0,355.0
2017-01-01 04:00:00+01:00,0.0,360.0
2017-01-01 05:00:00+01:00,0.0,330.0
2017-01-01 06:00:00+01:00,7.0,278.0


In [21]:
solar_energy_df = solar_energy_df[(period_start < solar_energy_df.index) & 
                                  (solar_energy_df.index < period_end)]
solar_energy_df.head()

Unnamed: 0_level_0,lat,lon,airTemperature,cloudCover,gust,humidity,precipitation,pressure,visibility,windDirection,windSpeed
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2017-01-01 02:00:00+01:00,37.240941,23.052728,4.423333,51.276667,6.043333,75.77,0.406667,1024.933333,,342.936667,4.8
2017-01-01 03:00:00+01:00,37.240941,23.052728,4.37,47.556667,5.89,76.63,0.36,1024.896667,,344.293333,4.66
2017-01-01 04:00:00+01:00,37.240941,23.052728,4.326667,43.833333,5.733333,77.5,0.31,1024.853333,,345.653333,4.53
2017-01-01 05:00:00+01:00,37.240941,23.052728,4.353333,40.11,5.803333,77.82,0.26,1024.91,,345.796667,4.63
2017-01-01 06:00:00+01:00,37.240941,23.052728,4.386667,36.39,5.863333,78.146667,0.213333,1024.97,,345.946667,4.74


In [22]:
wind_energy_df = wind_energy_df[(period_start < wind_energy_df.index) & 
                                  (wind_energy_df.index < period_end)]
wind_energy_df.head()

Unnamed: 0_level_0,lat,lon,airTemperature,cloudCover,gust,humidity,precipitation,pressure,visibility,windDirection,windSpeed
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2017-01-01 02:00:00+01:00,38.784385,22.861444,1.673333,72.796667,3.217778,73.290556,0.160556,1025.918889,,162.192778,2.682222
2017-01-01 03:00:00+01:00,38.784385,22.861444,1.663889,64.314444,3.071111,73.481667,0.147778,1025.780556,,162.475,2.561111
2017-01-01 04:00:00+01:00,38.784385,22.861444,1.650556,55.833333,2.922222,73.672222,0.132778,1025.642222,,162.756111,2.438889
2017-01-01 05:00:00+01:00,38.784385,22.861444,1.437222,47.352222,2.818333,74.692222,0.119444,1025.661111,,214.995,2.355556
2017-01-01 06:00:00+01:00,38.784385,22.861444,1.228889,38.87,2.715,75.718889,0.106667,1025.681667,,207.236667,2.276667


Resample data at hourly intervals

In [23]:
energy_df = energy_df.resample('1h').mean()
solar_energy_df = solar_energy_df.resample('1h').mean()
wind_energy_df = wind_energy_df.resample('1h').mean()

## 3.3 Wind energy dataset preperation

In [24]:
wind_power_df = pd.concat([wind_energy_df, energy_df['wind power'].to_frame()], axis=1)
wind_power_df = wind_power_df.interpolate(method='pad')
wind_power_df.head()

Unnamed: 0_level_0,lat,lon,airTemperature,cloudCover,gust,humidity,precipitation,pressure,visibility,windDirection,windSpeed,wind power
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2017-01-01 02:00:00+01:00,38.784385,22.861444,1.673333,72.796667,3.217778,73.290556,0.160556,1025.918889,,162.192778,2.682222,391.0
2017-01-01 03:00:00+01:00,38.784385,22.861444,1.663889,64.314444,3.071111,73.481667,0.147778,1025.780556,,162.475,2.561111,355.0
2017-01-01 04:00:00+01:00,38.784385,22.861444,1.650556,55.833333,2.922222,73.672222,0.132778,1025.642222,,162.756111,2.438889,360.0
2017-01-01 05:00:00+01:00,38.784385,22.861444,1.437222,47.352222,2.818333,74.692222,0.119444,1025.661111,,214.995,2.355556,330.0
2017-01-01 06:00:00+01:00,38.784385,22.861444,1.228889,38.87,2.715,75.718889,0.106667,1025.681667,,207.236667,2.276667,278.0


## 3.3 Solar energy dataset preperation

In [25]:
solar_power_df = pd.concat([solar_energy_df, energy_df['solar power'].to_frame()], axis=1)
solar_power_df = solar_power_df.interpolate(method='pad')
solar_power_df.head()

Unnamed: 0_level_0,lat,lon,airTemperature,cloudCover,gust,humidity,precipitation,pressure,visibility,windDirection,windSpeed,solar power
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2017-01-01 02:00:00+01:00,37.240941,23.052728,4.423333,51.276667,6.043333,75.77,0.406667,1024.933333,,342.936667,4.8,0.0
2017-01-01 03:00:00+01:00,37.240941,23.052728,4.37,47.556667,5.89,76.63,0.36,1024.896667,,344.293333,4.66,0.0
2017-01-01 04:00:00+01:00,37.240941,23.052728,4.326667,43.833333,5.733333,77.5,0.31,1024.853333,,345.653333,4.53,0.0
2017-01-01 05:00:00+01:00,37.240941,23.052728,4.353333,40.11,5.803333,77.82,0.26,1024.91,,345.796667,4.63,0.0
2017-01-01 06:00:00+01:00,37.240941,23.052728,4.386667,36.39,5.863333,78.146667,0.213333,1024.97,,345.946667,4.74,7.0


In [26]:
processed_data_dir = os.path.join('resources', 'processed_data')

if not os.path.exists(processed_data_dir):
   os.mkdir(processed_data_dir)

wind_power_df.to_csv(os.path.join(processed_data_dir, 'wind_energy_w_forecast.csv')) 
solar_power_df.to_csv(os.path.join(processed_data_dir, 'solar_energy_w_forecast.csv'))