# EU Wind Energy PDataset

The dataset contains data from different EU countries, from 1986 to 2015, on Wind Energy Potential per Hour.
The second dataset contains information on the wind power stations themselves, along the same time period.

There is still a lot to improve, namely in the plotting domain. Any constructive feedback is very welcome :)

In [None]:
import pandas as pd
import numpy as np
import datetime as dt

import matplotlib.pyplot as plt
import matplotlib.dates as pltdt
%matplotlib inline

import seaborn as sns
sns.set_style("darkgrid")


##  Starting with Dataframe HOUR vs COUNTRY

In [None]:
coun = pd.read_csv("../input/EMHIRESPV_TSh_CF_Country_19862015.csv")
coun.head(3)

In [None]:
coun.shape

In [None]:
t = pd.date_range('1/1/1986', periods = 262968, freq = 'H')

Let's see the wind energy profile for all countries on the last day and on the last month of the last year.

In [None]:
coun["Hour"] = t
coun.set_index("Hour", inplace = True, )
coun['2015-12-31'].plot()
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, ncol = 2,  borderaxespad=0.)

In [None]:
coun['2015-12'].plot()
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, ncol = 2,  borderaxespad=0.)

## per Day

The lines are all over the place but you can clearly see a pattern.

In [None]:
coun['Day']=coun.index.map(lambda x: x.strftime('%Y-%m-%d'))
c_group_day = coun.groupby('Day').mean()
c_group_day.plot()
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, ncol = 2,  borderaxespad=0.)

Cyprus has always null values for Wind Energy Potential and it becomes very evident in this picture

## per Month

In [None]:
coun['Month']=coun.index.map(lambda x: x.strftime('%Y-%m'))
coun['Month_only']=coun.index.map(lambda x: x.strftime('%m'))
c_group_month = coun.groupby('Month').mean()
c_group_month.plot()
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, ncol = 2,  borderaxespad=0.)

## per Year

In [None]:
coun['Year']=coun.index.map(lambda x: x.strftime('%Y'))
c_group_year = coun.groupby('Year').mean()
c_group_year.plot()
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, ncol = 2,  borderaxespad=0.)

# Zooming in on Portugal

Let's start by seeing how Portugal has been doing in the wind energy sector in this 30 year time period.

In [None]:
pt_heatmap = coun.pivot_table(index = 'Month_only', columns = 'Year', values = 'PT')
pt_heatmap.sortlevel(level = 0, ascending = True, inplace = True)
sns.heatmap(pt_heatmap, vmin = 0.09, vmax = 0.29, cmap = 'inferno', linewidth = 0.5)

In [None]:
sns.clustermap(pt_heatmap, cmap = 'inferno', standard_scale = 1)

It's very curious to observe that the best wind energy potential values also coincide with the months deemed more lucrative for solar energy production, that is, the best values range from April to September (~ Spring ~ Summer) and lower values from October to February (~ Autumn ~ Winter). There was a significant sequence of years where March was also very productive for the wind-powered energy sector.

## Let's look at it from a time-series perspective

In [None]:
pt_ts = coun.filter(['Month','Year','PT'], axis = 1)
pt_ts.plot()

In [None]:
pt_ts_m = pt_ts.groupby('Month').mean()
pt_ts_m.plot()

In [None]:
pt_ts_y = pt_ts.groupby('Year').mean()
pt_ts_y.plot()

## Testing some predictions 

Let's see if we can develop a RNN model to predict the next hour's wind energy potential value, considering a rolling window of 24h.

It will be only considered the last month of the last year for better visualization (country - PT).

In [None]:
pt_nn = coun.filter(['Hour', 'PT'], axis = 1)

pt_nn = pt_nn.reset_index()
pt_nn['Hour'] = pd.to_datetime(pt_nn['Hour'])

start = pd.Timestamp('2015-12-01')
split = pd.Timestamp('2015-12-22')
pt_nn = pt_nn[pt_nn['Hour']>=start]

pt_nn = pt_nn.set_index('Hour')

pt_nn.plot()

In [None]:
train = pt_nn.loc[:split, ['PT']]
test = pt_nn.loc[split:, ['PT']]
tr_pl = train
te_pl = test
ax = tr_pl.plot()
te_pl.plot(ax=ax)

So our train set is in blue, up to the 22nd December, and our test set is from then onwards (green).

In [None]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
train_sc = sc.fit_transform(train)
test_sc = sc.transform(test)

X_train = train_sc[:-1]
y_train = X_train[1:]
X_train = X_train[:-1]            # in order for arrays to have same length

X_test = test_sc[:-1]
y_test = X_test[1:]
X_test = X_test[:-1]

### Rolling Windows

In [None]:
train_df = pd.DataFrame(train_sc, columns = ['PT'], index = train.index )
test_df = pd.DataFrame(test_sc, columns = ['PT'], index = test.index )

In [None]:
for s in range(1, 25):
    train_df['shift {}'.format(s)] = train_df['PT'].shift(s, freq = 'H')
    test_df['shift {}'.format(s)] = test_df['PT'].shift(s, freq = 'H')

train_df.head(3)

Here we can see part of the nice shift the loop caused. Next we trim the dataframe.

In [None]:
X_train = train_df.dropna().drop('PT', axis = 1)
y_train = train_df.dropna()[['PT']]

X_test = test_df.dropna().drop('PT', axis = 1)
y_test = test_df.dropna()[['PT']]
X_train.head(3)

In [None]:
X_train.shape

In [None]:
# to np.array
X_train = X_train.values
y_train = y_train.values

X_test = X_test.values
y_test = y_test.values

### The Predictive Model

In [None]:
# Needs to be re-dimensioned for LSTM layer
X_train_w = X_train.reshape(X_train.shape[0], 1, 24)
X_test_w = X_test.reshape(X_test.shape[0], 1, 24)
X_train_w.shape

In [None]:
from keras.models import Sequential
from keras.layers import Dense, LSTM, Flatten
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
import keras.backend as K

In [None]:
K.clear_session()

eps = 500
bs = 1

in_sh = (1, 24) 
hidden_1= 12
hidden_2= 12
outputs = 1

model = Sequential()
model.add(LSTM(hidden_1, input_shape = in_sh,))
model.add(Dense(hidden_2, activation ='relu'))
model.add(Dense(outputs))
model.compile(optimizer='adam', loss='mean_squared_error',)
model.summary()

In [None]:
early_stop = EarlyStopping(monitor = 'loss', patience = 1, verbose = 1)

In [None]:
model.fit(X_train_w, y_train, epochs = eps, batch_size = bs, verbose = 1 , callbacks = [early_stop])

In [None]:
y_pred = model.predict(X_test_w)

In [None]:
plt.plot(y_test)
plt.plot(y_pred)

Our Recurrent Neural Network's performance seems quite acceptable.