# Wind Power Forecasting

### Context
Here's data of a certain windmill. The aim was to predict the wind power that could be generated from the windmill for the next 15 days. A long term wind forecasting technique is thus required.

### Content
It contains various weather, turbine and rotor features. Data has been recorded from January 2018 till March 2020. Readings have been recorded at a 10-minute interval.

### Task Details
The aim is to use multistep time series forecasting to predict power that can be generated from the windmill for the next 15 days.

### Expected Submission
Feel free to use the data to get a feel of multivariate time series analysis. Note that wind speed is a very unpredictable variable so be prepared to handle a very noisy time series!

### Evaluation
Clear and concise code with a model that results in a low mean absolute error.

### Interesting Observation
A hybrid ARIMA-ANN model has been tested and given good results for modelling a single variable. Check out the paper
https://www.sciencedirect.com/science/article/abs/pii/S0925231201007020

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [2]:
# Unnamed column is actually DateTimes
df = pd.read_csv("Turbine_Data.csv", parse_dates=['Unnamed: 0'])

df.rename(columns={'Unnamed: 0':'DateTime', 'AmbientTemperatue':'AmbientTemperature'}, inplace=True)
df['DateTime'] = df['DateTime'].dt.strftime('%Y-%m-%d %H:%M')
df = df.set_index('DateTime')

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 118224 entries, 2017-12-31 00:00 to 2020-03-30 23:50
Data columns (total 21 columns):
 #   Column                        Non-Null Count   Dtype  
---  ------                        --------------   -----  
 0   ActivePower                   94750 non-null   float64
 1   AmbientTemperature            93817 non-null   float64
 2   BearingShaftTemperature       62518 non-null   float64
 3   Blade1PitchAngle              41996 non-null   float64
 4   Blade2PitchAngle              41891 non-null   float64
 5   Blade3PitchAngle              41891 non-null   float64
 6   ControlBoxTemperature         62160 non-null   float64
 7   GearboxBearingTemperature     62540 non-null   float64
 8   GearboxOilTemperature         62438 non-null   float64
 9   GeneratorRPM                  62295 non-null   float64
 10  GeneratorWinding1Temperature  62427 non-null   float64
 11  GeneratorWinding2Temperature  62449 non-null   float64
 12  HubTemperature          

In [3]:
pd.set_option('display.max_columns', None)
df.tail()

Unnamed: 0_level_0,ActivePower,AmbientTemperature,BearingShaftTemperature,Blade1PitchAngle,Blade2PitchAngle,Blade3PitchAngle,ControlBoxTemperature,GearboxBearingTemperature,GearboxOilTemperature,GeneratorRPM,GeneratorWinding1Temperature,GeneratorWinding2Temperature,HubTemperature,MainBoxTemperature,NacellePosition,ReactivePower,RotorRPM,TurbineStatus,WTG,WindDirection,WindSpeed
DateTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-03-30 23:10,70.044465,27.523741,45.711129,1.515669,1.950088,1.950088,0.0,59.821165,55.193793,1029.870744,59.060367,58.148777,39.008931,36.476562,178.0,13.775785,9.234004,2.0,G01,178.0,3.533445
2020-03-30 23:20,40.833474,27.602882,45.598573,1.702809,2.136732,2.136732,0.0,59.142038,54.798545,1030.160478,58.452003,57.550367,39.006759,36.328125,178.0,8.088928,9.22937,2.0,G01,178.0,3.261231
2020-03-30 23:30,20.77779,27.560925,45.462045,1.706214,2.139664,2.139664,0.0,58.439439,54.380456,1030.137822,58.034071,57.099335,39.003815,36.131944,178.0,4.355978,9.236802,2.0,G01,178.0,3.331839
2020-03-30 23:40,62.091039,27.810472,45.343827,1.575352,2.009781,2.009781,0.0,58.205413,54.079014,1030.178178,57.795387,56.847239,39.003815,36.007805,190.0,12.018077,9.237374,2.0,G01,190.0,3.284468
2020-03-30 23:50,68.664425,27.915828,45.23161,1.499323,1.933124,1.933124,0.0,58.581716,54.080505,1029.834789,57.694813,56.74104,39.003815,35.914062,203.0,14.439669,9.235532,2.0,G01,203.0,3.475205


In [4]:
df['WTG'].value_counts()    # Constant column

G01    118224
Name: WTG, dtype: int64

In [5]:
df['ControlBoxTemperature'].value_counts()  # Constant column

0.0    62160
Name: ControlBoxTemperature, dtype: int64

In [6]:
# No need for constant columns
# Note: All columns are now of type float64

df.drop(columns=['WTG', 'ControlBoxTemperature'], inplace=True)

In [7]:
# We have missing data
df.isnull().sum()

ActivePower                     23474
AmbientTemperature              24407
BearingShaftTemperature         55706
Blade1PitchAngle                76228
Blade2PitchAngle                76333
Blade3PitchAngle                76333
GearboxBearingTemperature       55684
GearboxOilTemperature           55786
GeneratorRPM                    55929
GeneratorWinding1Temperature    55797
GeneratorWinding2Temperature    55775
HubTemperature                  55818
MainBoxTemperature              55717
NacellePosition                 45946
ReactivePower                   23476
RotorRPM                        56097
TurbineStatus                   55316
WindDirection                   45946
WindSpeed                       23629
dtype: int64

In [8]:
# Let's use linear interpolation to impute missing values

df.interpolate(method='linear', inplace=True)
df.isnull().sum()


ActivePower                       144
AmbientTemperature                144
BearingShaftTemperature         33065
Blade1PitchAngle                70789
Blade2PitchAngle                70789
Blade3PitchAngle                70789
GearboxBearingTemperature       33065
GearboxOilTemperature           33065
GeneratorRPM                    33065
GeneratorWinding1Temperature    33065
GeneratorWinding2Temperature    33065
HubTemperature                  33065
MainBoxTemperature              33065
NacellePosition                   144
ReactivePower                     144
RotorRPM                        33065
TurbineStatus                   33073
WindDirection                     144
WindSpeed                         144
dtype: int64

In [9]:
df.head(145)

Unnamed: 0_level_0,ActivePower,AmbientTemperature,BearingShaftTemperature,Blade1PitchAngle,Blade2PitchAngle,Blade3PitchAngle,GearboxBearingTemperature,GearboxOilTemperature,GeneratorRPM,GeneratorWinding1Temperature,GeneratorWinding2Temperature,HubTemperature,MainBoxTemperature,NacellePosition,ReactivePower,RotorRPM,TurbineStatus,WindDirection,WindSpeed
DateTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2017-12-31 00:00,,,,,,,,,,,,,,,,,,,
2017-12-31 00:10,,,,,,,,,,,,,,,,,,,
2017-12-31 00:20,,,,,,,,,,,,,,,,,,,
2017-12-31 00:30,,,,,,,,,,,,,,,,,,,
2017-12-31 00:40,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2017-12-31 23:20,,,,,,,,,,,,,,,,,,,
2017-12-31 23:30,,,,,,,,,,,,,,,,,,,
2017-12-31 23:40,,,,,,,,,,,,,,,,,,,
2017-12-31 23:50,,,,,,,,,,,,,,,,,,,


In [10]:
# It appears the turbine was off on the first day. Lets remove it.
df = df.iloc[145:,:]
df.isnull().sum()

ActivePower                         0
AmbientTemperature                  0
BearingShaftTemperature         32920
Blade1PitchAngle                70644
Blade2PitchAngle                70644
Blade3PitchAngle                70644
GearboxBearingTemperature       32920
GearboxOilTemperature           32920
GeneratorRPM                    32920
GeneratorWinding1Temperature    32920
GeneratorWinding2Temperature    32920
HubTemperature                  32920
MainBoxTemperature              32920
NacellePosition                     0
ReactivePower                       0
RotorRPM                        32920
TurbineStatus                   32928
WindDirection                       0
WindSpeed                           0
dtype: int64

In [11]:
# If a column has a null value in some row,
# Then every column that contains null values is null in that row.
# I.e. the readings for various things were turned off simultaneously
# Except for blade pitch angles, which were turned off more frequently.
df[df.BearingShaftTemperature.isnull() == False].isnull().sum()

ActivePower                         0
AmbientTemperature                  0
BearingShaftTemperature             0
Blade1PitchAngle                37724
Blade2PitchAngle                37724
Blade3PitchAngle                37724
GearboxBearingTemperature           0
GearboxOilTemperature               0
GeneratorRPM                        0
GeneratorWinding1Temperature        0
GeneratorWinding2Temperature        0
HubTemperature                      0
MainBoxTemperature                  0
NacellePosition                     0
ReactivePower                       0
RotorRPM                            0
TurbineStatus                       8
WindDirection                       0
WindSpeed                           0
dtype: int64

In [12]:
df[df.Blade1PitchAngle.isnull() == False].isnull().sum()

ActivePower                     0
AmbientTemperature              0
BearingShaftTemperature         0
Blade1PitchAngle                0
Blade2PitchAngle                0
Blade3PitchAngle                0
GearboxBearingTemperature       0
GearboxOilTemperature           0
GeneratorRPM                    0
GeneratorWinding1Temperature    0
GeneratorWinding2Temperature    0
HubTemperature                  0
MainBoxTemperature              0
NacellePosition                 0
ReactivePower                   0
RotorRPM                        0
TurbineStatus                   0
WindDirection                   0
WindSpeed                       0
dtype: int64