**Aim: The aim of this notebook is to predict wind power that could be generated from the windmill for the next 15 days**

## Approach:

This is a learning project, where I want to use TensorFlow for doing Time Series Project. 

Algorithms tried:
1. 

## Import statements

In [None]:
import os
import datetime

import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf

mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False

# to display all columns
pd.options.display.max_columns = None
pd.options.display.max_rows = None

## Read the data

In [None]:
# read the data
data = pd.read_csv('../input/wind-power-forecasting/Turbine_Data.csv')
data.head()

We see that the data is populated at intervals of 10 minutes. 

In [None]:
data.info()

In [None]:
data.describe()

## Preprocessing

1. The column `Unnanmed: 0` looks to be a date column. Let's change the type to datetime. 

2. Looking closely at df.describe(), we can see that `Blade2PitchAngle` and `Blade3PitchAngle` are having same values. We will validate this by `df1['col'].equals(df2['col']`, and if both are equal, we will drop one of the columns.

In [None]:
# change the Unnamed: 0 to datetype
df_updated = data.copy()
df_updated['Unnamed: 0'] = pd.to_datetime(df_updated['Unnamed: 0'])
df_updated.rename(columns={'Unnamed: 0': 'date_column'}, inplace=True)

if (df_updated['Blade2PitchAngle'].equals(df_updated['Blade3PitchAngle'])==True):
  df_updated = df_updated.drop('Blade3PitchAngle', axis=1) 

# check if the column is dropped
assert 'Blade3PitchAngle' not in df_updated.columns

In [None]:
df_updated.head()

## Check for null values

In [None]:
# Check null values
df_updated.isnull().sum()

**To do**:
How to handle the missing data here:

1. Populate the NaN values by `df.fillna(method='ffill')`
2. Populate the missing values by median value/ average value? (or moving average value)



In [None]:
df_updated = df_updated.fillna(method='ffill').fillna(method='bfill')
df_updated.isnull().sum()

In [None]:
df = df_updated[['date_column', 'ActivePower', 'WindSpeed', 'GeneratorRPM', 'ReactivePower', 'RotorRPM', 'AmbientTemperatue', \
                 'WindDirection', 'Blade1PitchAngle', 'Blade2PitchAngle', 'HubTemperature', 'MainBoxTemperature', 'GearboxBearingTemperature', \
                 'GearboxOilTemperature']].copy()

In [None]:
# crete a new column called Weekday
df['weekday'] = df['date_column'].dt.dayofweek
# get one hot encoding
ohe = pd.get_dummies(df['weekday'])
df = df.join(ohe)
df = df.drop('weekday', axis=1)
date = pd.to_datetime(df.pop('date_column'))
df.head()

## Date column
  * The `date` column in string format is not a useful input. 
  * It could have clear daily and yearly periodicity.
  * Use `sin` and `cos` to convert the time to clear "Time of day" and "Time of  year signals.

In [None]:
# convert datetime column to seconds
import datetime
timestamp_s = date.map(datetime.datetime.timestamp)

In [None]:
# time in seconds may not be an useful input. We could convert it into sin and cos
day = 24*60*60
year = (365.2425)*day

df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day))
df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day))
df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year))
df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))

In [None]:
df.head()

In [None]:
plt.plot(np.array(df['Day sin'])[:200])
plt.plot(np.array(df['Day cos'])[:200])
plt.xlabel('Time [h]')
plt.title('Time of day signal');

## Split the data

* We will use `70%, 20%, 10%` split for the training, validation and test sets.

* Data is not randomly sampled before splitting:
  * Chopping the data into windows of consecutive samples is still possible.
  * Ensures that validation/test sets are more realistic.

In [None]:
column_indices = {name: i for i, name in enumerate(df.columns)}

n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]

num_features = df.shape[1]

In [None]:
df.head()

## Normalize the data

* It is important to scale features before training a Neural Network
* Normalization = Subtract the mean and divide by the standard deviation of each feature.

* **thinktank**: normalization could be done using moving averages.

In [None]:
# Normalize the data
train_mean = train_df.mean()
train_std = train_df.std()

train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std

Visualize the distribution

In [None]:
# the code below visualizes the normalized data
df_std = (df - train_mean) / train_std
df_std = df_std.melt(var_name='Column', value_name='Normalized')
plt.figure(figsize=(12, 6))
ax = sns.violinplot(x='Column', y='Normalized', data=df_std)
_ = ax.set_xticklabels(df.keys(), rotation=90)

## Data Windowing

* Models will make predictions based on a window of consecutive samples from the data
* The main features of the input windows are:
  * The width (number of timesteps) of the input and label windows
  * The time offset between them
  * Which features are used as inputs, labels, or both.

#### Indexes and offset

In [None]:
class WindowGenerator():
  def __init__(self, input_width, label_width, shift,
               train_df=train_df, val_df=val_df, test_df=test_df,
               label_columns=None):
    # Store the raw data.
    self.train_df = train_df
    self.val_df = val_df
    self.test_df = test_df

    # Work out the label column indices.
    self.label_columns = label_columns
    if label_columns is not None:
      self.label_columns_indices = {name: i for i, name in
                                    enumerate(label_columns)}
    self.column_indices = {name: i for i, name in
                           enumerate(train_df.columns)}

    # Work out the window parameters.
    self.input_width = input_width
    self.label_width = label_width
    self.shift = shift

    self.total_window_size = input_width + shift
    self.input_slice = slice(0, input_width)
    self.input_indices = np.arange(self.total_window_size)[self.input_slice]

    self.label_start = self.total_window_size - self.label_width
    self.labels_slice = slice(self.label_start, None)
    self.label_indices = np.arange(self.total_window_size)[self.labels_slice]

  def __repr__(self):
    return '\n'.join([
        f'Total window size: {self.total_window_size}',
        f'Input indices: {self.input_indices}',
        f'Label indices: {self.label_indices}',
        f'Label column name(s): {self.label_columns}'])

* In our case, we want to predict 15 days into the future. 
* input_width = `30*24*6`
* label_with = `30*24*6`
* shift = 1

In [None]:
w2 = WindowGenerator(input_width=30*24*6, label_width=15*24*6, shift=15*24*6,
                     label_columns=['ActivePower'])

w2

## Split

* Given a list of consecutive inputs, the `split_window` method will convert them to a window of inputs and a window of labels. 

In [None]:
def split_window(self, features):
  inputs = features[:, self.input_slice, :]
  labels = features[:, self.labels_slice, :]
  if self.label_columns is not None:
    labels = tf.stack(
        [labels[:, :, self.column_indices[name]] for name in self.label_columns],
        axis=-1)

  # Slicing doesn't preserve static shape information, so set the shapes
  # manually. This way the `tf.data.Datasets` are easier to inspect.
  inputs.set_shape([None, self.input_width, None])
  labels.set_shape([None, self.label_width, None])

  return inputs, labels

WindowGenerator.split_window = split_window

In [None]:
# Stack three slices, the length of the total window:
example_window = tf.stack([np.array(train_df[:w2.total_window_size]),
                           np.array(train_df[100:100+w2.total_window_size]),
                           np.array(train_df[200:200+w2.total_window_size])])


example_inputs, example_labels = w2.split_window(example_window)

print('All shapes are: (batch, time, features)')
print(f'Window shape: {example_window.shape}')
print(f'Inputs shape: {example_inputs.shape}')
print(f'labels shape: {example_labels.shape}')

* This example takes a batch of 3, 4320 timestep windows, with 24 features at each timestep. 
* It splits them into a batch of 2160 timestep, 24 feature inputs, and a 2160 timestep, 1 feature output label. 
* The label only has one feature, because the `WindowGenerator` was initialized with one column. 

## Plot

* A plot method that allows a simple visualization of the split window

In [None]:
w2.example = example_inputs, example_labels

In [None]:
def plot(self, model=None, plot_col='ActivePower', max_subplots=5):
  inputs, labels = self.example
  plt.figure(figsize=(70, 20))
  plot_col_index = self.column_indices[plot_col]
  max_n = min(max_subplots, len(inputs))
  for n in range(max_n):
    plt.subplot(max_n, 1, n+1)
    plt.ylabel(f'{plot_col} [normed]')
    plt.plot(self.input_indices, inputs[n, :, plot_col_index],
             label='Inputs', marker='.', zorder=-10)

    if self.label_columns:
      label_col_index = self.label_columns_indices.get(plot_col, None)
    else:
      label_col_index = plot_col_index

    if label_col_index is None:
      continue

    plt.scatter(self.label_indices, labels[n, :, label_col_index],
                edgecolors='k', label='Labels', c='#2ca02c', s=124)
    if model is not None:
      predictions = model(inputs)
      plt.scatter(self.label_indices, predictions[n, :, label_col_index],
                  marker='X', label='Predictions',
                  c='#ff0000', s=154)

    if n == 0:
      plt.legend(prop={'size': 20})

  plt.xlabel('Time [20 min]')

WindowGenerator.plot = plot

In [None]:
w2.plot()

## Create tf.Datasets 
* The `make_dataset` method below will take a timeseries DataFrame and convert it to a `tf.data.Dataset` of (input_window, label_window) pairs using the `preprocessing.timeseries_dataset_from_array` function.

In [None]:
def make_dataset(self, data):
  data = np.array(data, dtype=np.float32)
  ds = tf.keras.preprocessing.timeseries_dataset_from_array(
      data=data,
      targets=None,
      sequence_length=self.total_window_size,
      sequence_stride=1,
      shuffle=True,
      batch_size=120,)

  ds = ds.map(self.split_window)

  return ds

WindowGenerator.make_dataset = make_dataset

The `WindowGenerator` ibhect holds training, validation, and test data. Add properties for accessing them using the above `make_dataset` method. Also, add a standard example batch for easy access and plotting. 

In [None]:
@property
def train(self):
  return self.make_dataset(self.train_df)

@property
def val(self):
  return self.make_dataset(self.val_df)

@property
def test(self):
  return self.make_dataset(self.test_df)

@property
def example(self):
  """Get and cache an example batch of `inputs, labels` for plotting."""
  result = getattr(self, '_example', None)
  if result is None:
    # No example batch was found, so get one from the `.train` dataset
    result = next(iter(self.train))
    # And cache it for next time
    self._example = result
  return result

WindowGenerator.train = train
WindowGenerator.val = val
WindowGenerator.test = test
WindowGenerator.example = example

In [None]:
# Each element is an (inputs, label) pair
w2.train.element_spec

Iterating over a dataset yields concrete batches. 

In [None]:
for example_inputs, example_labels in w2.train.take(1):
  print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
  print(f'Labels shape (batch, time, features): {example_labels.shape}')

In [None]:
wide_window = WindowGenerator(
    input_width=2*6, label_width=2*6, shift=1,
    label_columns=['ActivePower'])

wide_window

## Compile and fit

In [None]:
# package the training into a function
MAX_EPOCHS = 20

def compile_and_fit(model, window, patience=3):
  early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                    patience=3,
                                                    mode='min')

  model.compile(loss=tf.losses.MeanAbsoluteError(),
                optimizer=tf.optimizers.Adam(lr=0.01),
                metrics=[tf.metrics.MeanAbsoluteError()])

  history = model.fit(window.train, epochs=MAX_EPOCHS,
                      validation_data=window.val,
                      callbacks=[early_stopping])
  return history


### Multi step model

* We want to predict multiple steps into the future (15 days into the future)

* In multi-step prediction, the model needs to learn to predict a range of future values. A sequence of future values are predicted. 

* **Two approaches**:

1. Single shot predictions where the entire timeseries is predicted at once
2. AutoRegressive model: the model makes only single step predictions and its output is fed back as input. 

Here's a window object that generates these slices from the dataset

In [None]:
OUT_STEPS = 15*24*6
multi_window = WindowGenerator(input_width=15*24*6,
                               label_width=OUT_STEPS,
                               shift=OUT_STEPS)

multi_window.plot()
multi_window


#### Baseline model

##### Repeat last input: A simple baseline is to repeat the last input timestep for the required number of output timesteps. 

In [None]:
class MultiStepLastBaseline(tf.keras.Model):
  def call(self, inputs):
    return tf.tile(inputs[:, -1:, :], [1, OUT_STEPS, 1])

last_baseline = MultiStepLastBaseline()
last_baseline.compile(loss=tf.losses.MeanAbsoluteError(),
                      metrics=[tf.metrics.MeanAbsoluteError()])

multi_val_performance = {}
multi_performance = {}

multi_val_performance['Last'] = last_baseline.evaluate(multi_window.val)
multi_performance['Last'] = last_baseline.evaluate(multi_window.test)
multi_window.plot(last_baseline)


Repeat Baseline: Repeat previous 15 days, assuming the next 15 days will be similar.

In [None]:
class RepeatBaseline(tf.keras.Model):
  def call(self, inputs):
    return inputs

repeat_baseline = RepeatBaseline()
repeat_baseline.compile(loss=tf.losses.MeanAbsoluteError(),
                        metrics=[tf.metrics.MeanAbsoluteError()])

multi_val_performance['Repeat'] = repeat_baseline.evaluate(multi_window.val)
multi_performance['Repeat'] = repeat_baseline.evaluate(multi_window.test)
multi_window.plot(repeat_baseline)


### Single shot model

* The model makes the entire sequence prediction in one step
* Model only needs to reshape the output

#### Linear model

* `multi_linear_model`: 
  
  * Groups a linear stack of layers into a `tf.keras.Model`. 

In [None]:
OUT_STEPS = 15*24*6
multi_window = WindowGenerator(input_width=30*24*6,
                               label_width=OUT_STEPS,
                               shift=OUT_STEPS)

multi_window.plot()
multi_window


In [None]:
multi_linear_model = tf.keras.Sequential([
    # Take the last time-step.
    # Shape [batch, time, features] => [batch, 1, features]
    tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
    # Shape => [batch, 1, out_steps*features]
    tf.keras.layers.Dense(OUT_STEPS*num_features,
                          kernel_initializer=tf.initializers.zeros()),
    # Shape => [batch, out_steps, features]
    tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_linear_model, multi_window)

#IPython.display.clear_output()
multi_val_performance['Linear'] = multi_linear_model.evaluate(multi_window.val)
multi_performance['Linear'] = multi_linear_model.evaluate(multi_window.test)
multi_window.plot(multi_linear_model)


This model does better than baseline, but still underpowered. (Also we haven't made use of other feature columns).

#### Dense
* Add layers.dense  between the input and output

In [None]:
multi_dense_model = tf.keras.Sequential([
    # Take the last time step.
    # Shape [batch, time, features] => [batch, 1, features]
    tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
    # Shape => [batch, 1, dense_units]
    tf.keras.layers.Dense(512, activation='relu'),
    # Shape => [batch, out_steps*features]
    tf.keras.layers.Dense(OUT_STEPS*num_features,
                          kernel_initializer=tf.initializers.zeros()),
    # Shape => [batch, out_steps, features]
    tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_dense_model, multi_window)

multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val)
multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test)
multi_window.plot(multi_dense_model)


#### CNN

In [None]:
CONV_WIDTH = 3
multi_conv_model = tf.keras.Sequential([
    # Shape [batch, time, features] => [batch, CONV_WIDTH, features]
    tf.keras.layers.Lambda(lambda x: x[:, -CONV_WIDTH:, :]),
    # Shape => [batch, 1, conv_units]
    tf.keras.layers.Conv1D(256, activation='relu', kernel_size=(CONV_WIDTH)),
    # Shape => [batch, 1,  out_steps*features]
    tf.keras.layers.Dense(OUT_STEPS*num_features,
                          kernel_initializer=tf.initializers.zeros()),
    # Shape => [batch, out_steps, features]
    tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_conv_model, multi_window)

multi_val_performance['Conv'] = multi_conv_model.evaluate(multi_window.val)
multi_performance['Conv'] = multi_conv_model.evaluate(multi_window.test)
multi_window.plot(multi_conv_model)


#### RNN

In [None]:
multi_lstm_model = tf.keras.Sequential([
    # Shape [batch, time, features] => [batch, lstm_units]
    # Adding more `lstm_units` just overfits more quickly.
    tf.keras.layers.LSTM(32, return_sequences=False),
    # Shape => [batch, out_steps*features]
    tf.keras.layers.Dense(OUT_STEPS*num_features,
                          kernel_initializer=tf.initializers.zeros()),
    # Shape => [batch, out_steps, features]
    tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_lstm_model, multi_window)

multi_val_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.val)
multi_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.test)
multi_window.plot(multi_lstm_model)


**Improvements/ future development**

* Add more layers to RNN/CNN

* Try Autoregressive model

* Create custom evaluation metric:
  * to give more weightage to the closest predictions

* Feature engineering + apply the known datafields to future dates which are fed to the predict function. 
