## Lab: RNN with Forecasting

### Wind Turbine Power Prediction with RNN

This lab uses a Recurrent Neural Network to predict wind turbine power output.
* [Data link](https://www.kaggle.com/code/ahmedfathygwely/wind-turbine-dataset-machine-learning-rnn-times/input)

### Step 0: Load libraries & Data

In [1]:
# Step 0: Load libraries & Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
import math

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Load the data
df = pd.read_csv('T1.csv')

In [2]:
# Display basic information

print(f"Wind Turbine dataset shape: {df.shape}") 
print("\nDescriptive Statistics:")
print(df.describe())
df.head()

Wind Turbine dataset shape: (50530, 5)

Descriptive Statistics:
       LV ActivePower (kW)  Wind Speed (m/s)  Theoretical_Power_Curve (KWh)  \
count         50530.000000      50530.000000                   50530.000000   
mean           1307.684332          7.557952                    1492.175463   
std            1312.459242          4.227166                    1368.018238   
min              -2.471405          0.000000                       0.000000   
25%              50.677890          4.201395                     161.328167   
50%             825.838074          7.104594                    1063.776283   
75%            2482.507568         10.300020                    2964.972462   
max            3618.732910         25.206011                    3600.000000   

       Wind Direction (°)  
count        50530.000000  
mean           123.687559  
std             93.443736  
min              0.000000  
25%             49.315437  
50%             73.712978  
75%            201.696720  


Unnamed: 0,Date/Time,LV ActivePower (kW),Wind Speed (m/s),Theoretical_Power_Curve (KWh),Wind Direction (°)
0,01 01 2018 00:00,380.047791,5.311336,416.328908,259.994904
1,01 01 2018 00:10,453.769196,5.672167,519.917511,268.641113
2,01 01 2018 00:20,306.376587,5.216037,390.900016,272.564789
3,01 01 2018 00:30,419.645905,5.659674,516.127569,271.258087
4,01 01 2018 00:40,380.650696,5.577941,491.702972,265.674286


### Step 1: Prepare the data

In [3]:
# Step 1: Prepare the data

df['Date Time'] = pd.to_datetime(df['Date/Time'], format='%d %m %Y %H:%M')
df = df.drop('Date/Time', axis=1)
df.set_index('Date Time', inplace=True)
df.head()

Unnamed: 0_level_0,LV ActivePower (kW),Wind Speed (m/s),Theoretical_Power_Curve (KWh),Wind Direction (°)
Date Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2018-01-01 00:00:00,380.047791,5.311336,416.328908,259.994904
2018-01-01 00:10:00,453.769196,5.672167,519.917511,268.641113
2018-01-01 00:20:00,306.376587,5.216037,390.900016,272.564789
2018-01-01 00:30:00,419.645905,5.659674,516.127569,271.258087
2018-01-01 00:40:00,380.650696,5.577941,491.702972,265.674286


In [4]:
data = df.values
scaler = MinMaxScaler(feature_range=(0, 1))
data = scaler.fit_transform(data.reshape(-1, 1))
data.shape

(202120, 1)

### Step 2: Create sequences

In [6]:
# Step 2: Create sequences
seq_size = 36

X = []
y = []

for i in range(seq_size, len(df)):
    X.append(data[i-seq_size:i, 0])
    y.append(data[i, 0])

X = np.array(X)
y = np.array(y)

X = np.reshape(X, (X.shape[0], X.shape[1], 1))
X.shape, y.shape


((50494, 36, 1), (50494,))

### Step 3: Split the data

In [7]:
# Step 3: Split the data
split = 35000

X_train = X[:split]
y_train = y[:split]

X_test = X[split:]
y_test = y[split:]

X_train.shape, y_train.shape, X_test.shape, y_test.shape

((35000, 36, 1), (35000,), (15494, 36, 1), (15494,))

### Step 4: Build the RNN Model

In [9]:
# Step 4: Build the RNN Model
# Model1 - simple, 1 LSTM layer

model1 = Sequential()
model1.add(LSTM(20, activation='tanh', input_shape=(seq_size, 1), return_sequences=True))
model1.add(Dense(1))
model1.compile(optimizer='adam', loss='mse')

model1.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_1 (LSTM)               (None, 36, 20)            1760      
                                                                 
 dense_1 (Dense)             (None, 36, 1)             21        
                                                                 
Total params: 1,781
Trainable params: 1,781
Non-trainable params: 0
_________________________________________________________________


In [10]:
# Step 4: Build the RNN Model
# Model2 - adding an additional LSTM layer and dropout layers

model2 = Sequential()
model2.add(LSTM(20, activation='tanh', input_shape=(seq_size, 1), return_sequences=True))
model2.add(Dropout(0.5))
model2.add(LSTM(40, activation='tanh', input_shape=(seq_size, 1)))
model2.add(Dropout(0.5))
model2.add(Dense(1))
model2.compile(optimizer='adam', loss='mse')

model2.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_2 (LSTM)               (None, 36, 20)            1760      
                                                                 
 dropout (Dropout)           (None, 36, 20)            0         
                                                                 
 lstm_3 (LSTM)               (None, 40)                9760      
                                                                 
 dropout_1 (Dropout)         (None, 40)                0         
                                                                 
 dense_2 (Dense)             (None, 1)                 41        
                                                                 
Total params: 11,561
Trainable params: 11,561
Non-trainable params: 0
_________________________________________________________________


In [11]:
# early stopping
early_stopping = EarlyStopping(
    monitor='val_loss',       # Monitor validation loss
    patience=4,               # Number of epochs with no improvement after which to stop
    min_delta=0.001,          # Minimum change to qualify as improvement
    restore_best_weights=True # Restore model weights from the epoch with the best value
)

# model checkpoint
checkpoint_filepath = './best_model.keras'
model_checkpoint = ModelCheckpoint(
    filepath=checkpoint_filepath,
    monitor='val_loss',
    save_best_only=True,      # Only save when there's improvement
    mode='min',               # The direction is 'min' for loss
    verbose=1                 # Show progress
)

callbacks = [
    early_stopping,
    model_checkpoint
]

### Step 5: Train the model

In [12]:
# Step 5: Train the model
# model1
history1 = model1.fit(
    X_train, y_train, 
    epochs=50,
    validation_split=0.2,
    callbacks=callbacks,
    verbose=1
)

Epoch 1/50
Epoch 1: val_loss improved from inf to 0.15194, saving model to .\best_model.keras
Epoch 2/50
Epoch 2: val_loss improved from 0.15194 to 0.14418, saving model to .\best_model.keras
Epoch 3/50
Epoch 3: val_loss did not improve from 0.14418
Epoch 4/50
Epoch 4: val_loss did not improve from 0.14418
Epoch 5/50
Epoch 5: val_loss did not improve from 0.14418
Epoch 6/50
Epoch 6: val_loss did not improve from 0.14418


In [13]:
# Step 5: train the model
# model2
history2 = model2.fit(
    X_train, y_train, 
    epochs=50,
    validation_split=0.2,
    callbacks=callbacks,
    verbose=1
)

Epoch 1/50
Epoch 1: val_loss improved from 0.14418 to 0.01605, saving model to .\best_model.keras
Epoch 2/50
Epoch 2: val_loss improved from 0.01605 to 0.01096, saving model to .\best_model.keras
Epoch 3/50
Epoch 3: val_loss improved from 0.01096 to 0.00764, saving model to .\best_model.keras
Epoch 4/50
Epoch 4: val_loss improved from 0.00764 to 0.00477, saving model to .\best_model.keras
Epoch 5/50
Epoch 5: val_loss improved from 0.00477 to 0.00414, saving model to .\best_model.keras
Epoch 6/50
Epoch 6: val_loss improved from 0.00414 to 0.00348, saving model to .\best_model.keras
Epoch 7/50
Epoch 7: val_loss did not improve from 0.00348
Epoch 8/50
Epoch 8: val_loss did not improve from 0.00348
Epoch 9/50
Epoch 9: val_loss did not improve from 0.00348
Epoch 10/50
Epoch 10: val_loss improved from 0.00348 to 0.00305, saving model to .\best_model.keras


### Step 6: Evaluate on the test data and visualize the results

In [14]:
# Step 6: Evaluate and visualize
y_pred1 = model1.predict(X_test)
y_pred2 = model2.predict(X_test)
y_pred1.shape



(15494, 36, 1)

In [15]:
y_test_inv = scaler.inverse_transform([y_test])
y_pred1_inv = scaler.inverse_transform(y_pred1)
y_pred2_inv = scaler.inverse_transform(y_pred2)


ValueError: Found array with dim 3. None expected <= 2.

In [None]:
plt.figure(figsize=(12,6))
plt.plot(y_test_inv.flatten(), marker='.', label="Actual")
plt.plot(y_pred1_inv.flatten(), 'r', marker='.', label="Predicted")
plt.legend()
plt.title('Model 1 Predictions vs. Test Set')
plt.show()

In [None]:
plt.figure(figsize=(12,6))
plt.plot(y_test_inv.flatten(), marker='.', label="Actual")
plt.plot(y_pred2_inv.flatten(), 'r', marker='.', label="Predicted")
plt.legend()
plt.title('Model 2 Predictions vs. Test Set')
plt.show()