The dataset contains speed, direction, tempreture and other features. The dataset is downloaded from the following study 

In [1]:
!git clone https://github.com/HansBambel/multidim_conv.git

Cloning into 'multidim_conv'...
remote: Enumerating objects: 59, done.[K
remote: Counting objects: 100% (25/25), done.[K
remote: Compressing objects: 100% (22/22), done.[K
remote: Total 59 (delta 10), reused 12 (delta 3), pack-reused 34[K
Unpacking objects: 100% (59/59), 145.58 MiB | 12.48 MiB/s, done.


## Load Data

In [1]:
from src.load_data import read_raw_data, make_ready_data

In [2]:
train_data, test_data, scaler = read_raw_data()
train_data.shape, test_data.shape

((70128, 7, 6), (10872, 7, 6))

In [3]:
# min and max values of the data
scaler

{'Features': ['Wind speed in 0.1m/s',
  'Wind direction in degrees (360 North, 90 East, 0 No wind)',
  'Temperature in 0.1C',
  'Dew Point in 0.1C',
  'Air Pressure in 0.1hpa',
  'Rain amount in 0.1mm'],
 'feature_min_train': array([ 0.000e+00,  0.000e+00, -1.950e+02, -2.170e+02,  9.681e+03,
        -1.000e+00]),
 'feature_max_train': array([  240.,   360.,   376.,   230., 10462.,   481.])}

## Using LSTM with BiLinear Pooling Fusion

The following models are running in my own CPU, so it's very slow. It should be very fast if it runs on kaggle or google colab.

### 1 Hour ahead

In [4]:
# build time series features and labels
xtrain, xval, ytrain, yval = make_ready_data(train_data, feature='speed',gap=1)
xtrain_temp, xval_temp, _, _ = make_ready_data(train_data, feature='tempreture', gap=1)
xtest, ytest = make_ready_data(test_data, train=False, feature='speed', gap=1)
xtest_temp, _ = make_ready_data(test_data, train=False, feature='tempreture', gap=1)
xtrain.shape, xval.shape, ytrain.shape, yval.shape,xtest.shape, ytest.shape

((60000, 10, 7),
 (10116, 10, 7),
 (60000, 7),
 (10116, 7),
 (10860, 10, 7),
 (10860, 7))

In [5]:
# build dataloader
from src.data_utils import build_dataloader
train_iter, val_iter, test_iter, device = build_dataloader(xtrain, xval, xtest, 
                                                           xtrain_temp, xval_temp, 
                                                           xtest_temp, ytrain, 
                                                           yval, ytest)

In [None]:
from src.models import BiLinearPoolingLSTM
from src.run import run_train, validate, run_test

# Model specs
input_size = output_size = 7
hidden_size = 16
num_layers = 1

# build the model
lstm_model = BiLinearPoolingLSTM(output_size, input_size, hidden_size, num_layers)
lstm_model = lstm_model.to(device)

# train the model
lstm_model = run_train(lstm_model, train_iter, val_iter, num_epochs=10)

100%|██████████| 938/938 [02:00<00:00,  7.75it/s]
100%|██████████| 159/159 [00:08<00:00, 18.50it/s]
  0%|          | 1/938 [00:00<02:37,  5.94it/s]

Epoch:  1 , Train Loss:  0.003111670906372519 , Val Loss:  0.0021834732


100%|██████████| 938/938 [01:59<00:00,  7.85it/s]
100%|██████████| 159/159 [00:07<00:00, 21.67it/s]
100%|██████████| 938/938 [02:00<00:00,  7.80it/s]
100%|██████████| 159/159 [00:07<00:00, 22.64it/s]
  0%|          | 1/938 [00:00<02:11,  7.13it/s]

Epoch:  3 , Train Loss:  0.0016815283793463571 , Val Loss:  0.0017920024


 78%|███████▊  | 736/938 [01:37<00:29,  6.80it/s]

In [None]:
y_true, y_preds = run_test(lstm_model, test_iter, scaler)

In [None]:
from src.vis import results
results(y_true, y_preds, plots=True)

The very last few predictions gives exaggerated higher or lower values.. The reason for this is not understood yet

### 5 Hours ahead

In [11]:
xtrain, xval, ytrain, yval = make_ready_data(train_data, feature='speed',gap=5)
xtrain_temp, xval_temp, _, _ = make_ready_data(train_data, feature='tempreture', gap=5)
xtest, ytest = make_ready_data(test_data, train=False, feature='speed', gap=5)
xtest_temp, _ = make_ready_data(test_data, train=False, feature='tempreture', gap=5)

train_iter, val_iter, test_iter, device = build_dataloader(xtrain, xval, xtest, 
                                                           xtrain_temp, xval_temp, 
                                                           xtest_temp, ytrain, 
                                                           yval, ytest)

lstm_model = BiLinearPoolingLSTM(output_size, input_size, hidden_size, num_layers)
lstm_model = lstm_model.to(device)
lstm_model = run_train(lstm_model, train_iter, val_iter, num_epochs=10)
y_true, y_preds = run_test(lstm_model)

print('Test Data:')
results(y_true, y_preds, plots=False)

Epoch:  1 , Train Loss:  0.004730927003949666 , Val Loss:  0.003500427
Epoch:  3 , Train Loss:  0.0032807041660446656 , Val Loss:  0.0030898903
Epoch:  5 , Train Loss:  0.0030257366112381744 , Val Loss:  0.0028989867
Epoch:  7 , Train Loss:  0.0029208143426627635 , Val Loss:  0.0028274287
Epoch:  9 , Train Loss:  0.0028618395974266846 , Val Loss:  0.0027826112
Test Data:
RMSE:  13.85252
MAE:  9.184157


## 10 Hours ahead

In [12]:
xtrain, xval, ytrain, yval = make_ready_data(train_data, feature='speed',gap=10)
xtrain_temp, xval_temp, _, _ = make_ready_data(train_data, feature='tempreture', gap=10)
xtest, ytest = make_ready_data(test_data, train=False, feature='speed', gap=10)
xtest_temp, _ = make_ready_data(test_data, train=False, feature='tempreture', gap=10)

train_iter, val_iter, test_iter, device = build_dataloader(xtrain, xval, xtest, 
                                                           xtrain_temp, xval_temp, 
                                                           xtest_temp, ytrain, 
                                                           yval, ytest)

lstm_model = BiLinearPoolingLSTM(output_size, input_size, hidden_size, num_layers)
lstm_model = lstm_model.to(device)

lstm_model = run_train(lstm_model, train_iter, val_iter, num_epochs=10)
y_true, y_preds = run_test(lstm_model)

print('Test Data:')
results(y_true, y_preds, plots=False)

Epoch:  1 , Train Loss:  0.006010674732252717 , Val Loss:  0.0045241723
Epoch:  3 , Train Loss:  0.0048307917287587114 , Val Loss:  0.004419077
Epoch:  5 , Train Loss:  0.004663910239940878 , Val Loss:  0.0043506706
Epoch:  7 , Train Loss:  0.0045244124045829845 , Val Loss:  0.00429912
Epoch:  9 , Train Loss:  0.004421364447883348 , Val Loss:  0.0042680562
Test Data:
RMSE:  16.162317
MAE:  11.48787


In [13]:
xtrain, xval, ytrain, yval = make_ready_data(train_data, feature='speed',gap=50)
xtrain_temp, xval_temp, _, _ = make_ready_data(train_data, feature='tempreture', gap=50)
xtest, ytest = make_ready_data(test_data, train=False, feature='speed', gap=50)
xtest_temp, _ = make_ready_data(test_data, train=False, feature='tempreture', gap=50)

train_iter, val_iter, test_iter, device = build_dataloader(xtrain, xval, xtest, 
                                                           xtrain_temp, xval_temp, 
                                                           xtest_temp, ytrain, 
                                                           yval, ytest)

lstm_model = BiLinearPoolingLSTM(output_size, input_size, hidden_size, num_layers)
lstm_model = lstm_model.to(device)

lstm_model = run_train(lstm_model, train_iter, val_iter, num_epochs=10)
y_true, y_preds = run_test(lstm_model)

print('Test Data:')
results(y_true, y_preds, plots=False)

Epoch:  1 , Train Loss:  0.009099008107154784 , Val Loss:  0.008012023
Epoch:  3 , Train Loss:  0.008384591996369919 , Val Loss:  0.008037458
Epoch:  5 , Train Loss:  0.008288733531355556 , Val Loss:  0.008044602
Epoch:  7 , Train Loss:  0.008224498602062234 , Val Loss:  0.008072225
Epoch:  9 , Train Loss:  0.008182332686050288 , Val Loss:  0.008144436
Test Data:
RMSE:  22.836233
MAE:  17.541508


## References:

- Dataset from "Trebing, Kevin and Mehrkanoon, Siamak, 2020, Wind speed prediction using multidimensional convolutional neural networks"