# Preprocess Datasets and Extract Features: linearinterp_avgmodels
> Feature engineering notebook

Dataset columns (same convention as the lab1):

| Col1 | Col2 | Col3 | Col3 | $\dots$ |
|------|------|------|------|---------|
| $Y$  |$Y_0$ | $X_1$| $X_2$| $\dots$ |

- $Y$ : labels or target values, in our case $Y(T+18)$
- $Y_0$ : present value $Y(T+0)$
- $X_1$, $X_2$, $\dots$ : other features


* **Baselines**: use only Y(t): energy as features
    * [x] Naive windows VS Diff windows (step=1): window_size \[20, 40, 80\]
* **Wind Speed** Forecast:
    * [x] Wind speed forecast X(T+18 ): for each location speed*(sin^2+cos^2) --> 8 additional features
    * [x] Wind speed forecast X(T+18 ) for 8 loc-s + diffWindows Y(T) \[20,40\]
* **Wind Speed Differences**:
    * [x] diffWindows Y(T) + Wind speed (T+18, T+0) + naiveWindow X(T)
    * [x] diffWindows Y(T) + Wind speed (T+18, T+0) + diffWindows X(T) \[T+0,T-1,...\] w=20, w=40
* **Wind Speed Differences w/ step**:
    * [ ] step_diffWindows Y(T) + Wind speed (T+18, T+0) + step_diffWindows X(T) w=20, 40
* Next:
    * [ ] Wind speed + **direction** (T+0 and past): sin, cos for each loc-n --> 16 additional features
    * [ ] Encoding schemes e.g. positional encoding of Y and X values. 
    * [ ] Momentum,Force (step=1, 4, 9, 18) \[Fine tuning\]
    * [ ] Separate Forecast Models
    * [ ] Nearest neighbour interpolation dataset
    * [ ] what else?

\[ TRAIN/TEST SPLIT IS DONE AFTER PREPROCESSING FEATURES \]

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
#For running in JupyterHub:
import os
if os.path.basename(os.getcwd())!='P003':
    print('Not in /P003 folder, changing directory to P003')
    lib_path = os.path.expanduser(os.path.relpath('~/images/codesDIR/datathon2020/P003'))
    os.chdir(lib_path)

Not in /P003 folder, changing directory to P003


In [3]:
import numpy as np
from matplotlib import pyplot as plt
import matplotlib
plt.style.use('ggplot')

%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12,8)
# matplotlib.rcParams['font.size']= 22 # use for presentation

In [4]:
from src.datautils import windowed_data, windowed_diff_data, windowed_diff_with_step, windowed_momentum_force
# or
# from datautils import windowed_data

## Constants

In [5]:
import os

# Train% / Test% :
TRAIN_PERCENT = 95
TEST_PERCENT = 100-TRAIN_PERCENT
print(f"> We will be using train/test: {TRAIN_PERCENT}% / {TEST_PERCENT}% split.\n")

# Path for Datasets:
# all preprocessed data will be saved in `data_path`
data_path = os.path.relpath('../../../dataDIR/'+'preprocessed_linearinterp_avgmodels')
print(f'> Preprocessed data will be saved in `{data_path}`\n')

# Lead Time :
lead_time = 18 # T+18
print(f'> Lead time is set to T+{lead_time}\n')

> We will be using train/test: 95% / 5% split.

> Preprocessed data will be saved in `../../../dataDIR/preprocessed_linearinterp_avgmodels`

> Lead time is set to T+18



## Import Normalised Dataset (not yet split)

In [6]:
# full dataset
with open('norm_linearinterp_avgmodels.npy', 'rb') as f:
    data_norm = np.load(f) # columns (all normalised): Energy, loc1_sin, loc1_cos, ...,loc8_sin, loc8_cos

In [7]:
# plt.plot(data_norm[lead_time:,0],'-',label=f'T+{lead_time}',alpha=.5)
# plt.plot(data_norm[:-lead_time,0],'-',label=f'T+0',alpha=.3)
# plt.axis([9000,10000,-.5,2])
# plt.legend()

## Using Only Energy as Features: Y(t)
* Try out window sizes
* Differences with step sizes etc.

- Revision video: [Session 3: The Prediction Pipeline](https://youtu.be/4W6-48wXXEc?t=1246)

number of samples in normalized datasets

### Naive Window as Features
- Dataset with window width `window_size`; columns:`[Y(T+lead_time), Y(T+0), ...,Y(T-window_size+1)]` 

write train/test datasets to :
- `train_preprocessed_naivewin{}.npy` and
- `test_preprocessed_naivewin{}.npy`

`naivewin{}` stands for naive windowed data with window width `{}`

In [84]:
WINDOW_SIZES = [20, 40, 80] # Y(T+0) and "window_size-1" previous points, T-1,T-2,...,T-window_size+1

x_norm = data_norm[:,0] # Y(t) energy

for window_size in WINDOW_SIZES:
    # index of last elem of window
    start_time = window_size-1
    # prepare windows : 1st columns Y(T+lead_time)
    X_norm_wind = windowed_data(x_norm, lead_time=lead_time, window_size=window_size)
    print(f'\nFull windowed dataset (1st column is target Y(T+{lead_time}): {X_norm_wind.shape}',
          f"; windowsize:{window_size}")
    
    # split train/test
    split_index = X_norm_wind.shape[0]*TRAIN_PERCENT//100
    
    X_train = X_norm_wind[:split_index,:]
    X_test = X_norm_wind[split_index:,:]
    print('Training dataset:',X_train.shape)
    print('Testing dataset:',X_test.shape)
    
    # write to files
    
    # training data
    train_file_name = os.path.join(data_path,f'train_preprocessed_naivewin{window_size}.npy')
    with open(train_file_name, 'wb') as f:
        np.save(f,X_train)
    print(f'Saved : {train_file_name}')
    
    # testing data
    test_file_name = os.path.join(data_path,f'test_preprocessed_naivewin{window_size}.npy')
    with open(test_file_name, 'wb') as f:
        np.save(f,X_test)
    print(f'Saved : {test_file_name}')


Full windowed dataset (1st column is target Y(T+18): (30977, 21) ; windowsize:20
Training dataset: (29428, 21)
Testing dataset: (1549, 21)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_naivewin20.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_naivewin20.npy

Full windowed dataset (1st column is target Y(T+18): (30957, 41) ; windowsize:40
Training dataset: (29409, 41)
Testing dataset: (1548, 41)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_naivewin40.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_naivewin40.npy

Full windowed dataset (1st column is target Y(T+18): (30917, 81) ; windowsize:80
Training dataset: (29371, 81)
Testing dataset: (1546, 81)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_naivewin80.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_naivewin80.npy


### Differences as Features
- First column is `Y(T+lead_time)`
- 2nd column is `Y(T+0)`, present value
- 3rd to END are differences: `[Y(T+0)-Y(T-1), Y(T-1)-Y(T-2), Y(T-2)-Y(T-3), ...]`

write training datasets to :
- `train_preprocessed_diff{}.npy` and
- `test_preprocessed_diff{}.npy`

`diff{}` part stands for difference data for differences from `T-window_size+1` to `T+0` (`window_size-1` $\Delta T$'s)

In [86]:
WINDOW_SIZES = [20, 40, 80] # Y(T+0) and "window_size-1" previous points, T-1,T-2,...,T-window_size+1

x_norm = data_norm[:,0] # Y(t) energy

for window_size in WINDOW_SIZES:
    # index of last elem of window
    start_time = window_size-1
    # prepare windows : 1st columns Y(T+lead_time)
    X_norm_wind = windowed_diff_data(x_norm, lead_time=lead_time, window_size=window_size)
    print(f'\nFull windowed dataset (1st column is target Y(T+{lead_time}): {X_norm_wind.shape}',
          f"; windowsize:{window_size}")
    
    # split train/test
    split_index = X_norm_wind.shape[0]*TRAIN_PERCENT//100
    
    X_train = X_norm_wind[:split_index,:]
    X_test = X_norm_wind[split_index:,:]
    print('Training dataset:',X_train.shape)
    print('Testing dataset:',X_test.shape)
    
    # write to files
    
    # training data
    train_file_name = os.path.join(data_path,f'train_preprocessed_diff{window_size}.npy')
    with open(train_file_name, 'wb') as f:
        np.save(f,X_train)
    print(f'Saved : {train_file_name}')
    
    # testing data
    test_file_name = os.path.join(data_path,f'test_preprocessed_diff{window_size}.npy')
    with open(test_file_name, 'wb') as f:
        np.save(f,X_test)
    print(f'Saved : {test_file_name}')


Full windowed dataset (1st column is target Y(T+18): (30977, 21) ; windowsize:20
Training dataset: (29428, 21)
Testing dataset: (1549, 21)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_diff20.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_diff20.npy

Full windowed dataset (1st column is target Y(T+18): (30957, 41) ; windowsize:40
Training dataset: (29409, 41)
Testing dataset: (1548, 41)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_diff40.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_diff40.npy

Full windowed dataset (1st column is target Y(T+18): (30917, 81) ; windowsize:80
Training dataset: (29371, 81)
Testing dataset: (1546, 81)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_diff80.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_diff80.npy


## Adding Wind Forecast Features

In [65]:
# normalised data
Yt = data_norm[:,0].reshape(-1,1)
wind_speeds = np.sqrt(data_norm[:,1::2]**2 + data_norm[:,2::2]**2) # wind speed from wind vectors

# window size
window_size = 80 # Y(T+0) and "window_size-1" previous points, T-1,T-2,...,T-window_size+1
start_time = window_size-1 # index of last elem of window

# Y(t)
# prepare windows : 1st columns Y(T+lead_time)
Y_norm_wind = windowed_data(Yt, lead_time=lead_time, window_size=window_size)
print(f'\nFull windowed dataset (1st column is target Y(T+{lead_time}): {Y_norm_wind.shape}',
      f"; windowsize:{window_size}")

Y_norm_diffwind = windowed_diff_data(Yt, lead_time=lead_time, window_size=window_size)
print(f'\nFull Diff windowed dataset (1st column is target Y(T+{lead_time}): {Y_norm_diffwind.shape}',
      f"; windowsize:{window_size}")

X_norm_windows = []
for l in range(wind_speeds.shape[1]):
    X_norm_windows.append(
        windowed_data(wind_speeds[:,l], lead_time=lead_time, window_size=window_size)
    )
print('\nX(t+18),X(T+0),X(T-1)...,X(T-window_size+1):\n',[l.shape for l in X_norm_windows])


Full windowed dataset (1st column is target Y(T+18): (30917, 81) ; windowsize:80

Full Diff windowed dataset (1st column is target Y(T+18): (30917, 81) ; windowsize:80

X(t+18),X(T+0),X(T-1)...,X(T-window_size+1):
 [(30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81)]


### Wind Speed at T+18
- Wind speed forecast X(T+18 ): for each location speed*(sin^2+cos^2) --> 8 additional features
- columns `[Y(T+18), Y(T+0), speed1(T+18),speed2(T+18),...,speed8(T+18)]`

In [64]:
# columns [Y(T+18), Y(T+0), speed1(T+18),speed2(T+18),...,speed8(T+18)]
Y18_Y0_X18=[Y_norm_wind[:,:2]]
Y18_Y0_X18.extend([Xt[:,0].reshape(-1,1) for Xt in X_norm_windows])

Y18_Y0_X18 = np.concatenate(Y18_Y0_X18, axis=1)

# split train/test
split_index = Y18_Y0_X18.shape[0]*TRAIN_PERCENT//100

X_train = Y18_Y0_X18[:split_index,:]
X_test = Y18_Y0_X18[split_index:,:]
print('Training dataset:',X_train.shape)
print('Testing dataset:',X_test.shape)

# training data
train_file_name = os.path.join(data_path,f'train_preprocessed_naive_Y18_Y0_X18.npy')
with open(train_file_name, 'wb') as f:
    np.save(f,X_train)
print(f'Saved : {train_file_name}')

# testing data
test_file_name = os.path.join(data_path,f'test_preprocessed_naive_Y18_Y0_X18.npy')
with open(test_file_name, 'wb') as f:
    np.save(f,X_test)
print(f'Saved : {test_file_name}')

Training dataset: (29371, 10)
Testing dataset: (1546, 10)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_naive_Y18_Y0_X18.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_naive_Y18_Y0_X18.npy


### Y(t+0), diff(Y(t+0),...,Y(t-w+1)), +speed magnitudes X(t+18,t+0)
- columns `[Y(T+18), Y(T+0),Y(t+0)-Y(t-1),..., Y(t-w+2)-Y(t-w+1), speed1(T+18),speed2(T+18),...,speed8(T+18)]`
where `w` is the `window_size`

#### Y(t) window_size=20

In [139]:
# Y_norm_diffwind
# X_norm_windows
Y18_Y0_dYwind20_X18=[Y_norm_diffwind[:,:21]]  #Y18,Y0, Y(t+0)-Y(t-1),..., Y(t-18)-Y(t-19)
# append [speed1(T+18),speed2(T+18),...,speed8(T+18)]
Y18_Y0_dYwind20_X18.extend([Xt[:,0].reshape(-1,1) for Xt in X_norm_windows])

Y18_Y0_dYwind20_X18=np.concatenate(Y18_Y0_dYwind20_X18, axis=1)

# split train/test
split_index = Y18_Y0_dYwind20_X18.shape[0]*TRAIN_PERCENT//100

X_train = Y18_Y0_dYwind20_X18[:split_index,:]
X_test = Y18_Y0_dYwind20_X18[split_index:,:]
print('Training dataset:',X_train.shape)
print('Testing dataset:',X_test.shape)

# training data
train_file_name = os.path.join(data_path,f'train_preprocessed_Y18_Y0_dYwind20_X18.npy')
with open(train_file_name, 'wb') as f:
    np.save(f,X_train)
print(f'Saved : {train_file_name}')

# testing data
test_file_name = os.path.join(data_path,f'test_preprocessed_Y18_Y0_dYwind20_X18.npy')
with open(test_file_name, 'wb') as f:
    np.save(f,X_test)
print(f'Saved : {test_file_name}')

Training dataset: (29371, 29)
Testing dataset: (1546, 29)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_Y18_Y0_dYwind20_X18.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_Y18_Y0_dYwind20_X18.npy


#### Y(t) window_size=40

In [141]:
# Y_norm_diffwind
# X_norm_windows
Y18_Y0_dYwind40_X18=[Y_norm_diffwind[:,:41]]  #Y18,Y0, Y(t+0)-Y(t-1),..., Y(t-38)-Y(t-39)

# append [speed1(T+18),speed2(T+18),...,speed8(T+18)]
Y18_Y0_dYwind40_X18.extend([Xt[:,0].reshape(-1,1) for Xt in X_norm_windows])

Y18_Y0_dYwind40_X18=np.concatenate(Y18_Y0_dYwind40_X18, axis=1)

# split train/test
split_index = Y18_Y0_dYwind40_X18.shape[0]*TRAIN_PERCENT//100

X_train = Y18_Y0_dYwind40_X18[:split_index,:]
X_test = Y18_Y0_dYwind40_X18[split_index:,:]
print('Training dataset:',X_train.shape)
print('Testing dataset:',X_test.shape)

# training data
train_file_name = os.path.join(data_path,f'train_preprocessed_Y18_Y0_dYwind40_X18.npy')
with open(train_file_name, 'wb') as f:
    np.save(f,X_train)
print(f'Saved : {train_file_name}')

# testing data
test_file_name = os.path.join(data_path,f'test_preprocessed_Y18_Y0_dYwind40_X18.npy')
with open(test_file_name, 'wb') as f:
    np.save(f,X_test)
print(f'Saved : {test_file_name}')

Training dataset: (29371, 49)
Testing dataset: (1546, 49)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_Y18_Y0_dYwind40_X18.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_Y18_Y0_dYwind40_X18.npy


### Wind Speed Forecasts T+0 and T+18
* diffWindows Y(T) + Wind speed (T+18, T+0) + naiveWindow X(T)
* diffWindows Y(T) + Wind speed (T+18, T+0) + diffWindows X(T)`[T+0,T-1,...]` )

In [11]:
# normalised data
Yt = data_norm[:,0].reshape(-1,1)
wind_speeds = np.sqrt(data_norm[:,1::2]**2 + data_norm[:,2::2]**2) # wind speed from wind vectors

# window size
window_size = 80 # Y(T+0) and "window_size-1" previous points, T-1,T-2,...,T-window_size+1
start_time = window_size-1 # index of last elem of window

# Y(t)
# prepare windows : 1st columns Y(T+lead_time)
Y_norm_diffwind = windowed_diff_data(Yt, lead_time=lead_time, window_size=window_size)
print(f'\nFull Diff windowed dataset (1st column is target Y(T+{lead_time}): {Y_norm_diffwind.shape}',
      f"; windowsize:{window_size}")

X_norm_windows = []
for l in range(wind_speeds.shape[1]):
    X_norm_windows.append(
        windowed_data(wind_speeds[:,l], lead_time=lead_time, window_size=window_size)
    )
print('\nX(t+18),X(T+0),X(T-1)...,X(T-window_size+1):\n',[l.shape for l in X_norm_windows])

X_norm_diffwindows = []
for l in range(wind_speeds.shape[1]):
    X_norm_diffwindows.append(
        windowed_diff_data(wind_speeds[:,l], lead_time=lead_time, window_size=window_size)
    )
print('\nX(t+18),X(T+0),X(T-1)...,X(T-window_size+1):\n',[l.shape for l in X_norm_diffwindows])


Full Diff windowed dataset (1st column is target Y(T+18): (30917, 81) ; windowsize:80

X(t+18),X(T+0),X(T-1)...,X(T-window_size+1):
 [(30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81)]

X(t+18),X(T+0),X(T-1)...,X(T-window_size+1):
 [(30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81), (30917, 81)]


#### X(t+18), X(t+0) and NaiveWindows( X(t) )

In [10]:
#Y18,Y0, Y(t+0)-Y(t-1),..., Y(t-18)-Y(t-19)
Y18Y0dYwind20_X18X0Xwind20 = [Y_norm_diffwind[:,:21]]

# append X(t+18), X(t+0), Xwind20s [speed1(T+18),speed2(T+18),...,speed8(T+18)]
Y18Y0dYwind20_X18X0Xwind20.extend([Xt[:,:21] for Xt in X_norm_windows])
Y18Y0dYwind20_X18X0Xwind20 = np.concatenate(Y18Y0dYwind20_X18X0Xwind20, axis=1)

# split train/test
split_index = Y18Y0dYwind20_X18X0Xwind20.shape[0]*TRAIN_PERCENT//100

X_train = Y18Y0dYwind20_X18X0Xwind20[:split_index,:]
X_test = Y18Y0dYwind20_X18X0Xwind20[split_index:,:]
print('Training dataset:',X_train.shape)
print('Testing dataset:',X_test.shape)

# training data
train_file_name = os.path.join(data_path,f'train_preprocessed_Y18Y0dYwind20_X18X0Xwind20.npy')
with open(train_file_name, 'wb') as f:
    np.save(f,X_train)
print(f'Saved : {train_file_name}')

# testing data
test_file_name = os.path.join(data_path,f'test_preprocessed_Y18Y0dYwind20_X18X0Xwind20.npy')
with open(test_file_name, 'wb') as f:
    np.save(f,X_test)
print(f'Saved : {test_file_name}')

Training dataset: (29371, 189)
Testing dataset: (1546, 189)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_Y18Y0dYwind20_X18X0Xwind20.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_Y18Y0dYwind20_X18X0Xwind20.npy


#### X(t+18), X(t+0) and DiffWindows( X(t) ) step=1

##### w=20

In [14]:
#Y18,Y0, Y(t+0)-Y(t-1),..., Y(t-18)-Y(t-19)
Y18Y0dYwind20_X18X0dXwind20 = [Y_norm_diffwind[:,:21]]

# append X(t+18), X(t+0), Xwind20s [speed1(T+18),speed2(T+18),...,speed8(T+18)]
Y18Y0dYwind20_X18X0dXwind20.extend([Xt[:,:21] for Xt in X_norm_diffwindows])
Y18Y0dYwind20_X18X0dXwind20 = np.concatenate(Y18Y0dYwind20_X18X0dXwind20, axis=1)

# split train/test
split_index = Y18Y0dYwind20_X18X0dXwind20.shape[0]*TRAIN_PERCENT//100

X_train = Y18Y0dYwind20_X18X0dXwind20[:split_index,:]
X_test = Y18Y0dYwind20_X18X0dXwind20[split_index:,:]
print('Training dataset:',X_train.shape)
print('Testing dataset:',X_test.shape)

# training data
train_file_name = os.path.join(data_path,f'train_preprocessed_Y18Y0dYwind20_X18X0dXwind20.npy')
with open(train_file_name, 'wb') as f:
    np.save(f,X_train)
print(f'Saved : {train_file_name}')

# testing data
test_file_name = os.path.join(data_path,f'test_preprocessed_Y18Y0dYwind20_X18X0dXwind20.npy')
with open(test_file_name, 'wb') as f:
    np.save(f,X_test)
print(f'Saved : {test_file_name}')

Training dataset: (29371, 189)
Testing dataset: (1546, 189)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_Y18Y0dYwind20_X18X0dXwind20.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_Y18Y0dYwind20_X18X0dXwind20.npy


In [17]:
# Ystr = ['y(t+18)','y(t+0)']
# Ystr.extend([f"y(t{'+' if t>-1 else ''}{t})-y(t{t-1})" for t in range(0,-80+1,-1)])
# print(len(Ystr))
# Ystr[:41]

##### w=40

In [18]:
#Y18,Y0, Y(t+0)-Y(t-1),..., Y(t-38)-Y(t-39)
Y18Y0dYwind40_X18X0dXwind40 = [Y_norm_diffwind[:,:41]]

# append X(t+18), X(t+0), Xwind20s [speed1(T+18),speed2(T+18),...,speed8(T+18)]
Y18Y0dYwind40_X18X0dXwind40.extend([Xt[:,:41] for Xt in X_norm_diffwindows])
Y18Y0dYwind40_X18X0dXwind40 = np.concatenate(Y18Y0dYwind40_X18X0dXwind40, axis=1)

# split train/test
split_index = Y18Y0dYwind40_X18X0dXwind40.shape[0]*TRAIN_PERCENT//100

X_train = Y18Y0dYwind40_X18X0dXwind40[:split_index,:]
X_test = Y18Y0dYwind40_X18X0dXwind40[split_index:,:]
print('Training dataset:',X_train.shape)
print('Testing dataset:',X_test.shape)

# training data
train_file_name = os.path.join(data_path,f'train_preprocessed_Y18Y0dYwind40_X18X0dXwind40.npy')
with open(train_file_name, 'wb') as f:
    np.save(f,X_train)
print(f'Saved : {train_file_name}')

# testing data
test_file_name = os.path.join(data_path,f'test_preprocessed_Y18Y0dYwind40_X18X0dXwind40.npy')
with open(test_file_name, 'wb') as f:
    np.save(f,X_test)
print(f'Saved : {test_file_name}')

Training dataset: (29371, 369)
Testing dataset: (1546, 369)
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/train_preprocessed_Y18Y0dYwind40_X18X0dXwind40.npy
Saved : ../../../dataDIR/preprocessed_linearinterp_avgmodels/test_preprocessed_Y18Y0dYwind40_X18X0dXwind40.npy


## Time Differences With Step Size
- Differences with step size `[Y(T+0)-Y(T-h),...]` where h is a step size

In [87]:
# X_norm_wind = windowed_diff_with_step(x_norm, lead_time=lead_time, window_size=window_size,h=lead_time)

# split_index = X_norm_wind.shape[0]*70//100
# X_train = X_norm_wind[:split_index,:]
# X_test = X_norm_wind[split_index:,:]
# print(f'Full windowed dataset: {X_norm_wind.shape}')
# print(f'Training dataset (1st column is target Y(T+{lead_time})):',X_train.shape,f"\nwindowsize:{window_size}")
# print(f'Testing dataset (1st column is target Y(T+{lead_time})):',X_test.shape,f"\nwindowsize:{window_size}")

# # Training data
# with open(f'train_preprocessed_stepdiff{window_size}.npy', 'wb') as f:
#     np.save(f,X_train)
# # Testing data
# with open(f'test_preprocessed_stepdiff{window_size}.npy', 'wb') as f:
#     np.save(f,X_test)

### Force and Momentum with Step Size
Momentum = Difference of Differences

Force = Difference of Momentum

In [None]:
# X_norm_wind = windowed_momentum_force(x_norm,lead_time=lead_time,window_size=window_size,h=lead_time)

# split_index = X_norm_wind.shape[0]*70//100
# X_train = X_norm_wind[:split_index,:]
# X_test = X_norm_wind[split_index:,:]
# print(f'Full windowed dataset: {X_norm_wind.shape}')
# print(f'Training dataset (1st column is target Y(T+{lead_time})):',X_train.shape,f"\nwindowsize:{window_size}")
# print(f'Testing dataset (1st column is target Y(T+{lead_time})):',X_test.shape,f"\nwindowsize:{window_size}")

# # Training data
# with open(f'train_preprocessed_mntfrcwin{window_size}.npy', 'wb') as f:
#     np.save(f,X_train)
# # Testing data
# with open(f'test_preprocessed_mntfrcwin{window_size}.npy', 'wb') as f:
#     np.save(f,X_test)