Hi, I had a question with regard to this section of the dataprovider_pypots.py more specifically this part :
# --- 3. Split Data and Time Info ---
idx_train, idx_val, idx_test = make_split_indices(ori_data.shape[0], train_ratio, val_ratio, test_ratio)
train_set_X, train_set_time = ori_data[idx_train], time_info[idx_train]
val_set_X, val_set_time = ori_data[idx_val], time_info[idx_val]
test_set_X, test_set_time = ori_data[idx_test], time_info[idx_test]
# --- 4. Apply Sliding Window to both Features and Time ---
train_X = sliding_window(train_set_X, seq_len, stride)
val_X = sliding_window(val_set_X, seq_len, stride)
test_X = sliding_window(test_set_X, seq_len, stride)
time_info_train = sliding_window(train_set_time, seq_len, stride)
time_info_val = sliding_window(val_set_time, seq_len, stride)
time_info_test = sliding_window(test_set_time, seq_len, stride)
My understanding here is that the sequence is being shuffled and then randomly split into a train / validation / test set before any windowing is done.
However wouldn't this lead the created sliding window to no longer match the real sequences, especially considering the following points:
- Steps are no longer sorted appropriately
- Sequences have now gaps within where the next "step" in the train sequence can randomly end up in the validation or test.
Can you confirm if my understanding is correct and if so how / if these concerns are addressed by the modelling ?
Thanks for all your work, very helpful !
Hi, I had a question with regard to this section of the
dataprovider_pypots.pymore specifically this part :My understanding here is that the sequence is being shuffled and then randomly split into a train / validation / test set before any windowing is done.
However wouldn't this lead the created sliding window to no longer match the real sequences, especially considering the following points:
Can you confirm if my understanding is correct and if so how / if these concerns are addressed by the modelling ?
Thanks for all your work, very helpful !