Wrong data preprocessing leading to better results | Multimodal project #18

Dudeldu · 2019-02-12T10:19:50Z

Hi Alex,
I really like your tutorials and used them as a good example for starting own projects ;) but I think
there is a major error in the preprocessing, performed by the split_into_XY - function, in the process_data modul in the multimodal project.

x_i = data_chng_train[i:i+window]
y_i = np.std(data_chng_train[i:i+window+forecast][3])

By using the above mentioned code, for generating the regression labels, the train data contain the labels!!!
In general, the idea behind it, isn't clear to me.
First, the code should be replaced with (that's for sure):

x_i = data_chng_train[i:i+window]
y_i = np.std(data_chng_train[i+window+forecast])

But on the other hand, i dont understand, why you are using the standard deviation along the specific axis?!
Shouldn't it be:

x_i = data_chng_train[i:i+window]
y_i = data_chng_train[i+window+forecast][3]  #Using the close prize [3] as label

Then obviously all results substantially change and getting worse:

The text was updated successfully, but these errors were encountered:

Rachnog · 2019-02-12T10:35:15Z

Hi @Dudeldu , you're totally right, definitely my bad. Will fix it, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong data preprocessing leading to better results | Multimodal project #18

Wrong data preprocessing leading to better results | Multimodal project #18

Dudeldu commented Feb 12, 2019

Rachnog commented Feb 12, 2019

Wrong data preprocessing leading to better results | Multimodal project #18

Wrong data preprocessing leading to better results | Multimodal project #18

Comments

Dudeldu commented Feb 12, 2019

Rachnog commented Feb 12, 2019