Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong data preprocessing leading to better results | Multimodal project #18

Open
Dudeldu opened this issue Feb 12, 2019 · 1 comment
Open

Comments

@Dudeldu
Copy link

Dudeldu commented Feb 12, 2019

Hi Alex,
I really like your tutorials and used them as a good example for starting own projects ;) but I think
there is a major error in the preprocessing, performed by the split_into_XY - function, in the process_data modul in the multimodal project.

x_i = data_chng_train[i:i+window]
y_i = np.std(data_chng_train[i:i+window+forecast][3])

By using the above mentioned code, for generating the regression labels, the train data contain the labels!!!
In general, the idea behind it, isn't clear to me.
First, the code should be replaced with (that's for sure):

x_i = data_chng_train[i:i+window]
y_i = np.std(data_chng_train[i+window+forecast])

But on the other hand, i dont understand, why you are using the standard deviation along the specific axis?!
Shouldn't it be:

x_i = data_chng_train[i:i+window]
y_i = data_chng_train[i+window+forecast][3]  #Using the close prize [3] as label

Then obviously all results substantially change and getting worse:
figure_1

@Rachnog
Copy link
Owner

Rachnog commented Feb 12, 2019

Hi @Dudeldu , you're totally right, definitely my bad. Will fix it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants