Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Leakage in Training the CNN #10

Open
hamiGH opened this issue Jun 25, 2023 · 0 comments
Open

Data Leakage in Training the CNN #10

hamiGH opened this issue Jun 25, 2023 · 0 comments

Comments

@hamiGH
Copy link

hamiGH commented Jun 25, 2023

Your implementation demonstrates a brilliant and ingenious approach that truly stands out. However, during my examination of the code, I noticed a potential issue that I believe requires your attention.
It appears that there is a case of data leakage in your CNN classifier. Specifically, the classifier seems to be utilizing information from the same day to predict the outcome for that day. Data leakage can lead to inflated performance metrics during testing but result in poor performance when applied to real-world scenarios.

There is a data leakage issue in the training CNN section of the STOCK_Market_GAN:

# start at num_historical_days and iterate the full length of the training
# data at intervals of num_historical_days
for i in range(num_historical_days, len(df), num_historical_days):
    # split the df into arrays of length num_historical_days and append
    # to data, i.e. array of df[curr - num_days : curr] -> a batch of values
    self.data.append(data[i-num_historical_days:i])

    # appending if price went up or down in curr day of "i" we are looking
    # at
    self.labels.append(labels[i-1])

# do same for test data
data = test_df[['open','high','low','close','volume']].values

You should change self.labels.append(labels[i-1]) with self.labels.append(labels[i])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant