Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with training/testing partition #2

Closed
rola93 opened this issue Mar 3, 2020 · 1 comment
Closed

issue with training/testing partition #2

rola93 opened this issue Mar 3, 2020 · 1 comment

Comments

@rola93
Copy link

rola93 commented Mar 3, 2020

hey! I've been following your code, great work!

I assume you run all cells in order.

The only mistake I see is that, first, you apply the data augmentation technique, then, you split your data in train and test. This means that you may be adding an example on train set and its augmented version to the test set. This may lead to overestimated performance on your test set. You need to make sure that Train and Test sets are as independent as possible.

The easy fix is, fist split in train test, and then apply augmentation on both datasets.

@robinreni96
Copy link
Owner

Thank you @rola93 for pointing out the error point . Ill fix it and update the repo soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants