Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about:create_train_test_set.py #17

Closed
Apollo0801 opened this issue Jan 14, 2022 · 2 comments
Closed

about:create_train_test_set.py #17

Apollo0801 opened this issue Jan 14, 2022 · 2 comments

Comments

@Apollo0801
Copy link

Why set test_size= 0.2 in create_train_test_set.py, but the resulting data set (training set: Test Set) is not (8:2). Moreover, the sample size of the test set is much higher than that of the training set.

@Apollo0801 Apollo0801 changed the title about: about:create_train_test_set.py Jan 14, 2022
@munhouiani
Copy link
Owner

Because of undersampling.

We first split the entire set into train and test at lines 54 to 65. The ratio of train and test should be around 8:2.

At line 68, if under_sampling_train is set to True, we balance the train set by undersampling.

That is the reason why the final train set is smaller than test set.

@Apollo0801
Copy link
Author

Because of undersampling.

We first split the entire set into train and test at lines 54 to 65. The ratio of train and test should be around 8:2.

At line 68, if under_sampling_train is set to True, we balance the train set by undersampling.

That is the reason why the final train set is smaller than test set.

Thank you very much for your answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants