-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random sampling of dataset rows with improved memory utilization #2155
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Tamas, for the PR! Looks good overall and I like the idea of the excess sampling. Just having a few minor questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @achirkin for the review, I have addressed the issues.
/rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updates, Tamas, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
/merge |
The random sampling of IVF methods was reverted (#2144) due to large memory utilization #2141.
This PR improves the memory consumption of subsamling: it is O(n_train) where n_train is the size of the subsampled dataset.
This PR adds the following new APIs: