Skip to content

Conversation

NicolasHug
Copy link
Member

Closes #13926

Instead of binning the whole data before the train/validation split, we now bin the training and validation data separately.

Not sure if that's worth a whatsnew entry, since it's all experimental still?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ogrisel
Copy link
Member

ogrisel commented May 23, 2019

Not sure if that's worth a whatsnew entry, since it's all experimental still?

Maybe we can have a compound entry that summaries all the small changes to HistGradientBoosting*.

@NicolasHug
Copy link
Member Author

I added a regular whatsnew entry, we can make the compound entry when there's more to add

@ogrisel
Copy link
Member

ogrisel commented May 24, 2019

ping @adrinjalali @glemaitre

@thomasjpfan
Copy link
Member

Should the _BinMapper that was fitted on the training data be used to transform the validation data?

@NicolasHug
Copy link
Member Author

Yes, good point

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan changed the title [MRG] bin training and validation data separately in GBDTs FIX Bin training and validation data separately in GBDTs May 28, 2019
@thomasjpfan thomasjpfan merged commit 2a7194d into scikit-learn:master May 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GBDTs should bin train and validation data separately?
3 participants