Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
[MRG + 2] Add Drop Option to OneHotEncoder. #12908
Fixes #6488 and fixes #6053. This builds upon some of the code from #12884 (thanks @NicolasHug!), but also incorporates functionality which lets the user manually specify which category in each column they would like to be dropped, so this is a more general solution along the lines of what @amueller suggested in #6053. This is useful in some cases (such as OLS regression) where the dropped group affects the interpretation of coefficients.
What does this implement/fix? Explain your changes.
This code implements a new parameter (drop) in the OneHotEncoder, which can take any of three values:
Any other comments?
This new feature does not work in Legacy mode (this was discussed in #6053), and it requires the manual specification of "categories='auto'" in the case in which the input is all integers, so as not to interfere with the ongoing change in the treatment of integers in OneHotEncoder.
referenced this pull request
Jan 9, 2019
Thanks, now I agree that this one is better than #12884
Discussing with @jorisvandenbossche, we think it's unnecessary complexity to support None being in the list of drops.... this can be achieved, if the user needs it, with a ColumnTransformer. Supporting None in the list adds unnecessary complexity to the code, and may make it hard to support having None be a valid (or missing value) category in the input in the future. I hope it's not too much effort, @drewmjohnston, to pull back that feature.