[MRG + 2] Add Drop Option to OneHotEncoder. #12908
Fixes #6488 and fixes #6053. This builds upon some of the code from #12884 (thanks @NicolasHug!), but also incorporates functionality which lets the user manually specify which category in each column they would like to be dropped, so this is a more general solution along the lines of what @amueller suggested in #6053. This is useful in some cases (such as OLS regression) where the dropped group affects the interpretation of coefficients.
What does this implement/fix? Explain your changes.
This code implements a new parameter (drop) in the OneHotEncoder, which can take any of three values:
Any other comments?
This new feature does not work in Legacy mode (this was discussed in #6053), and it requires the manual specification of "categories='auto'" in the case in which the input is all integers, so as not to interfere with the ongoing change in the treatment of integers in OneHotEncoder.
…d is not present in the training data. Added tests to confirm
Thanks, now I agree that this one is better than #12884
…ving all values of some features. Updated behavior to by default drop columns with same value throughout