Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
ENH: support DataFrames in OneHot/OrdinalEncoder without converting to array #12147
Left-over to do from #9151 (comment)
Idea is to support DataFrames without converting to a contiguous array. This conversion is not needed, as the transformer encodes the input column by column anyway, so it would be rather easy to preserve the datatypes per column.
This would avoid converting a potentially mixed-dtype DataFrame (eg ints and object strings) to a full object array.
This can introduces a slight change in behaviour (it can change the
(Note that is not yet necessarily means to have special handling for certain pandas dtypes such as categorical dtype, see #12086, in an initial step, we could still do a
I don't think we want to use
But feel free to work on this!
Just to add, one key challenge when returning an array is mapping feature importances back to the original column names when you've applied OneHotEncoder.
It would be a big step forward to replace the prefixes