You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unknown categories are currently ignored in the current encoding implementations. While we should consider adding an option to handle this in the future, it's not a high priority at the moment.
Open an issue to record this for future consideration.
The current implemenations:
CategoricalEncode will convert unknown category to None
OneHotEncode will convert all encoded cols to 0, see following example.
CategoricalEncode will convert unknown category to None
I haven't looked into whether this is correct or not, since the step doesn't have any tests; we should definitely add one, and then can identify the correct behaviors. 😅
OneHotEncode will convert all encoded cols to 0, see following example.
Having a separate category for unknown (i.e. rest of the encoding column values are all 0) could be a nice option to provide/give the user more flexibility, but may not matter for a lot of model types (e.g. GBDT).
CountEncode will convert unknown category to 0
This is intentional, as the count should be 0 for something that has not been seen.
To me, seems like the immediate action item is to add a test for CategoricalEncode to make sure it's functionality is correct, and to (at lower priority) make the OneHotEncode unknown category handling a bit more flexible.
@jitingxu1 closing this; if want to make OneHotEncode unknown category handling a bit more flexible, feel free to create a new issue, but it doesn't seem to be a priority at this time.
Unknown categories are currently ignored in the current encoding implementations. While we should consider adding an option to handle this in the future, it's not a high priority at the moment.
Open an issue to record this for future consideration.
The current implemenations:
CategoricalEncode
will convert unknown category toNone
OneHotEncode
will convert all encoded cols to0
, see following example.CountEncode
will convert unknown category to0
For example:
AMZN
in the 5th row is unknown, it will be translated to all 0sThe text was updated successfully, but these errors were encountered: