Categorical support preprocess_one_hot_encoding
#3487
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This allows users to pass categorical matrices to the
preprocess_one_hot_encoding
binding. This is actually a transparent change, so all user code can remain the same. But whereas in Python a user would previously callpreprocess_one_hot_encoding(input=dataset)
, now that parameterdataset
can also be a pandasDataFrame
whose columns can have any type, instead of just a numpyndarray
or similar.By default, if the
dimensions
parameter is not specified (it is no longer required), all categorical dimensions are one-hot encoded.This addresses #3480.
There will need to be a new release of mlpack before this becomes available in PyPI or conda (or other languages' package managers).