Skip to content

[ENH] handling of categorical features and sensible defaults #136

@fkiraly

Description

@fkiraly

The v4 hyperactive wrappers of GFO have a feature where they encode categorical features as consecutive integers - this kind of encoding is a desirable feature, potentially as a default.

Related issues:

  • There is also a potentially undesirable secondary effect, namely the encoding of numerical values as integers as well, which may or may not be desired by the user depending on circumstance.
  • as an alterative to consecutive encoding - note that pure categoricals in general do not have an order - one could think of one-hot encoding

Some designs I can think of:

  1. the current hyperactive v4 design that does the consecutive integer encoding by default for all categoricals and numericals

  2. encoding only categoricals, leaving numericals as-os

  3. having tags for estimators on whether they can handle categoricals, e.g., capability:categorical.

Estimators that cannot handle categoricals - such as native GFO - return an error if categoricals are passed.

They can be wrapped in meta-estimators such as CategoricalEncoder.

  1. similar to 3, except that estimators without the capability encode automatically like hyperactive v4.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions