Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KBinsDiscretizer's get_feature_name_out() does not work if encode != 'onehot' #22841

Closed
rosscleung opened this issue Mar 14, 2022 · 1 comment
Labels
Bug Needs Triage Issue requires triage

Comments

@rosscleung
Copy link

Describe the bug

If the encode = 'onehot', you can get the names out as intended:

from sklearn.preprocessing import KBinsDiscretizer
import pandas as pd
kb = KBinsDiscretizer(n_bins=5, encode='onehot')
kb.fit_transform(pd.DataFrame(range(100),columns=['one']))
kb.get_feature_names_out()

output:
array(['one_0.0', 'one_1.0', 'one_2.0', 'one_3.0', 'one_4.0'],
dtype=object)

If the encode is NOT onehot:

from sklearn.preprocessing import KBinsDiscretizer
import pandas as pd
kb = KBinsDiscretizer(n_bins=5, encode='ordinal')
kb.fit_transform(pd.DataFrame(range(100),columns=['one']))
kb.get_feature_names_out()

output:
Traceback (most recent call last):
File "", line 1, in
File "/home/smds/miniconda3/envs/ross_docker_py37/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py", line 396, in get_feature_names_out
return self._encoder.get_feature_names_out(input_features)
AttributeError: 'KBinsDiscretizer' object has no attribute '_encoder'

I have hunted down the source of the bug. It's because if the encode != 'onehot', the attribute _encoder won't even be established. It's only established if encode == 'onehot':
https://github.com/scikit-learn/scikit-learn/blob/37ac6788c/sklearn/preprocessing/_discretization.py#L240-L248

You can see that when you try to call get_feature_names_out(), it's looking for the self._encoder, which wouldn't have been created in the first place if you don't use encode='onehot':
https://github.com/scikit-learn/scikit-learn/blob/37ac6788c/sklearn/preprocessing/_discretization.py#L376-L396

Steps/Code to Reproduce

from sklearn.preprocessing import KBinsDiscretizer
import pandas as pd
kb = KBinsDiscretizer(n_bins=5, encode='ordinal')
kb.fit_transform(pd.DataFrame(range(100),columns=['one']))
kb.get_feature_names_out()

Expected Results

The name of the passed in feature, in this case it should be 'one'.

Actual Results

Traceback (most recent call last):
File "", line 1, in
File "/home/smds/miniconda3/envs/ross_docker_py37/lib/python3.7/site-packages/sklearn/preprocessing/_discretization.py", line 396, in get_feature_names_out
return self._encoder.get_feature_names_out(input_features)
AttributeError: 'KBinsDiscretizer' object has no attribute '_encoder'

Versions

import sklearn; sklearn.show_versions()

System:
    python: 3.7.10 (default, Jun  4 2021, 14:48:32)  [GCC 7.5.0]
executable: /home/smds/miniconda3/envs/ross_docker_py37/bin/python
   machine: Linux-4.4.0-116-generic-x86_64-with-debian-buster-sid

Python dependencies:
          pip: 21.2.2
   setuptools: 58.0.4
      sklearn: 1.0.2
        numpy: 1.20.3
        scipy: 1.7.1
       Cython: 0.29.24
       pandas: 1.3.3
   matplotlib: 3.4.2
       joblib: 1.0.1
threadpoolctl: 2.2.0

Built with OpenMP: True
@rosscleung rosscleung added Bug Needs Triage Issue requires triage labels Mar 14, 2022
@thomasjpfan
Copy link
Member

Thank you for opening the issue! This issue was recently fixed in #22735 and will be included in the next version of scikit-learn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

2 participants