New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LabelEncoder throws an error when it's used in a Pipeline or in a ColumnTransform #12720
Comments
Yes indeed, from the user guide:
To encode features, you need to use |
duplicate of #3956 |
The problem is that OneHotEncoder is broken in >0.20... it fails if you pass it features with string values... |
yes.. any suggestions on how to resolve this? |
Why? See the example section in our doc. https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.OneHotEncoder.html |
What if I want to Labelencode the input feature? |
may be cos LabelEncoder wont pass multi_future in one step, but OneHotEncoder can do this ? check this: |
I have written a labelencoder which can take care of unknown values and multiple columns too and can be inserted into a pipeline with some tweaks. Code:
|
Here is an example for Multiple Column convert into int from sklearn.preprocessing import LabelEncoder |
Please don't do that. Please use OrdinalEncoder, perhaps together with
ColumnTransformer.
|
Traceback (most recent call last): #Here is the Code import tensorflow as tf from tensorflow.keras.models import load_model |
Description
fit and fit_transform methods in LabelEncoder don't follow the standard scikit-lean convention for these methods: fit(X[, y]) and fit_transform(X[, y]). The fit and fit_transform method in the LabelEncoder only accepts one argument: fit(y) and fit_transform(y).
Therefore, LabelEncoder couldn't be used inside a Pipeline or a ColumnTransform. I suspect that there are a bunch of other classes in which it doesn't work (GridSearchCV, ...) but I haven't tested it.
In contrast, fit and fit_transform methods in OneHotEncoder and OrdinalEncoder follows the standard scikit-learn signature.
See reference:
LabelEncoder: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
OneHotEnconder: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder
OrdinalEncoder:https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html#sklearn.preprocessing.OrdinalEncoder
Steps/Code to Reproduce
Example:
Expected Results
No error is thrown.
Actual Results
The same error in both cases:
TypeError: fit_transform() takes 2 positional arguments but 3 were given.
Versions
System:
python: 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16) [GCC 7.3.0]
executable: /home/twins/anaconda3/envs/pytorch/bin/python
machine: Linux-4.8.0-56-generic-x86_64-with-debian-stretch-sid
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /home/twins/anaconda3/envs/pytorch/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 18.1
setuptools: 40.6.2
sklearn: 0.20.1
numpy: 1.15.4
scipy: 1.1.0
Cython: None
pandas: 0.23.4
Thanks for the amazing job you do !
The text was updated successfully, but these errors were encountered: