Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class sklearn.preprocessing._label.LabelEncoder) #197

Closed
itzikjan opened this issue Dec 11, 2019 · 1 comment

Comments

@itzikjan
Copy link

Hi,

We are using this package for a long time at production with python 2.7 with the following code:

params2 = {'n_estimators': 100,
'learning_rate': 0.5,
'seed': 0,
'subsample': 0.8,
'n_jobs': 50,
'colsample_bytree': 0.8,
'objective': 'binary:logistic',
'max_depth': 10,
'min_child_weight': 300,
'gamma': 2,
'max_delta_step': 6
}

estimator = xgb.XGBClassifier(**params2)
mapper = DataFrameMapper([(i, None) if j != 'object' and j != 'bool' else (i,
[CategoricalDomain(
missing_value_treatment="as_value",
invalid_value_treatment="as_missing",
missing_value_replacement=train_x[
i].value_counts().idxmax(),
invalid_value_replacement=train_x[
i].value_counts().idxmax()),
LabelEncoder()])
for i, j in
zip(train_x.columns.values, train_x.dtypes.values)]
, input_df=True, df_out=True)

rf_pipeline = PMMLPipeline([("mapper", mapper), ("classifier", estimator)])
rf_pipeline.fit(train_x, train_y)
sklearn2pmml(rf_pipeline, pmml_model_name, with_repr=True)

pip3 freeze

ai-model-infra==0.1
awscli==1.16.300
beautifulsoup4==4.7.1
boto==2.49.0
boto3==1.10.36
botocore==1.13.36
certifi==2019.11.28
chardet==3.0.4
colorama==0.4.1
Cython==0.29.14
datadog==0.32.0
decorator==4.4.1
docutils==0.15.2
fsspec==0.6.1
idna==2.8
jmespath==0.9.3
joblib==0.14.1
lxml==4.3.0
mysqlclient==1.3.14
nltk==3.4
nose==1.3.4
numpy==1.17.4
ortools==7.4.7247
pandas==0.25.3
pandasql==0.7.3
protobuf==3.11.1
py-dateutil==2.2
pyarrow==0.13.0
pyasn1==0.4.8
python-dateutil==2.8.0
python36-sagemaker-pyspark==1.2.1
pytz==2018.9
PyYAML==3.11
requests==2.22.0
rsa==3.4.2
s3fs==0.4.0
s3transfer==0.2.1
scikit-learn==0.22
scipy==1.3.3
singledispatch==3.4.0.3
six==1.12.0
sklearn==0.0
sklearn-pandas==1.8.0
sklearn2pmml==0.51.0
soupsieve==1.6.2
SQLAlchemy==1.3.11
urllib3==1.25.7
windmill==1.6
xgboost==0.90

We are moving to python 3.6. and we are getting the following error (versions: 0.47.1 and 0.51.0)

  1. Standard output is empty
    Standard error:
    Dec 11, 2019 1:17:18 PM org.jpmml.sklearn.Main run
    INFO: Parsing PKL..
    Dec 11, 2019 1:17:18 PM org.jpmml.sklearn.Main run
    INFO: Parsed PKL in 132 ms.
    Dec 11, 2019 1:17:18 PM org.jpmml.sklearn.Main run
    INFO: Converting..
    Dec 11, 2019 1:17:18 PM org.jpmml.sklearn.Main run
    SEVERE: Failed to convert
    java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class sklearn.preprocessing._label.LabelEncoder)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
    at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57)
    at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40)
    at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34)
    at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:128)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)
    Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder
    at java.lang.Class.cast(Class.java:3369)
    at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
    ... 7 more

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class sklearn.preprocessing._label.LabelEncoder)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57)
at sklearn.LabelEncoderClassifier.getLabelEncoder(LabelEncoderClassifier.java:40)
at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:34)
at sklearn.ClassifierUtil.getClasses(ClassifierUtil.java:32)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:128)
at org.jpmml.sklearn.Main.run(Main.java:145)
at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.preprocessing.LabelEncoder
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
... 7 more

  1. In other versions we also got the following error:
    ('Invalid value treatment {0} does not support invalid_value_replacement attribute', 'as_missing')
@vruusmann
Copy link
Member

TLDR: During Scikit-Learn version upgrade from 0.21.X to 0.22.X many modules were renamed (typically, by prepending an underscore character to the module name). For example, sklearn.preprocessing.label.LabelEncoder became sklearn.preprocessing._label.LabelEncoder.

If you're using Scikit-Learn 0.22.X (or newer), then you need to upgrade to SkLearn2PMML version 0.51.X (or newer). For example, SkLearn2PMML version 0.51.0, which is based on JPMML-SkLearn version 1.5.25 knows both label and _label modules:
https://github.com/jpmml/jpmml-sklearn/blob/1.5.25/src/main/resources/META-INF/sklearn2pmml.properties#L121-L122

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier._le' has an unsupported value (Python class sklearn.preprocessing._label.LabelEncoder)

Please upgrade to SkLearn2PMML version 0.51.0 (or newer).

'Invalid value treatment {0} does not support invalid_value_replacement attribute', 'as_missing'

This is a legitimate complaint. Older SkLearn2PMML versions did not check for conflicting domain attribute values, whereas newer ones do.

Please update your Python source code. Specifically, remove any domain attribute values that you're not 100% sure about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants