Broken print method for xgboost when loaded as python function #3550

lorenzwalthert · 2020-10-19T20:32:54Z

Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
No. I cannot contribute a bug fix at this time.

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes, slight modification of example.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOs
MLflow installed from (source or binary): binary
MLflow version (run mlflow --version): 1.11.0
Python version: 3.6.11
npm version, if running the dev UI:
Exact command to reproduce:

Describe the problem

When loaded with pyfunc, the xgboost model can't be printed because the attribute run_id is missing from the meta data. It seems as when initiated, PyFuncModel expects run_id to be in model_meta (but it does not look like it's asserted).
With mlflow.xgboost.log_model(), the run_id is correctly logged to MLmodel, but not with mlflow.xgboost.save_model().

Hence, a quick fix would be to check if run_id is in model_meta before the existent check for model_meta.run is None in the pyfunc print method. Other way would be to write run_id: None when saving a model outside a run with mlflow.xgboost.save_model().
Probably we best aim for consistency with other flavours. I did not have time to check if they write run_id: None or simply don't write the key.

Code to reproduce issue

from sklearn import datasets
from sklearn.model_selection import train_test_split
import mlflow
import xgboost as xgb
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {
            "objective": "multi:softprob",
            "num_class": 3,
            "learning_rate": 0.3,
            "eval_metric": "mlogloss",
            "seed": 42
}
model = xgb.train(params, dtrain, evals=[(dtrain, "train")])
[0]     train-mlogloss:0.74723
[1]     train-mlogloss:0.54060
[2]     train-mlogloss:0.40276
[3]     train-mlogloss:0.30789
[4]     train-mlogloss:0.24052
[5]     train-mlogloss:0.19087
[6]     train-mlogloss:0.15471
[7]     train-mlogloss:0.12807
[8]     train-mlogloss:0.10722
[9]     train-mlogloss:0.09053

mlflow.xgboost.save_model(model, 'model')
model2 = mlflow.xgboost.load_model('model')
model2
<xgboost.core.Booster object at 0x7f9e1798d0f0>

model3 = mlflow.pyfunc.load_model('model')
model3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/lorenz/opt/miniconda3/envs/mlflow-1.11.0/lib/python3.6/site-packages/mlflow/pyfunc/__init__.py", line 437, in __repr__
    if self._model_meta.run_id is not None:
AttributeError: 'Model' object has no attribute 'run_id'

Other info / logs

@harupy probably a quick fix for you since you implemented most of the python xgboost features here.

What component(s), interfaces, languages, and integrations does this bug affect?

Components

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

harupy · 2020-10-19T23:40:59Z

@lorenzwalthert Thanks for reporting this. I think the error occurs because you're using save_model instead of log_model.

harupy · 2020-10-19T23:44:40Z

I tested print(mlflow.pyfunc.load_model('model')) with log_model and it worked.

lorenzwalthert · 2020-10-20T06:01:45Z

yes, I noted that in the bug description. So is this behaviour expected? Because I think it should not matter how the model was saved, printing it should not give an error.

harupy · 2020-10-20T06:57:03Z

@lorenzwalthert sorry I completely missed that.

a quick fix would be to check if run_id is in model_meta before the existent check for model_meta.run is None in the pyfunc print method. Other way would be to write run_id: None when saving a model outside a run with mlflow.xgboost.save_model().

I like the former approach (check run_id).

lorenzwalthert · 2020-10-23T21:14:56Z

Ok, I'll file a PR soon.

lorenzwalthert added the bug Something isn't working label Oct 19, 2020

github-actions bot added the area/models MLmodel format, model serialization/deserialization, flavors label Oct 19, 2020

lorenzwalthert mentioned this issue Oct 23, 2020

Fix print method for pyfunc flavor models without run_id and artifact path #3589

Merged

27 tasks

harupy closed this as completed in #3589 Oct 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken print method for xgboost when loaded as python function #3550

Broken print method for xgboost when loaded as python function #3550

lorenzwalthert commented Oct 19, 2020

harupy commented Oct 19, 2020

harupy commented Oct 19, 2020

lorenzwalthert commented Oct 20, 2020 •

edited

Loading

harupy commented Oct 20, 2020

lorenzwalthert commented Oct 23, 2020

Broken print method for xgboost when loaded as python function #3550

Broken print method for xgboost when loaded as python function #3550

Comments

lorenzwalthert commented Oct 19, 2020

Willingness to contribute

System information

Describe the problem

Code to reproduce issue

Other info / logs

What component(s), interfaces, languages, and integrations does this bug affect?

harupy commented Oct 19, 2020

harupy commented Oct 19, 2020

lorenzwalthert commented Oct 20, 2020 • edited Loading

harupy commented Oct 20, 2020

lorenzwalthert commented Oct 23, 2020

lorenzwalthert commented Oct 20, 2020 •

edited

Loading