Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken print method for xgboost when loaded as python function #3550

Closed
2 of 23 tasks
lorenzwalthert opened this issue Oct 19, 2020 · 5 comments · Fixed by #3589
Closed
2 of 23 tasks

Broken print method for xgboost when loaded as python function #3550

lorenzwalthert opened this issue Oct 19, 2020 · 5 comments · Fixed by #3589
Labels
area/models MLmodel format, model serialization/deserialization, flavors bug Something isn't working

Comments

@lorenzwalthert
Copy link
Contributor

Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes, slight modification of example.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOs
  • MLflow installed from (source or binary): binary
  • MLflow version (run mlflow --version): 1.11.0
  • Python version: 3.6.11
  • npm version, if running the dev UI:
  • Exact command to reproduce:

Describe the problem

When loaded with pyfunc, the xgboost model can't be printed because the attribute run_id is missing from the meta data. It seems as when initiated, PyFuncModel expects run_id to be in model_meta (but it does not look like it's asserted).
With mlflow.xgboost.log_model(), the run_id is correctly logged to MLmodel, but not with mlflow.xgboost.save_model().

Hence, a quick fix would be to check if run_id is in model_meta before the existent check for model_meta.run is None in the pyfunc print method. Other way would be to write run_id: None when saving a model outside a run with mlflow.xgboost.save_model().
Probably we best aim for consistency with other flavours. I did not have time to check if they write run_id: None or simply don't write the key.

Code to reproduce issue

from sklearn import datasets
from sklearn.model_selection import train_test_split
import mlflow
import xgboost as xgb
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {
            "objective": "multi:softprob",
            "num_class": 3,
            "learning_rate": 0.3,
            "eval_metric": "mlogloss",
            "seed": 42
}
model = xgb.train(params, dtrain, evals=[(dtrain, "train")])
[0]     train-mlogloss:0.74723
[1]     train-mlogloss:0.54060
[2]     train-mlogloss:0.40276
[3]     train-mlogloss:0.30789
[4]     train-mlogloss:0.24052
[5]     train-mlogloss:0.19087
[6]     train-mlogloss:0.15471
[7]     train-mlogloss:0.12807
[8]     train-mlogloss:0.10722
[9]     train-mlogloss:0.09053

mlflow.xgboost.save_model(model, 'model')
model2 = mlflow.xgboost.load_model('model')
model2
<xgboost.core.Booster object at 0x7f9e1798d0f0>

model3 = mlflow.pyfunc.load_model('model')
model3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/lorenz/opt/miniconda3/envs/mlflow-1.11.0/lib/python3.6/site-packages/mlflow/pyfunc/__init__.py", line 437, in __repr__
    if self._model_meta.run_id is not None:
AttributeError: 'Model' object has no attribute 'run_id'

Other info / logs

@harupy probably a quick fix for you since you implemented most of the python xgboost features here.

What component(s), interfaces, languages, and integrations does this bug affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: Local serving, model deployment tools, spark UDFs
  • area/server-infra: MLflow server, JavaScript dev server
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, JavaScript, plotting
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@lorenzwalthert lorenzwalthert added the bug Something isn't working label Oct 19, 2020
@github-actions github-actions bot added the area/models MLmodel format, model serialization/deserialization, flavors label Oct 19, 2020
@harupy
Copy link
Member

harupy commented Oct 19, 2020

@lorenzwalthert Thanks for reporting this. I think the error occurs because you're using save_model instead of log_model.

@harupy
Copy link
Member

harupy commented Oct 19, 2020

I tested print(mlflow.pyfunc.load_model('model')) with log_model and it worked.

@lorenzwalthert
Copy link
Contributor Author

lorenzwalthert commented Oct 20, 2020

yes, I noted that in the bug description. So is this behaviour expected? Because I think it should not matter how the model was saved, printing it should not give an error.

@harupy
Copy link
Member

harupy commented Oct 20, 2020

@lorenzwalthert sorry I completely missed that.

a quick fix would be to check if run_id is in model_meta before the existent check for model_meta.run is None in the pyfunc print method. Other way would be to write run_id: None when saving a model outside a run with mlflow.xgboost.save_model().

I like the former approach (check run_id).

@lorenzwalthert
Copy link
Contributor Author

Ok, I'll file a PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/models MLmodel format, model serialization/deserialization, flavors bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants