Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: unable to deserialize using cloudpickle #9793

Closed
1 task done
jstammers opened this issue Aug 8, 2024 · 4 comments · Fixed by #9798
Closed
1 task done

bug: unable to deserialize using cloudpickle #9793

jstammers opened this issue Aug 8, 2024 · 4 comments · Fixed by #9798
Labels
bug Incorrect behavior inside of ibis

Comments

@jstammers
Copy link
Contributor

What happened?

I am using mlflow to log a model object that internally executes some transformations using ibis.

Here's a minimal example of what I'm trying to do

import mlflow
import ibis

@ibis.udf.scalar.python
def add_one(x: int) -> int:
    return x + 1


class MLFlowModel(mlflow.pyfunc.PythonModel):

    def predict(self, context, model_input):
        con = ibis.get_backend(model_input)
        df = con.create_table("df", model_input)
        df = df.mutate(prediction=add_one(df["input"]))
        return df["prediction"].execute()

with mlflow.start_run():
    model = MLFlowModel()
    mlflow.pyfunc.log_model("model", python_model=model, infer_code_paths=True)

However, this gives me a TypeError when trying to deserialize the object

TypeError: annotation must be an instance of Argument, got POSITIONAL_OR_KEYWORD

The mlflow.pyfunc.log_model function internally uses cloudpickle to handle the serialization. I can reproduce this error using the following

with open("model.pkl", "wb") as f:
    cloudpickle.dump(model, f)

with open("model.pkl", "rb") as f:
    loaded_model = cloudpickle.load(f)

I don't see this error when using pickle, so I suspect this is related to a difference in how these two libraries handle the serialization.

It may be possible for me to specify pickle as the serialization method, which could be a work-around for now

What version of ibis are you using?

9.2.0

What backend(s) are you using, if any?

DuckDB

Relevant log output

File "/home/jimmy/virtualenvs/proj/lib/python3.10/site-packages/mlflow/utils/_capture_modules.py", line 255, in <module>
    main()
  File "/home/jimmy/virtualenvs/proj/lib/python3.10/site-packages/mlflow/utils/_capture_modules.py", line 232, in main
    store_imported_modules(
  File "/home/jimmy/virtualenvs/proj/lib/python3.10/site-packages/mlflow/utils/_capture_modules.py", line 188, in store_imported_modules
    mlflow.pyfunc.load_model(model_path)
  File "/home/jimmy/virtualenvs/proj/lib/python3.10/site-packages/mlflow/tracing/provider.py", line 237, in wrapper
    is_func_called, result = True, f(*args, **kwargs)
  File "/home/jimmy/virtualenvs/proj/lib/python3.10/site-packages/mlflow/pyfunc/__init__.py", line 1027, in load_model
    model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
  File "/home/jimmy/virtualenvs/proj/lib/python3.10/site-packages/mlflow/pyfunc/model.py", line 550, in _load_pyfunc
    context, python_model, signature = _load_context_model_and_signature(model_path, model_config)
  File "/home/jimmy/virtualenvs/proj/lib/python3.10/site-packages/mlflow/pyfunc/model.py", line 533, in _load_context_model_and_signature
    python_model = cloudpickle.load(f)
  File "/home/jimmy/virtualenvs/proj/lib/python3.10/site-packages/ibis/common/annotations.py", line 263, in __init__
    raise TypeError(
TypeError: annotation must be an instance of Argument, got POSITIONAL_OR_KEYWORD

Code of Conduct

  • I agree to follow this project's Code of Conduct
@jstammers jstammers added the bug Incorrect behavior inside of ibis label Aug 8, 2024
@jstammers
Copy link
Contributor Author

From looking into this further, I think this is related to the fact that cloudpickle doesn't serialize functions with type annotations e.g. cloudpipe/cloudpickle#541.

I see the same error with the following

@ibis.udf.scalar.python
def add_one(x: int) -> int:
    return x + 1

with open("func.pkl", "wb") as f:
    cloudpickle.dump(add_one, f)

with open("func.pkl", "rb") as f:
    loaded_model = cloudpickle.load(f)

@cpcloud
Copy link
Member

cpcloud commented Aug 8, 2024

cloudpickle does seem to preserve annotations (I think cloudpipe/cloudpickle#541 is about annotating the cloudpickle library itself, not about supporting them in pickling/unpickling).

Perhaps there's an issue on our side.

In [1]: import cloudpickle

In [2]: def add_one(x: int) -> int:
   ...:     return x + 1
   ...:

In [3]: func = cloudpickle.loads(cloudpickle.dumps(add_one))

In [4]: func
Out[4]: <function __main__.add_one(x: int) -> int>

In [5]: func.__annotations__
Out[5]: {'x': int, 'return': int}

@jstammers
Copy link
Contributor Author

To add to this issue, I've been investigating using pickle instead, but have hit a blocker with a case statement

import pickle
from ibis import _
import ibis

c = ibis.case().when(_.s == 1, "EXACT").when(True, "DIFFERENT").end()

pickle.loads(pickle.dumps(c)) 

raises

PicklingError: Can't pickle <function _finish_searched_case at 0x78fdf8061900>: it's not the same object as ibis.expr.builders._finish_searched_case

I am able to pickle/unpickle this using cloudpickle however

@cpcloud
Copy link
Member

cpcloud commented Aug 8, 2024

After a bit of hacking, I got this working:

In [1]: from ibis.interactive import *

In [2]: @ibis.udf.scalar.python
   ...: def add_one(x: int) -> int:
   ...:     return x + 1
   ...:

In [3]: import cloudpickle

In [4]: cloudpickle.loads(cloudpickle.dumps(add_one))
Out[4]: <function __main__.add_one(x: int) -> int>

gforsyth pushed a commit that referenced this issue Aug 8, 2024
The pickles expect a specific constructor for `Parameter`, so remove our
custom constructor and provide a `classmethod` with the previous
behavior for convenience. Closes #9793.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants