Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greykite Forecaster Model is Unpickle-able #73

Closed
kurtejung opened this issue May 10, 2022 · 7 comments
Closed

Greykite Forecaster Model is Unpickle-able #73

kurtejung opened this issue May 10, 2022 · 7 comments

Comments

@kurtejung
Copy link

kurtejung commented May 10, 2022

Even basic implementation of greykite (see below) does not pickle properly, due to some of the design choices within Greykite (e.g. nested functions and namedtuple definitions within function class calls.

Was this a purposeful design choice? Is there another method to save a trained model state and reuse the model to create inferences downstream? Integrations with deployment tools become much more challenging if we need to retrain the model every time and can't save the model state. Looking for guidance here on best practice - thanks!

Here's code to reproduce the issue:

from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.autogen.forecast_config import MetadataParam
from greykite.framework.templates.forecaster import Forecaster
from greykite.framework.templates.model_templates import ModelTemplateEnum

import pandas as pd
import numpy as np

date_list = pd.date_range(start='2020-01-01', end='2022-01-01', freq='W-FRI')
df_train = pd.DataFrame(
    {
        'week_end_date': date_list,
        'data': np.random.rand(len(date_list))
    }
)

metadata = MetadataParam(
    time_col="week_end_date",
    value_col=df_train.columns[-1],
    freq='W-FRI'
)

fc = Forecaster()
result = fc.run_forecast_config(
    df=df_train,
    config=ForecastConfig(
        model_template=ModelTemplateEnum.SILVERKITE.name,
        forecast_horizon=52,
        coverage=0.95,         # 95% prediction intervals
        metadata_param=metadata
    )
)

import dill
with open("pickle_out.b", "wb") as fp:
    dill.dump(result.model, fp)
    output_set = dill.load(fp)
@KaixuYang
Copy link
Contributor

Hi @kurtejung some of the functions/classes are not directly pickleable. We have a built-in function to iteratively save or load the model. Once you have run the forecast, you can do

fc.dump_forecast_result(destination_dir="dir")

For loading, you can load an dumped directory with

fc = Forecaster()
fc.load_forecast_result(source_dir="dir")

Change the "dir" to your desired directory.

@kurtejung
Copy link
Author

Thanks! - not sure how I missed this in the documentation.

I'm trying to implement a deepcopy function for this as well - I can use the save/load functionality but the I/O is time intensive. Is there an in-memory version of dump/load_forecast_result?

If not, would such a function be a welcome addition to the codebase?

@KaixuYang
Copy link
Contributor

Hi @kurtejung , yeah you are very welcome to help add the deepcopy version of the save/load functionality! Please feel free to open a PR if you would like to, thanks!

@al-bert al-bert closed this as completed Jul 21, 2022
@vincetran96
Copy link

I'm using Greykite in Miniconda on Windows. When I tried to dump a Forecaster object, I had the following error: PermissionError: [WinError 32] The process cannot access the file because it is being used by another process. I tried switching both the dump_design_info and overwrite_exist_dir parameters between True and False but the error persisted.

I'm wondering what might be the cause of this problem.

@sayanpatra
Copy link
Contributor

Hi @vincetran96, I am not sure that this issue is due to the Greykite package. Maybe these links would help you debug.

  1. https://stackoverflow.com/questions/53607295/pythonthe-process-cannot-access-the-file-because-it-is-being-used-by-another-pr
  2. https://stackoverflow.com/questions/27215462/permissionerror-winerror-32-the-process-cannot-access-the-file-because-it-is

Feel free to post your code snippets, it helps us in assisting you.

@vincetran96
Copy link

vincetran96 commented Jul 30, 2022

@sayanpatra Thank you for your suggestions. I have re-used the exact code from @kurtejung above except the pickling part at the end:

from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.autogen.forecast_config import MetadataParam
from greykite.framework.templates.forecaster import Forecaster
from greykite.framework.templates.model_templates import ModelTemplateEnum

import pandas as pd
import numpy as np

date_list = pd.date_range(start='2020-01-01', end='2022-01-01', freq='W-FRI')
df_train = pd.DataFrame(
    {
        'week_end_date': date_list,
        'data': np.random.rand(len(date_list))
    }
)

metadata = MetadataParam(
    time_col="week_end_date",
    value_col=df_train.columns[-1],
    freq='W-FRI'
)

fc = Forecaster()
result = fc.run_forecast_config(
    df=df_train,
    config=ForecastConfig(
        model_template=ModelTemplateEnum.SILVERKITE.name,
        forecast_horizon=52,
        coverage=0.95,         # 95% prediction intervals
        metadata_param=metadata
    )
)

fc.dump_forecast_result("path",dump_design_info=False)

Below is the traceback:

Traceback (most recent call last):
  File "env_path\lib\site-packages\greykite\framework\templates\pickle_utils.py", line 177, in dump_obj
    dill.dump(
  File "env_path\lib\site-packages\dill\_dill.py", line 336, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "env_path\lib\site-packages\dill\_dill.py", line 620, in dump
    StockPickler.dump(self, obj)
  File "env_path\lib\pickle.py", line 487, in dump
    self.save(obj)
  File "env_path\lib\pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "env_path\lib\pickle.py", line 717, in save_reduce
    save(state)
  File "env_path\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "env_path\lib\site-packages\dill\_dill.py", line 1251, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "env_path\lib\pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "env_path\lib\pickle.py", line 997, in _batch_setitems
    save(v)
  File "env_path\lib\pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "env_path\lib\pickle.py", line 717, in save_reduce
    save(state)
  File "env_path\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "env_path\lib\site-packages\dill\_dill.py", line 1251, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "env_path\lib\pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "env_path\lib\pickle.py", line 997, in _batch_setitems
    save(v)
  File "env_path\lib\pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "env_path\lib\pickle.py", line 717, in save_reduce
    save(state)
  File "env_path\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "env_path\lib\site-packages\dill\_dill.py", line 1251, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "env_path\lib\pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "env_path\lib\pickle.py", line 997, in _batch_setitems
    save(v)
  File "env_path\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "env_path\lib\pickle.py", line 931, in save_list
    self._batch_appends(obj)
  File "env_path\lib\pickle.py", line 955, in _batch_appends
    save(x)
  File "env_path\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "env_path\lib\pickle.py", line 886, in save_tuple
    save(element)
  File "env_path\lib\pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "env_path\lib\pickle.py", line 717, in save_reduce
    save(state)
  File "env_path\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "env_path\lib\site-packages\dill\_dill.py", line 1251, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "env_path\lib\pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "env_path\lib\pickle.py", line 997, in _batch_setitems
    save(v)
  File "env_path\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "env_path\lib\site-packages\dill\_dill.py", line 1251, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "env_path\lib\pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "env_path\lib\pickle.py", line 997, in _batch_setitems
    save(v)
  File "env_path\lib\pickle.py", line 578, in save
    rv = reduce(self.proto)
  File "env_path\lib\site-packages\patsy\util.py", line 723, in no_pickling
    raise NotImplementedError(
NotImplementedError: Sorry, pickling not yet supported. See https://github.com/pydata/patsy/issues/26 if you want to help.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "env_path\lib\site-packages\greykite\framework\templates\forecaster.py", line 442, in dump_forecast_result
    dump_obj(
  File "env_path\lib\site-packages\greykite\framework\templates\pickle_utils.py", line 184, in dump_obj
    os.remove(os.path.join(dir_name, f"{obj_name}.pkl"))
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'path\\object.pkl'

A seemingly noteworthy exception occurred before the PermissionError one was NotImplementedError: Sorry, pickling not yet supported. See https://github.com/pydata/patsy/issues/26 if you want to help..

@harithzulfaizal
Copy link

@vincetran96 Hi, wondering if you ever come across a solution to this. I'm encountering the same problem. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants