[ENH] save/load aka serialization/deserialization for estimators #3336

fkiraly · 2022-08-24T23:42:09Z

This adds a prototype design for serializing and deserializing estimators.
This follows the current (Aug 19) draft STEP 27 sktime/enhancement-proposals#27.

This is done via save and two load methods which can be overridden by class, and a generic load (class agnostic), in the BaseObject module.

Also changes the pickle/unpickle test to a generic serialize/deserialize test which allows custom, class specific implementations of serialize and deserialize.

FYI @AurumnPegasus - design alternative to #3128 which avoids the workflow where we have to construct an object first when we load.

AurumnPegasus · 2022-08-24T23:55:05Z

I see,
I do not know if serialisation/deserialisation works for keras models (will check it out once).
I was thinking something along the lines of (better explained in the step document I was working on, will push that soon):

Creating a save function which does 2 things:
a. Creates pickle of class dictionary (EXCLUDING optimizer)
b. Saves the network using keras.save
In this case, we will be able to solve both the problems my design was having, firstly being able to save estimator parameters, and secondly, there would be no pickle error since we are excluding the optimizer from the state dict.
A generic load function would then first load from the pickle file, then load in the network using keras.load

fkiraly · 2022-08-24T23:57:34Z

with this design, you don't have to implement separate functions or logic, just overload the three new BaseObject methods for neural networks.

AurumnPegasus · 2022-08-31T13:53:06Z

Hey @fkiraly ,
Can you please explain a doubt I had wrt this implementation?

if isinstance(serial, tuple):
        cls = serial[0]
        stored = serial[1]
        return cls.load_from_serial(stored)
    elif isinstance(serial, (str, ZipFile)):

        if isinstance(serial, str):
            zipfile = ZipFile(serial)

        with zipfile.open("metadata", mode="r") as metadata:
            cls = metadata.read()
        with zipfile.open("object", mode="r") as object:
            return cls.load_from_path(object.read())

Here, I am assuming cls would refer to the particular estimator.
What is this line doing? cls.load_from_path(object.read()) (is the object.read a pickle file?)

fkiraly · 2022-08-31T17:48:54Z

What is this line doing? cls.load_from_path(object.read()) (is the object.read a pickle file?)

The idea is that object.read() can be anything, but the contract is that cls.load_from_path must convert it into an instance of cls.

In the default implementation, object.read is a pickle file, but you could override load_from_path (a class method) if the object us unpickleable.

achieveordie · 2022-09-09T14:00:42Z

Could you confirm if this works for DL Classifiers, like CNNClassifier()? I altered CNNClassifier directly to serialize contents(save(), load_from_serial(), __getstate__() etc) and I got error for the line meta_file.write(type(self)):

TypeError: object of type 'ABCMeta' has no len()

fkiraly · 2022-09-10T23:15:21Z

Could you confirm if this works for DL Classifiers

@achieveordie, in the tests, this is currently skipped for all DL classifiers, since they are not compatible with pickle (skip is in the _config file).

Re your proposed change, is there a draft PR? I could look at it.

Your error message might be related to BaseClassifier inheriting from ABC. Perhaps it works if you remove the inheritane from ABC?

fkiraly · 2022-09-11T12:03:21Z

Could you confirm if this works for DL Classifiers

@achieveordie, I suppose I have misunderstood your question.

Did you want to know whether the default in this PR works for DL classifiers?

The answer is "no", and I know that it doesn't work.

However, having the abstract methods in place, it should be possible to override this in the deep learning base class, with something that works - as you have probably tried to (this was the idea, I just don't know what precisely would work).

Fixes #3022 - adds save/load functionality for DL estimators. Also changes the file format of file saves slightly, but this has not been released yet and therefore does not require deprecation. The Save/Load function for classical estimators is from #3336 and has been expanded for DL models along the same design. Keras is known to be problematic for certain kinds of (de)serialization, so this uses `model.save()` from `keras` along with `pickle.dump()`. The estimators are saved as `.zip` files. Components of this directory include: 1. Metadata 2. Object 3. Keras History 4. Keras Model subdirectory The first two components are the same as previously mentioned PR, stored differently to [avoid](#3336 (comment)) this problem. All the attributes are saved and restored (optimizers/history etc.) appropriately.

Serialization and deserializSerialization and deserialization interface for sktime estimators, supporting the following workflows: 1. serialization to in-memory object and deserialization 2. serialization to single file and deserialization Initially proposed in sktime issues [#3128](sktime/sktime#3128) and [#3336](sktime/sktime#3336). The proposed design introduces: * an object interface points for estimators, `save` * a loose method `load` * a file format specification `skt` for serialized estimator objects which are counterparts and universal. Concrete class or intermediate class specific logic is implemented in: * `save`, a class method * `load_from_path` and `load_from_serial`, class methods called by `load` while no class specific logic is present in `load`.ation for sktime estimators

fkiraly added 6 commits August 24, 2022 23:46

save

7033838

save

fc6d5bd

load utility

3c3f62d

load to allow ZipFile

c78adea

export and example

e88f488

deserialize

70e08f6

fkiraly added API design API design & software architecture implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality labels Aug 24, 2022

fkiraly requested a review from aiwalter as a code owner August 24, 2022 23:42

linting

dc4c0db

fix docstring example

badbefb

AurumnPegasus mentioned this pull request Aug 29, 2022

[ENH] Added save/load functionality for Deep Learning models #3128

Closed

1 task

AurumnPegasus mentioned this pull request Sep 2, 2022

Serialization and deserialization for sktime estimators sktime/enhancement-proposals#27

Merged

achieveordie mentioned this pull request Sep 14, 2022

[ENH] Save/Load Functionality for both Classical and DL Estimators #3425

Merged

1 task

fkiraly added this to Under review in Workstream: benchmarking Sep 30, 2022

fkiraly merged commit 72afe73 into main Oct 2, 2022

Workstream: benchmarking automation moved this from Under review to Done Oct 2, 2022

fkiraly deleted the save-load branch October 2, 2022 15:21

fkiraly changed the title ~~[ENH] design for generic save/load aka serialization/deserialization of estimators~~ [ENH] save/load aka serialization/deserialization for estimators Oct 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] save/load aka serialization/deserialization for estimators #3336

[ENH] save/load aka serialization/deserialization for estimators #3336

fkiraly commented Aug 24, 2022 •

edited

AurumnPegasus commented Aug 24, 2022

fkiraly commented Aug 24, 2022

AurumnPegasus commented Aug 31, 2022

fkiraly commented Aug 31, 2022

achieveordie commented Sep 9, 2022

fkiraly commented Sep 10, 2022

fkiraly commented Sep 11, 2022

[ENH] save/load aka serialization/deserialization for estimators #3336

[ENH] save/load aka serialization/deserialization for estimators #3336

Conversation

fkiraly commented Aug 24, 2022 • edited

AurumnPegasus commented Aug 24, 2022

fkiraly commented Aug 24, 2022

AurumnPegasus commented Aug 31, 2022

fkiraly commented Aug 31, 2022

achieveordie commented Sep 9, 2022

fkiraly commented Sep 10, 2022

fkiraly commented Sep 11, 2022

fkiraly commented Aug 24, 2022 •

edited