Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model serialization #59

Closed
jfsantos opened this issue Apr 18, 2015 · 13 comments · Fixed by #63
Closed

Model serialization #59

jfsantos opened this issue Apr 18, 2015 · 13 comments · Fixed by #63

Comments

@jfsantos
Copy link
Contributor

This discussion started in #51, but as things can get complicated I decided to start another issue.

It seems to be a good idea to store weights for a model separately in an HDF5 file (or even a Numpy npy file, but HDF5 would be more portable). I wanted to compare how large is a serialized model with and without the weights, so I did the following test:

    model = Sequential()
    model.add(Dense(n_input, 2048, init='uniform', activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2048, 2048, init='uniform', activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2048, 2048, init='uniform', activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2048, 2048, init='uniform', activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2048, n_output, init='uniform', activation='linear'))

(the model is intentionally large!)

I then compiled the model, serialized it after compilation, and removed the weights and post-compilation Theano objects/functions as follows:

    for l in model.layers:
        weights.append(l.get_weights())
        l.params = []
        try:
            l.W = None
            l.b = None
        except ValueError:
            pass
    model.X = None
    model.y = None
    model.y_test = None
    model.y_train = None
    model._train  = None
    model._train_with_acc  = None
    model._test = None
    model._test_with_acc = None
    model._predict = None

The full compiled model ends up with 243 MB, and the cleaned-up model with 120 MB (which is exacly the same we would get from pickling the non-compiled models with the weight matrices deleted). Is there anything else we could remove to make the serialized model smaller?

@fchollet
Copy link
Member

In theory, besides the weights, we only need the parameters to the .compile() method, the names of the layer classes (or rather their import path) and the parameters that were passed to each layer. It should take a few kilobytes in total.

There are hack-ish ways to recover attributes from an arbitrary layer, but that's kind of ugly. Here's an example:

 for l in layers:
    attributes = []
    for a in dir(l):
        if a[:2] != '__' and a not in ['params', 'previous_layer', 'input']:
            if type(getattr(l, a)) != types.MethodType:
                attributes.append((a, getattr(l, a)))

Besides being ugly, it might be somewhat brittle.

Here's a better alternative: have all layers expose a .config or .get_config attribute or method that returns the constructor arguments to the layer, as a dict. Then it's easy to reconstruct the layer:

config = layer.get_config()
weights = layer.get_weights()

new_layer = Layer(**config)
new_layer.set_weights(weights)

@jfsantos
Copy link
Contributor Author

That's a good approach, but we'll also need the correct layer type and a way to call the right constructor, as we cannot call the base class constructor with arbitrary params. Some alternatives could be:

  1. Long function matching each layer class (which would be a pain to maintain).
  2. Keeping all the layer subclasses in a list and matching over that list (with __name__). Also not great for maintaining, but better than 1.
  3. Dynamically calling class names from strings, which is possible but hacky and unsafe.

@fchollet
Copy link
Member

I think the simplest thing to do would be to dynamically instantiate the layers from their import path, which we would have saved as part of the serialization. This give users the freedom to use the .save() function with custom layers not part of the Keras source.

Of course, it's sort of hacky. But at this point all our options are looking somewhat hacky.

Alternatively, to do something cleaner we would have to restrict ourselves to known layers from the source.

Maybe we should search for inspiration somewhere else. Do you know how Torch7 implements their saving function?

@jfsantos
Copy link
Contributor Author

I've been mostly working with Theano-based code and Caffe (which does not have this problem as everything is in the config file). By reading Torch7 code used for serialization, it seems they serialize the whole objects without any "smart" parameter saving besides avoiding to store the same object twice on disk.

Mocha.jl (a Julia-based library inspired by Caffe) only stores parameters and requires the user to have code to rebuild the model. They only have a simple function to load the stored parameters to a network. Maybe having something like this would be enough, given the minimalist approach in keras?

I don't think we should be restricted to saving layers from the source, as adding new kinds of layers is something I suspect people would want to do often. I'll work tomorrow on an approach based on either just loading params from HDF5 files or with dynamic instantiation (as it seems the most flexible way to do it).

@fchollet
Copy link
Member

Ok, great! Looking forward to seeing what you come up with.

@jfsantos
Copy link
Contributor Author

I have a draft implementation of the simpler approach (which requires the user to rebuild the model programatically and only loads/saves weights): jfsantos/keras@fbc7dec. Since I am not sure if datasets stored to HDF5 are retrieved in a deterministic order by h5py, I chose to create a hierarchy with groups and datasets with indices in their names. I'll write a test for it now so we can make sure it works properly.

@pranv
Copy link
Contributor

pranv commented Apr 19, 2015

Having "param_" over and over again could take a lot of memory right? (I don't think hdf5 would do compression )

@fchollet
Copy link
Member

This looks great. I agree that the approach of just saving the weights and having the user take care of the model structure looks like our best option.

Some thoughts:

  • save_weights and load_weights might be more explicit names. The natural expectation for a .save() method would to save the entire model, so we want to avoid any confusion.
  • what's the memory usage and speed compared to pickling the full model? I'd expect it to be blazing fast, but I wonder if you have any numbers.
  • this is really minor, but n_params and n_layers as variable names doesn't match with nb_epoch and nb_batch (and others) used throughout the codebase.
  • for debugging purposes, it would be useful to store a text description of the model structure (eg. the names of each layer and the shape of each weight matrix along with its name) alongside the weights. That way, if you have a savefile for which you lost the corresponding code, you can still get a good idea of what it was about and rebuild a new model to load it.

Even outside of the scope of the saving function, it would be useful to have a method .describe(). Want me to have a look at it?

@jfsantos
Copy link
Contributor Author

@pranv We need to create multiple datasets (inside the HDF5 file) because each layer may have a different shape. It is possible to enable compression in HDF5.

@fchollet Thanks for the feedback! I made the suggested changes. Regarding your last bullet point, I thought about it and it's something we could use HDF5 attributes for. I was actually thinking on storing all the parameters used to create the layer, but that would depend on storing them in a config dictionary (as you suggested yesterday). For now that could be only for debugging reasons but, if anybody wants, they could write code to instantiate a model based on the description from the HDF5 dump. If you write a .describe() function that returns a dictionary describing each layer, I can surely store the info with the model dump.

As for the memory usage, I tested this (jfsantos/keras@8e7e755) with a really large model (3 2048 * 2048 layers and 2 256 * 2048 layers) and the storage is around half of the size of the full pickled model (compiled model: 211 MB, HDF: 104 MB). The HDF5 storage has no overhead at all: the size of the file is exactly the sum of the sizes of all the weight matrices.

@fchollet
Copy link
Member

I'll be pushing changes later today that will introduce a .get_config() method on every layer (the default one on the base class will simply return the layer name), and a .describe() method on the Sequential model (just a list of the layers configs).

A config would be a dict with str keys and values that can be str, bool, int, float, or tuple of these types. Sounds like it would be easily storable in HDF5?

@jfsantos
Copy link
Contributor Author

Yes, that would be really easy unless the tuples are heterogeneous (i.e., have strings mixed with floats or something like this), but it's something we can probably work around. Should I make a pull request with my current code and we add the config/description stuff later?

@fchollet
Copy link
Member

So far there are no cases where we would have heterogenous tuples, and that's very unlikely to happen in the future. We'll be fine : )

Sure, we can merge your saving function before we add this stuff.

@MartinThoma
Copy link
Contributor

For people coming here from Google looking for a way to serialize a Keras model: How can I save a Keras model?

hubingallin pushed a commit to hubingallin/keras that referenced this issue Sep 22, 2023
* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Sync with main (keras-team#56)

* Minor touch ups

* Fix a pretty major bug

* Format code

* Big rethink of Variable API

* Make build-by-run the default build(), leveraging new zero_history KerasTensor mode

* Minor fixes

* Format code

* Switch back to build-by-eager-run for simplicity

* Add raise upon build failure

* Work around JAX bug.

* Add a few more tests.

* Add saving tests

* Adds test suite for SGD and golden correctness tests for all optimizers (keras-team#40)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Add binary accuracy (keras-team#41)

* chore: adding binary accuracy

* chore: fix docstring

* Add tests for add_loss and activity regularization.

* Reformat code

* Add ActivityRegularization layer

* Fix JAX CI.

* Add Lambda Callback (keras-team#42)

* Add LambdaCallback

* Add Lambda Callback

* Add Lambda Callback

* Rename lambda_callback_test.py

* Add einsum (keras-team#43)

* Add einsum

* address comments

* Fix format line length (keras-team#45)

* Add Embedding layer

* Shorten lines

* Add .vscode to .gitignore (keras-team#46)

* rm vscode settings

* add .vscode to gitignore

* Set demo program backend (keras-team#48)

* Add tests for training arg resolution in Layer.

* Implement mixed precision.

* Replace backend.execute with backend.numpy.XXX (keras-team#50)

* Add cosine similarity loss and update l2_normalize from regularizers (keras-team#34)

* Begin cosine loss

* Add testing for cosine similarity

* Fix formatting

* Docstring standardization

* Formatting

* Create numerical_utils

* Fix issue with call context lingering.

* Add the EarlyStopping callback (keras-team#44)

* add earlystopping callback

* addressing comments

* address comments

* addressing comments

* remove unused imports

* re-enable imports checks (keras-team#51)

* Add nn.one_hot (keras-team#52)

* Add GaussianDropout layer.

* Add GaussianNoise layer

* Add Categorical Accuracy Metric (keras-team#47)

* chore: adding categorical accuracy metric

* chore: reformat docstrings

* chore: reformat

* chore: ndims with len

* refactor the docstring

* Fix typos

* Implement masking.

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>

* Adds rmsprop optimizer and tests

* Add AdamW optimizer and tests, minor formatting changes

* Implemented formatting fixes

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>
kernel-loophole pushed a commit to kernel-loophole/keras that referenced this issue Sep 25, 2023
* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Sync with main (keras-team#56)

* Minor touch ups

* Fix a pretty major bug

* Format code

* Big rethink of Variable API

* Make build-by-run the default build(), leveraging new zero_history KerasTensor mode

* Minor fixes

* Format code

* Switch back to build-by-eager-run for simplicity

* Add raise upon build failure

* Work around JAX bug.

* Add a few more tests.

* Add saving tests

* Adds test suite for SGD and golden correctness tests for all optimizers (#40)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Add binary accuracy (#41)

* chore: adding binary accuracy

* chore: fix docstring

* Add tests for add_loss and activity regularization.

* Reformat code

* Add ActivityRegularization layer

* Fix JAX CI.

* Add Lambda Callback (#42)

* Add LambdaCallback

* Add Lambda Callback

* Add Lambda Callback

* Rename lambda_callback_test.py

* Add einsum (#43)

* Add einsum

* address comments

* Fix format line length (#45)

* Add Embedding layer

* Shorten lines

* Add .vscode to .gitignore (#46)

* rm vscode settings

* add .vscode to gitignore

* Set demo program backend (#48)

* Add tests for training arg resolution in Layer.

* Implement mixed precision.

* Replace backend.execute with backend.numpy.XXX (#50)

* Add cosine similarity loss and update l2_normalize from regularizers (#34)

* Begin cosine loss

* Add testing for cosine similarity

* Fix formatting

* Docstring standardization

* Formatting

* Create numerical_utils

* Fix issue with call context lingering.

* Add the EarlyStopping callback (#44)

* add earlystopping callback

* addressing comments

* address comments

* addressing comments

* remove unused imports

* re-enable imports checks (keras-team#51)

* Add nn.one_hot (keras-team#52)

* Add GaussianDropout layer.

* Add GaussianNoise layer

* Add Categorical Accuracy Metric (#47)

* chore: adding categorical accuracy metric

* chore: reformat docstrings

* chore: reformat

* chore: ndims with len

* refactor the docstring

* Fix typos

* Implement masking.

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>

* Adds rmsprop optimizer and tests

* Add AdamW optimizer and tests, minor formatting changes

* Implemented formatting fixes

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants