-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model serialization #59
Comments
In theory, besides the weights, we only need the parameters to the .compile() method, the names of the layer classes (or rather their import path) and the parameters that were passed to each layer. It should take a few kilobytes in total. There are hack-ish ways to recover attributes from an arbitrary layer, but that's kind of ugly. Here's an example: for l in layers:
attributes = []
for a in dir(l):
if a[:2] != '__' and a not in ['params', 'previous_layer', 'input']:
if type(getattr(l, a)) != types.MethodType:
attributes.append((a, getattr(l, a))) Besides being ugly, it might be somewhat brittle. Here's a better alternative: have all layers expose a .config or .get_config attribute or method that returns the constructor arguments to the layer, as a dict. Then it's easy to reconstruct the layer:
|
That's a good approach, but we'll also need the correct layer type and a way to call the right constructor, as we cannot call the base class constructor with arbitrary params. Some alternatives could be:
|
I think the simplest thing to do would be to dynamically instantiate the layers from their import path, which we would have saved as part of the serialization. This give users the freedom to use the .save() function with custom layers not part of the Keras source. Of course, it's sort of hacky. But at this point all our options are looking somewhat hacky. Alternatively, to do something cleaner we would have to restrict ourselves to known layers from the source. Maybe we should search for inspiration somewhere else. Do you know how Torch7 implements their saving function? |
I've been mostly working with Theano-based code and Caffe (which does not have this problem as everything is in the config file). By reading Torch7 code used for serialization, it seems they serialize the whole objects without any "smart" parameter saving besides avoiding to store the same object twice on disk. Mocha.jl (a Julia-based library inspired by Caffe) only stores parameters and requires the user to have code to rebuild the model. They only have a simple function to load the stored parameters to a network. Maybe having something like this would be enough, given the minimalist approach in keras? I don't think we should be restricted to saving layers from the source, as adding new kinds of layers is something I suspect people would want to do often. I'll work tomorrow on an approach based on either just loading params from HDF5 files or with dynamic instantiation (as it seems the most flexible way to do it). |
Ok, great! Looking forward to seeing what you come up with. |
I have a draft implementation of the simpler approach (which requires the user to rebuild the model programatically and only loads/saves weights): jfsantos/keras@fbc7dec. Since I am not sure if datasets stored to HDF5 are retrieved in a deterministic order by h5py, I chose to create a hierarchy with groups and datasets with indices in their names. I'll write a test for it now so we can make sure it works properly. |
Having "param_" over and over again could take a lot of memory right? (I don't think hdf5 would do compression ) |
This looks great. I agree that the approach of just saving the weights and having the user take care of the model structure looks like our best option. Some thoughts:
Even outside of the scope of the saving function, it would be useful to have a method |
@pranv We need to create multiple datasets (inside the HDF5 file) because each layer may have a different shape. It is possible to enable compression in HDF5. @fchollet Thanks for the feedback! I made the suggested changes. Regarding your last bullet point, I thought about it and it's something we could use HDF5 attributes for. I was actually thinking on storing all the parameters used to create the layer, but that would depend on storing them in a As for the memory usage, I tested this (jfsantos/keras@8e7e755) with a really large model (3 2048 * 2048 layers and 2 256 * 2048 layers) and the storage is around half of the size of the full pickled model (compiled model: 211 MB, HDF: 104 MB). The HDF5 storage has no overhead at all: the size of the file is exactly the sum of the sizes of all the weight matrices. |
I'll be pushing changes later today that will introduce a .get_config() method on every layer (the default one on the base class will simply return the layer name), and a .describe() method on the Sequential model (just a list of the layers configs). A config would be a dict with str keys and values that can be str, bool, int, float, or tuple of these types. Sounds like it would be easily storable in HDF5? |
Yes, that would be really easy unless the tuples are heterogeneous (i.e., have strings mixed with floats or something like this), but it's something we can probably work around. Should I make a pull request with my current code and we add the config/description stuff later? |
So far there are no cases where we would have heterogenous tuples, and that's very unlikely to happen in the future. We'll be fine : ) Sure, we can merge your saving function before we add this stuff. |
For people coming here from Google looking for a way to serialize a Keras model: How can I save a Keras model? |
* Add golden correctness tests for Adam and SGD * Fix dtype issues * Sync with main (keras-team#56) * Minor touch ups * Fix a pretty major bug * Format code * Big rethink of Variable API * Make build-by-run the default build(), leveraging new zero_history KerasTensor mode * Minor fixes * Format code * Switch back to build-by-eager-run for simplicity * Add raise upon build failure * Work around JAX bug. * Add a few more tests. * Add saving tests * Adds test suite for SGD and golden correctness tests for all optimizers (keras-team#40) * Add golden correctness tests for Adam and SGD * Fix dtype issues * Add binary accuracy (keras-team#41) * chore: adding binary accuracy * chore: fix docstring * Add tests for add_loss and activity regularization. * Reformat code * Add ActivityRegularization layer * Fix JAX CI. * Add Lambda Callback (keras-team#42) * Add LambdaCallback * Add Lambda Callback * Add Lambda Callback * Rename lambda_callback_test.py * Add einsum (keras-team#43) * Add einsum * address comments * Fix format line length (keras-team#45) * Add Embedding layer * Shorten lines * Add .vscode to .gitignore (keras-team#46) * rm vscode settings * add .vscode to gitignore * Set demo program backend (keras-team#48) * Add tests for training arg resolution in Layer. * Implement mixed precision. * Replace backend.execute with backend.numpy.XXX (keras-team#50) * Add cosine similarity loss and update l2_normalize from regularizers (keras-team#34) * Begin cosine loss * Add testing for cosine similarity * Fix formatting * Docstring standardization * Formatting * Create numerical_utils * Fix issue with call context lingering. * Add the EarlyStopping callback (keras-team#44) * add earlystopping callback * addressing comments * address comments * addressing comments * remove unused imports * re-enable imports checks (keras-team#51) * Add nn.one_hot (keras-team#52) * Add GaussianDropout layer. * Add GaussianNoise layer * Add Categorical Accuracy Metric (keras-team#47) * chore: adding categorical accuracy metric * chore: reformat docstrings * chore: reformat * chore: ndims with len * refactor the docstring * Fix typos * Implement masking. --------- Co-authored-by: Francois Chollet <francois.chollet@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com> Co-authored-by: Chen Qian <chenmoney@google.com> Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com> Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com> * Adds rmsprop optimizer and tests * Add AdamW optimizer and tests, minor formatting changes * Implemented formatting fixes --------- Co-authored-by: Francois Chollet <francois.chollet@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com> Co-authored-by: Chen Qian <chenmoney@google.com> Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com> Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>
* Add golden correctness tests for Adam and SGD * Fix dtype issues * Sync with main (keras-team#56) * Minor touch ups * Fix a pretty major bug * Format code * Big rethink of Variable API * Make build-by-run the default build(), leveraging new zero_history KerasTensor mode * Minor fixes * Format code * Switch back to build-by-eager-run for simplicity * Add raise upon build failure * Work around JAX bug. * Add a few more tests. * Add saving tests * Adds test suite for SGD and golden correctness tests for all optimizers (#40) * Add golden correctness tests for Adam and SGD * Fix dtype issues * Add binary accuracy (#41) * chore: adding binary accuracy * chore: fix docstring * Add tests for add_loss and activity regularization. * Reformat code * Add ActivityRegularization layer * Fix JAX CI. * Add Lambda Callback (#42) * Add LambdaCallback * Add Lambda Callback * Add Lambda Callback * Rename lambda_callback_test.py * Add einsum (#43) * Add einsum * address comments * Fix format line length (#45) * Add Embedding layer * Shorten lines * Add .vscode to .gitignore (#46) * rm vscode settings * add .vscode to gitignore * Set demo program backend (#48) * Add tests for training arg resolution in Layer. * Implement mixed precision. * Replace backend.execute with backend.numpy.XXX (#50) * Add cosine similarity loss and update l2_normalize from regularizers (#34) * Begin cosine loss * Add testing for cosine similarity * Fix formatting * Docstring standardization * Formatting * Create numerical_utils * Fix issue with call context lingering. * Add the EarlyStopping callback (#44) * add earlystopping callback * addressing comments * address comments * addressing comments * remove unused imports * re-enable imports checks (keras-team#51) * Add nn.one_hot (keras-team#52) * Add GaussianDropout layer. * Add GaussianNoise layer * Add Categorical Accuracy Metric (#47) * chore: adding categorical accuracy metric * chore: reformat docstrings * chore: reformat * chore: ndims with len * refactor the docstring * Fix typos * Implement masking. --------- Co-authored-by: Francois Chollet <francois.chollet@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com> Co-authored-by: Chen Qian <chenmoney@google.com> Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com> Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com> * Adds rmsprop optimizer and tests * Add AdamW optimizer and tests, minor formatting changes * Implemented formatting fixes --------- Co-authored-by: Francois Chollet <francois.chollet@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com> Co-authored-by: Chen Qian <chenmoney@google.com> Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com> Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>
This discussion started in #51, but as things can get complicated I decided to start another issue.
It seems to be a good idea to store weights for a model separately in an HDF5 file (or even a Numpy npy file, but HDF5 would be more portable). I wanted to compare how large is a serialized model with and without the weights, so I did the following test:
(the model is intentionally large!)
I then compiled the model, serialized it after compilation, and removed the weights and post-compilation Theano objects/functions as follows:
The full compiled model ends up with 243 MB, and the cleaned-up model with 120 MB (which is exacly the same we would get from pickling the non-compiled models with the weight matrices deleted). Is there anything else we could remove to make the serialized model smaller?
The text was updated successfully, but these errors were encountered: