Skip to content

Latest commit

 

History

History
268 lines (196 loc) · 8.32 KB

serialization.rst

File metadata and controls

268 lines (196 loc) · 8.32 KB

Serialization Tutorial

In this package there are two ways of performing serialization and deserialization: the "classic" and "new" methods. The classic method predates the built-in JSON serialization of :mod:`param` while the new method extends the built-in serialization to new file types. The new method is still in beta.

Classic method

As an example, suppose we have parameterized classes and instances:

import param
class TrainingHyperparameters(param.Parameterized):
    lr = param.Number(1e-5, doc='The learning rate')
    max_epochs = param.Integer(10)
    model_regex = param.String(
        "model-{epoch:05d}.pkl",
        doc='Regular exp for storing model weights after every epoch')

t_params = TrainingHyperparameters()

class ModelHyperparameters(param.Parameterized):
    layers = param.ListSelector(
        [], objects=['conv', 'fc', 'recurrent'],
        doc='Sequence of layers by type, bottom-first')
    activations = param.ObjectSelector('relu', objects=['tanh', 'relu'])

m_params = ModelHyperparameters()
m_params.layers = ['conv', 'conv', 'fc']

param_dict = {
  'training': t_params,
  'model': m_params,
}

We can serialize these easily into JSON, YAML, or INI using :mod:`pydrobert.param.serialization`:

import pydrobert.param.serialization as serial
serial.serialize_to_json('conf.json', param_dict)
serial.serialize_to_yaml('conf.yaml', param_dict)  # requires ruamel.yaml or pyyaml
serial.serialize_to_ini('conf.ini', param_dict)

where we get

{
  "training": {
    "lr": 1e-05,
    "max_epochs": 10,
    "model_regex": "model-{epoch:05d}.pkl"
  },
  "model": {
    "activations": "relu",
    "layers": [
      "conv",
      "conv",
      "fc"
    ]
  }
}

or

training:
  lr: 1e-05  # The learning rate
  max_epochs: 10
  model_regex: model-{epoch:05d}.pkl # Regular exp for storing model weights after every epoch
model:
  activations: relu  # Choices: "tanh", "relu"
  layers:  # Sequence of layers by type, bottom-first. Element choices: "conv", "fc", "recurrent"
    - conv
    - conv
    - fc

or

# == Help ==
# [training]
# lr: The learning rate
# model_regex: Regular exp for storing model weights after every epoch

# [model]
# activations: Choices: "tanh", "relu"
# layers: Sequence of layers by type, bottom-first. A JSON string. Element choices: "conv", "fc", "recurrent"


[training]
lr = 1e-05
max_epochs = 10
model_regex = model-{epoch:05d}.pkl

[model]
activations = relu
layers = ["conv", "conv", "fc"]

respectively.

Deserialization proceeds similarly. Files can be used to populate parameters in existing parameterized instances.

t_params.lr = 10000.
assert t_params.lr == 10000.
serial.deserialize_from_yaml('conf.yaml', param_dict)
assert t_params.lr == 1e-05

:mod:`pydrobert.param.argparse` contains convenience functions for (de)serializing config files right from the command line.

import argparse, pydrobert.param.argparse as pargparse
parser = argparse.ArgumentParser()
pargparse.add_parameterized_read_group(parser, parameterized=param_dict)
pargparse.add_parameterized_print_group(parser, parameterized=param_dict)

Sometimes, the default (de)serialization routines are unsuited for the data. For example, INI files do not have a standard format for lists of values. For this, and many other container types, values are parsed with JSON syntax. If we wanted to parse lists differently, such as a comma-delimited list, we can design a custom serializer and deserializer for handling our layers parameter:

class CommaSerializer(serial.DefaultListSelectorSerializer):
    def help_string(self, name, parameterized):
        choices_help_string = super(CommaSerializer, self).help_string(name, parameterized)
        return 'Elements separated by commas. ' + choices_help_string

    def serialize(self, name, parameterized):
        val = super(CommaSerializer, self).serialize(name, parameterized)
        return ','.join(str(x) for x in val)

class CommaDeserializer(serial.DefaultListSelectorDeserializer):
    def deserialize(self, name, block, parameterized):
        block = block.split(',')
        super(CommaDeserializer, self).deserialize(name, block, parameterized)

serial.serialize_to_ini(
    'conf.ini', param_dict,
    # (de)serialize by type
    serializer_type_dict={param.ListSelector: CommaSerializer()},
)
serial.deserialize_from_ini(
    'conf.ini', param_dict,
    # or by name!
    deserializer_name_dict={'model': {'layers': CommaDeserializer()}},
)

With conf.ini:

# == Help ==
# [training]
# lr: The learning rate
# model_regex: Regular expression for storing model weights after every epoch

# [model]
# activations: Choices: "tanh", "relu"
# layers: Sequence of layers by type, bottom-first. Elements separated by commas. Element choices: "conv", "fc", "recurrent"


[training]
max_epochs = 10
model_regex = model-{epoch:05d}.pkl
lr = 1e-05

[model]
activations = relu
layers = conv,conv,fc

New method

Because (de)serialization is straightforward in most cases, the :mod:`param` built-in serialization protocol matches the classic serialization protocol above in most values for JSON:

t_params = TrainingHyperparameters()
with open("conf.json", "w") as f:
    f.write(t_params.param.serialize_parameters())

yielding

{"name": "TrainingHyperparameters00002", "lr": 1e-05, "max_epochs": 10, "model_regex": "model-{epoch:05d}.pkl"}

Note the additional inclusion of the "name" parameter. Deserialization is similarly performed:

with open("conf.json") as f:
    t_params = TrainingHyperparameters.param.deserialize_parameters(f.read())

Using a similar strategy as :mod:`param` did for JSON, I have extended serialization to YAML. The custom protocol requires registration once at runtime to be used

serial.register_serializer("yaml")

Afterwards files can be read and written to in YAML.

with open("conf.yaml", "w") as f:
    f.write(t_params.param.serialize_parameters(mode="yaml"))

yielding

name: TrainingHyperparameters00002  # String identifier for this object.
lr: 1e-05 # The learning rate
max_epochs: 10
model_regex: model-{epoch:05d}.pkl  # Regular exp for storing model weights after every epoch

There are a few other goodies as well. Once again there are convenience functions for (de)serialization to/from different file types (including JSON)

parser = argparse.ArgumentParser()
pargparse.add_deserialization_group_to_parser(
    parser, TrainingHyperparameters, 't_params')
pargparse.add_serialization_group_to_parser(parser, t_params)
namespace = parser.parse_args(['--read-json', 'conf.json'])
assert namespace.t_params.pprint() == t_params.pprint()
parser.parse_args(['--print-yaml'])  # prints to stdout and exits

You'll note that the new style does away with the dictionary of parameterized objects. :mod:`param` prefers to recreate this structure by nesting parameterized instances as parameters. As of writing, nesting cannot be serialized by default in :mod:`param`. :mod:`pydrobert.param` offers a solution in the form of "reckless" parsing. Once registered, the :obj:`'reckless_json'` and :obj:`'reckless_yaml'` act as drop-in replacements for the :obj:`'json'` and :obj:`'yaml'` modes which can also handle nesting. Unfortunately, they do so by making assumptions which aren't always correct. See :func:`pydrobert.param.serialization.register_serializer` for more discussion.