Save/load functions and function sets #181

beam2d · 2015-07-13T08:00:22Z

Function and FunctionSet objects can be pickled. If one uses GPU for training and wants to port it to non-GPU environment, we have to pickle it after migrating it to CPU. One must deep-copy the object if the pickling takes place during the training process in order not to keep the original object on GPU. It is redundant and uses too much GPU memory. We want an efficient way to save/load models regardless of CPU/GPU.

jfsantos · 2015-07-20T18:33:44Z

I believe the safest way of saving a model is to save only the weight/bias variables, without necessarily saving the objects themselves (so, you would always have to dump CPU-based variables, as this is the common denominator for all users). Also, pickling is not reliable for model saving since there's issues to loading a Python 2 pickled file with Python 3 and vice-versa. It would be safer to export the model parameters to HDF5 (where you can even get some compression for free) or Numpy binary files.

unnonouno · 2015-08-06T01:28:26Z

Is portability to other languages important? My colleague is trying to use a trained model on Javascript.

unnonouno · 2015-09-02T05:55:18Z

We are now redesigning chainer in #363, that includes this issue.

ejlb · 2015-10-26T17:49:14Z

@beam2d @unnonouno

I've come across another problem around model saving when training on multiple GPUs. In function.to_gpu() we have

if isinstance(v, numpy.ndarray):
   setattr(self, k, cuda.cupy.array(v))

The unpickled model arrays have the type chainer.cuda.ndarray and so they do not get transferred to the GPU (I think, but might be wrong). My work-around for this is moving a copy of the function set to the CPU before pickling to force all arrays to be numpy.ndarray

pickle.dump(copy.deepcopy(self.function_set).to_cpu())

It works but obviously not ideal :) You can replicate this bug by adding the following lines to the end of train_mnist_model_parallel.py

import pickle
pickle.dump(model, open('test.pkl', 'w'))
model = pickle.load(open('test.pkl', 'r'))

which gives the following error

Traceback (most recent call last):
  File "./train_mnist_model_parallel.py", line 113, in <module>
    loss, acc = forward(x_batch, y_batch)
  File "./train_mnist_model_parallel.py", line 72, in forward
    h1_1 = F.dropout(F.relu(model.gpu1.l1(x_1)),  train=train)
  File "/usr/local/lib/python2.7/dist-packages/chainer-1.3.2-py2.7.egg/chainer/function.py", line 174, in __call__
    outputs = self.forward(in_data)
  File "/usr/local/lib/python2.7/dist-packages/chainer-1.3.2-py2.7.egg/chainer/functions/connection/linear.py", line 113, in forward
    Wx += self.b
  File "/usr/local/lib/python2.7/dist-packages/chainer-1.3.2-py2.7.egg/cupy/__init__.py", line 838, in __iadd__
    return add(self, other, self)
  File "/usr/local/lib/python2.7/dist-packages/chainer-1.3.2-py2.7.egg/cupy/elementwise.py", line 644, in __call__
    _check_args(args)
  File "/usr/local/lib/python2.7/dist-packages/chainer-1.3.2-py2.7.egg/cupy/elementwise.py", line 82, in _check_args
    % (arg.device.id, dev.id))
ValueError: Array device must be same as the current device: array device = 0 while current = 1
Exception cupy.cuda.driver.CUDADriverError: CUDADriverError('CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered',) in <bound method Module.__del__ of <cupy.cuda.module.Module object at 0x7fe84fbdbd10>> ignored

This works fine for train_mnist.py on a single GPU. I think this has something to do with device_id being defaulted to 0.

unnonouno · 2015-11-16T11:49:03Z

Chainer now supports serialization in the master branch.
http://docs.chainer.org/en/latest/reference/core/serializer.html
I'll close this issue.

unnonouno · 2015-11-16T11:51:07Z

fixed in #573

delta2323 added the cat:feature Implementation that introduces new interfaces. label Jul 18, 2015

unnonouno added this to the v1.3.0 milestone Aug 5, 2015

unnonouno self-assigned this Aug 5, 2015

unnonouno removed this from the v1.3.0 milestone Sep 2, 2015

unnonouno added this to the v1.4.0 milestone Sep 30, 2015

unnonouno modified the milestones: v1.5.0, v1.4.0 Oct 27, 2015

unnonouno closed this as completed Nov 16, 2015

beam2d mentioned this issue Mar 9, 2018

Design Question: Why not save and load the entire model? #4453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save/load functions and function sets #181

Save/load functions and function sets #181

beam2d commented Jul 13, 2015

jfsantos commented Jul 20, 2015

unnonouno commented Aug 6, 2015

unnonouno commented Sep 2, 2015

ejlb commented Oct 26, 2015

unnonouno commented Nov 16, 2015

unnonouno commented Nov 16, 2015

Save/load functions and function sets #181

Save/load functions and function sets #181

Comments

beam2d commented Jul 13, 2015

jfsantos commented Jul 20, 2015

unnonouno commented Aug 6, 2015

unnonouno commented Sep 2, 2015

ejlb commented Oct 26, 2015

unnonouno commented Nov 16, 2015

unnonouno commented Nov 16, 2015