Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save/load functions and function sets #181

Closed
beam2d opened this issue Jul 13, 2015 · 6 comments
Closed

Save/load functions and function sets #181

beam2d opened this issue Jul 13, 2015 · 6 comments
Assignees
Labels
cat:feature Implementation that introduces new interfaces.
Milestone

Comments

@beam2d
Copy link
Member

beam2d commented Jul 13, 2015

Function and FunctionSet objects can be pickled. If one uses GPU for training and wants to port it to non-GPU environment, we have to pickle it after migrating it to CPU. One must deep-copy the object if the pickling takes place during the training process in order not to keep the original object on GPU. It is redundant and uses too much GPU memory. We want an efficient way to save/load models regardless of CPU/GPU.

@delta2323 delta2323 added the cat:feature Implementation that introduces new interfaces. label Jul 18, 2015
@jfsantos
Copy link
Contributor

I believe the safest way of saving a model is to save only the weight/bias variables, without necessarily saving the objects themselves (so, you would always have to dump CPU-based variables, as this is the common denominator for all users). Also, pickling is not reliable for model saving since there's issues to loading a Python 2 pickled file with Python 3 and vice-versa. It would be safer to export the model parameters to HDF5 (where you can even get some compression for free) or Numpy binary files.

@unnonouno unnonouno added this to the v1.3.0 milestone Aug 5, 2015
@unnonouno unnonouno self-assigned this Aug 5, 2015
@unnonouno
Copy link
Member

Is portability to other languages important? My colleague is trying to use a trained model on Javascript.

@unnonouno unnonouno removed this from the v1.3.0 milestone Sep 2, 2015
@unnonouno
Copy link
Member

We are now redesigning chainer in #363, that includes this issue.

@unnonouno unnonouno added this to the v1.4.0 milestone Sep 30, 2015
@ejlb
Copy link
Contributor

ejlb commented Oct 26, 2015

@beam2d @unnonouno

I've come across another problem around model saving when training on multiple GPUs. In function.to_gpu() we have

if isinstance(v, numpy.ndarray):
   setattr(self, k, cuda.cupy.array(v))

The unpickled model arrays have the type chainer.cuda.ndarray and so they do not get transferred to the GPU (I think, but might be wrong). My work-around for this is moving a copy of the function set to the CPU before pickling to force all arrays to be numpy.ndarray

pickle.dump(copy.deepcopy(self.function_set).to_cpu())

It works but obviously not ideal :) You can replicate this bug by adding the following lines to the end of train_mnist_model_parallel.py

import pickle
pickle.dump(model, open('test.pkl', 'w'))
model = pickle.load(open('test.pkl', 'r'))

which gives the following error

Traceback (most recent call last):
  File "./train_mnist_model_parallel.py", line 113, in <module>
    loss, acc = forward(x_batch, y_batch)
  File "./train_mnist_model_parallel.py", line 72, in forward
    h1_1 = F.dropout(F.relu(model.gpu1.l1(x_1)),  train=train)
  File "/usr/local/lib/python2.7/dist-packages/chainer-1.3.2-py2.7.egg/chainer/function.py", line 174, in __call__
    outputs = self.forward(in_data)
  File "/usr/local/lib/python2.7/dist-packages/chainer-1.3.2-py2.7.egg/chainer/functions/connection/linear.py", line 113, in forward
    Wx += self.b
  File "/usr/local/lib/python2.7/dist-packages/chainer-1.3.2-py2.7.egg/cupy/__init__.py", line 838, in __iadd__
    return add(self, other, self)
  File "/usr/local/lib/python2.7/dist-packages/chainer-1.3.2-py2.7.egg/cupy/elementwise.py", line 644, in __call__
    _check_args(args)
  File "/usr/local/lib/python2.7/dist-packages/chainer-1.3.2-py2.7.egg/cupy/elementwise.py", line 82, in _check_args
    % (arg.device.id, dev.id))
ValueError: Array device must be same as the current device: array device = 0 while current = 1
Exception cupy.cuda.driver.CUDADriverError: CUDADriverError('CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered',) in <bound method Module.__del__ of <cupy.cuda.module.Module object at 0x7fe84fbdbd10>> ignored

This works fine for train_mnist.py on a single GPU. I think this has something to do with device_id being defaulted to 0.

@unnonouno unnonouno modified the milestones: v1.5.0, v1.4.0 Oct 27, 2015
@unnonouno
Copy link
Member

Chainer now supports serialization in the master branch.
http://docs.chainer.org/en/latest/reference/core/serializer.html
I'll close this issue.

@unnonouno
Copy link
Member

fixed in #573

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:feature Implementation that introduces new interfaces.
Projects
None yet
Development

No branches or pull requests

5 participants