New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert and clearState #138

Closed
melgor opened this Issue Mar 17, 2016 · 8 comments

Comments

Projects
None yet
4 participants
@melgor

melgor commented Mar 17, 2016

Recently I want to use "convert" method to reduce the size of models(cmusatyalab/openface#110) . And my previous method worked much better than clearState. I found why:

  1. "_gradOutput" is never cleared in any module.
  2. "gradBias" and "gradWeight" are not cleared in SpatialConvolution
    Shouldn't this buffers be added to "clearState" function?

The last thing is this line. Why not using "clearState()" here, no matter if this is 'nn' or 'cudnn' version?
I think that "clearState()" can have different implementation in nn and cudnn what can cause buffers which have occured in cudnn version will not be cleared.

@fmassa

This comment has been minimized.

Collaborator

fmassa commented Mar 17, 2016

Indeed, _gradOutput and _input are not cleared in clearState, as mentioned in here and that should be included.
About gradWeight and gradBias, as mentioned in cmusatyalab/openface#110, we can't only free them, but we also need to reconstruct them before training, so they go out of the scope of clearState I think.

About calling :clearState instead of :clearDesc, I think that's a good idea. It's probably like this because convert was merged in before clearState.

@soumith

This comment has been minimized.

Owner

soumith commented Mar 17, 2016

(1) is fixed.
(2) is out of scope of clearState unfortunately.

@soumith soumith closed this Mar 17, 2016

@melgor

This comment has been minimized.

melgor commented Mar 17, 2016

@soumith Thanks for fast response and action. I see you added _gradOutput only to SpatialConvolution, It would be great that if you can add it to Pointwise and other modules which have "_gradOutput" buffer.

@soumith

This comment has been minimized.

Owner

soumith commented Mar 17, 2016

sure, adding it now.

@vgatto

This comment has been minimized.

vgatto commented Apr 29, 2016

@soumith I understand (2) is out of scope for changes to clearState. Is there a work-around? Even when zeroed out, gradWeight is still pretty big when written to disk.

@soumith

This comment has been minimized.

Owner

soumith commented May 10, 2016

@vgatto you could write a small save and load function. Save goes through the model and nils all the gradWeight and gradBias. Load goes through the model and makes gradWeight same size as weight, and gradBias same size as bias. However, sharing will break in this scheme.

@fmassa

This comment has been minimized.

Collaborator

fmassa commented May 10, 2016

@vgatto to complement @soumith answer, here is a code (which is present in optnet) which does exactly what Soumith mentioned, but which also takes care of the sharing scheme.

@vgatto

This comment has been minimized.

vgatto commented May 10, 2016

@soumith, @fmassa - Thanks for the pointers, that definitely works for me. It seems like the long term solution would be enhancing the nn.Module interface to support this save/load case and having each module implement whatever makes sense. Basically, another clearState specifically for this purpose. Is there an enhancement issue filed for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment