a strange problem about saving memory in training process #64

zhang-wen · 2017-06-11T11:42:02Z

hello guys,

i have some strange problems here,

if i use memoryEfficientLoss function to backward loss, the training seems to be normal

but if i put the content of memoryEfficientLoss into trainEpoch function and do not define extra memoryEfficientLoss function, the training will not converge, all other code are the same.

and another question is that i guess the split operation along the first dimension to ouputs of the model can save the memory, however, if so, how do we calculate gradients and do backward and why do you call the backward() two times (loss.backward() and outputs.backward()) ? can you explain this ? thank you.

Can anyone tell me why ? any reply will be appreciated.

The text was updated successfully, but these errors were encountered:

srush · 2017-07-05T18:35:50Z

Check out the new version of the code. I think it makes this function much clearer.

zhang-wen changed the title ~~a strange problem~~ a strange problem about saving memory in training process Jun 11, 2017

srush closed this as completed Jul 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a strange problem about saving memory in training process #64

a strange problem about saving memory in training process #64

zhang-wen commented Jun 11, 2017 •

edited

srush commented Jul 5, 2017

a strange problem about saving memory in training process #64

a strange problem about saving memory in training process #64

Comments

zhang-wen commented Jun 11, 2017 • edited

srush commented Jul 5, 2017

zhang-wen commented Jun 11, 2017 •

edited