Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduction in memory requirements: Add SplitInitializer for separate initialization #4

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

marhlder
Copy link

@marhlder marhlder commented Mar 25, 2018

This dramatically reduces memory requirements, as there will no longer be kept an extra copy of the concatenated weight tensor for each timestep (During backprop)

This dramatically reduces memory requirements, as there will no longer be kept a copy of the weight tensor for each timestep
@marhlder marhlder changed the title add SplitInitializer for separate initialization Reduction in memory requirements: add SplitInitializer for separate initialization Mar 25, 2018
@marhlder marhlder changed the title Reduction in memory requirements: add SplitInitializer for separate initialization Reduction in memory requirements: Add SplitInitializer for separate initialization Mar 26, 2018
@hannw
Copy link
Owner

hannw commented Apr 30, 2018

Hi @marhlder, would you mind explaining a bit more how this works? Just from the code, I do not quite understand how this would reduce the memory requirement. Since in the original code, the kernels are also concatenated. Specifically, why does memory consumption relate to the initializer? My understanding is that using dynamic_rnn will prevent the copying from happening.

@marhlder
Copy link
Author

marhlder commented May 3, 2018

@hannw Thx for your response. The problem is that the default backpropagation code in TensorFlow will save a copy of the concatenated weight tensor (Kernel) for each timestep (In the original code), as the concatenation op will become a part of the graph and run for each timestep. You are correct that dynamic RNN wont make extra copies of the individual kernels, only the concatenated results. The concatenation will only happen once when using the provided custom initializer, which in turn will no longer require the backpropagation code to keep this extra intermediate value for each timestep. This is not noticeable for networks with few units in each layer and short sequences, but it does become very noticeable once you turn up the heat, e.g. sequences of length 200+, nesting level 3, and 512 units in each layer. Try for instance to compare the memory consumption of the original implementation, nesting level of 3, with 3 layers of regular stacked LSTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants