-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: state preserving LSTM #4
base: master
Are you sure you want to change the base?
Conversation
717e6ca
to
3024543
Compare
if (state_preserving == "false"): | ||
component_nodes.append("component-node name={0}_f1 component={0}_W_f-xr input=Append({1}, IfDefined(Offset({0}_{2}, {3})))".format(name, input_descriptor, recurrent_connection, lstm_delay)) | ||
else: | ||
component_nodes.append("component-node name={0}_f1 component={0}_W_f-xr input=Append({1}, Failover(Offset({0}_{2}, {3}), Offset(output_{0}_{2}_STATE_PREVIOUS_MINIBATCH, {3})))".format(name, input_descriptor, recurrent_connection, lstm_delay)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use the offset descriptor even for STATE_PREVIOUS_MINIBATCH variable ? Isn't its value constant at all time steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the lstm delay value is larger than -1, e.g. -3, then we need to add 3 previous states at t=-3, -2, -1 for frames at t=0, 1, 2 respectively, in which case we need the offset descriptor with value -3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is possible to add the previous state from multiple time steps in the same variable in the current framework ? Did you try using this network with larger offsets and check it works. I think we need to have separate variables to store each time offset from the previous minibatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my implementation, I add to each example an additional input io with multiple time steps of the same variable (Ln 108-130 in nnet3-add-recurrent-io-to-egs.cc), just the same way as ordinary input. I think as long as the Index of those additional input io is correct (in terms of computability of all the component nodes after building the computation graph).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice that you managed to get this done with minimal variables.
c41ddd2
to
342c665
Compare
4be93e2
to
6a43192
Compare
…e egs for large chunks. It replaces the function of the previous option left-shift-window, but have not changed its name and description yet
7014a93
to
f17b524
Compare
…g stats in training if we have more than one output nodes
cuda kernels for sparse matrix affine forward/backward prop
No description provided.