Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: state preserving LSTM #4

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

WIP: state preserving LSTM #4

wants to merge 5 commits into from

Conversation

freewym
Copy link
Owner

@freewym freewym commented Dec 23, 2015

No description provided.

@freewym freewym force-pushed the splstm branch 28 times, most recently from 717e6ca to 3024543 Compare December 29, 2015 19:11
if (state_preserving == "false"):
component_nodes.append("component-node name={0}_f1 component={0}_W_f-xr input=Append({1}, IfDefined(Offset({0}_{2}, {3})))".format(name, input_descriptor, recurrent_connection, lstm_delay))
else:
component_nodes.append("component-node name={0}_f1 component={0}_W_f-xr input=Append({1}, Failover(Offset({0}_{2}, {3}), Offset(output_{0}_{2}_STATE_PREVIOUS_MINIBATCH, {3})))".format(name, input_descriptor, recurrent_connection, lstm_delay))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use the offset descriptor even for STATE_PREVIOUS_MINIBATCH variable ? Isn't its value constant at all time steps.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the lstm delay value is larger than -1, e.g. -3, then we need to add 3 previous states at t=-3, -2, -1 for frames at t=0, 1, 2 respectively, in which case we need the offset descriptor with value -3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is possible to add the previous state from multiple time steps in the same variable in the current framework ? Did you try using this network with larger offsets and check it works. I think we need to have separate variables to store each time offset from the previous minibatch.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my implementation, I add to each example an additional input io with multiple time steps of the same variable (Ln 108-130 in nnet3-add-recurrent-io-to-egs.cc), just the same way as ordinary input. I think as long as the Index of those additional input io is correct (in terms of computability of all the component nodes after building the computation graph).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice that you managed to get this done with minimal variables.

@freewym freewym force-pushed the splstm branch 7 times, most recently from c41ddd2 to 342c665 Compare January 23, 2016 17:01
@freewym freewym force-pushed the splstm branch 10 times, most recently from 4be93e2 to 6a43192 Compare July 1, 2016 22:27
…e egs for large chunks. It replaces the function of the previous option left-shift-window, but have not changed its name and description yet
@freewym freewym force-pushed the splstm branch 4 times, most recently from 7014a93 to f17b524 Compare July 3, 2016 16:21
…g stats in training if we have more than one output nodes
freewym pushed a commit that referenced this pull request Apr 6, 2017
cuda kernels for sparse matrix affine forward/backward prop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants