Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bi-Directional RNN support #66

Closed
wants to merge 15 commits into from
Closed

Bi-Directional RNN support #66

wants to merge 15 commits into from

Conversation

SeanNaren
Copy link

Re-opened request on a new branch.

Contains an implementation of BRNN for the LSTM module based on the BiSequencer module here.

Let me know of any feedback/alterations!

@SeanNaren
Copy link
Author

@elbamos I've opened a PR here but there is a merge with RNN happening here. Would be nice to get BRNN support for the LSTM in RNN, hopefully @jcjohnson agrees!

@elbamos
Copy link

elbamos commented Apr 14, 2016

Cool! I just hope it doesn't make rnn even more complicated.

On Apr 14, 2016, at 6:50 AM, Sean Naren notifications@github.com wrote:

@elbamos I've opened a PR here but there is a merge with RNN happening here. Would be nice to get BRNN support for the LSTM in RNN, hopefully @jcjohnson agrees!


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

-- reverse output
local k = 1
for i = input:size(1), 1, -1 do
self.output[k] = input[i]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that an explicit loop for reversing will be very slow for CudaTensors, since each execution of this line will require device synchronization and a kernel launch. Can you reverse the sequence in a vectorized way to avoid this overhead?

@jcjohnson
Copy link
Owner

@SeanNaren Sorry I've been slow to respond to this - I agree that bidirectional RNNs are a good thing that we should support.

Overall the BRNN and ReverseSequence API looks good to me, but I have some inline comments.

@SeanNaren
Copy link
Author

No worries, thanks for the comments will work through em, all seem very valid points! I agree with loop, wasn't really a fan of it, so will try to find a better solution to it. Any suggestions would be awesome!

@SeanNaren
Copy link
Author

@jcjohnson Added a different version of reversing the sequence using this response. How does it look now?

For clearState it seems to be in the nn.Container and I add the modules to the container so hopefully that will handle it. Let me know of any more feedback or if I missed anything!

@jcjohnson
Copy link
Owner

@SeanNaren The solution using narrow is pretty much the same as before, just using a different syntax; each call to copy will still invoke a separate CUDA kernel. I haven't thought through the details, but I think you should be able to reverse the sequence all at once using a single call to gather, which will probably be much faster.

Other than that it's looking good. Can you add some unit tests to make sure everything works as expected?

@SeanNaren
Copy link
Author

@jcjohnson Thanks for your suggestion helped a lot! I've pushed a change that uses gather, let me know of any feedback you may have on it, will add some unit tests now.

@SeanNaren
Copy link
Author

Added tests, let me know if there is any more issues/feedback.

for x = 1, T do
indices:narrow(1, x, 1):fill(T - x + 1)
end
self.gradInput = gradOutput:gather(1, indices)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this allocates new memory at every function call. Could you do something like

self.gradInput:gather(gradOutput, 1, indices)

instead ?

@fmassa
Copy link

fmassa commented Apr 15, 2016

I have some comments on the code (mostly minor), but I commented on two things that I think deserves some attention. But overall I think this is nice :)

@SeanNaren
Copy link
Author

@fmassa Thanks a lot man! Lots of good fixes cheers. I think I've covered all the issues, let me know if you have any more feedback.

@yfliao
Copy link

yfliao commented Jun 17, 2016

Dear SeanNaren:

How to apply BRNN? The command

$th train.lua -input_h5 my_data.h5 -input_json my_data.json -model_type brnn
Running with CUDA on GPU 0
/home/liao/torch/install/bin/luajit: ./LanguageModel.lua:46: attempt to index local 'rnn' (a nil value)
stack traceback:
./LanguageModel.lua:46: in function '__init'
/home/liao/torch/install/share/lua/5.1/torch/init.lua:91: in function </home/liao/torch/install/share/lua/5.1/torch/init.lua:87>
[C]: in function 'LanguageModel'
train.lua:90: in main chunk
[C]: in function 'dofile'
...liao/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d50

@ostrosablin
Copy link

How to integrate it into train.lua? New code doesn't seem to be plugged into existing torch-rnn facilities.

Also, how they behave in practice? According to papers, they're good at certain kinds of tasks and have better loss than other types of RNN. But how they compare with, say, LSTM on character-level language modelling? Are they producing better or more interesting results?

@SeanNaren SeanNaren closed this Jun 1, 2017
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants