bi-directional rnn alignment when masked #2536

Closed
braingineer opened this Issue Apr 27, 2016 · 3 comments

Comments

Projects
None yet
2 participants
@braingineer
Contributor

braingineer commented Apr 27, 2016

related: #1941 #1703 and a ton of others, but they don't really capture this issue.

It came up in #2413 and #2393 when discussing masks for merge layers, but was consigned to another PR. This issue post is basically to discuss the best route.

basically: (0 is mask)

abcd000 
# and its reversed version
000dcba

The desired output should be aligned

abcd000
dcba000

To have a proper aligned bi-directional RNN. And actually.. being able to set the offset would also be useful.. If one wanted.

Tensorflow does not have this problem. They have a custom implementation which addresses it.

I've **scoured theano, lasagne, blocks, pylearn issues and code, but I can't find any implementations that would work.

At the moment, I have the following possible solution:

https://gist.github.com/braingineer/4837baee2f2782f77cddc0dce852064d

**footnote. though, the internets are vast. I probably missed things.

@braingineer

This comment has been minimized.

Show comment
Hide comment
@braingineer

braingineer Apr 27, 2016

Contributor

Sooo. I assumed something earlier and thought this was the original BRNN architecture style (aligning in this way). I don't think it is. But, It's something I am going to be using and it may come up for someone else, so I'll just leave the issue up for now.

Contributor

braingineer commented Apr 27, 2016

Sooo. I assumed something earlier and thought this was the original BRNN architecture style (aligning in this way). I don't think it is. But, It's something I am going to be using and it may come up for someone else, so I'll just leave the issue up for now.

@xingdi-eric-yuan

This comment has been minimized.

Show comment
Hide comment
@xingdi-eric-yuan

xingdi-eric-yuan Apr 27, 2016

Contributor

For example [1, 2, 3, 4, 0, 0], say both W and U are ones.
forward: [1, 2, 3, 4, 0, 0]
backward: [0, 0, 4, 3, 2, 1] /(just reverse)
after rnn
forward: [1, 3, 6, 10, 10, 10]
backward: [0, 0, 4, 7, 9, 10]
if not return_sequence, then you can directly return concat([10; 10])
if return_sequence,
forward: [1, 3, 6, 10, 10, 10]
backward: [10, 9, 7, 4, 0, 0 ] /(just reverse)
I don't think you want [4, 7, 9, 10, 0, 0] for backward, cause "4" coresponds to "4" in input, not the first element; however, "10" coresponds to that first "1".

If [4, 7, 9, 10, 0, 0] is what you want, then you can also do something like:
row_shape = row.shape
row = row[row.nonzero()]
pad = T.alloc(0., row_shape[0] - row.shape[0])
return T.concatenate([row, pad])

Contributor

xingdi-eric-yuan commented Apr 27, 2016

For example [1, 2, 3, 4, 0, 0], say both W and U are ones.
forward: [1, 2, 3, 4, 0, 0]
backward: [0, 0, 4, 3, 2, 1] /(just reverse)
after rnn
forward: [1, 3, 6, 10, 10, 10]
backward: [0, 0, 4, 7, 9, 10]
if not return_sequence, then you can directly return concat([10; 10])
if return_sequence,
forward: [1, 3, 6, 10, 10, 10]
backward: [10, 9, 7, 4, 0, 0 ] /(just reverse)
I don't think you want [4, 7, 9, 10, 0, 0] for backward, cause "4" coresponds to "4" in input, not the first element; however, "10" coresponds to that first "1".

If [4, 7, 9, 10, 0, 0] is what you want, then you can also do something like:
row_shape = row.shape
row = row[row.nonzero()]
pad = T.alloc(0., row_shape[0] - row.shape[0])
return T.concatenate([row, pad])

@braingineer

This comment has been minimized.

Show comment
Hide comment
@braingineer

braingineer Apr 27, 2016

Contributor

Hi @xingdi-eric-yuan ,

thanks =). Ya, I think I got confused while reading a paper on something, got really focused on figuring out how to solve it, and didn't realize my mistake until after I posted.

also, I didn't know about alloc!

side note: There is an issue with nonzero in how it indexes (for example, it gives you a tuple, so you've have to do nonzero()[0]). I usually use it to do accuracy with masks.

The issue here was that each row in your matrix will have different size paddings. And if you want the first element of a forward RNN to align with the first element of a backward RNN, it's a bit wonky.

Contributor

braingineer commented Apr 27, 2016

Hi @xingdi-eric-yuan ,

thanks =). Ya, I think I got confused while reading a paper on something, got really focused on figuring out how to solve it, and didn't realize my mistake until after I posted.

also, I didn't know about alloc!

side note: There is an issue with nonzero in how it indexes (for example, it gives you a tuple, so you've have to do nonzero()[0]). I usually use it to do accuracy with masks.

The issue here was that each row in your matrix will have different size paddings. And if you want the first element of a forward RNN to align with the first element of a backward RNN, it's a bit wonky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment