New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge doesn't support masking #2393
Comments
+1 for merged mask. Also, interesting use case. How should a backwards-running mask be pushed forward in this case? let's say mask is 110. m_fwd = 110 still, but m_back = 011, right? so then, when they concat, should the new mask be 110011? Is the backwards RNN outputting the mask backward? I don't think it is. It look like it's just pushing it forward if returning sequences else None In the concat mode, you would just then concat the incoming masks, right? |
I opened a pull request here #2413 |
Forward mask is 110, and backward mask is 011, when they concat in axis -1, the new mask is [(1, 0), (1, 1), (0, 1)] but it's wrong. I think first we will pad the two input mask(actually pad the two input sequential data): Same process to the 'sum' and 'ave' merge mode, right ? |
I believe if you do If you used |
I don't think I'm on the same page as you guys, so I'll try to be super explicit. I'm also changing the example slightly to make it more explicit what the shapes will be. When I write "Mask @ J", I mean mask output from J. # assumes shape variables:
# batch, time, feat_size, emb_size, rnn_size
#### 1. input is (batch, time)
#### Mask @ 1: None
input = Input(batch_shape=(batch, time), dtype='int32')# feature at every (b,t)
#### 2. embedding is (batch, time, emb_size)
#### Mask @ 2: (batch, time) from K.not_equal(x, 0)
x = Embedding(input_dim=feat_size, output_dim=emb_size, mask_zeros=True)(input)
#### 3. return_sequences makes this (batch, time, rnn_size).
#### Mask @ 3: (batch, time) from just pushing mask through
x_forward = GRU(rnn_size, return_sequences=True)(x)
#### 4. return_sequences make this (batch, time rnn_size) and should reverse the mask
#### Mask @ 4: (batch, time) but should reverse; probably with mask[:,::-1]
x_backward = GRU(rnn_size, return_sequences=True, go_backwards=True)(x)
#### 5. Concatenation on the -1 dimension should result in (batch, time, rnn_size * 2)
#### Mask @ 5: (batch, time)..???? see below
rnn = merge([x_forward, x_backward], mode='concat') So, step 5 is odd. And it presents an issue that isn't in your PR. That is, the time dimension was reversed on the RNN, not the feature dimension. So, the time dimension has the reversed mask. But the methodological question is, when masking, how do you align two sequences where one has been reversed? |
I think I get @jiumem 's comment now. Your recommendation was to use padding to realign the two RNNs. I think this is troublesome in general, though, because you might have different length sequences along the batch dimension. @codekansas If the mask is over the time dimension, the mask shouldn't change depending on the merge type, right? So if you're merging in whatever what on the feature dimension, then the time dimension is untouched. But, If you're merging on the time dimension though, then you'd need to manage the mask. |
In the PR I increased the mask dimensions to match the input, and concatenated the masks, here. So it outputs a mask with dimensions |
Wait I think I understand the issue now. If you have another RNN on top of the merged layer:
You would want It seems like this may have been the intent for |
Ya. That seems more accurate than what I said. I don't think it's the full story though. That's good for concatenation, but what about summing, etc? Slightly more complicating use case: I want to allow mixed data, where one sequence has an entry and the other doesn't. In this case, you mask on input to the merge, not on output. Currently, RNNs just push forward the state_tm1 (t minus 1) when they encounter a 0 in the mask. So, you wouldn't have a 0 entry where you want to allow the other sequence to give you data.
Masks are frustrating..
Yes. The Bi-Directional RNN paper has the sequences aligned at all times. You could, in theory, even specify an offset, but that sounds insane in the current implementation.
Ya, that is the thing that is coming to light from this conversation, I think.
It's an issue for more than just the unrolled RNNs. For example, the theano implementation: In [38]: import theano
In [39]: x = theano.shared(np.arange(10))
In [40]: print(theano.scan(lambda x: x**2, sequences=x, go_backwards=True)[0].eval())
[81 64 49 36 25 16 9 4 1 0]
In [41]: np.arange(10)
Out[41]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) For alignment: |
Agreed.
It makes sense to just mask anything that's masked in any of the masks, e.g. do Suppose you have some mask like I was thinking instead of the line here it could be something like
and then further down, instead of doing This would also mean changing the output mask in I'm not sure if this would work. I will try it later and see. |
Now that I reread what you said this seems like the best way to do it |
@codekansas Elegant code for |
I have a similar use case/need for merging masked inputs. I'm combing the outputs of multiple LSTMs layers that may have to deal with variable length, but will all have a final result in the end. These can't be merged into a single input for the classification step though. |
This is only an issue if you need to compine them at each time-step. I think if you end up dropping out the time dimension at some point there is a work-around, albeit not a very clean one. For me I was doing a max-pooling over the feature dimension after the LSTM, so the work-around was to merge the max pooling layers instead of the LSTM layers. So in general the work-around is to apply what you'd normally apply to both the forward and backward parts and then merge them after the time dimension has been dropped out. |
In my current use case I'm dropping out of the time dimension, but the merge still errors about the masking. |
You can fork my PR here |
Seems two input's mask will be corrupted in concat mode. Will it support mask merge soon?
The text was updated successfully, but these errors were encountered: