-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How does Masking work? #3086
Comments
I'm also interested with this question. It seems to me you expected to get something like the following:
...you will get [[-0.20101213],[ 0. ],[-0.51546627]] as expected. |
@ipoletaev from keras.models import Model
import numpy as np
from keras.layers import Masking, Activation, Input, LSTM, merge
a = np.array([[[.3,.1,.2,.2,.1,.1],[.2,.3,.3,.3,.3,.1],[0,0,0,0,0,0]]])
inputs = Input(shape=(3,6))
mask = Masking(mask_value=0.0)(inputs)
fw = LSTM(1,return_sequences=True)(mask)
bw = LSTM(1,return_sequences=True,go_backwards=True)(mask)
merged = merge([fw,bw],mode='sum')
model = Model(input=inputs,output=fw)
model2 = Model(input=inputs,output=bw)
model3 = Model(input=inputs,output=merged) the |
Hmm... I don't know how to make such out only through the Keras. About your example: I think it's similar to the aforementioned example, so you should get |
Hi guys, I got this question too.. Especially for LSTM (BRNN). Masking Layer gives a masked vector, only work for the inputs, not for inner states. This might be correct because inputs are surely masked. However, I think using Masking layer in bidirectional RNN for sequences with different lengths may be totally wrong. |
Yes, it's logically, but in any case we want to get zero at the third place, isn't it?
I think it doesn't matter because of, as I understood, you should specify output Maybe there is some way to use a |
@ipoletaev Wow, thanks a lot for this! Yes, we want to get zero at the masked position. Maybe we can deliver another |
I think it's not necessary, because the network shouldn't remember what responses it need to get at empty vectors...
I don't understand for what task you want to use it? After all you always know in advance what data you process, and you respectively know - which output of the network corresponds to the empty vectors, so you can just skip such positions in output, I guess.
As far as I understood Keras has been "fighting" with RNN masking task about year :) |
@ipoletaev I think it's just In lasagne, it seems to use Masking matrix to deal with padding. (I do not test its accuracy) |
@poyuwu : yes, I had checked it - and you are right. It means,as I understood, that and simple I write again what does not converge with the expectations:
|
@ipoletaev I don't think
this statement is true.
As I said, in lasagne, we provide a mask numpy.array (the same shape as input) to deal with it. If Besides, |
@poyuwu so you want to say that now, there's no way to solve this issue with Keras? |
Same here. It seems masking mechanism in Keras is not fully supported. |
I don't think Masking masks input values (neither during forward or back-propagation). It just skips a time-step where all features are equal to the mask value (i.e. when you pad a sequence). You can confirm this by:
The answer is:
If it masked the inputs of value 0.1, you would expect result to be
|
Actually Masking works exactly as expected.
In this case the answers are:
so from here you can see that when we mask |
Moreover, I just tested, if you have a Multi-input net (with multiple input branches) and you have a masking layer on each branch, it is enough that just one of the inputs at time step I guess that if one wants to skip the time step only if all the inputs are equal to the masked value, the branches need to be merged, right? |
Hi Fragore, I have a similar question to you about masking with multiple inputs. I have two input branches and all I want to do is mask 0 from both. Am I right in thinking that adding a mask to the end of each branch is equivalent to adding a single mask AFTER the inputs are merged? here's my example
or version with a mask after each branch prior to merging
|
Wait, you want to mask the output of the branches that are 0? In that case both of your approaches should give you the same result. But usually you mask inputs, this means to put the mask layer as input of the net. |
I've been experimenting with and without masking for a little bit now and I have finally figured out what the Masking layer actually does. It doesn't actually "skip" the timepoint that has all masked values, it just forces all the values for that timepoint to be equal to 0... So effectively Masking(mask_value=0.) does nothing. That is why in the example provided by @GPaolo above the results for mask_value=0 and mask_value=0.5 are the same when val matches them. Here is some easy code to demonstrate what I mean. Model: `input1 = Input(batch_shape=(1,1,10) model = Model(input1, output1) `data = np.ones((10, 1, 10), dtype='float32') #set first data point equal to mask value to show that this line is uneffected print outputs: `get_mask_output = K.function([model.layers[0].input], [model.layers[1].output]) print(data) data: [[[ 2. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]] [[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]] [[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]] [[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]] [[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]] mask_output: [[[ 2. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]` Predictions: `test_data = np.ones((5,1,10)) print(test_data) Results: test_data: [[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]] [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]] [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]] predictions: [[ 0.5 ]] [[ 0.5 ]] [[ 0.09200736]] [[ 0.09200736]]]` As you can imagine, "masking" values by setting them to 0 and still calculating the results for those lines in layers causes some mistakes from backpropagation (treating unknown values as a real result) as well as added unneeded computation time. I'm going to try to rework how masking is done in Keras a bit... Edit: I did a little bit of digging into the training.py code and I found that the "masking" information (even with mask_value = 0.) does get incorporated into the training of the weights. The masked lines effectively get ignored after the calculation is done (which is good!). The problem that I am encountering in my actual network is that although "masked lines" are ignored during weight training, they are still evaluated by the network going forward which effects the outputs of future layers based on false information. To be able to build a network that handles variably sized inputs (not all have max timepoints) I want to completely ignore the masked lines entirely... I'm going to try to work that out |
Building on @slaterb1 and @GPaolo 's snippets I tried digging around to see the benefits of masking but haven't found it yet. It feels like I'm missing something.
Does anyone have an idea about if/when it gives performance gains? I didn't have time to run for long/deep/wide and I'm not comfortable about how Python/Keras/Tensorflow/Theano compiles Is mask an intricate way of doing what I think weights should to be doing? I.e multiplying with the loss and dividing by sum of weights in batch? Does it actually halt any execution (yet)? |
@ragulpr, I'm not sure about performance gains but Theano is pretty smart about knowing what it needs to hang on to and what it doesn't (based on the API doc: http://deeplearning.net/software/theano/library/scan.html) More specifically this line: "Note that there is an optimization, that at compile time will detect that you are using just the last value of the result and ensure that scan does not store all the intermediate values that are used. So do not worry if A and k are large." So after compiling the model it might pass over the masked values (or at least not hold them in memory as long), but that is pure speculation based on similarities in the underlying code. @carlthome, I came across the mask snippet in the "theano_backend.py" as well and you are right that the masking has a direct effect on how the states are evaluated and passed on (T.switch). Maybe this is too general a question but how does this layer accept the mask? Just to give an example, if I have a model with multiple layers, defined as so: model = Model(input1, output1) I understand that Theano wraps this up as a mathematical equation to calculate: output1 = input1 -> [ layers[0] -> layers[1] -> ... layers[N] ] but if I have somewhere in the middle: prev_layer -> Masking_layer -> RNN_layer The output from the Masking_layer gets put into the RNN_layer as input ("x"). Does the "supports_masking" attribute tell the RNN_layer to figure out the mask? I could not find anywhere in the code where the mask is evaluated or interpreted by the RNN_layer, except that I can pass in a "mask" variable via the call() method of the Recurrent(Layer) object. I tried calling RNN_layer(prev_layer, mask=Masking_layer) but it didn't do anything different. The last comment in the thread, #176 suggests that it has to be called with a mask but I'm not sure how to do that... Any thoughts? |
Each Keras layer declares if it supports masking. Each layer is also responsible for using the mask in a sensible way (which I believe is the primary source of confusion: that the masking functionality is implemented across a bunch of different classes). For RNN layers in particular, they rely on the fact that the underlying |
@carlthome, I saw that in the code but was not able to get the mask to work in my RNN network. For clarity I was trying to rework stuff in RecurrentShop to setup an encoder decoder network that adjusts the next input based on a prediction made on the previous state from both the encoder and the decoder (a custom RNN that uses a .single_step_rnn() instead of the regular .rnn() ). But based on your advice, I tried to just build a basic LSTM network to act as a NOT Gate (pointless but simple) and it does interpret the mask correctly, when it is passed a mask mid network! I'm including the gist. It shows that masking works for both return_sequences=True and return_sequences=False. It also shows that if you train the network with data that does not have 'masked' input, 'masked' lines in the test data will still get masked appropriately. Hope that helps people understand the masking stuff better! This is the gist |
@fferroni @GPaolo apparently, the |
Hey Guys, there is a seq2seq example which it's input is a string (sequence) like '5+9' and output is another string '14'. this is main code:
and I just change this part:
to this part:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
Mask layer will work only when all feature of a timestep equals to the mask value.In you case,the input a is a 3d matrix with the shape(1,3,6),1 means batch_size,3 means timesteps,and 10 means the feature of that timestep.Mask will work when the feature of a timestep all equal to 0.1.if you change a to: you will get the output like:
|
Hi @hoangcuong2011 , thanks for your explanations. I've validated your second point and indeed it's exactly what you said. I'm currently trying to implement a LSTM-autoencoder model to encode sequence into sequence, in which it involves a LSTM layer with return_sequence = False and then RepeatVector layer to copy that back to the previous timestep dimension. However, the mask get lost right after the LSTM because return_sequence = False (if True it returns the input_mask), then I'm wondering how I can get back the mask so that the loss will also ignore the padded timesteps? Thanks! |
@zhangwj618 I am not really sure what your question is about. I guess you would like to write a custom masking layer. If you explain the question in more detail, I think I can help. Thx! |
](./typescript-kurulumu.md) | |
1 similar comment
](./typescript-kurulumu.md) | |
I'm wondering how
Masking
Layer works.I try to write simple model to test
Masking
onActivation
Layerand the result of prediction is
Is this the correct behavior?
My keras version is 1.0.5
The text was updated successfully, but these errors were encountered: