-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Masking Layer for Sequence to Sequence Learning #957
Comments
Keras uses
There will be no cost function masking. Note that when you use BTW, I'm not following you guys discussions about Sequence to Sequence Learning. I figured out a way to do it both in the Sutskever way (sequence to sequence with single RNN, Fig 1) and recurrent encoder-decoder like you are doing right now. Is that still an open problem for this community? CREDIT: The following figure was modified from here. |
@EderSantana again-- a much appreciated thanks.
I suspected that this might be the case. Many of my predictions with 'masking' didn't seem right (they always were the same length!). So thank you for explaining why that was happening. Big congrats on using the Sutskever way. I don't want to speak for the others (@simonhughes22 and @sergeyf), but this type of Sutskever RNN would be a major help to us. The main reason being that we simply can not mask our output sequences, meaning the cost function is biased!
Maybe I'm not investigating all the resources, but everyone on thread #395 seems to have questions and are struggling to improve results. Having an rnn layer that does sequence to sequence and allows masking for the output would be incredibly useful. Any light you can shed on this matter would be very appreciated. |
Are you familiar with Did you get the idea? My solution is all about using |
@EderSantana I have 2 qu's:
Can you illustrate that with some sample keras code? An alternative approach would be very useful, but the picture you posted seems like how we are doing it, unless the only difference is that you have a single RNN in the middle, which might help as that's fewer layers in between. I was able to get seq 2 seq learning working but not well. It learned something but accuracy was low.
I am not sure I understand that sentence. Thanks for your advice, it would be cool if we could get the Sutskever method working really well. I have a lot of ideas of how to use that sort of model, but I don't fancy writing from scratch in theano. |
I'm also confused by this. Lets say you have 100 timesteps. Yet, for sample number 5 lets say, you only have 15 timesteps. For your input, Wouldn't you pad 85 zeros on the left and 15 numbers on the right? And let us suppose that for sample number 5, your output is supposed to be 20 numbers long. Are you saying you would place 20 numbers on the left and then 80 zeros on the *right? In summary:
There are alot of ideas you're saying (like |
I'm preparing a tutorial. I need you guys to understand sequence to sequence learning to be able to use a new recurrent layer that is coming up... so much mystery... but I believe you will like it. OFF TOPIC: |
Never been more excited for a tutorial in my life.
I actually went to med school for a while, so if you ever want to chat, I wouldn't mind giving you some helpful suggestions for carpal tunnel. I actually know a ton about it as I started to get it a while back. |
@EderSantana is that tutorial on sequence-sequence available anywhere? If not - maybe just a 5-6 line overview on this thread would be useful too :) |
@viksit I'm waiting for some fixes on |
@EderSantana : Is there a way to calculate this Y_matrix only for a batch (it would still be big, but atleast an O(10^4) approximately less memory? Or is there a more efficient approach to just use the indices directly? I do know that you could handle the batch sampling yourself and store the arrays dynamically for these batches.... But that sounds like a hard way to do validation then... Also, keras allows the sample_weights only for the training and not for validation... I guess that should be changed! |
you can write you own training loop and update the model with |
I am dealing with a similar problem, where i want the cost function to skip some samples at the beginning of the sequence. I am familiar with sample_weight used with fit or train_on_batch, but does anybody know if it is possible to use it with fit_generator? |
@EderSantana thank you for your clarification of the sample_weight above. I have a task similar to POS tagging where I pad both the X and y, and at first my model was acting as if there was no masking and tried to predict the 0s in the X with 0s in the y. I've implemented your sample_weight method, but am now running into a problem that the model is still trying to predict masked values in the X, but won't predict them as the masked value in the y. So it's not considering the mask a label anymore from the y, but it's trying to predict for paddings in the sentence (X). I feel like I'm almost there, any idea? (I have my stackoverflow question here. |
@EderSantana May I know how to use making at the input if the padded values are not zeros but some other values. For example, in image even the samples have zeros as values so we need to pad with a different value. So how to perform masking in this case. |
@fchollet i am using fit_generator to train, different batches in my data has different size like (4,100,40),(4,120,40),....(4,1200,40),(4,1500,40) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
Hey Guys,
I know @EderSantana asked this question a few months ago #395, and I have read several Keras threads that deal with masking.
The question was: We can mask the input of zeros, but won't it bias the cost function as we are not masking the zeros on the output?
Later, the
Masking
layer was added, and I do believe that this output problem was resolved. However, in order to use masking without an embedding layer, my model structure is:I have two questions:
mask_value = 0
(no floating)TimeDistributedDense
layer? If not, this would lead to the cost function being biased by zeros!I apologize for the redundancy of the issue, but it is a very important one. Thank you!
The text was updated successfully, but these errors were encountered: