Seq2seq learning: how to pad the features? #3023

ipoletaev · 2016-06-19T10:00:14Z

According to #395 we should pad all of the training sequences with a special padding symbol (0,for example) in order to make equal length feature vectors. But I don't understand how to do it from the code point of view.
Let's imagine that we have classification task of words in the sentences. So, for example we have an input vector with shape_in=(nb_samples,30), where "30" - maximum amount of words in the sentence. Output of the neural network has shape_out=(nb_smaples,30,20), where "20" - amount of classes, for example.
Then, if we have the following input: "What are you doing right now?", the features and output will be next:

input_vector=[1243,34,5776,54,45,878,0,...,0]; len(input_vector)=30. Each number corresponds to respective word in the sentence. 0 is padding vector to the necessary length.
output_matrix = [
[1,0,...,0], - vector of output with length of 20 corresponds to the "What" and means that this word from the 1 class
[0,1,0,...,0], - "are" corresponds to the second class,for example, and so on...
///there're 27 row of zeros(20)
[0,0,...,0] - 30th row of zeros for the last padding symbol.
]

I have a few questions. The first one is a little off topic.

Do I understand that I should end each sentence with a special additional symbol which has the fixed feature in numerical (one-hot) representation for every sentence? Maybe it's necessary for every beginning of sentences?
As it became clear the first layer of the network that I use - Embedding.Then go BiLSTM layer and TimeDistributedDense with Softmax in the end. And actually the question: how besides use of the flag mask_zero=True in the Embedding layer I should specify output masking? So when the neural network meets at the output for some word the vector of all zeros then the network will simply skip such vectors and move on to the next example?

Many thanks for the detailed comments.

The text was updated successfully, but these errors were encountered:

ipoletaev · 2016-08-08T07:01:07Z

So, after some experiments I made the following conclusion:

You shouldn't use mask_zero flag;
For right results you just need to specify sample_weights numpy array with the shape of your output data (with ones and zeroes if you want to include some timestamp or not respectively);
When you compile model it's necessary to use flag compile_mode='temporal'

ipoletaev changed the title ~~Seq2seq learning: hot to pad the features?~~ Seq2seq learning: how to pad the features? Jun 19, 2016

ipoletaev mentioned this issue Jun 28, 2016

How does Masking work? #3086

Closed

ipoletaev closed this as completed Aug 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seq2seq learning: how to pad the features? #3023

Seq2seq learning: how to pad the features? #3023

ipoletaev commented Jun 19, 2016 •

edited

Loading

ipoletaev commented Aug 8, 2016

Seq2seq learning: how to pad the features? #3023

Seq2seq learning: how to pad the features? #3023

Comments

ipoletaev commented Jun 19, 2016 • edited Loading

ipoletaev commented Aug 8, 2016

ipoletaev commented Jun 19, 2016 •

edited

Loading