use attention_3d_block in many to many mapping #7

Opdoop · 2017-07-04T03:39:09Z

Hi, I'm beginner of Keras and tring to use attention_3d_block in translation module.
I have input of 5 sentences, each sentences has padding to 6 words, each word is presented in 620 dim(as embedding dim).
And the output is 5 sentences, sentences padding to 9 words, and word is presented in 1-of-k in 30 dim(as vocabulary size)
How to use attention_3d_block in this scenario as the LSTM is many to many?

Opdoop · 2017-07-04T04:17:02Z

I add a TimeDistribute(Dense()) to fit input_dim, but how to fit the time_step.
As the module structure is below:

Get Error when run
ValueError: Error when checking target: expected activation_1 to have shape (None, 6, 30) but got array with shape (5, 9, 30)

philipperemy · 2017-07-04T04:36:57Z

@Opdoop can you please share the minimal code to reproduce your error?

Opdoop · 2017-07-04T06:53:42Z

@philipperemy
Data prepare is :

    input_text = ['中国 的 首都 是 北京'
                  , '日本 的 首都 是 东京'
                  , '美国 的 首都 是 华盛顿'
                  , '英国 的 首都 是 伦敦'
                  , '德国 的 首都 是 柏林']
    
    tar_text = ['Beijing is the capital of China'
                , 'Tokyo is the capital of Japan'
                , 'Washington is the capital of the United States'
                , 'London is the capital of England'
                , 'Berlin is the capital of Germany']
                
    input_list = []
    tar_list = []
    END = ' EOS'
    for tmp_input in input_text:
        tmp_input = tmp_input+END
        input_list.append(tokenize(tmp_input))
    for tmp_tar in tar_text:
        tmp_tar = tmp_tar+END
        tar_list.append(tokenize(tmp_tar))
    vocab = sorted(reduce(lambda x, y: x | y, (set(tmp_list) for tmp_list in input_list + tar_list)))

    vocab_size = len(vocab) + 1  # keras embedding need len(vocab)+1
    input_maxlen = max(map(len, (x for x in input_list)))
    tar_maxlen = max(map(len, (x for x in tar_list)))
    output_dim = vocab_size
    hidden_dim = 1000
    INPUT_DIM = hidden_dim #senten length
    TIME_STEPS = input_maxlen #

    word_to_idx = dict((c, i + 1) for i, c in enumerate(vocab))  
    idx_to_word = dict((i + 1, c) for i, c in enumerate(vocab))  

    inputs_train, tars_train = vectorize_stories(input_list, tar_list, word_to_idx, input_maxlen, tar_maxlen, vocab_size)
    #inputs_trains shape is (5,6) and tars_tratin shape is (5,9,30)

Module is :

def model_attention_applied_before_lstm():
    inputs = Input(shape=(input_maxlen,))
    embed = Embedding(input_dim=vocab_size,
                              output_dim=620,
                              input_length=input_maxlen)(inputs)
    attention_mul = LSTM(hidden_dim, return_sequences=True)(embed)    
    attention_mul = attention_3d_block(attention_mul)
    attention_mul = LSTM(500, return_sequences=True)(attention_mul)
    output = TimeDistributed(Dense(output_dim, activation='sigmoid'),input_shape=(tar_maxlen, output_dim))(attention_mul)
    #output = TimeDistributed(Dense(output_dim))(output)
    #output = Dense(output_dim, activation='sigmoid')(attention_mul)
    #output = RepeatVector(tar_maxlen)(output)
    #output = Permute((tar_maxlen,output_dim),name='reshapeLayer')(output)
    output = Activation('softmax')(output)
    model = Model(input=[inputs], output=output)
    return model

And run the model as:

    m = model_attention_applied_before_lstm()
    m.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    print(m.summary())

    m.fit(inputs_train, tars_train, epochs=10, batch_size=3, validation_split=0.1)

philipperemy · 2017-07-04T06:57:07Z

In that case, what you need to have is a sequence to sequence attention. This project does not support this.

Opdoop · 2017-07-04T07:00:12Z

Ooooooooh. Thanks a lot.
Do you know any sequence to sequence attention implement using keras?

philipperemy · 2017-07-04T07:01:29Z

I'm not sure about this. This project:

https://github.com/farizrahman4u/seq2seq

is the most famous seq2seq in Keras. Maybe there's an attention mechanism there. To be checked

philipperemy · 2017-07-04T07:01:55Z

The Seq2seq framework includes a ready made attention model which does the same.

Yes it does have one!

philipperemy added the question label Jul 4, 2017

philipperemy closed this as completed Jul 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use attention_3d_block in many to many mapping #7

use attention_3d_block in many to many mapping #7

Opdoop commented Jul 4, 2017 •

edited

Loading

Opdoop commented Jul 4, 2017

philipperemy commented Jul 4, 2017

Opdoop commented Jul 4, 2017

philipperemy commented Jul 4, 2017 •

edited

Loading

Opdoop commented Jul 4, 2017 •

edited

Loading

philipperemy commented Jul 4, 2017

philipperemy commented Jul 4, 2017

use attention_3d_block in many to many mapping #7

use attention_3d_block in many to many mapping #7

Comments

Opdoop commented Jul 4, 2017 • edited Loading

Opdoop commented Jul 4, 2017

philipperemy commented Jul 4, 2017

Opdoop commented Jul 4, 2017

philipperemy commented Jul 4, 2017 • edited Loading

Opdoop commented Jul 4, 2017 • edited Loading

philipperemy commented Jul 4, 2017

philipperemy commented Jul 4, 2017

Opdoop commented Jul 4, 2017 •

edited

Loading

philipperemy commented Jul 4, 2017 •

edited

Loading

Opdoop commented Jul 4, 2017 •

edited

Loading