Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use attention_3d_block in many to many mapping #7

Closed
Opdoop opened this issue Jul 4, 2017 · 7 comments
Closed

use attention_3d_block in many to many mapping #7

Opdoop opened this issue Jul 4, 2017 · 7 comments
Labels

Comments

@Opdoop
Copy link

Opdoop commented Jul 4, 2017

Hi, I'm beginner of Keras and tring to use attention_3d_block in translation module.
I have input of 5 sentences, each sentences has padding to 6 words, each word is presented in 620 dim(as embedding dim).
And the output is 5 sentences, sentences padding to 9 words, and word is presented in 1-of-k in 30 dim(as vocabulary size)
How to use attention_3d_block in this scenario as the LSTM is many to many?

s b3v8 0fr ex 3 he0wk

@Opdoop
Copy link
Author

Opdoop commented Jul 4, 2017

I add a TimeDistribute(Dense()) to fit input_dim, but how to fit the time_step.
As the module structure is below:

ypdtu 2_ gcn eq 45u 8

Get Error when run
ValueError: Error when checking target: expected activation_1 to have shape (None, 6, 30) but got array with shape (5, 9, 30)

@philipperemy
Copy link
Owner

@Opdoop can you please share the minimal code to reproduce your error?

@Opdoop
Copy link
Author

Opdoop commented Jul 4, 2017

@philipperemy
Data prepare is :

    input_text = ['中国 的 首都 是 北京'
                  , '日本 的 首都 是 东京'
                  , '美国 的 首都 是 华盛顿'
                  , '英国 的 首都 是 伦敦'
                  , '德国 的 首都 是 柏林']
    
    tar_text = ['Beijing is the capital of China'
                , 'Tokyo is the capital of Japan'
                , 'Washington is the capital of the United States'
                , 'London is the capital of England'
                , 'Berlin is the capital of Germany']
                
    input_list = []
    tar_list = []
    END = ' EOS'
    for tmp_input in input_text:
        tmp_input = tmp_input+END
        input_list.append(tokenize(tmp_input))
    for tmp_tar in tar_text:
        tmp_tar = tmp_tar+END
        tar_list.append(tokenize(tmp_tar))
    vocab = sorted(reduce(lambda x, y: x | y, (set(tmp_list) for tmp_list in input_list + tar_list)))

    vocab_size = len(vocab) + 1  # keras embedding need len(vocab)+1
    input_maxlen = max(map(len, (x for x in input_list)))
    tar_maxlen = max(map(len, (x for x in tar_list)))
    output_dim = vocab_size
    hidden_dim = 1000
    INPUT_DIM = hidden_dim #senten length
    TIME_STEPS = input_maxlen #

    word_to_idx = dict((c, i + 1) for i, c in enumerate(vocab))  
    idx_to_word = dict((i + 1, c) for i, c in enumerate(vocab))  

    inputs_train, tars_train = vectorize_stories(input_list, tar_list, word_to_idx, input_maxlen, tar_maxlen, vocab_size)
    #inputs_trains shape is (5,6) and tars_tratin shape is (5,9,30)

Module is :

def model_attention_applied_before_lstm():
    inputs = Input(shape=(input_maxlen,))
    embed = Embedding(input_dim=vocab_size,
                              output_dim=620,
                              input_length=input_maxlen)(inputs)
    attention_mul = LSTM(hidden_dim, return_sequences=True)(embed)    
    attention_mul = attention_3d_block(attention_mul)
    attention_mul = LSTM(500, return_sequences=True)(attention_mul)
    output = TimeDistributed(Dense(output_dim, activation='sigmoid'),input_shape=(tar_maxlen, output_dim))(attention_mul)
    #output = TimeDistributed(Dense(output_dim))(output)
    #output = Dense(output_dim, activation='sigmoid')(attention_mul)
    #output = RepeatVector(tar_maxlen)(output)
    #output = Permute((tar_maxlen,output_dim),name='reshapeLayer')(output)
    output = Activation('softmax')(output)
    model = Model(input=[inputs], output=output)
    return model

And run the model as:

    m = model_attention_applied_before_lstm()
    m.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    print(m.summary())

    m.fit(inputs_train, tars_train, epochs=10, batch_size=3, validation_split=0.1)

@philipperemy
Copy link
Owner

philipperemy commented Jul 4, 2017

In that case, what you need to have is a sequence to sequence attention. This project does not support this.

@Opdoop
Copy link
Author

Opdoop commented Jul 4, 2017

Ooooooooh. Thanks a lot.
Do you know any sequence to sequence attention implement using keras?

@philipperemy
Copy link
Owner

I'm not sure about this. This project:

is the most famous seq2seq in Keras. Maybe there's an attention mechanism there. To be checked

@philipperemy
Copy link
Owner

The Seq2seq framework includes a ready made attention model which does the same.

Yes it does have one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants