why Permute before attention dense layer in attention_3d_block? #24

guiyuliu · 2018-08-27T02:29:45Z

    a = Permute((2, 1))(inputs)
    a = Dense(TIME_STEPS, activation='softmax')(a)

this line ,why do you permute times_tep and input_dim
what if I don't permute , and followed by a dense layer with input_dim ? since dense layer is with the shape of "time_Step *time_step" ,what is the difference when I change it to "input_dim * input_dim"
Dense(input_Dim activation='softmax')(a)

The text was updated successfully, but these errors were encountered:

Dongzhixiao · 2018-09-15T14:05:23Z

I think the attention mechanism is used in time step dimension. So the parameters to be trained are the weights to multiply by the time steps.

philipperemy · 2020-02-03T10:30:34Z

The attention block has been updated. It's a much better version than the old one. This issue is now deprecated.

philipperemy closed this as completed Feb 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why Permute before attention dense layer in attention_3d_block? #24

why Permute before attention dense layer in attention_3d_block? #24

guiyuliu commented Aug 27, 2018 •

edited

Loading

Dongzhixiao commented Sep 15, 2018

philipperemy commented Feb 3, 2020

why Permute before attention dense layer in attention_3d_block? #24

why Permute before attention dense layer in attention_3d_block? #24

Comments

guiyuliu commented Aug 27, 2018 • edited Loading

Dongzhixiao commented Sep 15, 2018

philipperemy commented Feb 3, 2020

guiyuliu commented Aug 27, 2018 •

edited

Loading