The visualization of decoder attention_weight #30

sally1913105 · 2021-04-27T11:26:13Z

I want to visualization attention_weight of decoder moudle, I take the output of multihead_attn in the last layer of decoder ,but the shape is that(bs,360,36hw) where h*w is the shape of feature map, I don't understand that there are 36 different attention_weight with the same instances of the same frame as the picture show
Can you explain what this means

Epiphqny · 2021-04-28T03:45:46Z

Hi @sally1913105, we compute the spatial and temporal attention, then for 36 frames sequence there are 36 attention weights for each prediction, even the prediction is for a specific frame. In this way, the features from other frames could help the segmentation of this frame.

sally1913105 · 2021-04-28T04:08:42Z

Thank you for your answer! Can I think of it as within the 36 attention weights of ith prediction only ith attention weights is for ith features and others attention weights is for other features？ but How to combine these 36 attention weights？

Epiphqny · 2021-05-10T11:51:09Z

hi @sally1913105, for each prediction we only use the attention weights of the corresponding frame in this stage. The weights do not need to be combined. Interaction with other frames is realized by the following 3D convolutions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The visualization of decoder attention_weight #30

The visualization of decoder attention_weight #30

sally1913105 commented Apr 27, 2021

Epiphqny commented Apr 28, 2021

sally1913105 commented Apr 28, 2021

Epiphqny commented May 10, 2021

The visualization of decoder attention_weight #30

The visualization of decoder attention_weight #30

Comments

sally1913105 commented Apr 27, 2021

Epiphqny commented Apr 28, 2021

sally1913105 commented Apr 28, 2021

Epiphqny commented May 10, 2021