Implementation of the paper : Not all attention is needed - Gated Attention Network for Sequence Data (GA-Net)
There are two networks in the model:
- Backbone Network
- Auxiliary Network
Soft Attention gives some attention (low or high) to all the input tokens whereas gated attention network chooses the most important tokens to attend.
Visualization of probability for gate to be open for input token and the actual gated attention weight.