Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Doubt In LocationSensitiveAttention #50

Closed
StevenZYj opened this issue May 23, 2018 · 3 comments
Closed

A Doubt In LocationSensitiveAttention #50

StevenZYj opened this issue May 23, 2018 · 3 comments

Comments

@StevenZYj
Copy link

Hi Rayhane, I'm currently looking into your LocationSensitiveAttention class and don't know the reason of using cumulate_weights when calculating the next state. I can't find any reference in the original essay.

By the way, your work is fantastic :D Appreciate it a lot.

@begeekmyfriend
Copy link
Contributor

It helps convergence. You can see the alignment improved when adding cumulative attention states on the 4th floor of this issue keithito/tacotron#170 (comment). I have made series of ablation studies to confirm it.

@Rayhane-mamah
Copy link
Owner

Hello @StevenZYj, thanks for reaching out!

As stated by @begeekmyfriend, attention weights cumulation is a must to get proper alignments. This was actually stated in the paper as well:
"We use the location-sensitive attention from [21], which extends the additive attention mechanism [22] to use cumulative attention weights from previous decoder time steps as an additional feature. This encourages the model to move forward consistently through the input, mitigating potential failure modes where some subsequences are repeated or ignored by the decoder."

While the entire attention mechanism was just referenced in few words in the paper, by using the references they provide we managed to get a sense of what they are talking about. I believe what confirms this cumulation approach is the fact that when doing ablation studies (By the way great work @begeekmyfriend you never cease to impress!), we found out that when we don't use weights cumulation, the decoder tends to repeat of ignore some subsequences.

@StevenZYj
Copy link
Author

@begeekmyfriend @Rayhane-mamah Thanks a lot! It's my bad that the point is actually in the tacotron 2 paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants