Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Attention #1

Open
SRDdev opened this issue Mar 8, 2024 · 0 comments
Open

Question about Attention #1

SRDdev opened this issue Mar 8, 2024 · 0 comments

Comments

@SRDdev
Copy link

SRDdev commented Mar 8, 2024

I some basic questions as a student.
I have implemented Transformers multiple times but still learning new things about them. So these are the questions

As seen in the attention maps , only a few values contribute to the final output the maximum.
image

  1. Isn't this like a long tail distribution where only a few values have very high values and the rest are very low ?
  2. If that is the case then , can we remove some parts on the input query randomly (say 25%) and still achive same results ?

Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant