-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why Self-attention #28
Labels
Comments
Complexity per layer
why self attention?
|
Sequential operationswhy self attention? |
Maximum path lengthwhy self attention? |
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
이 섹션에서 저자는,
다음과 같이 정의된 문제에 대하여:
x1
-xn
) -> n개의 출력 벡터(z1
-zn
) 으로 end-to-end mapping을 하는 transduction problem.다음의 3가지 항목을 기준으로:
RNN, CNN과 self-attention 메커니즘을 비교하고 있습니다.
다음은 그런 비교를 요약해놓은 테이블.
각 항목 별로, "why attention"인지를 설명할 수 있다면 이 부분은 제대로 이해했다고 볼 수 있을 것 같네요.
The text was updated successfully, but these errors were encountered: