Why Self-attention #28

eubinecto · 2020-09-22T04:21:10Z

이 섹션에서 저자는,

다음과 같이 정의된 문제에 대하여:

n개의 입력 벡터 (x1-xn) -> n개의 출력 벡터(z1- zn) 으로 end-to-end mapping을 하는 transduction problem.
where the dimension of each vector is d (e.g. 512)

다음의 3가지 항목을 기준으로:

complexity per layer
sequential operations
maximum path length

RNN, CNN과 self-attention 메커니즘을 비교하고 있습니다.

다음은 그런 비교를 요약해놓은 테이블.

table 1

각 항목 별로, "why attention"인지를 설명할 수 있다면 이 부분은 제대로 이해했다고 볼 수 있을 것 같네요.

The text was updated successfully, but these errors were encountered:

eubinecto · 2020-09-24T10:03:21Z

Complexity per layer

rnn - 왜 d2? : encoder, decoder, 두 개를 다 학습해야하기 때문인가?
cnn 그렇다면 얘는 왜 d2?

why self attention?

n2*d 와 n *d가 만나는 접점은, n=d.
n< d 일때는, n2d가 nd보다 작다.
즉, 입력 문장의 길이가 입력 벡터의 차원 크기보다 작다면, self-attention은 RNN, CNN보다 더 빠르게 작동한다.
SOTA representation에서 출력하는 임베딩 벡터의 차원크기는 512
하지만 현실적으로, 대부분의 입력 문장의 단어의 개수는 그렇게 많지 않다. (한 문장에 512개의 단어를 쓰라고 하면 못할듯)
때문에, 복잡도가 n에 대하여 qudratic 하게 증가함에도 불구하고, 대부분의 경우 self-attention이 RNN, CNN보다 더 빠르다고 생각할 수 있다.

eubinecto · 2020-09-24T10:03:34Z

Sequential operations

why self attention?

eubinecto · 2020-09-24T10:03:47Z

Maximum path length

why self attention?

eubinecto · 2020-09-24T10:26:27Z

Bonus - interpretable models

figure 3

eubinecto added the attention is all you need label Sep 22, 2020

eubinecto mentioned this issue Sep 26, 2020

공통 질문 & 느낀점 정리해보기 #36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Self-attention #28

Why Self-attention #28

eubinecto commented Sep 22, 2020 •

edited

Loading

eubinecto commented Sep 24, 2020 •

edited

Loading

eubinecto commented Sep 24, 2020 •

edited

Loading

eubinecto commented Sep 24, 2020 •

edited

Loading

eubinecto commented Sep 24, 2020 •

edited

Loading

Why Self-attention #28

Why Self-attention #28

Comments

eubinecto commented Sep 22, 2020 • edited Loading

eubinecto commented Sep 24, 2020 • edited Loading

Complexity per layer

why self attention?

eubinecto commented Sep 24, 2020 • edited Loading

Sequential operations

why self attention?

eubinecto commented Sep 24, 2020 • edited Loading

Maximum path length

why self attention?

eubinecto commented Sep 24, 2020 • edited Loading

Bonus - interpretable models

eubinecto commented Sep 22, 2020 •

edited

Loading

eubinecto commented Sep 24, 2020 •

edited

Loading

eubinecto commented Sep 24, 2020 •

edited

Loading

eubinecto commented Sep 24, 2020 •

edited

Loading

eubinecto commented Sep 24, 2020 •

edited

Loading