Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jargon / glossary 의미 정리 #20

Open
13 of 23 tasks
eubinecto opened this issue Sep 20, 2020 · 13 comments
Open
13 of 23 tasks

Jargon / glossary 의미 정리 #20

eubinecto opened this issue Sep 20, 2020 · 13 comments

Comments

@eubinecto
Copy link
Owner

eubinecto commented Sep 20, 2020

jargon 리스트

  • transduction model
  • constituency parsing
  • sinusoids
  • Positional Encodings
  • BLEU
  • WMT
  • factorization tricks
  • desiderata
  • transformer
  • Adam Optimizer
  • Scaled Dot-Product Attention
  • Multi-Head Attention
  • Residual Connection
  • auto-regressive
  • compatibility function
  • language modeling
  • conditional computation
  • Separable convolutions
  • Contiguous kernels
  • Residual Dropout
  • Label Smoothing
  • Beam search
  • WSJ

Framework

  • 정의, 예시 문장
  • 논문의 내용과의 연결
@eubinecto
Copy link
Owner Author

eubinecto commented Sep 20, 2020

transduction problems / transduction model / transductive model (machine learning)

통계학에서는 다음의 인용구가 유명한가 봄.

"When solving a problem of interest, do not solve a more general problem as an intermediate step. Try to get the answer that you really need but not a more general one." -- (Vladimir Vapnik, 1990)

어떤 문제를 풀기 위해 또 다른 일반적인 문제를 풀려고 하지말고, 궁극적으로 풀어야 하는 문제에 대한 답을 얻기 위해 노력하라는 뜻인데.

이게 머신러닝에서는 무슨 뜻일까? End-to-End learning과 유사한 의미로 받아들일 수 있을 것 같다.

Many machine learning tasks can be expressed as the transformation, or transduction, of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. -- (Graves, 2012)

machine translation은 transduction problem이다.

  • transductive model: <kor_seq> -> hidden -> <eng_seq>
    • intermediate step을 건너뛴다는 점에서, "transitive" 한 접근법
    • 풀고자하는 문제만을 풀 수 있다.
  • inductive model: <kor_seq> -> 한국어와 영어는 어순이 반대다, 문장의 구가 의미를 담고 있는 최소단위이다, 등의 rule을 세운다 -> <eng_seq>
    • 목표는 한국어를 영어로 번역하는 것이지만, 그 과정 속에서 더 일반적인 문제를 풀어야 한다.

transductive learning 의 예시

  • spelling correction
  • machine translation
  • speech recognition
  • text-to-speech
  • language modelling

inductive learning의 예시:

  • phrase-based translation (번역을 하기위해 먼저 규칙을 정의한다. 번역이라는 문제를 풀기 위해 "더 큰" 문제를 풀어야 한다.)

논문의 내용과의 연결

transformer의 이름과 연관지을 수 있다고 생각.

transformer는 transduction 문제를 보다 더 잘 풀기 위해서 고안된 모델인데, transduction 문제가 하나의 입력 시퀀스를 또 다른 형태의 출력 시퀀스로 "transform"하는 문제이므로, 그런 의미에서 "transformer"라고 이름을 지은 것 같다.

reference

@eubinecto eubinecto self-assigned this Sep 21, 2020
@eubinecto
Copy link
Owner Author

eubinecto commented Sep 21, 2020

Constituency parsing

의미

Two parse trees for an ambiguous sentence. The parse on the left corresponds to the humorous reading in which the elephant is in the pajamas, the parse on the right corresponds to the reading in which Captain Spaulding did the shooting in his pajamas.
image

온라인 데모

논문의 내용과의 연결

reference

@eubinecto eubinecto changed the title jargon / glossary 의미 정리 Jargon / glossary 의미 정리 Sep 21, 2020
@eubinecto
Copy link
Owner Author

eubinecto commented Sep 21, 2020

Sinusoids

사인함수와 코사인 함수를 모두 함께 일컫는 말.

두 함수의 개형은 결국 동일하므로, 그런 의미에서 둘을 모두 일컫는 단어가 필요한데, 그럴 때 쓰는 게 sinusoids.

즉, "a sinusoid"라는 말은 "sine 함수 or cosine 함수"라는 뜻이라고 생각하면된다.

형용사로 쓰일 경우에는 sinusoidal이라고 부른다.

논문에서

That is, each dimension of the positional encoding corresponds to a sinusoid. (pg. 6)

We chose the sinusoidal version because it may allow the model to extrapolate to sequence lengths longer than the ones encountered during training. (pg.6)

positional embedding instead of sinusoids. (pg. 9, table 3)

In row (E) we replace our sinusoidal positional encoding with learned positional embeddings [9], and observe nearly identical results to the base model. (pg. 9. table 3)

references

@eubinecto
Copy link
Owner Author

eubinecto commented Sep 21, 2020

Positional Encodings

의미

  • sinusoidal positional encoding?

related: positional embeddings.

논문에서

To this end, we add "positional encodings" to the input embeddings at the bottoms of the encoder and decoder stacks. The positional encodings have the same dimension dmodel as the embeddings, so that the two can be summed. There are many choices of positional encodings, learned and fixed [9]. (pg. 6 > 3.5 Positional Encoding > 3. Model Architecture)

That is, each dimension of the positional encoding corresponds to a sinusoid. (pg. 6 > 3.5 Positional Encoding > 3. Model Architecture)

We apply dropout [33] to the output of each sub-layer, before it is added to the sub-layer input and normalized. In addition, we apply dropout to the sums of the embeddings and the positional encodings in both the encoder and decoder stacks. For the base model, we use a rate of Pdrop = 0.1. (pg. 8 > Residual Dropout > 5.4 Regulaisation > 5. Training)

In Table 3 rows (B), we observe that reducing the attention key size dk hurts model quality. This suggests that determining compatibility is not easy and that a more sophisticated compatibility function than dot product may be beneficial. We further observe in rows (C) and (D) that, as expected, bigger models are better, and dropout is very helpful in avoiding over-fitting. In row (E) we replace our sinusoidal positional encoding with learned positional embeddings [9], and observe nearly identical results to the base model. (pg.9 > 6.2 Model variations > 6. Results)

references

@eubinecto
Copy link
Owner Author

eubinecto commented Sep 22, 2020

Residual Connection

ResNet 공부할 때 배웠던 것과 연결되는 부분.

@eubinecto
Copy link
Owner Author

Compatible functions

실제 함수 대신에 사용할 수 있는 함수.
e.g. Taylor series.
e.g. a linear regression model

compatible numbers가 어떤 숫자를 일컫는지 이해하면 compatible functions는 바로 무엇을 뜻하는지 이해할 수 있음.

references

@teang1995
Copy link
Collaborator

teang1995 commented Sep 24, 2020

BLEU

번역 모델에 주로 사용되는 성능 평가 지표.
잘 정리된 링크가 있어 공유합니다.
https://donghwa-kim.github.io/BLEU.html

@teang1995
Copy link
Collaborator

Factorization Trick

https://arxiv.org/pdf/1703.10722.pdf
해당 논문에서 나온 개념인데 좀 더 파악해서 정리하도록 하겠습니다

@teang1995 teang1995 reopened this Sep 24, 2020
@teang1995
Copy link
Collaborator

WMT

번역 모델을 위한 데이터셋.

@eubinecto
Copy link
Owner Author

eubinecto commented Sep 24, 2020

Desiderata

어떠한 것을 이루기 위해 충족해야 하는 조건.

synonyms: requirements

논문에서

  • Why self-attention

references

@teang1995
Copy link
Collaborator

Transformer

이 논문에서 제시한 모델의 이름.

@teang1995
Copy link
Collaborator

Adam Optimizer

그 Adam 맞습니다.

@teang1995
Copy link
Collaborator

Scaled Dot-Product Attention, Multi-Head Attention

따로 설명.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants