## <a id='toc1_1_'></a>[언어모델 평가방법](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [언어모델 평가방법](#toc1_1_)    
    - [PPL(perplexity)](#toc1_1_1_)    
    - [보정된 유니그램 정밀도(Modified Unigram Precision)](#toc1_1_2_)    
    - [BLUE(Bilingual Evaluation Understudy)](#toc1_1_3_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

### <a id='toc1_1_1_'></a>[PPL(perplexity)](#toc0_)
- 불확실성을 수치화
- 예측대상의 확률분포가 비슷하다면, 불확실성이 높기 때문에 성능이 낮다고 간주 

### <a id='toc1_1_2_'></a>[보정된 유니그램 정밀도(Modified Unigram Precision)](#toc0_)
- 기계번역을 예측값, 사람번역을 실제값이라고 할 때, 
- 예측값에서 각 단어가 사용된 횟수보다 실제값에서 사용된 횟수가 더 유의미하다고 간주하여
- 이 두값을 스케일링
- $Count_{clip}\ =\ min(Count,\ Max)$
- $\text{Modified Unigram Precision =}\frac{\sum_{unigram∈Candidate}\ Count_{clip}(unigram)}
{\sum_{unigram∈Candidate}\ Count(unigram)}$

In [None]:
import numpy as np
from collections import Counter
from nltk import ngrams

In [None]:
# 문장에서 n-gram 카운트
# 예측값(번역값)에서 각 단어가 사용된 횟수수
def simple_count(tokens,n):
    return Counter(ngrams(tokens,n)) 

In [8]:
candidate = "It is a guide to action which ensures that the military always obeys the commands of the party."
tokens = candidate.split() # 토큰화
result = simple_count(tokens, 1) # n = 1은 유니그램
print('유니그램 카운트 :',result)


유니그램 카운트 : Counter({('the',): 3, ('It',): 1, ('is',): 1, ('a',): 1, ('guide',): 1, ('to',): 1, ('action',): 1, ('which',): 1, ('ensures',): 1, ('that',): 1, ('military',): 1, ('always',): 1, ('obeys',): 1, ('commands',): 1, ('of',): 1, ('party.',): 1})


In [9]:
candidate = 'the the the the the the the'
tokens = candidate.split() # 토큰화
result = simple_count(tokens, 1) # n = 1은 유니그램
print('유니그램 카운트 :',result)

유니그램 카운트 : Counter({('the',): 7})


In [None]:
# 예측값의 각 단어가 실제값(사람번역)에 사용된 횟수
# 즉, 단어의 실효성을 카운트하여 추출

def count_clip(candiate, reference_list,n):
    # Ca 문장에서 n-gram 카운트
    ca_cnt = simple_count(candiate, n)
    max_ref_cnt_dict = dict()

    
    for ref in reference_list:
        # ref 문장에서 n-gram 카운트
        ref_cnt = simple_count(ref,n) # ref_cnt : Counter({('the',): 2, ('cat',): 1, ('is',): 1, ('on',): 1, ('mat',): 1})
        print(ref_cnt)

        # ref 문장에서 n-gram의 최대 등장 횟수 계산
        for n_gram in ref_cnt: 
            if n_gram in max_ref_cnt_dict: # n_gram : ('the',)
                print(n_gram)
                max_ref_cnt_dict[n_gram] = max(ref_cnt[n_gram], max_ref_cnt_dict[n_gram])
            else:
                max_ref_cnt_dict[n_gram] = ref_cnt[n_gram]

    return {
        # count_clip = min(count, max_ref_count)
        # ca_cnt.get(n_gram,0) : ca_cnt 에서 n_gram값을 출력하되, 없으면 0을 반환환
        n_gram: min(ca_cnt.get(n_gram, 0), max_ref_cnt_dict.get(n_gram, 0)) for n_gram in ca_cnt
     }



In [30]:
a=Counter({('the',): 2, ('cat',): 1, ('is',): 1, ('on',): 1, ('mat',): 1})
for w in a:
    word = w
a.get('the',0)

0

In [None]:
# 결과 확인
# 예측값에서 'the'는 7번 사용되었으나
# 실제값에서는 최대 2번 사용되었으므로 이 값을 유의미한 횟수로 간주한다
#  
candidate = 'the the the the the the the'
references = [
    'the cat is on the mat',
    'there is a cat on the mat'
]
result = count_clip(candidate.split(),list(map(lambda ref: ref.split(), references)),1)
print('보정된 유니그램 카운트 :',result)

# map() 함수는 iterable(반복 가능한 객체)의 각 요소에 function을 적용하여 새로운 map 객체를 생성


Counter({('the',): 2, ('cat',): 1, ('is',): 1, ('on',): 1, ('mat',): 1})
Counter({('there',): 1, ('is',): 1, ('a',): 1, ('cat',): 1, ('on',): 1, ('the',): 1, ('mat',): 1})
('is',)
('cat',)
('on',)
('the',)
('mat',)
보정된 유니그램 카운트 : {('the',): 2}


In [31]:
def modified_precision(candidate, reference_list, n):
  clip_cnt = count_clip(candidate, reference_list, n) 
  total_clip_cnt = sum(clip_cnt.values()) # 분자

  cnt = simple_count(candidate, n)
  total_cnt = sum(cnt.values()) # 분모

  # 분모가 0이 되는 것을 방지
  if total_cnt == 0: 
    total_cnt = 1

  # 분자 : count_clip의 합, 분모 : 단순 count의 합 ==> 보정된 정밀도
  return (total_clip_cnt / total_cnt)


In [None]:
result = modified_precision(candidate.split(), list(map(lambda ref: ref.split(), references)), n=1)
print('보정된 유니그램 정밀도 :',result)

Counter({('the',): 2, ('cat',): 1, ('is',): 1, ('on',): 1, ('mat',): 1})
Counter({('there',): 1, ('is',): 1, ('a',): 1, ('cat',): 1, ('on',): 1, ('the',): 1, ('mat',): 1})
('is',)
('cat',)
('on',)
('the',)
('mat',)
보정된 유니그램 정밀도 : 0.2857142857142857


### <a id='toc1_1_3_'></a>[BLUE(Bilingual Evaluation Understudy)](#toc0_)
- 순서정보를 고려하기 위해 유니그램(p1)에서 n-gram(pn)으로 확장
- $p_{n}=\frac{\sum_{n\text{-}gram∈Candidate}\ Count_{clip}(n\text{-}gram)}
{\sum_{n\text{-}gram∈Candidate}\ Count(n\text{-}gram)}$
- $BLEU = exp(\sum_{n=1}^{N}w_{n}\ \text{log}\ p_{n})$<br><br>




- 예측값의 문장길이가 긴 경우, n-gram 순서정보로 인해 패널티가 적용됨
- 반면, 예측값의 문장길이가 짧은 경우, 높은 점수를 얻을 수 있기 때문에 별도의 패널티 부여 필요
- 이를 브레버티 패널티(Brevity Penalty) 라고 한다


- $BLEU = BP × exp(\sum_{n=1}^{N}w_{n}\ \text{log}\ p_{n})$
- $BP = \begin{cases}1&\text{if}\space c>r\\ e^{(1-r/c)}&\text{if}\space c \leq r \end{cases}$

In [None]:
# Ca 길이와 가장 근접한 Ref의 길이를 리턴하는 함수
def closest_ref_length(candidate, reference_list):
  ca_len = len(candidate) # ca 길이
  ref_lens = (len(ref) for ref in reference_list) # Ref들의 길이
  # 길이 차이를 최소화하는 Ref를 찾아서 Ref의 길이를 리턴
  closest_ref_len = min(ref_lens, key=lambda ref_len: (abs(ref_len - ca_len), ref_len))
  return closest_ref_len

In [34]:
def brevity_penalty(candidate, reference_list):
  ca_len = len(candidate)
  ref_len = closest_ref_length(candidate, reference_list)

  if ca_len > ref_len:
    return 1

  # candidate가 비어있다면 BP = 0 → BLEU = 0.0
  elif ca_len == 0 :
    return 0
  else:
    return np.exp(1 - ref_len/ca_len)


In [35]:
def bleu_score(candidate, reference_list, weights=[0.25, 0.25, 0.25, 0.25]):
  bp = brevity_penalty(candidate, reference_list) # 브레버티 패널티, BP

  p_n = [modified_precision(candidate, reference_list, n=n) for n, _ in enumerate(weights,start=1)] 
  # p1, p2, p3, ..., pn
  score = np.sum([w_i * np.log(p_i) if p_i != 0 else 0 for w_i, p_i in zip(weights, p_n)])
  return bp * np.exp(score)


In [36]:
import nltk.translate.bleu_score as bleu

candidate = 'It is a guide to action which ensures that the military always obeys the commands of the party'
references = [
    'It is a guide to action that ensures that the military will forever heed Party commands',
    'It is the guiding principle which guarantees the military forces always being under the command of the Party',
    'It is the practical guide for the army always to heed the directions of the party'
]

print('실습 코드의 BLEU :',bleu_score(candidate.split(),list(map(lambda ref: ref.split(), references))))
print('패키지 NLTK의 BLEU :',bleu.sentence_bleu(list(map(lambda ref: ref.split(), references)),candidate.split()))


Counter({('that',): 2, ('It',): 1, ('is',): 1, ('a',): 1, ('guide',): 1, ('to',): 1, ('action',): 1, ('ensures',): 1, ('the',): 1, ('military',): 1, ('will',): 1, ('forever',): 1, ('heed',): 1, ('Party',): 1, ('commands',): 1})
Counter({('the',): 4, ('It',): 1, ('is',): 1, ('guiding',): 1, ('principle',): 1, ('which',): 1, ('guarantees',): 1, ('military',): 1, ('forces',): 1, ('always',): 1, ('being',): 1, ('under',): 1, ('command',): 1, ('of',): 1, ('Party',): 1})
('It',)
('is',)
('the',)
('military',)
('Party',)
Counter({('the',): 4, ('It',): 1, ('is',): 1, ('practical',): 1, ('guide',): 1, ('for',): 1, ('army',): 1, ('always',): 1, ('to',): 1, ('heed',): 1, ('directions',): 1, ('of',): 1, ('party',): 1})
('It',)
('is',)
('the',)
('guide',)
('always',)
('to',)
('heed',)
('of',)
Counter({('It', 'is'): 1, ('is', 'a'): 1, ('a', 'guide'): 1, ('guide', 'to'): 1, ('to', 'action'): 1, ('action', 'that'): 1, ('that', 'ensures'): 1, ('ensures', 'that'): 1, ('that', 'the'): 1, ('the', 'militar