# 키워드 추출 및 요약 방법

### ■ TextRank 간단 구현
#### ※ Gensim package
- 언어의 의미와 유사도를 고려하여 언어를 벡터로 매핑하는 방식을 사용하는 패키지 
- 토픽 모델링, 문서 임베딩, 단어 임베딩, 유사도 측정 등 다양한 자연어 처리 작업을 지원

In [2]:
# 최신버전에서 gensim의 summarization 메서드가 삭제됐다. 고로 3.X 의 구버전 install 필요
# pip3 install gensim==3.6.0
from gensim.summarization.summarizer import summarize

In [12]:
text = "Aromas are very mild wheat beer with a little fruit and mineral. Flavors are much more pronounced, again wheat beer with peaches and apricot coming through. Kind of a riff on a Belgian wit. Slightly soft carbonation. Okay beer but nothing special."

In [13]:
print(summarize(text, ratio=0.2))

Flavors are much more pronounced, again wheat beer with peaches and apricot coming through.


### ■ Text Rank Algorithm for Extractive Text Summarization

In [24]:
import spacy
import pytextrank

In [18]:
# If this is your first time using spaCy, you'll need to import the required spaCy models. In this case, you use the en_core_web_lg model, which you can install by running:
# !python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.6.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.6.0/en_core_web_lg-3.6.0-py3-none-any.whl (587.7 MB)
     ---------------------------------------- 0.0/587.7 MB ? eta -:--:--
     ---------------------------------------- 0.1/587.7 MB 1.1 MB/s eta 0:09:00
     ---------------------------------------- 0.1/587.7 MB 1.2 MB/s eta 0:08:11
     ---------------------------------------- 0.2/587.7 MB 1.3 MB/s eta 0:07:29
     ---------------------------------------- 0.2/587.7 MB 1.2 MB/s eta 0:07:55
     ---------------------------------------- 0.3/587.7 MB 1.1 MB/s eta 0:08:39
     ---------------------------------------- 0.3/587.7 MB 1.2 MB/s eta 0:08:26
     ---------------------------------------- 0.4/587.7 MB 1.2 MB/s eta 0:08:17
     ---------------------------------------- 0.5/587.7 MB 1.3 MB/s eta 0:07:48
     ---------------------------------------- 0.6/587.7 MB 1.4 MB/s eta 0:06:51
     -------------------------

In [25]:
# Create spaCy pipeline and add textrank to it

nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("textrank")

<pytextrank.base.BaseTextRankFactory at 0x1f82f86feb0>

In [34]:
example_text = "this is like budlight but with slight corn taste not apple way too much carbonation hops awol malt nowhere to be seen i wonder what they really make this out of"

In [35]:
doc = nlp(example_text)

In [36]:
for sent in doc._.textrank.summary(limit_phrases=2, limit_sentences=2):
      print(sent)

this is like budlight but with slight corn taste not apple way too much carbonation hops awol malt nowhere to be seen i wonder what they really make this out of


In [37]:
phrases_and_ranks = [ 
    (phrase.chunks[0], phrase.rank) for phrase in doc._.phrases
]
phrases_and_ranks[:10]

[(apple way too much carbonation hops awol malt, 0.17047999894872246),
 (slight corn taste, 0.14239390991137643),
 (budlight, 0.0732112854323275),
 (i, 0.0),
 (they, 0.0),
 (this, 0.0),
 (what, 0.0)]

### 😑결론
> TextRank는 간단하게 구현할 수 있지만 그만큼 만족스러운 키워드 추출 모델은 될 수 없다.  
맥주의 맛이나 특징을 보다 간결하게 2~3단어로 표현하지 못하다.  
간결하게 표현이 되도 표현 자체의 의미 전달이 명확하지 않다.