# `textblob` - 객체지향 NLP 라이브러리

`textblob`은 `NLTK`를 기반으로 하여 텍스트 처리를 수월하게 할 수 있도록 다양한 기능을 많이 포함하고 있다.
[textblob](https://textblob.readthedocs.io/en/latest/) 웹사이틀 통해서 소개에 나와 있듯이 "Simplified Text Processing"을 모토로 TextBlob 객체를 생성시키면 주요 메쏘드를 통해서 텍스트 처리 작업이 단순해 진다.

- Noun phrase extraction
- Part-of-speech tagging
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- Add new models or languages through extensions
- WordNet integration


# 설치

`textblob` 라이브러리를 사용하려면 우선 라이브러리를 먼저 설치하고, TextBlob에서 사용되는 NLTK 말뭉치(corpora)도 설치해야 된다.

`conda`를 사용해서 다음 명령어로 `textblob` 라이브러리를 설치할 수 있다.

`$ conda install -c conda-forge textblob`

NLTK 말뭉치(corpora)도 다음 명령어를 사용해서 설치한다.

- Brown Corpus: 품사 태깅(Part-of-speech Tagging)
- Punkt: 영문 문장 토큰화
- WordNet: 단어 정의, 유사어(synonyms)와 반의어(antonyms)
- Averaged Perceptron Tagger: 품사 태깅(Part-of-speech Tagging)
- conll2000: 텍스트를 명사, 동사 등으로 컴포넌트화.
- Movie Reviews: 감성분석

`$ ipython -m textblob.download_corpora`

> `$ ipython -m textblob.download_corpora
> [nltk_data] Downloading package brown to
> [nltk_data]   Package brown is already up-to-date!
> [nltk_data] Downloading package punkt to
> [nltk_data]   Package punkt is already up-to-date!
> [nltk_data] Downloading package wordnet to
> [nltk_data]   Package wordnet is already up-to-date!
> [nltk_data] Downloading package averaged_perceptron_tagger to
> [nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.
> [nltk_data] Downloading package conll2000 to
> [nltk_data]   Package conll2000 is already up-to-date!
> [nltk_data] Downloading package movie_reviews to
> [nltk_data]   Package movie_reviews is already up-to-date!
> Finished.`

# textblob 헬로월드

In [13]:
from textblob import TextBlob
import pandas as pd

text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''

blob = TextBlob(text)

for sentence in blob.sentences:
    print(f"- 감성점수 {sentence.sentiment.polarity} : {sentence}")

- 감성점수 0.06000000000000001 : 
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
- 감성점수 -0.34166666666666673 : Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.


In [8]:
blob.words

WordList(['The', 'titular', 'threat', 'of', 'The', 'Blob', 'has', 'always', 'struck', 'me', 'as', 'the', 'ultimate', 'movie', 'monster', 'an', 'insatiably', 'hungry', 'amoeba-like', 'mass', 'able', 'to', 'penetrate', 'virtually', 'any', 'safeguard', 'capable', 'of', 'as', 'a', 'doomed', 'doctor', 'chillingly', 'describes', 'it', 'assimilating', 'flesh', 'on', 'contact', 'Snide', 'comparisons', 'to', 'gelatin', 'be', 'damned', 'it', "'s", 'a', 'concept', 'with', 'the', 'most', 'devastating', 'of', 'potential', 'consequences', 'not', 'unlike', 'the', 'grey', 'goo', 'scenario', 'proposed', 'by', 'technological', 'theorists', 'fearful', 'of', 'artificial', 'intelligence', 'run', 'rampant'])

In [17]:
blob.tags[:5]
# [('The', 'DT'),
#  ('titular', 'JJ'),
#  ('threat', 'NN'),
#  ('of', 'IN'),
#  ('The', 'DT')]
text_df = pd.DataFrame(blob.tags, columns=['word', 'pos'])

text_df.groupby('pos').count()

Unnamed: 0_level_0,word
pos,Unnamed: 1_level_1
DT,9
IN,10
JJ,12
NN,16
NNP,1
NNS,3
PRP,3
RB,5
RBS,1
TO,2


In [18]:
blob.noun_phrases

WordList(['titular threat', 'blob', 'ultimate movie monster', 'amoeba-like mass', 'snide', 'potential consequences', 'grey goo scenario', 'technological theorists fearful', 'artificial intelligence run rampant'])

# WordNet 사전

## 단어 정의

In [58]:
from textblob import Word
from textblob.wordnet import VERB
word = Word("happy")
word.definitions

['enjoying or showing or marked by joy or pleasure',
 'marked by good fortune',
 'eagerly disposed to act or to be of service',
 'well expressed and to the point']

## 동의어(synonym)

In [59]:
word.synsets

[Synset('happy.a.01'),
 Synset('felicitous.s.02'),
 Synset('glad.s.02'),
 Synset('happy.s.04')]

In [60]:
synonyms = set()
for synset in word.synsets:
    for lemma in synset.lemmas():
        synonyms.add(lemma.name())
        
print(synonyms)        

{'felicitous', 'glad', 'happy', 'well-chosen'}


## 반의어(antonyms)

In [61]:
lemmas = word.synsets[0].lemmas()
lemmas

[Lemma('happy.a.01.happy')]

In [62]:
lemmas[0].antonyms()

[Lemma('unhappy.a.01.unhappy')]