# VADERを用いたアスペクトベースセンチメント分析

このノートブックでは、[VADER](https://github.com/cjhutto/vaderSentiment)を用いてアスペクトベースのセンチメント分析をする方法を紹介します。VADERは辞書とルールベースのセンチメント分析のツールで、ソーシャルメディアを対象に作成されています。

## 準備

### パッケージのインストール

In [None]:
!pip install -q vaderSentiment==3.3.2 nltk==3.2.5

Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[?25l[K     |██▋                             | 10 kB 19.9 MB/s eta 0:00:01[K     |█████▏                          | 20 kB 24.3 MB/s eta 0:00:01[K     |███████▉                        | 30 kB 29.5 MB/s eta 0:00:01[K     |██████████▍                     | 40 kB 25.0 MB/s eta 0:00:01[K     |█████████████                   | 51 kB 19.4 MB/s eta 0:00:01[K     |███████████████▋                | 61 kB 15.9 MB/s eta 0:00:01[K     |██████████████████▏             | 71 kB 13.3 MB/s eta 0:00:01[K     |████████████████████▉           | 81 kB 14.7 MB/s eta 0:00:01[K     |███████████████████████▍        | 92 kB 12.9 MB/s eta 0:00:01[K     |██████████████████████████      | 102 kB 13.9 MB/s eta 0:00:01[K     |████████████████████████████▋   | 112 kB 13.9 MB/s eta 0:00:01[K     |███████████████████████████████▏| 122 kB 13.9 MB/s eta 0:00:01[K     |████████████████████████████████| 125 

### インポート

In [None]:
import re
import string
from pprint import pprint

import nltk
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from nltk.tokenize import RegexpTokenizer, word_tokenize
nltk.download("punkt")
nltk.download("vader_lexicon")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


True

## VADERで分析

スコアは-1から1までの間で出力されます。-1はネガティブ、1はポジティブを表しています。

では、解析してみましょう。

In [None]:
# フライドチキンが最高です。ジューシーで焼き加減も丁度いい。
positive = "This fried chicken tastes very good. It is juicy and perfectly cooked."

# フライドチキンがまずい。パサパサで焼き過ぎ。
negative = "This fried chicken tasted bad. It is dry and overcooked."

# フライドチキンは美味しいですが、それ以外はだめです。
ambiguous = "Except the amazing fried chicken everything else at the restaurant tastes very bad."

`SentimentIntensityAnalyzer`を使うことで、文に対するセンチメントのスコアを出力できます。

In [None]:
def sentiment_analyzer_scores(text):
    sentiment_analyzer = SentimentIntensityAnalyzer()
    score = sentiment_analyzer.polarity_scores(text)
    pprint(text)
    pprint(score)
    print("-" * 30)

In [None]:
print("Positive:")
sentiment_analyzer_scores(positive)

print("Negative:")
sentiment_analyzer_scores(negative)

print("Ambiguous:")
sentiment_analyzer_scores(ambiguous)

Positive:
'This fried chicken tastes very good. It is juicy and perfectly cooked.'
{'compound': 0.8122, 'neg': 0.0, 'neu': 0.575, 'pos': 0.425}
------------------------------
Negative:
'This fried chicken tasted bad. It is dry and overcooked.'
{'compound': -0.5423, 'neg': 0.28, 'neu': 0.72, 'pos': 0.0}
------------------------------
Ambiguous:
('Except the amazing fried chicken everything else at the restaurant tastes '
 'very bad.')
{'compound': 0.0018, 'neg': 0.204, 'neu': 0.592, 'pos': 0.204}
------------------------------


`polarity_scores`で得られるスコアを使って、各単語をポジティブ、ネガティブ、ニュートラルに分類してみましょう。

In [None]:
def get_word_sentiment(text, sentiment_analyzer):
    tokenized_text = nltk.word_tokenize(text)

    positive_words = []
    neutral_words = []
    negative_words = []
    for word in tokenized_text:
        if (sentiment_analyzer.polarity_scores(word)["compound"]) >= 0.1:
            positive_words.append(word)
        elif (sentiment_analyzer.polarity_scores(word)["compound"]) <= -0.1:
            negative_words.append(word)
        else:
            neutral_words.append(word)
    print(text)
    print("Positive:", positive_words)
    print("Negative:", negative_words)
    print("Neutral:", neutral_words)
    print("-" * 30)

In [None]:
sentiment_analyzer = SentimentIntensityAnalyzer()
get_word_sentiment(positive, sentiment_analyzer)
get_word_sentiment(negative, sentiment_analyzer)
get_word_sentiment(ambiguous, sentiment_analyzer)

This fried chicken tastes very good. It is juicy and perfectly cooked.
Positive: ['good', 'perfectly']
Negative: []
Neutral: ['This', 'fried', 'chicken', 'tastes', 'very', '.', 'It', 'is', 'juicy', 'and', 'cooked', '.']
------------------------------
This fried chicken tasted bad. It is dry and overcooked.
Positive: []
Negative: ['bad']
Neutral: ['This', 'fried', 'chicken', 'tasted', '.', 'It', 'is', 'dry', 'and', 'overcooked', '.']
------------------------------
Except the amazing fried chicken everything else at the restaurant tastes very bad.
Positive: ['amazing']
Negative: ['bad']
Neutral: ['Except', 'the', 'fried', 'chicken', 'everything', 'else', 'at', 'the', 'restaurant', 'tastes', 'very', '.']
------------------------------


使っている辞書を更新してみましょう。認識結果が変化しています。

In [None]:
new_words = {
    'dry': -2.0,
    'overcooked': -2.0,
}
sentiment_analyzer.lexicon.update(new_words)
get_word_sentiment(positive, sentiment_analyzer)
get_word_sentiment(negative, sentiment_analyzer)
get_word_sentiment(ambiguous, sentiment_analyzer)

This fried chicken tastes very good. It is juicy and perfectly cooked.
Positive: ['good', 'perfectly']
Negative: []
Neutral: ['This', 'fried', 'chicken', 'tastes', 'very', '.', 'It', 'is', 'juicy', 'and', 'cooked', '.']
------------------------------
This fried chicken tasted bad. It is dry and overcooked.
Positive: []
Negative: ['bad', 'dry', 'overcooked']
Neutral: ['This', 'fried', 'chicken', 'tasted', '.', 'It', 'is', 'and', '.']
------------------------------
Except the amazing fried chicken everything else at the restaurant tastes very bad.
Positive: ['amazing']
Negative: ['bad']
Neutral: ['Except', 'the', 'fried', 'chicken', 'everything', 'else', 'at', 'the', 'restaurant', 'tastes', 'very', '.']
------------------------------


元に戻しておきます。

In [None]:
sentiment_analyzer.lexicon.pop("dry")
sentiment_analyzer.lexicon.pop("overcooked")