<a href="https://colab.research.google.com/github/peterchang0414/SpeechAndLanguageProcessing3ed/blob/main/Ch4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 4: Naive Bayes and Sentiment Classification

Solution by: Peter Chang (GItHub: @peterchang0414)

*Last edited: 2022.1.21*

**4.1**: Assume the following likelihoods for each word being part of a positive or negative movie review, and equal prior probabilities for each class.
\begin{array}{ c c c }
  & \text{pos} & \text{neg} \\ 
  \hline
  \text{I} & 0.09 & 0.16 \\  
  \text{always} & 0.07 & 0.06 \\
  \text{like} & 0.29 & 0.06 \\
  \text{foreign} & 0.04 & 0.15 \\
  \text{films} & 0.08 & 0.11   
\end{array}
What class will Naive bayes assign to the sentence "I always like foreign films."?

Using equation (4.10), since we assume equal prior probabilities for each class:
\begin{align}
  c_{NB} = \text{argmax}_{c \in C}\sum_{i \in \text{positions}} \log{P(w_i|c)}
\end{align}
Computing the log probabilities for the positive class:
\begin{align}
  \sum_{i \in \text{positions}} \log{P(w_i|\text{pos})} &= \log(0.09) + \dots + \log(0.08) \approx -5.233
\end{align}
whereas for the negative class we have:
\begin{align}
  \sum_{i \in \text{positions}} \log{P(w_i|\text{neg})} &= \log(0.16) + \dots + \log(0.11) \approx -5.022
\end{align}
Therefore, Naive bayes will assign the sentence to the negative class.

**4.2**: Given the following short movie reviews, each labeled with a genre, either comedy or action:


1.   fun, couple, love, love: **comedy**
2.   fast, furious, shoot: **action**
3.   couple, fly, fast, fun, fun: **comedy**
4.   furious, shoot, shoot, fun: **action**
5.   fly, fast, shoot, love: **action**

and a new document D:

> fast, couple, shoot, fly

compute the most likely class for D. Assume a naive Bayes classifier and use add-1 smoothing for the likelihoods.

The prior for the two classes is computed:
$$
\begin{align}
  P(\text{comedy}) &= \frac{2}{5}, & P(\text{action}) &= \frac{3}{5}
\end{align}
$$

The probabilities for the words in the document D are as follows:

$$
\begin{align}
  P(\text{fast}|\text{comedy}) &= \frac{1+1}{9+7}=\frac{2}{16}, & P(\text{fast}|\text{action}) &= \frac{2+1}{11+7}=\frac{3}{18} \\
  P(\text{couple}|\text{comedy}) &= \frac{2+1}{9+7}=\frac{3}{16}, & P(\text{couple}|\text{action}) &= \frac{0+1}{11+7}=\frac{1}{18} \\
  P(\text{shoot}|\text{comedy}) &= \frac{0+1}{9+7}=\frac{1}{16}, & P(\text{shoot}|\text{action}) &= \frac{4+1}{11+7}=\frac{5}{18} \\
  P(\text{fly}|\text{comedy}) &= \frac{1+1}{9+7}=\frac{2}{16}, & P(\text{fly}|\text{action}) &= \frac{1+1}{11+7}=\frac{2}{18}
\end{align}
$$

Therefore, the posterior probabilities are computed:
$$
\begin{align}
  P(\text{comedy})P(D|\text{comedy}) &= \frac{2}{5} \times \frac{12}{16^{4}} \approx 7.3 \times 10^{-5} \\
  P(\text{action})P(D|\text{action}) &= \frac{3}{5} \times \frac{30}{18^{4}} \approx 1.7 \times 10^{-4}
\end{align}
$$
The model thus predicts the class *action* for the document $D$.

**4.3**: Train two models, multinomial naive Bayes and binarized naive Bayes, both with add-1 smoothing, on the following document counts for key sentiment words, with positive or negative class assigned as noted.

$$
\begin{array}{ l l l l l}
 \text{doc} & ``\text{good"} & ``\text{poor"} & ``\text{great"}&(\text{class})\\ 
 \text{d1.} & 3 & 0 & 3 & \text{pos}\\  
 \text{d2.} & 0 & 1 & 2 & \text{pos}\\
 \text{d3.} & 1 & 3 & 0 & \text{neg}\\
 \text{d4.} & 1 & 5 & 2 & \text{neg}\\
 \text{d5.} & 0 & 2 & 0 & \text{neg}
\end{array}
$$
Use both naive Bayes models to assign a class (pos or neg) to this sentence:

> A good, good plot and great characters, but poor acting.

Recall from page 62 that with naive Bayes text classification, we simply ignore (throw out) any word that never occurred in the training document. (We don't throw out words that appear in some classes but not others; that's what add-one smoothing is for.) Do the two models agree or disagree?


In [40]:
import numpy as np
from collections import Counter

def train_naive_bayes(docs, labels, binarize = False):
  """Train Naive Bayes model on given docs with labels

  Parameters
  ----------
  docs : list
    list of documents, each of which is a dictionary of
    unique word counts
  labels : list
    labels for the corresponding (index-matched) document
    in the docs list
  binarize : bool
    indicates whether we use binarized Naive Bayes model

  Returns
  -------
  log_prior
    dict where the keys are each class and values are
    the log prior probability of each class
  log_likelihood
    dict of dict where the keys are each class and values are
    dicts of log likelihood probability for each word in vocabulary
  vocab
    set of unique words in the union of docs
  classes
    set of unique labels
  """
  if binarize:
    for doc in docs:
      for k, v in doc.items():
        if v > 1:
          doc[k] = 1
  log_prior, log_likelihood, vocab = {}, {}, set()
  for doc in docs:
    vocab = vocab.union(set(list(doc.keys())))
  classes = set(labels)
  for c in classes:
    log_likelihood[c] = {}
    n_c = sum(map(lambda x : x == c, labels))
    log_prior[c] = np.log(n_c / len(docs))
    c_index = [i for i in range(len(label)) if label[i] == c]
    big_doc = Counter()
    for i in c_index:
      big_doc.update(Counter(docs[i]))
    for word in vocab:
      log_likelihood[c][word] = np.log((big_doc[word]+1)/(sum([big_doc[w] + 1 for w in vocab])))
  return log_prior, log_likelihood, vocab, classes

def test_naive_bayes(docs, labels, doc, binarize=False):
  """Test Naive Bayes model on given doc

  Parameters
  ----------
  docs : list
    list of documents, each of which is a dictionary of
    unique word counts
  labels : list
    labels for the corresponding (index-matched) document
    in the docs list
  doc : str
    test document/sentence
  binarize : bool
    indicates whether we use binarized Naive Bayes model

  Returns
  -------
  str
    class label to which the Naive Bayes model classified
    the test document
  """
  lp, ll, vocab, classes = train_naive_bayes(docs, labels, binarize)
  def process_doc(d, bin):
    dd = ''.join(ch for ch in d.lower() if ch.isalnum() or ch == ' ')
    ct = dict(Counter(dd.split()))
    if bin:
      for k, v in ct.items():
        if v > 1:
          ct[k] = 1
    return ct
  test_doc = process_doc(doc, binarize)
  sum_class = {}
  for c in classes:
    sum_class[c] = lp[c]
    for k, v in test_doc.items():
      if k in vocab:
        sum_class[c] += ll[c][k]*v
  return max(sum_class, key=sum_class.get)

In [41]:
train = [{'good': 3, 'poor': 0, 'great': 3},
         {'good': 0, 'poor': 1, 'great': 2},
         {'good': 1, 'poor': 3, 'great': 0},
         {'good': 1, 'poor': 5, 'great': 2},
         {'good': 0, 'poor': 2, 'great': 0}]
labels = ['pos', 'pos', 'neg', 'neg', 'neg']
test = 'A good, good plot and great characters, but poor acting.'

print(f'Naive Bayes classification of test sentence:\n\t{test}')
print('-'*65)
print(f'Non-binarized: {test_naive_bayes(train, labels, test)}')
print(f'Binarized: {test_naive_bayes(train, labels, test, True)}')

Naive Bayes classification of test sentence:
	A good, good plot and great characters, but poor acting.
-----------------------------------------------------------------
Non-binarized: pos
Binarized: neg


We can see that the two models disagree: the non-binarzied model classifies the test sentence as positive while the binarized model classifies it as negative.

Inspecting the test sentence, this makes intuitive sense, since the test sentence repeats the word "good" twice.