# **Setup**

In this notebook, you will be using the NLTK's [`TextBlob`](https://textblob.readthedocs.io/en/dev/) library, which contains many common and unique NLP functions, including a sentiment analyzer.

In [1]:
%reset -f
import pandas as pd, nltk
from textblob import TextBlob  # version 0.17.1
from sklearn.metrics import classification_report as rpt
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.corpus import movie_reviews
_ = nltk.download(['movie_reviews', 'punkt', 'vader_lexicon'], quiet=True)
pd.set_option('max_colwidth', 0)

<hr style="border-top: 2px solid #606366; background: transparent;">

# **Review**

<font color='black'>[TextBlob](https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis) is another library that you can use to easily evaluate the sentiment of a text. Similar to [SpaCy](https://spacy.io/), TextBlob wraps your text document into an object, analyzes it, and makes numerous [attributes](https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis) and statistics accessible. In particular, it provides a sentiment polarity (as a number in $[-1,1]$ range) and sentiment subjectivity (as a number in $[0,1]$). A polarity of zero implies neutral sentiment. A subjectivity of zero indicates highly objective text.

<font color='black'>Thus, the "good idea" text has a polarity of 0.7, indicating a strong positive opinion. Its subjectivity of 0.6 indicates an above average degree of opinion.

In [2]:
tb = TextBlob('good idea')
print(tb.sentiment.polarity)                # in [-1, 1] interval
print(round(tb.sentiment.subjectivity, 2))  # in [ 0, 1] interval

0.7
0.6


## Lots of Good Ideas

<font color='black'>As with VADER, you can measure evaluate different variants of the phrase "good idea" to learn what TextBlob's sentiment analysis algorithm is sensitive to. 

In [3]:
LsDocs = \
  ['Yes', 'No', 'Yes :-(', "good idea", "GOOD idea", "good idea!", "good idea!!!",
   "idea's good!!!!!!!!", "idea's good !!!!!!!!", "good idea!!!!!!!!",      # too many exlamations may fail
   "not a good idea", "it isn't a good idea", "good and risky idea",   # negation and multi-attitude towards the movie
   "idea is good, but risky"]           # conjunction "but" sigmals change in polarity towards dominanty phrase

def PolSub(sDoc='great idea!'):
    tb = TextBlob(sDoc)
    return (sDoc, tb.polarity, tb.subjectivity)

df = pd.DataFrame([PolSub(s) for s in LsDocs], columns=['doc','pol','subj']).set_index('doc')
df.T.style.background_gradient(cmap='coolwarm', vmin=-1, vmax=1).set_precision(2)

doc,Yes,No,Yes :-(,good idea,GOOD idea,good idea!,good idea!!!,idea's good!!!!!!!!,idea's good !!!!!!!!,good idea!!!!!!!!,not a good idea,it isn't a good idea,good and risky idea,"idea is good, but risky"
pol,0.0,0.0,-0.75,0.7,0.7,0.88,1.0,1.0,1.0,1.0,-0.35,0.7,0.7,0.7
subj,0.0,0.0,1.0,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6


<font color='black'>Note that exclamations intensify the sentiment, but capitalization does not. Some negations and emoticons also impact the sentiment; however "but" and "isn't" do not.

## TextBlob vs VADER on movie reviews dataset

<font color='black'>Next, you'll compare TextBlob and VADER's performances on a much larger dataset. Start by loading NLTK's movie reviews, which include 1000 positive reviews and 1000 negative reviews. The cell below loads only 100 reviews from each category for performance reasons, but you can increase this number.

In [4]:
print('Categories:', movie_reviews.categories())
print('Total Pos#: ', len(movie_reviews.fileids('pos')), ', Neg#:', len(movie_reviews.fileids('neg')))
LsPos = [movie_reviews.raw(s) for s in movie_reviews.fileids('pos')[:100]] # retrieve a few positive reviews from files
LsNeg = [movie_reviews.raw(s) for s in movie_reviews.fileids('neg')[:100]]
LsReviews = LsPos + LsNeg     # concatenate lists of positive and negative reviews
LnPosNeg = [1] * len(LsPos) + [-1] * len(LsNeg)   # actual (binary) polarities in {-1,1} set
n = len(LsReviews)            # total count of reviews we selected
print(LsReviews[:1][0][:265], '...')

Categories: ['neg', 'pos']
Total Pos#:  1000 , Neg#: 1000
films adapted from comic books have had plenty of success , whether they're about superheroes ( batman , superman , spawn ) , or geared toward kids ( casper ) or the arthouse crowd ( ghost world ) , but there's never really been a comic book like from hell before . ...


### Training

<font color='black'>Since TextBlob's sentiment analyzer was pre-trained on movie reviews, it should perform well on this similar corpus. VADER, by contrast, is rule-based system curated by experts. You can tune it by altering the valence of words in its vocabulary or adding/deleting words.
    
<font color='black'>Next, you will train your own classifier on the corpus from your domain using these popular models. 

In [5]:
from textblob import classifiers
print(', '.join(c for c in dir(classifiers) if 'Classifier' in c))

BaseClassifier, DecisionTreeClassifier, MaxEntClassifier, NLTKClassifier, NaiveBayesClassifier, PositiveNaiveBayesClassifier


### Performance

<font color='black'>Next, apply both models to the balanced subsamples. The dataframe generated below shows a small sample of TextBlob's results, including the actual sentiment polarity `vY` and the predicted valence `pPol`, which is thresholded at 0 to produce bi-polarity column `pY`.

In [6]:
%time pPol = [TextBlob(s).polarity for s in LsReviews]    # predicted polarities in [-1,1] interval

pY = [-1 if p<0 else 1 for p in pPol]                     # predicted polarities in {-1,1} set
dfTB = pd.DataFrame(dict(vY=LnPosNeg, pPol=pPol, pY=pY))  # Actual bi-polarity label, predicted polarity score, predicted bi-polarity label
LnIX = list(range(20)) + list(range(n-20, n))             # index of top (pos) few and bottom (neg) few reviews
dfTB.iloc[LnIX,:].T.style.background_gradient(cmap='coolwarm', vmin=-1, vmax=1).set_precision(1)

CPU times: user 1.48 s, sys: 15.5 ms, total: 1.5 s
Wall time: 1.51 s


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199
vY,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
pPol,0.0,0.1,0.1,0.1,-0.1,0.1,0.1,0.1,-0.1,0.1,0.0,0.2,0.2,0.0,0.2,0.1,0.2,0.2,0.2,0.1,0.1,0.0,-0.1,0.1,-0.1,0.1,-0.0,0.0,0.3,0.1,0.1,0.0,0.1,0.0,0.1,-0.0,0.1,0.0,0.1,-0.0
pY,1.0,1.0,1.0,1.0,-1.0,1.0,1.0,1.0,-1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,1.0,1.0,1.0,-1.0


<font color='black'>Notice that TextBlob appears overly positive, misclassifying far more negative reviews than positive reviews. 
    
<font color='black'>The next dataframe shows classification results for the VADER model. 

In [7]:
sia = SentimentIntensityAnalyzer()
%time pPol = [sia.polarity_scores(s)['compound'] for s in LsReviews]    # predicted polarities in [-1,1] interval

pY = [-1 if p<0 else 1 for p in pPol]                                   # predicted polarities in {-1,1} set
dfV = pd.DataFrame(dict(vY=LnPosNeg, pPol=pPol, pY=pY)) # Actual bi-polarity label, predicted polarity score, predicted bi-polarity label
dfV.iloc[LnIX,:].T.style.background_gradient(cmap='coolwarm', vmin=-1, vmax=1).set_precision(1)

CPU times: user 1.87 s, sys: 69.5 ms, total: 1.94 s
Wall time: 1.96 s


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199
vY,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
pPol,-0.6,0.8,1.0,1.0,-0.4,1.0,-0.9,1.0,0.7,-1.0,1.0,1.0,1.0,-0.9,1.0,1.0,1.0,1.0,1.0,0.9,-0.9,-0.9,-1.0,0.9,-0.8,0.9,-1.0,-1.0,1.0,1.0,1.0,-1.0,1.0,-0.3,1.0,-0.8,1.0,1.0,-1.0,-1.0
pY,-1.0,1.0,1.0,1.0,-1.0,1.0,-1.0,1.0,1.0,-1.0,1.0,1.0,1.0,-1.0,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0,-1.0,-1.0,1.0,1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,1.0,-1.0,-1.0


<font color='black'>Notice that VADER is more balanced in its misclassifications. It seemingly outperforms TextBlob on negative reviews and underperforms on positive reviews. Could you improve classification performance by combining these two models into an ensemble model? If you are interested, that exercise might be rewarding to explore on your own.

<font color='black'>Finally, compare more comprehensive classification reports for the two model's outputs. 

In [8]:
print(rpt(y_true=dfTB.vY, y_pred=dfTB.pY, labels=[-1,1]))

              precision    recall  f1-score   support

          -1       0.84      0.21      0.34       100
           1       0.55      0.96      0.70       100

    accuracy                           0.58       200
   macro avg       0.69      0.58      0.52       200
weighted avg       0.69      0.58      0.52       200



In [9]:
print(rpt(y_true=dfV.vY, y_pred=dfV.pY, labels=[-1,1]))

              precision    recall  f1-score   support

          -1       0.69      0.46      0.55       100
           1       0.59      0.79      0.68       100

    accuracy                           0.62       200
   macro avg       0.64      0.62      0.61       200
weighted avg       0.64      0.62      0.61       200



### <font color='black'> Sentiment Metrics
* <font color='black'>TextBlob's polarities are close to zero, while Vader's compound polarity is closer to +/-1.

### <font color='black'>Speed

* <font color='black'>VADER is twice as slow, but performs much better than TextBlob on negative reviews and equally well on positive reviews. 
    

### <font color='black'>Re-training

* <font color='black'>TextBlob can be re-trained on additional features and datasets. It uses NLTK's `NaiveBayesAnalyzer` to associate key words with binary sentiment
* <font color='black'>You can quickly expand VADER's vocabulary by a few words, but TextBlob requires re-training to expand its vocabulary, whether by one word or one million words. In such training a "sufficient" number of examples must be provided.

<hr style="border-top: 2px solid #606366; background: transparent;">

# **Optional Practice**

<font color='black'>Now you will practice comparing metrics for each model.
    
As you work through these tasks, check your answers by running your code in the *#check solution here* cell, to see if you’ve gotten the correct result. If you get stuck on a task, click the See **solution** drop-down to view the answer.

##  Task 1

Compute f1 metrics for each sentiment class with all movie review observations with VADER and TextBlob models. Note the runtime. Do you agree with the performance and runtime comparisons made above considering this larger sample?

<b>Hint:</b> Simply reuse the code above and remove slicing on <code>movie_reviews.fileids()</code>.

In [10]:
# check solution here

<font color=#606366>
    <details><summary><font color=#B31B1B>▶ </font>See <b>solution</b>.</summary>
<pre class="ec">
LsPos = [movie_reviews.raw(s) for s in movie_reviews.fileids('pos')] # retrieve a few positive reviews from files
LsNeg = [movie_reviews.raw(s) for s in movie_reviews.fileids('neg')]
LsReviews = LsPos + LsNeg
LnPosNeg = [1] * len(LsPos) + [-1] * len(LsNeg)   # actual (binary) polarities in {-1,1} set

%time pPol = [TextBlob(s).polarity for s in LsReviews]    # predicted polarities in [-1,1] interval
dfTB = pd.DataFrame(dict(vY=LnPosNeg, pPol=pPol, pY=[-1 if p<0 else 1 for p in pPol]))
%time pPol = [sia.polarity_scores(s)['compound'] for s in LsReviews]    # predicted polarities in [-1,1] interval
dfV = pd.DataFrame(dict(vY=LnPosNeg, pPol=pPol, pY=[-1 if p < 0 else 1 for p in pPol])) 
print(rpt(y_true=dfTB.vY, y_pred=dfTB.pY, labels=[-1,1]))
print(rpt(y_true=dfV.vY, y_pred=dfV.pY, labels=[-1,1]))
            </pre>Yes, the runtime and f1 scores are fairly similar with this larger sample as they were with the smaller sample of 200 reviews. However, VADER also appears to underperform on positive reviews with a similar f1 score.
</details> 
</font>

<hr>