# [Sentiment Analysis in Python With TextBlob](https://stackabuse.com/sentiment-analysis-in-python-with-textblob/)

The algorithms of sentiment analysis mostly focus on defining opinions, attitudes, and even emoticons in a corpus of texts.

It defines up to three basic polar emotions (positive, negative, neutral), the limit of more advanced models is broader.

<img src = https://stackabuse.s3.amazonaws.com/media/sentiment-analysis-in-python-with-textblob-1.jpg width="600px" />

TextBlob’s output for a polarity task is a float within the range `[-1.0, 1.0]` where `-1.0` is a negative polarity and `1.0` is positive. This score can also be equal to `0`, which stands for a neutral evaluation of a statement as it doesn’t contain any words from the training set.

 We first import `TextBlob` library:

In [30]:
# Importing TextBlob
from textblob import TextBlob

 Once imported, we'll load in a sentence for analysis and instantiate a `TextBlob` object, as well as assigning the `sentiment` property to our own `analysis`:

In [31]:
# Preparing an input sentence
sentence = '''The platform provides universal access to the world's best education, partnering with top universities and organizations to offer courses online.'''

# Creating a textblob object and assigning the sentiment property
analysis = TextBlob(sentence).sentiment
print(analysis)

Sentiment(polarity=0.5, subjectivity=0.26666666666666666)


 The `sentiment` property is a `namedtuple` of the form `Sentiment(polarity, subjectivity)`.

 Where the expected output of the analysis is:

> `Sentiment(polarity=0.5, subjectivity=0.26666666666666666)`

*Moreover, it’s also possible to go for polarity or subjectivity results separately by simply running the following*

In [32]:
from textblob import TextBlob

# Preparing an input sentence
sentence = '''The platform provides universal access to the world's best education, partnering with top universities and organizations to offer courses online.'''

analysisPol = TextBlob(sentence).polarity
analysisSub = TextBlob(sentence).subjectivity

print(analysisPol)
print(analysisSub)

0.5
0.26666666666666666


---
One of the great things about TextBlob is that it allows the user to choose an algorithm for implementation of the high-level NLP tasks:

 - `PatternAnalyzer` - a default classifier that is built on the pattern library
 - `NaiveBayesAnalyzer` - an NLTK model trained on a movie reviews corpus

To change the default settings, we'll simply specify a `NaiveBayes` analyzer in the code. Let’s run sentiment analysis on tweets directly from *Twitter*:

In [33]:
from textblob import TextBlob
# For parsing tweets
import tweepy 

# Importing the NaiveBayesAnalyzer classifier from NLTK
from textblob.sentiments import NaiveBayesAnalyzer

 After that, we need to establish a connection with the Twitter API via API keys (that you can get through a `developer account`):

In [34]:
# Uploading api keys and tokens
api_key = 'zkHL8gUupR20SVVlmlruqrk2N'
api_secret = 'sMYG0dAjk1ojegzeYIUX24iw983IWoIhXAYmfSLDxTwTtiAFJf'
access_token = '1507476365209223177-tT7Ur8dc3zTqinEuFF3YwP9nTcbWqY'
access_secret = 'z3cmW4PsR7SZtZG7harfHuBwFDYIss5No1XZbp9GJbr06'

# Establishing the connection
twitter = tweepy.OAuthHandler(api_key, api_secret)
api = tweepy.API(twitter)

Now, we can perform the analysis of tweets on any topic. A searched word (e.g. lockdown) can be both one word or more. Moreover, this task can be time-consuming due to a tremendous amount of tweets. It's recommended to limit the output:



In [35]:
# This command will call back 5 tweets within a “lockdown” topic
corpus_tweets = api.search_tweets("lockdown", count=5) 
for tweet in corpus_tweets:
    print(tweet.text)

RT @taniadysan: Seguindo a China, a Austrália entra em "lockdown". Agora as clínicas estão sobrecarregadas devido a ataques cardíacos.
ATAQ…
RT @aqualimits: sorry to everyone in my life but i'm on purple lockdown until june and then i cannot be making any plans whatsoever until i…
RT @hoonieloaf: on a LOCKDOWN https://t.co/uUSTvFaHXt
Τι θα επιτρέψει τη χαλαρότητα του lockdown στη Σανγκάη; «Διορία» έως την Τετάρτη για «στοπ» στη μετάδοση του κορονο… https://t.co/fDBHnT43GS
RT @pjmsyu: dia 10 de junho já eh feriado nacional pra mim não marquem compromissos comigo não me chamem não falem comigo não ousem me tira…


The last step in this example is switching the default model to the NLTK analyzer that returns its results as a `namedtuple` of the form: Sentiment`(classification, p_pos, p_neg)`:

In [36]:
# Applying the NaiveBayesAnalyzer
blob_object = TextBlob(tweet.text, analyzer=NaiveBayesAnalyzer())

# Running sentiment analysis
analysis = blob_object.sentiment
print(analysis)

Sentiment(classification='pos', p_pos=0.5, p_neg=0.5)


Here, it's classified it as a `positive` sentiment, with the `p_pos` and `p_neg` values being `~0.5` each.

---
# [Tutorial: Simple Text Classification with Python and TextBlob](https://stackabuse.com/sentiment-analysis-in-python-with-textblob/)

## Part 1: A Tweet Sentiment Analyzer (Simple classification)

Our first classifier will be a simple sentiment analyzer trained on a small dataset of fake tweets.

To begin, we'll import the textblob.classifiers and create some training and test data.

In [37]:
from textblob.classifiers import NaiveBayesClassifier

train = [
    ('I love this sandwich.', 'pos'),
    ('This is an amazing place!', 'pos'),
    ('I feel very good about these beers.', 'pos'),
    ('This is my best work.', 'pos'),
    ("What an awesome view", 'pos'),
    ('I do not like this restaurant', 'neg'),
    ('I am tired of this stuff.', 'neg'),
    ("I can't deal with this", 'neg'),
    ('He is my sworn enemy!', 'neg'),
    ('My boss is horrible.', 'neg')
]
test = [
    ('The beer was good.', 'pos'),
    ('I do not enjoy my job', 'neg'),
    ("I ain't feeling dandy today.", 'neg'),
    ("I feel amazing!", 'pos'),
    ('Gary is a friend of mine.', 'pos'),
    ("I can't believe I'm doing this.", 'neg')
]

We create a new classifier by passing training data into the constructor for a `NaiveBayesClassifier`.

In [38]:
cl = NaiveBayesClassifier(train)

We can now classify arbitrary text using the `NaiveBayesClassifier.classify(text)` method.

In [39]:
cl.classify("Their burgers are amazing")  # "pos"
cl.classify("I don't like their pizza.")  # "neg"

'neg'

---
Another way to classify strings of text is to use TextBlob objects. You can pass classifiers into the constructor of a TextBlob.

In [40]:
from textblob import TextBlob
blob = TextBlob("The beer was amazing. " 
                "But the hangover was horrible. My boss was not happy.",
                classifier=cl)

You can then call the `classify()` method on the blob.

In [41]:
blob.classify()  # "neg"

'neg'

---
You can also take advantage of TextBlob's sentence tokenization and classify each sentence indvidually.

In [42]:
for sentence in blob.sentences:
    print(sentence)
    print(sentence.classify())
# "pos", "neg", "neg"

The beer was amazing.
pos
But the hangover was horrible.
neg
My boss was not happy.
neg


---
Let's check the accuracy on the test set.

In [43]:
cl.accuracy(test)
# 0.83

0.8333333333333334

---
We can also find the most informative features:

In [44]:
cl.show_informative_features(5)

Most Informative Features
          contains(this) = True              neg : pos    =      2.3 : 1.0
          contains(this) = False             pos : neg    =      1.8 : 1.0
          contains(This) = False             neg : pos    =      1.6 : 1.0
            contains(an) = False             neg : pos    =      1.6 : 1.0
             contains(I) = False             pos : neg    =      1.4 : 1.0


*This indicates that tweets containing the word "my" but not containing the word "place" tend to be negative.*

## Part 2: Adding More Data from NLTK

We can improve our classifier by adding more training and test data. Here we'll add data from the movie review corpus which was downloaded with NLTK.

In [45]:
import random
from nltk.corpus import movie_reviews

reviews = [(list(movie_reviews.words(fileid)), category)
              for category in movie_reviews.categories()
              for fileid in movie_reviews.fileids(category)]
new_train, new_test = reviews[0:100], reviews[101:200]

Let's see what one of these documents looks like.

In [46]:
print(new_train[0])

(['plot', ':', 'two', 'teen', 'couples', 'go', 'to', 'a', 'church', 'party', ',', 'drink', 'and', 'then', 'drive', '.', 'they', 'get', 'into', 'an', 'accident', '.', 'one', 'of', 'the', 'guys', 'dies', ',', 'but', 'his', 'girlfriend', 'continues', 'to', 'see', 'him', 'in', 'her', 'life', ',', 'and', 'has', 'nightmares', '.', 'what', "'", 's', 'the', 'deal', '?', 'watch', 'the', 'movie', 'and', '"', 'sorta', '"', 'find', 'out', '.', '.', '.', 'critique', ':', 'a', 'mind', '-', 'fuck', 'movie', 'for', 'the', 'teen', 'generation', 'that', 'touches', 'on', 'a', 'very', 'cool', 'idea', ',', 'but', 'presents', 'it', 'in', 'a', 'very', 'bad', 'package', '.', 'which', 'is', 'what', 'makes', 'this', 'review', 'an', 'even', 'harder', 'one', 'to', 'write', ',', 'since', 'i', 'generally', 'applaud', 'films', 'which', 'attempt', 'to', 'break', 'the', 'mold', ',', 'mess', 'with', 'your', 'head', 'and', 'such', '(', 'lost', 'highway', '&', 'memento', ')', ',', 'but', 'there', 'are', 'good', 'and', 'b

*Notice that unlike the data in Part 1, the text comes as a list of words instead of a single string. TextBlob is smart about this; it will treat both forms of data as expected.*

We can now update our classifier with the new training data using the `update(new_data)` method, as well as test it using the larger test dataset.

In [47]:
cl.update(new_train)
accuracy = cl.accuracy(test + new_test)


# Compute accuracy
accuracy = cl.accuracy(test + new_test)
print("Accuracy: {0}".format(accuracy))

# Show 5 most informative features
cl.show_informative_features(5)

Accuracy: 0.9714285714285714
Most Informative Features
          contains(this) = False             pos : neg    =     31.8 : 1.0
            contains(of) = False             pos : neg    =     21.6 : 1.0
            contains(is) = False             pos : neg    =     17.7 : 1.0
       contains(awesome) = True              pos : neg    =     17.7 : 1.0
      contains(sandwich) = True              pos : neg    =     17.7 : 1.0


Use the `classify()` method to classify a sentence with the updated and trained classifier

In [48]:
cl.classify("I can't believe I'm doing this.")

'neg'

---
## Part 3: Language Detector (Custom Feature Extraction)

An important aspect that I haven't yet mentioned is `how features are being extracted from the text`.


For a given document and training set train, TextBlob's default behavior is to compute which words in train are present in document. For example, the sentence "It's just a flesh wound." might have features contains(flesh): True, contains(wound): True, and contains(knight): False.


Of course, this simple feature extractor may not be appropriate for all problems. Here we'll create a `custom feature extractor for a language detector`.


Here's the training and test data:

In [49]:
train = [
    ("amor", "spanish"),
    ("perro", "spanish"),
    ("playa", "spanish"),
    ("sal", "spanish"),
    ("oceano", "spanish"),
    ("love", "english"),
    ("dog", "english"),
    ("beach", "english"),
    ("salt", "english"),
    ("ocean", "english")
]
test = [
    ("ropa", "spanish"),
    ("comprar", "spanish"),
    ("camisa", "spanish"),
    ("agua", "spanish"),
    ("telefono", "spanish"),
    ("clothes", "english"),
    ("buy", "english"),
    ("shirt", "english"),
    ("water", "english"),
    ("telephone", "english")
]

A feature extractor is simply a function that takes an argument text (the text to extract features from) and returns a dictionary of features.

Let's create a very simple extractor that uses the last letter of a given word as its only feature.

In [50]:
def extractor(word):
    '''Extract the last letter of a word as the only feature.'''
    feats = {}
    last_letter = word[-1]
    feats["last_letter({0})".format(last_letter)] = True
    return feats

print(extractor("python"))  # {'last_letter(n)': True}

{'last_letter(n)': True}


We can pass this feature extractor as the second argument to the constructor of a `NaiveBayesClassifier`.

In [29]:
lang_detector = NaiveBayesClassifier(train, feature_extractor=extractor)

And again, compute accuracy and `informative features`.

In [54]:
lang_detector.accuracy(test)  # 0.7
lang_detector.show_informative_features(5)

Most Informative Features
          last_letter(o) = None           englis : spanis =      1.6 : 1.0
          last_letter(a) = None           englis : spanis =      1.2 : 1.0
          last_letter(e) = None           spanis : englis =      1.2 : 1.0
          last_letter(g) = None           spanis : englis =      1.2 : 1.0
          last_letter(h) = None           spanis : englis =      1.2 : 1.0


*Not surprisingly, words that do not end with the letter "o" tend to be English.*