# References

- [1] https://textblob.readthedocs.io/en/dev/quickstart.html
- [2] https://www.analyticsvidhya.com/blog/2018/02/natural-language-processing-for-beginners-using-textblob/
- [3] https://textblob.readthedocs.io/en/dev/classifiers.html#tutorial-building-a-text-classification-system

# What is TextBlob?

- Python library for NLP which is built on the shoulders of NLTK and Pattern [2]
- Some advantages:
 - easy to learn and offers a lot of features like sentiment analysis, pos-tagging, noun phrase extraction, etc. [2]
 

# Tutorial

## Quick Start from Official Docs [1]

### Installation

In [1]:
# !pip install -U textblob

In [2]:
from textblob import TextBlob

### create our first TextBlob

In [3]:
wiki = TextBlob("Python is a high-level, general-purpose programming language")

### Part of speech tagging

In [4]:
wiki.tags

[('Python', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('high-level', 'JJ'),
 ('general-purpose', 'JJ'),
 ('programming', 'NN'),
 ('language', 'NN')]

### noun phrase extraction

In [5]:
wiki.noun_phrases

WordList(['python'])

In [6]:
testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")
testimonial.sentiment

Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)

In [7]:
testimonial.sentiment.polarity

0.39166666666666666

### Tokenization

In [8]:
zen = TextBlob("Beautiful is better than ugly. "
                "Explicit is better than implicit. "
                "Simple is better than complex.")
zen.words

WordList(['Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit', 'Simple', 'is', 'better', 'than', 'complex'])

In [9]:
zen.sentences

[Sentence("Beautiful is better than ugly."),
 Sentence("Explicit is better than implicit."),
 Sentence("Simple is better than complex.")]

In [10]:
for sentence in zen.sentences:
    print(sentence)
    print(sentence.sentiment)

Beautiful is better than ugly.
Sentiment(polarity=0.2166666666666667, subjectivity=0.8333333333333334)
Explicit is better than implicit.
Sentiment(polarity=0.5, subjectivity=0.5)
Simple is better than complex.
Sentiment(polarity=0.06666666666666667, subjectivity=0.41904761904761906)


### Wordnet Integration

In [11]:
from textblob import Word
from textblob.wordnet import VERB
word = Word("octopus")
word.synsets

[Synset('octopus.n.01'), Synset('octopus.n.02')]

In [12]:
Word("hack").get_synsets(pos=VERB)

[Synset('chop.v.05'),
 Synset('hack.v.02'),
 Synset('hack.v.03'),
 Synset('hack.v.04'),
 Synset('hack.v.05'),
 Synset('hack.v.06'),
 Synset('hack.v.07'),
 Synset('hack.v.08')]

In [13]:
Word("octopus").definitions

['tentacles of octopus prepared as food',
 'bottom-living cephalopod having a soft oval body with eight long tentacles']

In [14]:
from textblob.wordnet import Synset
octopus = Synset('octopus.n.02')
shrimp = Synset('shrimp.n.03')
octopus.path_similarity(shrimp)

0.1111111111111111

### WordLists and Pluralize the word

In [15]:
animals = TextBlob("cat dog octopus")
animals.words

WordList(['cat', 'dog', 'octopus'])

In [16]:
animals.words.pluralize()

WordList(['cats', 'dogs', 'octopodes'])

### Spelling Correction

In [17]:
b = TextBlob("I havvv gooddd speling")
print(b.correct())

I have good spelling


## Tutorial: Building a Text Classification System [3]

In [18]:
train = [
     ('I love this sandwich.', 'pos'),
     ('this is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('this is my best work.', 'pos'),
     ("what an awesome view", 'pos'),
     ('I do not like this restaurant', 'neg'),
     ('I am tired of this stuff.', 'neg'),
     ("I can't deal with this", 'neg'),
     ('he is my sworn enemy!', 'neg'),
     ('my boss is horrible.', 'neg')
]

test = [
     ('the beer was good.', 'pos'),
     ('I do not enjoy my job', 'neg'),
     ("I ain't feeling dandy today.", 'neg'),
     ("I feel amazing!", 'pos'),
     ('Gary is a friend of mine.', 'pos'),
     ("I can't believe I'm doing this.", 'neg')
]

In [19]:
from textblob.classifiers import NaiveBayesClassifier
class1 = NaiveBayesClassifier(train)

In [20]:
with open('data/day2_sample_text.csv', 'r') as fp:
    class2 = NaiveBayesClassifier(fp, format="csv")

In [21]:
class2.classify("this is amazing movie")

'pos'

In [22]:
class2.classify("I like this hotel")

'neg'

### Classifying Text

In [23]:
prob_dist = class1.prob_classify("This one's a doozy.")
prob_dist.max()

'pos'

In [24]:
round(prob_dist.prob("pos"), 2)

0.63

In [25]:
round(prob_dist.prob("neg"), 2)

0.37

### Classifying TextBlobs

In [26]:
from textblob import TextBlob
blob = TextBlob("The beer is good. But the hangover is horrible.", classifier=class1)
blob.classify()

'pos'

In [27]:
for s in blob.sentences:
    print(s)
    print(s.classify())

The beer is good.
pos
But the hangover is horrible.
neg


### Evaluating Classifiers

In [28]:
class1.accuracy(test)

0.8333333333333334

### Diplay a Listing of the Most Informative Features

In [29]:
class1.show_informative_features(5)

Most Informative Features
            contains(my) = True              neg : pos    =      1.7 : 1.0
            contains(an) = False             neg : pos    =      1.6 : 1.0
             contains(I) = False             pos : neg    =      1.4 : 1.0
             contains(I) = True              neg : pos    =      1.4 : 1.0
            contains(my) = False             pos : neg    =      1.3 : 1.0


### Updating Classifiers with New Data¶

In [30]:
new_data = [('She is my best friend.', 'pos'),
             ("I'm happy to have a new friend.", 'pos'),
             ("Stay thirsty, my friend.", 'pos'),
             ("He ain't from around here.", 'neg')]
class1.update(new_data)

True

In [31]:
class1.accuracy(test)

1.0

### Feature Extractors

In [32]:
def end_word_extractor(document):
    tokens = document.split()
    first_word, last_word = tokens[0], tokens[-1]
    feats = {}
    feats["first({0})".format(first_word)] = True
    feats["last({0})".format(last_word)] = False
    return feats

features = end_word_extractor("I feel happy")
assert features == {'last(happy)': False, 'first(I)': True}

In [33]:
class3 = NaiveBayesClassifier(test, feature_extractor=end_word_extractor)
blob = TextBlob("I'm excited to try my new classifier.", classifier=class3)
blob.classify()

'pos'