## TextBlob



TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as **part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation**, and more.

Installing textblob via :

$ pip install -U textblob

$ python -m textblob.download_corpora

In [6]:
from textblob import TextBlob

text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''

[('The', 'DT'), ('titular', 'JJ'), ('threat', 'NN'), ('of', 'IN'), ('The', 'DT'), ('Blob', 'NNP'), ('has', 'VBZ'), ('always', 'RB'), ('struck', 'VBN'), ('me', 'PRP'), ('as', 'IN'), ('the', 'DT'), ('ultimate', 'JJ'), ('movie', 'NN'), ('monster', 'NN'), ('an', 'DT'), ('insatiably', 'RB'), ('hungry', 'JJ'), ('amoeba-like', 'JJ'), ('mass', 'NN'), ('able', 'JJ'), ('to', 'TO'), ('penetrate', 'VB'), ('virtually', 'RB'), ('any', 'DT'), ('safeguard', 'NN'), ('capable', 'JJ'), ('of', 'IN'), ('as', 'IN'), ('a', 'DT'), ('doomed', 'JJ'), ('doctor', 'NN'), ('chillingly', 'RB'), ('describes', 'VBZ'), ('it', 'PRP'), ('assimilating', 'VBG'), ('flesh', 'NN'), ('on', 'IN'), ('contact', 'NN'), ('Snide', 'JJ'), ('comparisons', 'NNS'), ('to', 'TO'), ('gelatin', 'VB'), ('be', 'VB'), ('damned', 'VBN'), ('it', 'PRP'), ("'s", 'VBZ'), ('a', 'DT'), ('concept', 'NN'), ('with', 'IN'), ('the', 'DT'), ('most', 'RBS'), ('devastating', 'JJ'), ('of', 'IN'), ('potential', 'JJ'), ('consequences', 'NNS'), ('not', 'RB'), ('

## Features
* Noun phrase extraction
* Part-of-speech tagging
* Sentiment analysis
* Classification (Naive Bayes, Decision Tree)
* Language translation and detection powered by Google Translate
* Tokenization (splitting text into words and sentences)
* Word and phrase frequencies
* Parsing
* n-grams
* Word inflection (pluralization and singularization) and lemmatization
* Spelling correction
* Add new models or languages through extensions
* WordNet integration

In [None]:
#Part-of-speech Tagging

blob = TextBlob(text)
print(blob.tags)           # [('The', 'DT'), ('titular', 'JJ'),
                    #  ('threat', 'NN'), ('of', 'IN'), ...]
print('***'*20)

#Noun Phrase Extraction
print(blob.noun_phrases)   # WordList(['titular threat', 'blob',
                    #            'ultimate movie monster',
                    #            'amoeba-like mass', ...])

        
#Sentiment Analysis
for sentence in blob.sentences:
    print(sentence.sentiment.polarity)
# 0.060
# -0.341

#blob.translate(to="es")

## Tokenization

In [7]:
for sentence in blob.sentences:
    print(sentence.words)

['The', 'titular', 'threat', 'of', 'The', 'Blob', 'has', 'always', 'struck', 'me', 'as', 'the', 'ultimate', 'movie', 'monster', 'an', 'insatiably', 'hungry', 'amoeba-like', 'mass', 'able', 'to', 'penetrate', 'virtually', 'any', 'safeguard', 'capable', 'of', 'as', 'a', 'doomed', 'doctor', 'chillingly', 'describes', 'it', 'assimilating', 'flesh', 'on', 'contact']
['Snide', 'comparisons', 'to', 'gelatin', 'be', 'damned', 'it', "'s", 'a', 'concept', 'with', 'the', 'most', 'devastating', 'of', 'potential', 'consequences', 'not', 'unlike', 'the', 'grey', 'goo', 'scenario', 'proposed', 'by', 'technological', 'theorists', 'fearful', 'of', 'artificial', 'intelligence', 'run', 'rampant']


In [9]:
blob.sentences

[Sentence("
 The titular threat of The Blob has always struck me as the ultimate movie
 monster: an insatiably hungry, amoeba-like mass able to penetrate
 virtually any safeguard, capable of--as a doomed doctor chillingly
 describes it--"assimilating flesh on contact."),
 Sentence("Snide comparisons to gelatin be damned, it's a concept with the most
 devastating of potential consequences, not unlike the grey goo scenario
 proposed by technological theorists fearful of
 artificial intelligence run rampant.")]

## Words Inflection and Lemmatization

In [13]:

sentence = TextBlob('Use 4 spaces per indentation level.')
print(sentence.words)
print('*****'*10)
print(sentence.words[2].singularize())
print('*****'*10)
print(sentence.words[-1].pluralize())


['Use', '4', 'spaces', 'per', 'indentation', 'level']
**************************************************
space
**************************************************
levels


## WordNet Integration

You can access the synsets for a Word via the synsets property or the get_synsets method, optionally passing in a part of speech.

In [14]:
from textblob import Word
from textblob.wordnet import VERB
word = Word("octopus")
print(word.synsets)

print(Word("hack").get_synsets(pos=VERB))

[Synset('octopus.n.01'), Synset('octopus.n.02')]
[Synset('chop.v.05'), Synset('hack.v.02'), Synset('hack.v.03'), Synset('hack.v.04'), Synset('hack.v.05'), Synset('hack.v.06'), Synset('hack.v.07'), Synset('hack.v.08')]


You can access the definitions for each synset via the definitions property or the define() method, which can also take an optional part-of-speech argument.

In [15]:
print(Word("octopus").definitions)

['tentacles of octopus prepared as food', 'bottom-living cephalopod having a soft oval body with eight long tentacles']


## Spelling Correction

Use the correct() method to attempt spelling correction.

In [17]:
b = TextBlob("I havv goood speling! , can yuouo pls correct the mistankes i had made in thsi ssentence")
print(b.correct())


I have good spelling! , can you pus correct the mistakes i had made in this sentence


Word objects have a spellcheck() Word.spellcheck() method that returns a list of (word, confidence) tuples with spelling suggestions.

In [18]:
from textblob import Word
w = Word('falibility')
w.spellcheck()

[('fallibility', 1.0)]

## n-grams

The TextBlob.ngrams() method returns a list of tuples of n successive words.

In [19]:
blob = TextBlob("Now is better than never.")
blob.ngrams(n=3)


[WordList(['Now', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'never'])]

## Get Start and End Indices of Sentences
Use sentence.start and sentence.end to get the indices where a sentence starts and ends within a TextBlob.

In [21]:
for s in blob.sentences:
...     print(s)
...     print("---- Starts at index {}, Ends at index {}".format(s.start, s.end))

Now is better than never.
---- Starts at index 0, Ends at index 25
