# TextBlob

TextBlob is a python library for processing textual data built on top of **NLTK** and **Pattern** modules.It provides simple API for parts_of_speech tagging, sentiment analysis, noun phrase extraction etc.

## Create textblob

TextBlob aims to provide access to common text-processing operations through a familiar interface. You can treat TextBlob objects as if they were Python strings that learned how to do Natural Language Processing.

In [1]:
from textblob import TextBlob

In [2]:
sent=TextBlob('Textblob is a simple API used for natural language processing')
sent

TextBlob("Textblob is a simple API used for natural language processing")

## Parts-of-speech tagging

In [3]:
#pos tagging is done using 'tags' property
sent.tags

[('Textblob', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('simple', 'JJ'),
 ('API', 'NNP'),
 ('used', 'VBD'),
 ('for', 'IN'),
 ('natural', 'JJ'),
 ('language', 'NN'),
 ('processing', 'NN')]

## Noun phrase extraction

In [4]:
#noun phrases are accessed through 'noun_phrases' property
sent.noun_phrases

WordList(['textblob', 'api', 'natural language processing'])

## Sentiment Analysis

sentiment analysis is done using **sentiment** property which returns a tuple of values (polarity,subjectivity). **Polarity** values ranges b/w [-1,1] where -1 indicates 'neg' and 1 indicates 'pos' sentiment.**Subjectivity** has range [0,1] with 0 being very objective and 1 being very subjective.


In [5]:
sent.sentiment

Sentiment(polarity=0.05, subjectivity=0.37857142857142856)

In [6]:
sent1=TextBlob('It is disgrace to anyone who cannot treat everybody equal')
sent1.sentiment

Sentiment(polarity=0.0, subjectivity=0.25)

In [9]:
sent1.sentiment.polarity

0.0

In [10]:
sent1.sentiment.subjectivity

0.25

## Tokenization

In [11]:
#words can be tokenised using 'words' property
sent.words

WordList(['Textblob', 'is', 'a', 'simple', 'API', 'used', 'for', 'natural', 'language', 'processing'])

In [25]:
#sentence can be tokenized using 'sentences' property
sent2=TextBlob("I like using Textblob. " 
               "It is very useful and simple API to learn.")
sent2.sentences

[Sentence("I like using Textblob."),
 Sentence("It is very useful and simple API to learn.")]

In [26]:
sent2.words

WordList(['I', 'like', 'using', 'Textblob', 'It', 'is', 'very', 'useful', 'and', 'simple', 'API', 'to', 'learn'])

## Words Inflection and Lemmatization

In [29]:
#inflection
sentence = TextBlob('Use 4 spaces per indentation level.')
sentence.words[2].singularize()

'space'

In [30]:
sentence.words[5].pluralize()

'levels'

In [31]:
#Lemmatization
from textblob import Word
w=Word('octopi')
w.lemmatize()

'octopus'

In [32]:
w=Word('went')
w.lemmatize('v')

'go'

## Wordnet

In [33]:
from textblob import Word
from textblob.wordnet import VERB

In [35]:
w=Word('octopus')
w.synsets

[Synset('octopus.n.01'), Synset('octopus.n.02')]

In [38]:
#we can use 'get_synsets' also
Word('hack').get_synsets(pos=VERB)

[Synset('chop.v.05'),
 Synset('hack.v.02'),
 Synset('hack.v.03'),
 Synset('hack.v.04'),
 Synset('hack.v.05'),
 Synset('hack.v.06'),
 Synset('hack.v.07'),
 Synset('hack.v.08')]

In [40]:
Word('bicycle').definitions

['a wheeled vehicle that has two wheels and is moved by foot pedals',
 'ride a bicycle']

In [44]:
from textblob.wordnet import Synset
octopus=Synset('octopus.n.02')
shrimp=Synset('shrimp.n.03')
shrimp.path_similarity(octopus)

0.1111111111111111

## Wordlists

they are same as python lists with some additional methods

In [45]:
animals=TextBlob('cat dog octopus')
animals.words

WordList(['cat', 'dog', 'octopus'])

In [46]:
animals.words.pluralize()

WordList(['cats', 'dogs', 'octopodes'])

In [47]:
animals.word_counts


defaultdict(int, {'cat': 1, 'dog': 1, 'octopus': 1})

## Spelling correction

In [48]:
#correct() method is used for spelling correction
b=TextBlob('I havv made a speling mistak')
b.correct()

TextBlob("I have made a spelling mistake")

Word objects have a spellcheck(), **Word.spellcheck()** method that returns a list of (word, confidence) tuples with spelling suggestions.

In [49]:
Word('falibilitu').spellcheck()

[('fallibility', 1.0)]

In [50]:
Word('misisipi').spellcheck()

[('misisipi', 0.0)]

In [51]:
Word('intitive').spellcheck()

[('initiative', 0.6774193548387096),
 ('inactive', 0.1935483870967742),
 ('incisive', 0.0967741935483871),
 ('intimite', 0.03225806451612903)]

In [52]:
Word('intutive').spellcheck()

[('inactive', 1.0)]

## Word and Noun phrase frequencies

In [54]:
#if we use this way, the search will not be case sensitive
monty = TextBlob("We are no longer the Knights who say Ni. "
        "We are now the Knights who say Ekki ekki ekki PTANG.")
monty.word_counts['ekki']

3

In [55]:
#also we can use in this way
monty.words.count('ekki')

3

In [56]:
#we can use case-sensitive option
monty.words.count('ekki',case_sensitive=True)

2

In [57]:
monty.words.count('ekki',case_sensitive=False)

3

In [59]:
#noun phrases frequency
monty.noun_phrases.count('Knights')

0

In [60]:
sent.noun_phrases.count('api')

1

## Parsing

We can use parse() method to parse the text.By default, TextBlob uses **pattern's parser**

In [62]:
b = TextBlob("The Beatles is the best band in 60s.")
print(b.parse())

The/DT/B-NP/O Beatles/NNPS/I-NP/O is/VBZ/B-VP/O the/DT/B-NP/O best/JJS/I-NP/O band/NN/I-NP/O in/IN/B-PP/B-PNP 60s/NNS/B-NP/I-PNP ././O/O


## Textblobs are like python strings

In [64]:
b[:11]

TextBlob("The Beatles")

In [66]:
apples=TextBlob('apples')
oranges=TextBlob('oranges')
apples+" "+oranges

TextBlob("apples oranges")

In [67]:
#gives the index of the sart of word 'band'
b.find('band')

24

In [68]:
b.upper()

TextBlob("THE BEATLES IS THE BEST BAND IN 60S.")

In [69]:
apples<oranges

True

In [70]:
'{0} and {1}'.format(apples,oranges)

'apples and oranges'

## n-grams

In [71]:
b.ngrams(2)

[WordList(['The', 'Beatles']),
 WordList(['Beatles', 'is']),
 WordList(['is', 'the']),
 WordList(['the', 'best']),
 WordList(['best', 'band']),
 WordList(['band', 'in']),
 WordList(['in', '60s'])]

In [72]:
b.ngrams(3)

[WordList(['The', 'Beatles', 'is']),
 WordList(['Beatles', 'is', 'the']),
 WordList(['is', 'the', 'best']),
 WordList(['the', 'best', 'band']),
 WordList(['best', 'band', 'in']),
 WordList(['band', 'in', '60s'])]

## Starting and end indices of sentence 

In [75]:
zen=TextBlob('Beautiful is better than ugly. '
            'Explicit is better than implicit. '
            'Life is better than death.')

for s in zen.sentences:
    print(s)
    print("---- Starts at index {}, Ends at index {}".format(s.start, s.end))

Beautiful is better than ugly.
---- Starts at index 0, Ends at index 30
Explicit is better than implicit.
---- Starts at index 31, Ends at index 64
Life is better than death.
---- Starts at index 65, Ends at index 91
