# TextBlob Quickstart

TextBlob aims to provide access to common text-processing operations through a familiar interface. You can treat TextBlob objects as if they were Python strings that learned how to do Natural Language Processing.

## 1) Create a TextBlob

To install, use either

> conda install -c conda-forge textblob

or

> pip install -U textblob

Then, you can import and use it:

In [5]:
from textblob import TextBlob

# create a text blob
wiki = TextBlob("Python is a high-level, general-purpose programming language.")

### TextBlobs are like Python strings

You can use Python's substring syntax:

In [6]:
wiki[0:19]

TextBlob("Python is a high-le")

You can use common string methods.

In [7]:
print(wiki.upper())
print(wiki.find("high"))

PYTHON IS A HIGH-LEVEL, GENERAL-PURPOSE PROGRAMMING LANGUAGE.
12


In [8]:
# You can make comparisons between TextBlobs and strings.
apple_blob = TextBlob('apples')
banana_blob = TextBlob('bananas')
print(apple_blob < banana_blob)
print(apple_blob == 'apples')

# You can concatenate and interpolate TextBlobs and strings.
print(apple_blob + ' and ' + banana_blob)
print("{0} and {1}".format(apple_blob, banana_blob))
print('apples and bananas')

True
True
apples and bananas
apples and bananas
apples and bananas


## 2) Natrual Language Processing

In [9]:
## Parts of Speech

In [10]:
wiki.tags

[('Python', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('high-level', 'JJ'),
 ('general-purpose', 'JJ'),
 ('programming', 'NN'),
 ('language', 'NN')]

In [11]:
# # Run this once
# import nltk
# nltk.download('brown')

In [12]:
# Noun Phrase Extraction
wiki.noun_phrases

WordList(['python'])

## Parsing

Use the `parse()` method to parse the text. By default, TextBlob uses `pattern`'s parser.

In [13]:
b = TextBlob("And now for something completely different.")
print(b.parse())

And/CC/O/O now/RB/B-ADVP/O for/IN/B-PP/B-PNP something/NN/B-NP/I-PNP completely/RB/B-ADJP/O different/JJ/I-ADJP/O ././O/O


## 3) Sentiment Analysis

The sentiment property returns a named tuple of the form `Sentiment(polarity, subjectivity)`. 

* The **polarity** score is a float within the range `[-1.0, 1.0]`. 

* The **subjectivity** is a float within the range `[0.0, 1.0]` where `0.0` is very objective and `1.0` is very subjective.

In [14]:
testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")
testimonial.sentiment

Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)

In [15]:
testimonial.sentiment.polarity

0.39166666666666666

## 4) Tokenization

In [16]:
zen = TextBlob("Beautiful is better than ugly. "
               "Explicit is better than implicit. "
               "Simple is better than complex.")
zen.words

WordList(['Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit', 'Simple', 'is', 'better', 'than', 'complex'])

In [17]:
zen.sentences

[Sentence("Beautiful is better than ugly."),
 Sentence("Explicit is better than implicit."),
 Sentence("Simple is better than complex.")]

Sentence objects have the same properties and methods as TextBlobs.

In [18]:
for sentence in zen.sentences:
    print(sentence.sentiment)

Sentiment(polarity=0.2166666666666667, subjectivity=0.8333333333333334)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.06666666666666667, subjectivity=0.41904761904761906)


Use `sentence.start` and `entence.end` to get the indices where a sentence starts and ends within a TextBlob.

In [19]:
for s in zen.sentences:
    print(s)
    print("---- Starts at index {}, Ends at index {}".format(s.start, s.end))

Beautiful is better than ugly.
---- Starts at index 0, Ends at index 30
Explicit is better than implicit.
---- Starts at index 31, Ends at index 64
Simple is better than complex.
---- Starts at index 65, Ends at index 95


A WordList is just a Python list with additional methods.

## 5) Words Inflection and Lemmatization

Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.

In [20]:
sentence = TextBlob('Use 4 spaces per indentation level.')
sentence.words

WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])

In [21]:
sentence.words[2].singularize()

'space'

In [22]:
sentence.words[-1].pluralize()

'levels'

Words can be lemmatized by calling the lemmatize method.

In [23]:
from textblob import Word
w = Word("octopi")
w.lemmatize()

'octopus'

In [24]:
w = Word("went")
w.lemmatize("v")  # Pass in WordNet part of speech (verb)

'go'

## Spelling Correction

Spelling correction is based on Peter Norvig's "How to Write a Spelling Corrector" as implemented in the pattern library. It is about 70% accurate.

Use the `correct()` method to attempt spelling correction.

In [25]:
b = TextBlob("I havv goood speling!")
b.correct()

TextBlob("I have good spelling!")

Word objects have a `spellcheck()` (i.e., `Word.spellcheck()`) method that returns a list of `(word, confidence)` tuples with spelling suggestions.

In [26]:
from textblob import Word
w = Word('falibility')
w.spellcheck()

[('fallibility', 1.0)]

# Get Word and Noun Phrase Frequencies

There are two ways to get the frequency of a word or noun phrase in a TextBlob.

The first is through the `word_counts` dictionary. If you access the frequencies this way, the search will not be case sensitive, and words that are not found will have a frequency of 0.

In [27]:
monty = TextBlob("We are no longer the Knights who say Ni. "
                    "We are now the Knights who say Ekki ekki ekki PTANG.")
monty.word_counts['ekki']

3

The second way is to use the `count()` method.
You can specify whether or not the search should be case-sensitive (default is False).
Each of these methods can also be used with noun phrases.

In [28]:
print(monty.words.count('ekki'))

print(monty.words.count('ekki', case_sensitive=True))

print(wiki.noun_phrases.count('python'))

3
2
1


# n-grams

The `TextBlob.ngrams()` method returns a list of tuples of `n` successive words.

In [29]:
blob = TextBlob("Now is better than never.")
blob.ngrams(n=3)


[WordList(['Now', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'never'])]