# Ten Great Things about TextBlob

TextBlob is a Python library for text processing. Under the hood, TextBlob uses NLTK libraries but provides its own API.


Read [the docs](https://textblob.readthedocs.io/en/dev/)

[API reference](https://textblob.readthedocs.io/en/dev/api_reference.html#textblob.blob.TextBlob)

TextBlob is a nice alternative to NLTK, and is in fact built on top of it. TextBlob is a happy medium between the education-grade code of NLTK and the industrial-grade code of spaCy.

In order to use TextBlob methods, import textblob, then convert text to a TextBlob object. Then, you are ready to roll.

In [1]:
from textblob import TextBlob

In [2]:
raw_text = """TextBlob is a Python (2 and 3) library for processing 
textual data. It provides a simple API for diving into common 
natural language processing (NLP) tasks such as part-of-speech 
tagging, noun phrase extraction, sentiment analysis, classification,
translation, and more."""

In [4]:
# convert raw text to a TextBlob object
blob = TextBlob(raw_text)

### 1. POS tagging

POS tagging in TextBlob is fast, and syntax involves minimal typing. Under the hood, the basic tagger is NLTK's TreeBank tagger. 

In [9]:
# all the tags are already there in the TextBlob object!!

blob.tags[:10]

[('TextBlob', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('Python', 'NNP'),
 ('2', 'CD'),
 ('and', 'CC'),
 ('3', 'CD'),
 ('library', 'NN'),
 ('for', 'IN'),
 ('processing', 'VBG')]

### 2. Tokenize

A TextBlob object is already tokenized by sentence and by word. TextBlob uses NTLK tokenizers under the hood. Text is first tokenized into sentences then words when the textblob is created.

In [11]:
blob.sentences

[Sentence("TextBlob is a Python (2 and 3) library for processing 
 textual data."),
 Sentence("It provides a simple API for diving into common 
 natural language processing (NLP) tasks such as part-of-speech 
 tagging, noun phrase extraction, sentiment analysis, classification,
 translation, and more.")]

In [12]:
for sentence in blob.sentences:
    print(sentence)

TextBlob is a Python (2 and 3) library for processing 
textual data.
It provides a simple API for diving into common 
natural language processing (NLP) tasks such as part-of-speech 
tagging, noun phrase extraction, sentiment analysis, classification,
translation, and more.


In [13]:
blob.words

WordList(['TextBlob', 'is', 'a', 'Python', '2', 'and', '3', 'library', 'for', 'processing', 'textual', 'data', 'It', 'provides', 'a', 'simple', 'API', 'for', 'diving', 'into', 'common', 'natural', 'language', 'processing', 'NLP', 'tasks', 'such', 'as', 'part-of-speech', 'tagging', 'noun', 'phrase', 'extraction', 'sentiment', 'analysis', 'classification', 'translation', 'and', 'more'])

In [15]:
t_words = [w for w in blob.words if w.lower().startswith('t')]
t_words

['TextBlob', 'textual', 'tasks', 'tagging', 'translation']

### 3. Lemmatize

TextBlob objects are means for text passages. Functionality in TextBlob can be extracted on the word level as well by creating Word objects. The following shows how to create a Word object. However, every word in a TextBlob object is already a Word.

In [28]:
from textblob import Word

w = Word('alumni')
print(w, 'lemmatized:', w.lemmatize())

w = Word('had')
print(w, 'lemmatized:', w.lemmatize())
print(w, 'lemmatized verb:', w.lemmatize("v"))

alumni lemmatized: alumnus
had lemmatized: had
had lemmatized verb: have


### 4. WordNet integration

You can use many WordNet features from TextBlob directly without important anything else.

In [29]:
blob.words[7].define()
#blob.words[7].definitions  # this also works

['a room where books are kept',
 'a collection of literary documents or records kept for reference or borrowing',
 'a depository built to contain books and other materials for reading and study',
 '(computing) a collection of standard programs and subroutines that are stored and available for immediate use',
 'a building that houses a collection of books and other materials']

In [30]:
blob.words[7].synsets

[Synset('library.n.01'),
 Synset('library.n.02'),
 Synset('library.n.03'),
 Synset('library.n.04'),
 Synset('library.n.05')]

In [31]:
for syn in blob.words[7].synsets:
    print(syn, ':', syn.definition())

Synset('library.n.01') : a room where books are kept
Synset('library.n.02') : a collection of literary documents or records kept for reference or borrowing
Synset('library.n.03') : a depository built to contain books and other materials for reading and study
Synset('library.n.04') : (computing) a collection of standard programs and subroutines that are stored and available for immediate use
Synset('library.n.05') : a building that houses a collection of books and other materials


### 5. Noun phrase extraction

A noun phrase is a phrase with a noun as a head word. A noun phrase could be a noun by itself but every noun is not necessarily a noun phrase. 
 
Noun phrase extraction is often a first step in identifying key phrases in a text, or identifying entities.

In [32]:
# extract noun phrases
blob.noun_phrases

WordList(['textblob', 'python', 'processing textual data', 'api', 'common natural language processing', 'nlp', 'noun phrase extraction', 'sentiment analysis'])

### 6. Ngrams

Ngrams are easily extracted from a TextBlob object.

In [33]:
blob.ngrams(n=3)

[WordList(['TextBlob', 'is', 'a']),
 WordList(['is', 'a', 'Python']),
 WordList(['a', 'Python', '2']),
 WordList(['Python', '2', 'and']),
 WordList(['2', 'and', '3']),
 WordList(['and', '3', 'library']),
 WordList(['3', 'library', 'for']),
 WordList(['library', 'for', 'processing']),
 WordList(['for', 'processing', 'textual']),
 WordList(['processing', 'textual', 'data']),
 WordList(['textual', 'data', 'It']),
 WordList(['data', 'It', 'provides']),
 WordList(['It', 'provides', 'a']),
 WordList(['provides', 'a', 'simple']),
 WordList(['a', 'simple', 'API']),
 WordList(['simple', 'API', 'for']),
 WordList(['API', 'for', 'diving']),
 WordList(['for', 'diving', 'into']),
 WordList(['diving', 'into', 'common']),
 WordList(['into', 'common', 'natural']),
 WordList(['common', 'natural', 'language']),
 WordList(['natural', 'language', 'processing']),
 WordList(['language', 'processing', 'NLP']),
 WordList(['processing', 'NLP', 'tasks']),
 WordList(['NLP', 'tasks', 'such']),
 WordList(['tasks',

### 7. Sentiment analysis

Polarity ranges from -1.0 to +1.0. Subjectivity ranges from 0.0 to 1.0 where lower numbers are more objective and higher numbers are more subjective.

In [20]:
blob.sentiment

Sentiment(polarity=0.06000000000000001, subjectivity=0.4514285714285714)

In [36]:
blob2 = TextBlob("I hate seafood. I love spicy food.")

blob2.sentiment

Sentiment(polarity=-0.15000000000000002, subjectivity=0.75)

### 8. Spelling correction

Text is often messy. TextBlob can help clean it up. This is simple spell correction, and will not identify the wrong word, as in 'stake' for 'steak'.

In [39]:
messy_text = "The stake was purfect but service was horible."
blob2 = TextBlob(messy_text)

blob2.correct()

TextBlob("The stake was perfect but service was horrible.")

### 9. Language detection

Uses Google Translate API and requires internet.

In [40]:
blob.detect_language()

'en'

In [41]:
blob2 = TextBlob("Hola amor")
blob2.detect_language()

'es'

### 10. Open source

One of the benefits of the open source approach is that you can dig into the code and learn more. For example, if you want to know how the spell checker works, the code gives a link to the algorithm by Peter Norvig.

TextBlob also has text classification functionality, but other frameworks such as sklearn are probably a better choice. 