# TextBlob tutorial

[Source](https://analyticsindiamag.com/lets-learn-textblob-quickstart-a-python-library-for-processing-textual-data/)

**Textblob** is an open-source python library for processing textual data. It performs different operations on textual data such as noun phrase extraction, sentiment analysis, classification, translation, etc. 

Textblob is built on top of NLTK and Pattern also it is very easy to use and can process the text in a few lines of code. Textblob can help you start with the NLP tasks.

In [1]:
# !pip install nltk textblob

In [2]:
from textblob import TextBlob
import nltk

In [23]:
# Download dependencies
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('brown')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\agarw\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\agarw\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package brown to
[nltk_data]     C:\Users\agarw\AppData\Roaming\nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\agarw\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\wordnet.zip.


True

## Text selection for processing

In [5]:
art = '''Among the 10 countries that have reported the highest number of case in the world, daily cases are still continuously rising in only two – India and Colombia.  Other than the US and Brazil, daily cases also appear hitting a plateau in Mexico (7th spot, 480,278 cases). Russia (4th, 892,654 cases), South Africa (5th, 563,598 cases), and Chile (9th, 375,044 cases). The remaining two – Spain (10th, 370,060 cases) and Peru (7th, 483,133 cases) – managed to control outbreaks once, but are now seeing a resurgence of cases. All caseloads are from the worldometers.info dashboard. To be sure, the global Covid-19 curve has flattened twice before — first, when the Chinese outbreak peaked and the contagion was yet to reach the West; the second, when cases dropped in Europe — however, it has risen again with more ferocity both times as the virus has spread to new regions.'''

## Text processing

In [6]:
# Pass text into TextBlob function
blob = TextBlob(art)

### Find Tags

Tags function is used to find the respective tags of the particular word which describes whether the word is a noun, adjective, etc.

In [9]:
# Find tags
blob.tags

[('Among', 'IN'),
 ('the', 'DT'),
 ('10', 'CD'),
 ('countries', 'NNS'),
 ('that', 'WDT'),
 ('have', 'VBP'),
 ('reported', 'VBN'),
 ('the', 'DT'),
 ('highest', 'JJS'),
 ('number', 'NN'),
 ('of', 'IN'),
 ('case', 'NN'),
 ('in', 'IN'),
 ('the', 'DT'),
 ('world', 'NN'),
 ('daily', 'JJ'),
 ('cases', 'NNS'),
 ('are', 'VBP'),
 ('still', 'RB'),
 ('continuously', 'RB'),
 ('rising', 'VBG'),
 ('in', 'IN'),
 ('only', 'JJ'),
 ('two', 'CD'),
 ('–', 'JJ'),
 ('India', 'NNP'),
 ('and', 'CC'),
 ('Colombia', 'NNP'),
 ('Other', 'JJ'),
 ('than', 'IN'),
 ('the', 'DT'),
 ('US', 'NNP'),
 ('and', 'CC'),
 ('Brazil', 'NNP'),
 ('daily', 'JJ'),
 ('cases', 'NNS'),
 ('also', 'RB'),
 ('appear', 'VBP'),
 ('hitting', 'VBG'),
 ('a', 'DT'),
 ('plateau', 'NN'),
 ('in', 'IN'),
 ('Mexico', 'NNP'),
 ('7th', 'CD'),
 ('spot', 'NN'),
 ('480,278', 'CD'),
 ('cases', 'NNS'),
 ('Russia', 'NNP'),
 ('4th', 'CD'),
 ('892,654', 'CD'),
 ('cases', 'NNS'),
 ('South', 'NNP'),
 ('Africa', 'NNP'),
 ('5th', 'CD'),
 ('563,598', 'CD'),
 ('cases

### Noun Phrases

Noun phrases function helps us find out the noun phrases in the text given.

In [11]:
blob.noun_phrases

WordList(['india', 'colombia', 'brazil', 'mexico', '7th spot', 'russia', 'africa', 'chile', 'spain', 'peru', 'worldometers.info dashboard', 'covid-19', 'chinese outbreak', 'europe', 'new regions'])

### Sentiments

Sentiment function is used to find out the polarity and subjectivity of the text. The polarity is used to check whether the text is positive or negative and subjectivity is used to check whether the text is objective or subjective.

In [12]:
blob.sentiment

Sentiment(polarity=0.1146694214876033, subjectivity=0.3228879706152434)

### Words

Words function split the text into words that are used in the text.

In [13]:
blob.words

WordList(['Among', 'the', '10', 'countries', 'that', 'have', 'reported', 'the', 'highest', 'number', 'of', 'case', 'in', 'the', 'world', 'daily', 'cases', 'are', 'still', 'continuously', 'rising', 'in', 'only', 'two', '–', 'India', 'and', 'Colombia', 'Other', 'than', 'the', 'US', 'and', 'Brazil', 'daily', 'cases', 'also', 'appear', 'hitting', 'a', 'plateau', 'in', 'Mexico', '7th', 'spot', '480,278', 'cases', 'Russia', '4th', '892,654', 'cases', 'South', 'Africa', '5th', '563,598', 'cases', 'and', 'Chile', '9th', '375,044', 'cases', 'The', 'remaining', 'two', '–', 'Spain', '10th', '370,060', 'cases', 'and', 'Peru', '7th', '483,133', 'cases', '–', 'managed', 'to', 'control', 'outbreaks', 'once', 'but', 'are', 'now', 'seeing', 'a', 'resurgence', 'of', 'cases', 'All', 'caseloads', 'are', 'from', 'the', 'worldometers.info', 'dashboard', 'To', 'be', 'sure', 'the', 'global', 'Covid-19', 'curve', 'has', 'flattened', 'twice', 'before', '—', 'first', 'when', 'the', 'Chinese', 'outbreak', 'peaked

### Sentences

Sentences function split the text into the sentences which are used to form the text.

In [14]:
blob.sentences

[Sentence("Among the 10 countries that have reported the highest number of case in the world, daily cases are still continuously rising in only two – India and Colombia."),
 Sentence("Other than the US and Brazil, daily cases also appear hitting a plateau in Mexico (7th spot, 480,278 cases)."),
 Sentence("Russia (4th, 892,654 cases), South Africa (5th, 563,598 cases), and Chile (9th, 375,044 cases)."),
 Sentence("The remaining two – Spain (10th, 370,060 cases) and Peru (7th, 483,133 cases) – managed to control outbreaks once, but are now seeing a resurgence of cases."),
 Sentence("All caseloads are from the worldometers.info dashboard."),
 Sentence("To be sure, the global Covid-19 curve has flattened twice before — first, when the Chinese outbreak peaked and the contagion was yet to reach the West; the second, when cases dropped in Europe — however, it has risen again with more ferocity both times as the virus has spread to new regions.")]

We can also find the polarity of all individual sentences using the polarity function mentioned above.

In [15]:
for sentence in blob.sentences:
    print(sentence.sentiment.polarity)

0.0
-0.0625
0.0
0.0
0.0
0.19805194805194803


### Singularize and Pluralize

We can select different words from our text and can singularize and pluralize them. Similarly, we can pass any word and convert it into a singular or plural form.

In [16]:
word_text = blob.words

In [19]:
word_text[3], word_text[4]

('countries', 'that')

In [18]:
word_text[3].singularize()

'country'

In [20]:
word_text[4].pluralize()

'those'

### Lemmatize

Lemmatize function is used to find out the lemma for the word

In [27]:
print(word_text[:10])
word_text[:10].lemmatize()

['Among', 'the', '10', 'countries', 'that', 'have', 'reported', 'the', 'highest', 'number']


WordList(['Among', 'the', '10', 'country', 'that', 'have', 'reported', 'the', 'highest', 'number'])

### Spell check and correct

Spell check function and correct function helps in checking and correcting the spelling mistakes in our sentence or word or article.

In [28]:
sent = TextBlob("Amnog the 10 countrees that have reporded the highest number  of case in the world")

In [31]:
sent.correct()

TextBlob("Among the 10 countries that have reported the highest number  of case in the world")

In [32]:
from textblob import Word

In [33]:
w = Word('amog')

w.spellcheck()

[('among', 0.9933920704845814),
 ('amoy', 0.0022026431718061676),
 ('amos', 0.0022026431718061676),
 ('agog', 0.0022026431718061676)]

### Parsing text

By default, Textblob uses Pattern’s parser. We will parse our text using the parser function.

In [34]:
blob.parse()

'Among/IN/B-PP/B-PNP the/DT/B-NP/I-PNP 10/CD/I-NP/I-PNP countries/NNS/I-NP/I-PNP that/IN/B-PP/O have/VBP/B-VP/O reported/VBD/I-VP/O the/DT/B-NP/O highest/JJS/I-NP/O number/NN/I-NP/O of/IN/B-PP/B-PNP case/NN/B-NP/I-PNP in/IN/B-PP/B-PNP the/DT/B-NP/I-PNP world/NN/I-NP/I-PNP ,/,/O/O daily/JJ/B-NP/O cases/NNS/I-NP/O are/VBP/B-VP/O still/RB/I-VP/O continuously/RB/I-VP/O rising/VBG/I-VP/O in/IN/B-PP/O only/RB/B-ADVP/O two/CD/O/O –/,/O/O India/NNP/B-NP/O and/CC/I-NP/O Colombia/NNP/I-NP/O ././O/O\nOther/JJ/B-ADJP/O than/IN/B-PP/B-PNP the/DT/B-NP/I-PNP US/PRP/I-NP/I-PNP and/CC/O/O Brazil/NNP/B-NP/O ,/,/O/O daily/JJ/B-NP/O cases/NNS/I-NP/O also/RB/B-VP/O appear/VB/I-VP/O hitting/VBG/I-VP/O a/DT/B-NP/O plateau/NN/I-NP/O in/IN/B-PP/B-PNP Mexico/NNP/B-NP/I-PNP (/(/O/O 7th/NN/B-NP/O spot/NN/I-NP/O ,/,/O/O 480,278/CD/B-NP/O cases/NNS/I-NP/O )/)/O/O ././O/O\nRussia/NNP/B-NP/O (/(/O/O 4th/CD/O/O ,/,/O/O 892,654/CD/B-NP/O cases/NNS/I-NP/O )/)/O/O ,/,/O/O South/NNP/B-NP/O Africa/NNP/I-NP/O (/(/O/O 5th/NN

### N-Grams

N-grams function returns a tuple of n successive words from a given text. You just need to pass the value of n in the n-gram function to decide the number of words in the n-gram.

In [35]:
blob.ngrams(n=5)

[WordList(['Among', 'the', '10', 'countries', 'that']),
 WordList(['the', '10', 'countries', 'that', 'have']),
 WordList(['10', 'countries', 'that', 'have', 'reported']),
 WordList(['countries', 'that', 'have', 'reported', 'the']),
 WordList(['that', 'have', 'reported', 'the', 'highest']),
 WordList(['have', 'reported', 'the', 'highest', 'number']),
 WordList(['reported', 'the', 'highest', 'number', 'of']),
 WordList(['the', 'highest', 'number', 'of', 'case']),
 WordList(['highest', 'number', 'of', 'case', 'in']),
 WordList(['number', 'of', 'case', 'in', 'the']),
 WordList(['of', 'case', 'in', 'the', 'world']),
 WordList(['case', 'in', 'the', 'world', 'daily']),
 WordList(['in', 'the', 'world', 'daily', 'cases']),
 WordList(['the', 'world', 'daily', 'cases', 'are']),
 WordList(['world', 'daily', 'cases', 'are', 'still']),
 WordList(['daily', 'cases', 'are', 'still', 'continuously']),
 WordList(['cases', 'are', 'still', 'continuously', 'rising']),
 WordList(['are', 'still', 'continuousl

These are some of the text processing functions that are provided by textblob. We can use textblob for text processing as it is easy to use and has a lot of predefined functions.