# Textblob: A Simple Library for Natural Language Processing (NLP)

## Introduction:
NLP is branch of Computational Linguistics that focuses on making computers interact with people through human language. For example,  computer dictation- where you dictate to your computer and the voice is processed to text.

For computers to be able to process and understand text data, it involves many NLP tasks such as tokenization, noun phrase extraction parts-of-speech targing,  sentiment analysis, translation etc.

But Python offers  various libraries for performing Natural Language Processing tasks in a convenient way . One of the most prominent and easy-to-use libraries for processing textual data is TextBlob.

This blog will walks us through some important NLP tasks using the textblob library.

## Install textblob:
Let's start  by installing the library using PIP:

In [None]:
!pip install -U textblob
!python -m textblob.download_corpora


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package conll2000 to /root/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.


This will install the library as well as the necessary NLTK (Natural Language Toolkit) corpora.

To download minimum corpora instead, run:

In [None]:
python -m textblob.download_corpora lite

To install with conda:

In [None]:
conda install -c conda-forge textblob
python -m textblob.download_corpora

## Some NLP Tasks with TextBlob:
Here we are going to look at some fundamental techniques that one must know in NLP.   We'll explain each concept briefly and show ho to do it using textblob.

### 1. Tokenization:
Most NLP tasks starts by splitting the text into tokens (chuncks of information). This is called tokenization. A tokenization can be sentences-level, word-level,  sub-word-level or character-level.

The following code imports the textblob library:

In [None]:
from textblob import TextBlob

We then create a textblob object and assigned a text to it:

In [None]:
my_text = TextBlob("I am reading a blog post on Medium. I am loving it!")

Now that our TextBlob is ready, let’s perform some word-level and sentence-level tokenization. We can easily break down the sentences into words attribute:

In [None]:
my_text.words

WordList(['I', 'am', 'reading', 'a', 'blog', 'post', 'on', 'Medium', 'I', 'am', 'loving', 'it'])

For sentences:

In [None]:
my_text.sentences

[Sentence("I am reading a blog post on Medium."), Sentence("I am loving it!")]

### 2. Part-of -speech  Tagging :
Tokenized words are tagged as parts of speech- noun, verb or adverb. This helps recognize what words are related in a sentence. Let’s try the PoS tagging operation with the “my_text” object:




In [None]:
my_text.tags

[('I', 'PRP'),
 ('am', 'VBP'),
 ('reading', 'VBG'),
 ('a', 'DT'),
 ('blog', 'NN'),
 ('post', 'NN'),
 ('on', 'IN'),
 ('Medium', 'NNP'),
 ('I', 'PRP'),
 ('am', 'VBP'),
 ('loving', 'VBG'),
 ('it', 'PRP')]

### 3. Word and phrase frequencies:
We can buid a dictionary of word that interests us to know how often it appears in the text. The “word_counts” operation returns the number of counts of a particular word in the sentence:

In [None]:
betty = TextBlob("Betty Botter bought some butter. But she said the Butter’s bitter. If I put it in my batter, it will make my batter bitter. But a bit of better butter will make my batter better.")
betty.word_counts['butter']

3

### 4. Noun phrase extraction:
To understand the meaning of a sentence,  we extract the noun phrase- identifying su subject or object of the sentence. We often identify nouns to have "a", "an" or "the" preceding them.

Let’s say we want to extract the noun phrases in our sentences. This can easily be done using the noun phrases property:



In [None]:
my_text.noun_phrases

WordList(['blog post', 'medium'])

### 5. Sentiment Analysis :
We can analyze a text or sentence for sentiment- how positive or negative it is. We can measure sentiment in polarity: -0 to 1 (negative to positive),  or in subjectivity and objectivity: 0 to 1 (most objective to most subjective):

In [None]:
my_text.sentiment

Sentiment(polarity=0.75, subjectivity=0.95)

### 7. Inflection:
Takes a word to find singular or plural of it:

In [None]:
my_text.words[4].pluralize() # the word "blog"

'blogs'

### 8. Lemmatization:
Given a set of words,  a lemma is their root. For example,  flew, flies and flying have a lemma of the verb "fly".

In [None]:
from textblob import Word
w = Word("radii")
w.lemmatize()

'radius'

### 10. Definition:
TextBlob also gives the functionality of defining the given word. The property called “definitions” does the job for it:



In [None]:
Word("blog").definitions

['a shared on-line journal where people can post diary entries about their personal experiences and hobbies',
 'read, write, or edit a shared on-line journal']

### 11. Spelling correction:
The spell check operation is performed by the “correct()” method. It uses the classic approach of Peter Norvig’s “How to Write a Spelling Corrector?“

In [None]:
my_sentence = TextBlob("I am not in denger. I am the dyangr.")
my_sentence.correct()

TextBlob("I am not in danger. I am the danger.")

### 12. Synsets:

The “synsets” property returns a list of synset objects for a particular word:



In [None]:
word = Word("phone")
word.synsets

[Synset('telephone.n.01'),
 Synset('phone.n.02'),
 Synset('earphone.n.01'),
 Synset('call.v.03')]