# What is Textblob

* Textblob is a Python library for processing textual data. It is used to perform natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis Lemmatization, Stemming, Tokenization, and N-Grams.
* It is faster than NLTK, however it does not provide the functionalities like vectorization and dependency parsing.
* Official Link to Textblob: https://textblob.readthedocs.io/en/dev/
* How to Install Textblob: pip install textblob



In [10]:
!pip install --upgrade pip

Collecting pip
  Using cached pip-25.2-py3-none-any.whl.metadata (4.7 kB)
Using cached pip-25.2-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.3.1
    Uninstalling pip-24.3.1:
      Successfully uninstalled pip-24.3.1
Successfully installed pip-25.2


In [11]:
# Install Textblob
!pip install nltk
!pip install textblob



In [21]:
import nltk
nltk.download('popular') # Fetches commonly used NLTK datasets/models

[nltk_data] Downloading collection 'popular'
[nltk_data]    | 
[nltk_data]    | Downloading package cmudict to
[nltk_data]    |     /Users/jeffreyjackson/nltk_data...
[nltk_data]    |   Package cmudict is already up-to-date!
[nltk_data]    | Downloading package gazetteers to
[nltk_data]    |     /Users/jeffreyjackson/nltk_data...
[nltk_data]    |   Package gazetteers is already up-to-date!
[nltk_data]    | Downloading package genesis to
[nltk_data]    |     /Users/jeffreyjackson/nltk_data...
[nltk_data]    |   Package genesis is already up-to-date!
[nltk_data]    | Downloading package gutenberg to
[nltk_data]    |     /Users/jeffreyjackson/nltk_data...
[nltk_data]    |   Package gutenberg is already up-to-date!
[nltk_data]    | Downloading package inaugural to
[nltk_data]    |     /Users/jeffreyjackson/nltk_data...
[nltk_data]    |   Package inaugural is already up-to-date!
[nltk_data]    | Downloading package movie_reviews to
[nltk_data]    |     /Users/jeffreyjackson/nltk_data...
[nl

True

In [36]:
import nltk
nltk.download('averaged_perceptron_tagger') # This helps with Tagging tasks.

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/jeffreyjackson/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

## Functionalities of Textblob

* Language Detection
* Word Correction
* Word Count
* Phrase Extraction
* POS Tagging
* Tokenization
* Pluralization of words using Textblob
* Lemmatization using Textblob
* N-Grams in Textblob

## Language Detection

* Language detection is a process of identifying the language of a given text.
* Textblob uses the langdetect library to detect the language of a given text.
* Textblob is also able to translate text from one language to another language.
* langdetect returns ISO 639-1 codes like "en", "es", etc.
* deep-translator uses public endpoints but can occasionally be rate-limited. For production/reliable use, prefer the official Google Cloud Translate API.

In [14]:
!pip install langdetect deep-translator



In [15]:
from langdetect import detect
from deep_translator import GoogleTranslator

text = "Hello Jeffrey, how are you?"

# Detect language
detected_lang = detect(text)
print("Detected Language is:", detected_lang)

# Translate to Spanish
translated = GoogleTranslator(source='auto', target='es').translate(text)
print("Input text in Spanish:", translated)

Detected Language is: en
Input text in Spanish: Hola Jeffrey, ¿cómo estás?


## Spelling Correction

In [20]:
from textblob import TextBlob
text = """ABCD Corporation alays values ttheir employees!!!"""

In [22]:
print(text)

ABCD Corporation alays values ttheir employees!!!


In [24]:
blob = TextBlob(text)

In [25]:
blob.correct()

TextBlob("ABCD Corporation always values their employees!!!")

In [26]:
TextBlob('hasss').correct()

TextBlob("has")

In [29]:
# Notice, that sometimes it fails
TextBlob("ur food is great").correct() # "ur" should be "your"

TextBlob("or food is great")

### Word Count

With the help of word count, we can count the frequency of words or a noun phrase in a given sentence.

In [34]:
!python -m textblob.download_corpora

[nltk_data] Downloading package brown to
[nltk_data]     /Users/jeffreyjackson/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/jeffreyjackson/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/jeffreyjackson/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /Users/jeffreyjackson/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package conll2000 to
[nltk_data]     /Users/jeffreyjackson/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to
[nltk_data]     /Users/jeffreyjackson/nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!
Finished.


In [35]:
text = "Sentiment Analysis is a process by which we can find the sentiment of a text. Sentiment can be Positive, Negative or Neutral."

In [37]:
blob  = TextBlob(text)

In [38]:
blob.word_counts["analysis"]

1

In [39]:
blob.word_counts["Sentiment"]

0

In [40]:
blob.word_counts["sentiment"]

3

In [41]:
blob.word_counts["Analysis"]

0

NOTE: When counting words, capitalization is ignored and the word is converted to lowercase.

### POS Tagging (Part-of-Speech)

In [42]:
from textblob import TextBlob

text = TextBlob("My name is Jeff. I like to read about NLP. I work at ABCD Company.")
print(text.tags)

[('My', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Jeff', 'NNP'), ('I', 'PRP'), ('like', 'VBP'), ('to', 'TO'), ('read', 'VB'), ('about', 'IN'), ('NLP', 'NNP'), ('I', 'PRP'), ('work', 'VBP'), ('at', 'IN'), ('ABCD', 'NNP'), ('Company', 'NNP')]


In [46]:
new_tuple=[]
for i in text.tags:
    print(i)
    if 'VBP' not in i[1]:
        new_tuple.append(i)

('My', 'PRP$')
('name', 'NN')
('is', 'VBZ')
('Jeff', 'NNP')
('I', 'PRP')
('like', 'VBP')
('to', 'TO')
('read', 'VB')
('about', 'IN')
('NLP', 'NNP')
('I', 'PRP')
('work', 'VBP')
('at', 'IN')
('ABCD', 'NNP')
('Company', 'NNP')


In [47]:
new_tuple

[('My', 'PRP$'),
 ('name', 'NN'),
 ('is', 'VBZ'),
 ('Jeff', 'NNP'),
 ('I', 'PRP'),
 ('to', 'TO'),
 ('read', 'VB'),
 ('about', 'IN'),
 ('NLP', 'NNP'),
 ('I', 'PRP'),
 ('at', 'IN'),
 ('ABCD', 'NNP'),
 ('Company', 'NNP')]

In [51]:
value=''
for i in new_tuple:
    value = value +" " + "".join(i[0])

In [52]:
value

' My name is Jeff I to read about NLP I at ABCD Company'

Tokenization
* Corpus (or corpora in plural) is a collection of texts or documents.
* Tokenization is the process of breaking down a corpus into smaller units, called tokens.
* Tokens are the basic units of text, such as words, characters, or subwords.
* Tokens are the total number of words in a text (corpus), regardless of their frequency or occurrence in the text. Tokens are a string of consecutive characters that lies between two spaces or a space and punctuation.
* For example, the corpus "Hello Jeffrey, how are you?" has 6 tokens: "Hello", "Jeffrey", "how", "are", "you", and "?".
* Another example, if you ahve a string "abc_123_defg", and you split it based on the underscore value, it will be tokenized as "abc", "123", and "defg".

In [53]:
text = """
R is a comprehensive statistical and graphical programming language, which is growing in popularity among data analysts."""

In [54]:
blob_object = TextBlob(text)

In [55]:
# Word tokenization of the sample corpus
corpus_words = blob_object.words

In [56]:
corpus_words

WordList(['R', 'is', 'a', 'comprehensive', 'statistical', 'and', 'graphical', 'programming', 'language', 'which', 'is', 'growing', 'in', 'popularity', 'among', 'data', 'analysts'])

In [57]:
print(len(corpus_words))

17


In [58]:
corpus_sentences = blob_object.sentences

In [59]:
corpus_sentences

[Sentence("
 R is a comprehensive statistical and graphical programming language, which is growing in popularity among data analysts.")]

In [60]:
print(len(corpus_sentences))

1


Pluralization of words using Textblob

In [61]:
from textblob import Word
w = Word('Platform')
w.pluralize()

'Platforms'

In [62]:
from textblob import Word
w = Word('Platforms')
w.pluralize()

'Platformss'

In [64]:
blob = TextBlob("Great Learning is a great platform to learn data science.  \n It helps the community through blogs, YouTube, GLA, etc")
for word, pos in blob.tags:
    if pos == 'NN':
        print (word.pluralize())

platforms
sciences
communities
etcs


Lemmatization using Textblob

In [65]:
blob = TextBlob("Great Learning is a great platform to learn data science.  \n It helps the community through blogs, YouTube, GLA, etc")
words = blob.words

for word in words:
    print("ORIGINAL:", word, "| LEMMA:", word.lemmatize(), "| STEM:", word.stem())

ORIGINAL: Great | LEMMA: Great | STEM: great
ORIGINAL: Learning | LEMMA: Learning | STEM: learn
ORIGINAL: is | LEMMA: is | STEM: is
ORIGINAL: a | LEMMA: a | STEM: a
ORIGINAL: great | LEMMA: great | STEM: great
ORIGINAL: platform | LEMMA: platform | STEM: platform
ORIGINAL: to | LEMMA: to | STEM: to
ORIGINAL: learn | LEMMA: learn | STEM: learn
ORIGINAL: data | LEMMA: data | STEM: data
ORIGINAL: science | LEMMA: science | STEM: scienc
ORIGINAL: It | LEMMA: It | STEM: it
ORIGINAL: helps | LEMMA: help | STEM: help
ORIGINAL: the | LEMMA: the | STEM: the
ORIGINAL: community | LEMMA: community | STEM: commun
ORIGINAL: through | LEMMA: through | STEM: through
ORIGINAL: blogs | LEMMA: blog | STEM: blog
ORIGINAL: YouTube | LEMMA: YouTube | STEM: youtub
ORIGINAL: GLA | LEMMA: GLA | STEM: gla
ORIGINAL: etc | LEMMA: etc | STEM: etc


In [66]:
w = Word("learning")
w.lemmatize("n") # n for noun

'learning'

In [67]:
w = Word("learning")
w.lemmatize("v") # v for verb

'learn'

In [68]:
w = Word("peoples")
w.lemmatize("n") # n for noun

'people'

n-gram in Textblob

An N-gram is an N-token sequence of words. A 2gram (more commonly call a bigram) is a two-word sequence of words like "really good", "not good", or "your homework". A 3-gram (more commonly called a trigram) is a three-word sequence of words like "not at all", or "I am happy".

In [69]:
blob

TextBlob("Great Learning is a great platform to learn data science.  
 It helps the community through blogs, YouTube, GLA, etc")

In [70]:
blob.ngrams(n=1)

[WordList(['Great']),
 WordList(['Learning']),
 WordList(['is']),
 WordList(['a']),
 WordList(['great']),
 WordList(['platform']),
 WordList(['to']),
 WordList(['learn']),
 WordList(['data']),
 WordList(['science']),
 WordList(['It']),
 WordList(['helps']),
 WordList(['the']),
 WordList(['community']),
 WordList(['through']),
 WordList(['blogs']),
 WordList(['YouTube']),
 WordList(['GLA']),
 WordList(['etc'])]

In [71]:
blob.ngrams(n=2)

[WordList(['Great', 'Learning']),
 WordList(['Learning', 'is']),
 WordList(['is', 'a']),
 WordList(['a', 'great']),
 WordList(['great', 'platform']),
 WordList(['platform', 'to']),
 WordList(['to', 'learn']),
 WordList(['learn', 'data']),
 WordList(['data', 'science']),
 WordList(['science', 'It']),
 WordList(['It', 'helps']),
 WordList(['helps', 'the']),
 WordList(['the', 'community']),
 WordList(['community', 'through']),
 WordList(['through', 'blogs']),
 WordList(['blogs', 'YouTube']),
 WordList(['YouTube', 'GLA']),
 WordList(['GLA', 'etc'])]

In [72]:
blob.ngrams(n=3)

[WordList(['Great', 'Learning', 'is']),
 WordList(['Learning', 'is', 'a']),
 WordList(['is', 'a', 'great']),
 WordList(['a', 'great', 'platform']),
 WordList(['great', 'platform', 'to']),
 WordList(['platform', 'to', 'learn']),
 WordList(['to', 'learn', 'data']),
 WordList(['learn', 'data', 'science']),
 WordList(['data', 'science', 'It']),
 WordList(['science', 'It', 'helps']),
 WordList(['It', 'helps', 'the']),
 WordList(['helps', 'the', 'community']),
 WordList(['the', 'community', 'through']),
 WordList(['community', 'through', 'blogs']),
 WordList(['through', 'blogs', 'YouTube']),
 WordList(['blogs', 'YouTube', 'GLA']),
 WordList(['YouTube', 'GLA', 'etc'])]

In [73]:
blob.ngrams(n=4)

[WordList(['Great', 'Learning', 'is', 'a']),
 WordList(['Learning', 'is', 'a', 'great']),
 WordList(['is', 'a', 'great', 'platform']),
 WordList(['a', 'great', 'platform', 'to']),
 WordList(['great', 'platform', 'to', 'learn']),
 WordList(['platform', 'to', 'learn', 'data']),
 WordList(['to', 'learn', 'data', 'science']),
 WordList(['learn', 'data', 'science', 'It']),
 WordList(['data', 'science', 'It', 'helps']),
 WordList(['science', 'It', 'helps', 'the']),
 WordList(['It', 'helps', 'the', 'community']),
 WordList(['helps', 'the', 'community', 'through']),
 WordList(['the', 'community', 'through', 'blogs']),
 WordList(['community', 'through', 'blogs', 'YouTube']),
 WordList(['through', 'blogs', 'YouTube', 'GLA']),
 WordList(['blogs', 'YouTube', 'GLA', 'etc'])]