# TextBlob

- TextBlob is a Python library use for processing textual data.
- It provides a simple API to access its methods and perform basic NLP tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
## Install and Import Textblob

Execute **pip install textblob** on Anaconda/command prompt.

to import Textblob in jupyter- **from textblob import TextBlob**

In [1]:
from textblob import TextBlob

In [2]:
text = '''
       Machine learning (ML) is the scientific study of algorithms and statistical models that 
       computer systems use to perform a specific task without using explicit instructions, 
       relying on patterns and inference instead. It is seen as a subset of artificial intelligence.
       '''
text

'\n       Machine learning (ML) is the scientific study of algorithms and statistical models that \n       computer systems use to perform a specific task without using explicit instructions, \n       relying on patterns and inference instead. It is seen as a subset of artificial intelligence.\n       '

In [8]:
tb = TextBlob(text)
tb

TextBlob("
       Machine learning (ML) is the scientific study of algorithms and statistical models that 
       computer systems use to perform a specific task without using explicit instructions, 
       relying on patterns and inference instead. It is seen as a subset of artificial intelligence.
       ")

## Sentence
Can Break Text into Sentence.

In [18]:
tb.sentences

[Sentence("
        Machine learning (ML) is the scientific study of algorithms and statistical models that 
        computer systems use to perform a specific task without using explicit instructions, 
        relying on patterns and inference instead."),
 Sentence("It is seen as a subset of artificial intelligence.")]

## Tokenization
Tokenization dividing text into a sequence of **tokens** or **words**.

In [7]:
tb.words

WordList(['Machine', 'learning', 'ML', 'is', 'the', 'scientific', 'study', 'of', 'algorithms', 'and', 'statistical', 'models', 'that', 'computer', 'systems', 'use', 'to', 'perform', 'a', 'specific', 'task', 'without', 'using', 'explicit', 'instructions', 'relying', 'on', 'patterns', 'and', 'inference', 'instead', 'It', 'is', 'seen', 'as', 'a', 'subset', 'of', 'artificial', 'intelligence'])

## Word Count
It give the Count of each words in text

In [15]:
tb.word_counts

defaultdict(int,
            {'machine': 1,
             'learning': 1,
             'ml': 1,
             'is': 2,
             'the': 1,
             'scientific': 1,
             'study': 1,
             'of': 2,
             'algorithms': 1,
             'and': 2,
             'statistical': 1,
             'models': 1,
             'that': 1,
             'computer': 1,
             'systems': 1,
             'use': 1,
             'to': 1,
             'perform': 1,
             'a': 2,
             'specific': 1,
             'task': 1,
             'without': 1,
             'using': 1,
             'explicit': 1,
             'instructions': 1,
             'relying': 1,
             'on': 1,
             'patterns': 1,
             'inference': 1,
             'instead': 1,
             'it': 1,
             'seen': 1,
             'as': 1,
             'subset': 1,
             'artificial': 1,
             'intelligence': 1})

In [13]:
tb.np_counts            # it return count of noun words only.

defaultdict(int,
            {'machine': 1,
             'ml': 1,
             'scientific study': 1,
             'statistical models': 1,
             'computer systems': 1,
             'specific task': 1,
             'explicit instructions': 1,
             'artificial intelligence': 1})

## N-grams
A combination of multiple words together are called N-Grams.It returns a tuple of n successive words.

In [6]:
txt = TextBlob("Now is better than never.")
txt.ngrams(n=2)

[WordList(['Now', 'is']),
 WordList(['is', 'better']),
 WordList(['better', 'than']),
 WordList(['than', 'never'])]

## Converting to Upper and Lowercase
it convert the whole sentence to upper or lower case.

In [9]:
tb.upper()

TextBlob("
       MACHINE LEARNING (ML) IS THE SCIENTIFIC STUDY OF ALGORITHMS AND STATISTICAL MODELS THAT 
       COMPUTER SYSTEMS USE TO PERFORM A SPECIFIC TASK WITHOUT USING EXPLICIT INSTRUCTIONS, 
       RELYING ON PATTERNS AND INFERENCE INSTEAD. IT IS SEEN AS A SUBSET OF ARTIFICIAL INTELLIGENCE.
       ")

In [10]:
tb.lower()

TextBlob("
       machine learning (ml) is the scientific study of algorithms and statistical models that 
       computer systems use to perform a specific task without using explicit instructions, 
       relying on patterns and inference instead. it is seen as a subset of artificial intelligence.
       ")

## Noun Phrase Extraction
It return all noun words of text.

In [8]:
tb.noun_phrases

WordList(['machine', 'ml', 'scientific study', 'statistical models', 'computer systems', 'specific task', 'explicit instructions', 'artificial intelligence'])

## Part of speech Tagging
It return part of speech of each word in the text Example noun, pronoun, adjective,verb,adverb etc. 

In [9]:
tb.tags  

[('Machine', 'NN'),
 ('learning', 'NN'),
 ('ML', 'NNP'),
 ('is', 'VBZ'),
 ('the', 'DT'),
 ('scientific', 'JJ'),
 ('study', 'NN'),
 ('of', 'IN'),
 ('algorithms', 'NN'),
 ('and', 'CC'),
 ('statistical', 'JJ'),
 ('models', 'NNS'),
 ('that', 'IN'),
 ('computer', 'NN'),
 ('systems', 'NNS'),
 ('use', 'VBP'),
 ('to', 'TO'),
 ('perform', 'VB'),
 ('a', 'DT'),
 ('specific', 'JJ'),
 ('task', 'NN'),
 ('without', 'IN'),
 ('using', 'VBG'),
 ('explicit', 'JJ'),
 ('instructions', 'NNS'),
 ('relying', 'VBG'),
 ('on', 'IN'),
 ('patterns', 'NNS'),
 ('and', 'CC'),
 ('inference', 'NN'),
 ('instead', 'RB'),
 ('It', 'PRP'),
 ('is', 'VBZ'),
 ('seen', 'VBN'),
 ('as', 'IN'),
 ('a', 'DT'),
 ('subset', 'NN'),
 ('of', 'IN'),
 ('artificial', 'JJ'),
 ('intelligence', 'NN')]

## Words Inflection and Lemmatization
Inflection is a process of word formation in which characters are added to the base form of a word to express grammatical meanings.

In [19]:
tb.words[3:20:2]           # similar as list sliceing in Python.

WordList(['is', 'scientific', 'of', 'and', 'models', 'computer', 'use', 'perform', 'specific'])

In [18]:
tb.words[:20].singularize()       # convert all words in text into its singular form.

WordList(['Machine', 'learning', 'ML', 'is', 'the', 'scientific', 'study', 'of', 'algorithm', 'and', 'statistical', 'model', 'that', 'computer', 'system', 'use', 'to', 'perform', 'a', 'specific'])

In [20]:
tb.words[:20].pluralize()          # convert all words in text into its plural form

WordList(['Machines', 'learnings', 'MLs', 'iss', 'thes', 'scientifics', 'studies', 'ofs', 'algorithmss', 'ands', 'statisticals', 'modelss', 'those', 'computers', 'systemss', 'uses', 'toes', 'performs', 'some', 'specifics'])

In [25]:
from textblob import Word
w = Word("patterns")
w.lemmatize()

'pattern'

In [26]:
w = Word("learning")
w.lemmatize(pos = "v")  # Pass in WordNet part of speech (verb)

'learn'

## WordNet Integration
We can access the synsets for a Word via the synsets property or get_synsets.

In [4]:
from textblob import Word
from textblob.wordnet import VERB
word = Word("octopus")
word.synsets

[Synset('octopus.n.01'), Synset('octopus.n.02')]

In [11]:
Word("go").get_synsets(pos=VERB)

[Synset('travel.v.01'),
 Synset('go.v.02'),
 Synset('go.v.03'),
 Synset('become.v.01'),
 Synset('go.v.05'),
 Synset('run.v.05'),
 Synset('run.v.03'),
 Synset('proceed.v.04'),
 Synset('go.v.09'),
 Synset('go.v.10'),
 Synset('sound.v.02'),
 Synset('function.v.01'),
 Synset('run_low.v.01'),
 Synset('move.v.13'),
 Synset('survive.v.01'),
 Synset('go.v.16'),
 Synset('die.v.01'),
 Synset('belong.v.03'),
 Synset('go.v.19'),
 Synset('start.v.09'),
 Synset('move.v.15'),
 Synset('go.v.22'),
 Synset('go.v.23'),
 Synset('blend.v.02'),
 Synset('go.v.25'),
 Synset('fit.v.02'),
 Synset('rifle.v.02'),
 Synset('go.v.28'),
 Synset('plump.v.04'),
 Synset('fail.v.04')]

### Word Definition
we can access the definitions for each synset via the definitions property. 

In [12]:
Word("application").definitions

['the act of bringing something to bear; using it for a particular purpose',
 'a verbal or written request for assistance or employment or admission to a school',
 'the work of applying something',
 'a program that gives a computer instructions that provide the user with tools to accomplish a task',
 'liquid preparation having a soothing or antiseptic or medicinal action when applied to the skin',
 'a diligent effort',
 'the action of putting something into operation']

In [14]:
Word("go").define(pos = VERB)

['change location; move, travel, or proceed, also metaphorically',
 'follow a procedure or take a course',
 'move away from a place into another direction',
 'enter or assume a certain state or condition',
 'be awarded; be allotted',
 'have a particular form',
 'stretch out over a distance, space, time, or scope; run or extend between two points or beyond a certain point',
 'follow a certain course',
 'be abolished or discarded',
 'be or continue to be in a certain condition',
 'make a certain noise or sound',
 'perform as expected when applied',
 'to be spent or finished',
 'progress by being changed',
 'continue to live through hardship or adversity',
 'pass, fare, or elapse; of a certain state of affairs or action',
 'pass from physical life and lose all bodily attributes and functions necessary to sustain life',
 'be in the right place or situation',
 'be ranked or compare',
 'begin or set in motion',
 "have a turn; make one's move in a game",
 'be contained in',
 'be sounded, play

## Spelling Correction
Use the **correct()** method to attempt spelling correction.

In [16]:
msg = TextBlob('I havv godo speling!')
print(msg.correct())

I have good spelling!


For Word object **Word.spellcheck()** method use for spelling correction.It returns a list of (word, confidence) tuples with spelling suggestions.

In [17]:
from textblob import Word
w = Word('undarstond')
w.spellcheck()

[('understand', 0.6498422712933754), ('understood', 0.3501577287066246)]

## Translation and Language Detection
To translate text from one language to another, you simply have to pass the text to the TextBlob object and then call the translate method on the object.

In [19]:
tb.translate(to="es")

TextBlob("El aprendizaje automático (ML) es el estudio científico de algoritmos y modelos estadísticos que
       los sistemas informáticos usan para realizar una tarea específica sin usar instrucciones explícitas,
       confiando en patrones e inferencia en su lugar. Es visto como un subconjunto de inteligencia artificial.")

In [20]:
chinese_blob = TextBlob(u"美丽优于丑陋")
chinese_blob.translate(from_lang="zh-CN", to='en')

TextBlob("Beauty is better than ugly")

In [21]:
tb.translate(from_lang = 'en', to = 'hi')

TextBlob("मशीन लर्निंग (एमएल) एल्गोरिदम और सांख्यिकीय मॉडल का वैज्ञानिक अध्ययन है
       कंप्यूटर सिस्टम स्पष्ट निर्देशों का उपयोग किए बिना एक विशिष्ट कार्य करने के लिए उपयोग करते हैं,
       इसके बजाय पैटर्न और अनुमान पर भरोसा करना। इसे कृत्रिम बुद्धिमत्ता के सबसेट के रूप में देखा जाता है।")

**If no source language is specified, TextBlob will attempt to detect the language.**

In [22]:
b = TextBlob(u"بسيط هو أفضل من مجمع")
b.detect_language()

'ar'