# TextBlob: Simplified Text Processing 

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation,and more.

TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

## Features
- Noun phrase extraction
- Part-of-speech tagging
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- Add new models or languages through extensions
- WordNet integration

## Installation

### Installing/Uploading from the PyPi
     
      pip install textblob
      python -m textblob.download_corpora

This will install TextBlob and download the necessary NLTK corpora. If you need to change the default download directory set the NLTK_DATA environment variable.

### Downloading the minimum corpora:

If you only intend to use TextBlob’s default models (no model overrides), you can pass the lite argument. This downloads only those corpora needed for basic functionality.

      python -m textblob.download_corpora lite
      
### Installing with Conda

***Note:*** Conda builds are currently available for Mac OSX only.

TextBlob is also available as a conda package. To install with conda, run

     conda install -c https://conda.anaconda.org/sloria textblob
     python -m textblob.download_corpora
     
### Python

TextBlob supports Python >=2.7 or >=3.4.

### Dependencies

TextBlob depends on NLTK3. NLTK will be installed automatically when you run
     
     pip install textblob
     
     
TextBlob aims to provide access to common text-processing operations through a familiar interface. You can treat **TextBlob** objects as if they were Python strings that learned how to do Natural Language Processing.

### Create a TextBlob

First, the import

In [2]:
!pip install textblob

Defaulting to user installation because normal site-packages is not writeable
Collecting textblob
  Downloading textblob-0.17.1-py2.py3-none-any.whl (636 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m636.8/636.8 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: textblob
Successfully installed textblob-0.17.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m23.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
from textblob import TextBlob

Let's create our first **TextBlob**

In [5]:
data = TextBlob("I love Natural Language Processing, not you!")

In [6]:
# word tokenization

data.words

WordList(['I', 'love', 'Natural', 'Language', 'Processing', 'not', 'you'])

In [7]:
data2 = TextBlob("I love Natural Language Processing, not you! . i am hasmukh mer")
data2.sentences


[Sentence("I love Natural Language Processing, not you! ."),
 Sentence("i am hasmukh mer")]

In [8]:
for i in data2.sentences:
    print(i)

I love Natural Language Processing, not you! .
i am hasmukh mer


### Part-of-speech(POS) Tagging

Parts-of-speech tags can be accessed through the **tags** property.

In [9]:
data3 = TextBlob("I love Natural Language Processing, not you!")
data3.tags


[('I', 'PRP'),
 ('love', 'VBP'),
 ('Natural', 'JJ'),
 ('Language', 'NNP'),
 ('Processing', 'NNP'),
 ('not', 'RB'),
 ('you', 'PRP')]

### Noun Phrase Extraction

Similarly, noun phrases are accessed through the **noun_phrases** property.

In [12]:
import nltk
nltk.download('brown') # required for noun
data4 = TextBlob("I am hasmukh mer and working as a machine learning engineer")
data4.noun_phrases


[nltk_data] Downloading package brown to /home/tech/nltk_data...
[nltk_data]   Package brown is already up-to-date!


WordList(['hasmukh mer', 'machine learning engineer'])

### Sentiment Analysis

The sentiment property returns a named tuple of the form Sentiment(polarity, subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

In [15]:
data5 = TextBlob("i like very much to learn new things everyday")
data5.sentiment

Sentiment(polarity=0.06545454545454545, subjectivity=0.4381818181818182)

In [14]:
data6 = TextBlob("i am very happy")
data6.sentiment

Sentiment(polarity=1.0, subjectivity=1.0)

  ## Word Inflection and lemmatization

  Each word in the **TextBlob.words** or **Sentence.words** is a **Word** object(a subclass of unicode) with useful methods, e.g. for word inflection.

In [16]:
data7 = TextBlob('Use 4 spaces per indentation level')

data7.words

WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])

In [18]:
for i in data7.words:
    print(i.lemmatize())

Use
4
space
per
indentation
level


In [19]:
data7.words[2].singularize()


'space'

In [20]:
data7.words[0].pluralize()


'Uses'

You can access the definitions for each synset via the **definitions** property or the **define()** method, which can also take an optional part-of-speech(pos) argument.

In [23]:
from textblob import Word
# Word("machine learning").definitions
Word("height").definitions



['the vertical dimension of extension; distance from the base of something to the top',
 'the highest level or degree attainable; the highest stage of development',
 '(of a standing person) the distance from head to foot',
 "elevation especially above sea level or above the earth's surface"]

### Spelling Correction

For correcting the words you can use **correct()** method to attempt spelling correction.

In [26]:
g = TextBlob('can you pronounce machina learnig engineer?')
print(g.correct())

can you pronounce machine learning engineer?


Word objects have a **spellcheck() Word.spellcheck()** , this method that returns a list of (word, confidence) tuples with spelling suggestions.

In [27]:
from textblob import Word
k = Word('machina')
k.spellcheck()

[('machine', 0.975), ('gachina', 0.025)]

This spelling correction is based on the Peter Norvig's "How to Write a Spelling Corrector", as implemented in the pattern library. It is about 70% accurate.

### Get Word and Noun Phrase Frequencies

There are two ways to get the frequency of a word or noun phrase in the **TextBlob**

The first one is through the word_counts dictionary.

In [28]:
sent = TextBlob('She sales sea shells at the sea shore.')

sent.word_counts['sea']

2

In [29]:
sent2 = TextBlob('She sales sea shells at the sea shore.')

sent2.word_counts['Sea']

0

If you access the frequencies this way, the search will not be case sensitive, and words that are not found will have a frequency of 0.

The second way is to use the count() method.

In [32]:
sent2 = TextBlob('She sales sea shells at the sea shore.')
sent2.words.count('sea')

2

In [33]:
sent3 = TextBlob('She sales sea shells at the sea shore.')
sent3.words.count('Sea')


2

You can specify whether or not the search should be case-sensitive (default is False).

In [34]:
sent4 = TextBlob('She sales sea shells at the sea shore.')

sent4.words.count('Sea', case_sensitive=True) 

0

In [35]:
sent4 = TextBlob('She sales sea shells at the sea shore.')

sent4.words.count('Sea', case_sensitive=False) 

2

## Translation and Language Detection

TextBlobs can be translated between languages.

In [49]:
trns = TextBlob('I am hasmukh mer a machine learning engineer')
print(trns)
trns.translate(from_lang="es",to='hi')

I am hasmukh mer a machine learning engineer


TextBlob("मैं हसमुख मेर एक मशीन लर्निंग इंजीनियर हूं")

In [50]:
trns.translate(from_lang="es",to='zh-CN')


TextBlob("我是一名机器学习工程师")

In [51]:
chinese_blob = TextBlob("我是一名机器学习工程师")
chinese_blob.translate(from_lang="zh-CN", to='en')

TextBlob("I am a machine learning engineer")

### TextBlobs Are Like Python Strings!

You can use Python’s substring syntax.

In [56]:
data8=TextBlob("hasmukh mer a machine learning engineer")
data8

TextBlob("hasmukh mer a machine learning engineer")

In [57]:
data8[:9]

TextBlob("hasmukh m")

In [58]:
data8.find("mer")

8


You can make comparisons between TextBlobs and strings.

In [61]:
a_blob = TextBlob('aple')
s_blob = TextBlob('samsumg')

a_blob < s_blob

True

You can concatenate and interpolate TextBlobs and strings.

In [62]:
a_blob + ' and ' + s_blob

TextBlob("aple and samsumg")

In [63]:
"{0} and {1}".format(a_blob,s_blob)

'aple and samsumg'

### n-grams

The **TextBlob.ngrams()** method returns a list of tuples of n successive words.

In [64]:
blob = TextBlob("Now is better than never.")
blob.ngrams(n=3)

[WordList(['Now', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'never'])]

### Get Start and End Indices of Sentences

Use sentence.start and sentence.end to get the indices where a sentence starts and ends within a **TextBlob.**

In [69]:
sent5=TextBlob("i am hasmukh mer. machine learning engineer")
for k in sent5.sentences:
    print("************")
    print(k)
   
    print("---- Starts at index {}, Ends at index {}".format(k.start, k.end))

************
i am hasmukh mer.
---- Starts at index 0, Ends at index 17
************
machine learning engineer
---- Starts at index 18, Ends at index 43


# Let's start building the Text Classification system

The __textblob.classifiers__ module makes it simple to create custom classifiers.

As an example, let’s create a custom sentiment analyzer.

## Loading Data and Creating a Classifier

First we’ll create some training and test data.

In [70]:
train = [
     ('I love this sandwich.', 'pos'),
     ('this is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('this is my best work.', 'pos'),
     ("what an awesome view", 'pos'),
     ('I do not like this restaurant', 'neg'),
     ('I am tired of this stuff.', 'neg'),
     ("I can't deal with this", 'neg'),
     ('he is my sworn enemy!', 'neg'),
     ('my boss is horrible.', 'neg')
]

test = [
     ('the beer was good.', 'pos'),
     ('I do not enjoy my job', 'neg'),
     ("I ain't feeling dandy today.", 'neg'),
     ("I feel amazing!", 'pos'),
     ('Gary is a friend of mine.', 'pos'),
     ("I can't believe I'm doing this.", 'neg')
]

Now we’ll create a Naive Bayes classifier, passing the training data into the constructor.

In [71]:
from textblob.classifiers import NaiveBayesClassifier
cl = NaiveBayesClassifier(train)

### Loading Data from Files

You can also load data from common file formats including CSV, JSON, and TSV.

CSV files should be formatted like so:

      I love this sandwich.,pos
      This is an amazing place!,pos
      I do not like this restaurant,neg
      
JSON files should be formatted like so:

[
    {"text": "I love this sandwich.", "label": "pos"},
    {"text": "This is an amazing place!", "label": "pos"},
    {"text": "I do not like this restaurant", "label": "neg"}
]

You can then pass the opened file into the constructor.

In [73]:
# with open('train.json', 'r') as fp:
#     cl = NaiveBayesClassifier(fp, format="json")

## Classifying Text

Call the *classify(text)* method to use the classifier.

In [74]:
cl.classify("This is an amazing library!")

'pos'

You can get the label probability distribution with the *prob_classify(text)* method.

In [76]:
prob_dist = cl.prob_classify("I am suffering from cough and cold.")
prob_dist.max()

'neg'

In [77]:
round(prob_dist.prob("neg"), 2)


0.81

In [78]:
round(prob_dist.prob("pos"), 2)

0.19

## Classifying TextBlobs

Another way to classify text is to pass a classifier into the constructor of TextBlob and call its *classify()* method.

In [81]:
from textblob import TextBlob
blob = TextBlob("Alcohol is good. But the hangover is horrible.", classifier=cl)
blob.classify()

'pos'

The advantage of this approach is that you can classify sentences within a **TextBlob**.

In [82]:
for b in blob.sentences:
    print(b)
    print(b.classify())

Alcohol is good.
pos
But the hangover is horrible.
neg


## Evaluating Classifiers

To compute the accuracy on our test set, use the **accuracy(test_data)** method.

In [83]:
cl.accuracy(test)

0.8333333333333334

Use the show_informative_features() method to display a listing of the most informative features.

In [84]:
cl.show_informative_features(5)

Most Informative Features
            contains(my) = True              neg : pos    =      1.7 : 1.0
            contains(an) = False             neg : pos    =      1.6 : 1.0
             contains(I) = False             pos : neg    =      1.4 : 1.0
             contains(I) = True              neg : pos    =      1.4 : 1.0
            contains(my) = False             pos : neg    =      1.3 : 1.0


## Updating Classifiers with New Data

Use the update(new_data) method to update a classifier with new training data.


In [85]:
new_data = [('She is my best friend.', 'pos'),
           ("I'm happy to have a new friend.", 'pos'),
           ("Stay thirsty, my friend.", 'pos'),             
           ("He ain't from around here.", 'neg')]

cl.update(new_data)

True

In [86]:
cl.accuracy(test)

1.0