<a href="https://colab.research.google.com/github/jay05Hawk/TextBlob/blob/main/TextBlob.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#$\color{red}{\text{TextBlob:}}$ Simplified Text Processing 
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation,and more.

TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

$\color{red}{\text{Features:}}$
- Noun phrase extraction
- Part-of-speech tagging
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- Add new models or languages through extensions
- WordNet integration

In [None]:
from textblob import TextBlob

#$\color{red}{\text{Let's create our first TextBlob}}$

In [None]:
wiki = TextBlob("I love Natural Language Processing, not you!")

###$\color{red}{\text{Part-of-speech(POS) Tagging}}$

In [None]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
wiki.tags

In [None]:
!pip install TextBlob
from textblob import TextBlob, Word, Blobber
#wiki.noun_phrases

In [None]:
import nltk

nltk.download('all')

###$\color{red}{\text{Noun Phrase Extraction}}$ 

Similarly, noun phrases are accessed through the **noun_phrases** property.

In [None]:
wiki.noun_phrases

### $\color{red}{\text{Sentiment Analysis}}$ 

The sentiment property returns a named tuple of the form Sentiment(polarity, subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

In [None]:
testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")
testimonial.sentiment

In [None]:
testimonial.sentiment.subjectivity

### $\color{red}{\text{Tokenization}}$


In [None]:
zen = TextBlob("Data is a new fuel. "
               "Explicit is better than implicit. "
               "Simple is better than complex. ")
               
zen.words

In [None]:
zen.sentences

In [None]:
for sentence in zen.sentences:
    print(sentence)

## $\color{red}{\text{Word Inflection and lemmatization}}$

Each word in the **TextBlob.words** or **Sentence.words** is a **Word** object(a subclass of unicode) with useful methods, e.g. for word inflection.

In [None]:
sentence = TextBlob('Use 4 spaces per indentation level')

sentence.words

In [None]:
sentence.words[2].singularize()

In [None]:
sentence.words[0].pluralize()

Words can be lemmatized just by calling the **$\color{red}{\text{lemmatize}}$** method.

In [None]:
from textblob import Word

q = Word('lions')
q.lemmatize()

In [None]:
q = Word("went")
q.lemmatize("v") #Pass in WordNet part of speech (verb)

## $\color{red}{\text{WordNet Integeration}}$

You can access the synets for a **Word** via the **synsets** property or the **get_synsets** method optionally passing in a parts-of-speech.

### WordNet 

   WordNet is a lexical database that is dictionary for the English language, it is specifically for the natural language     processing.
### Synset

   It is a special kind of a simple interface that is present in the NLTK for look up words in WordNet. Synset instances are    the groupings of synonymous that express the same type of concept. Some words have only one synset and some have several.

In [None]:
from textblob import Word
from textblob.wordnet import VERB
word = Word("goat")
word.synsets

In [None]:
Word("hack").get_synsets(pos=VERB)

You can access the definitions for each synset via the **definitions** property or the **define()** method, which can also take an optional part-of-speech(pos) argument.

In [None]:
Word("length").definitions

In [None]:
Word("corpus").definitions

You can also create $\color{blue}{\text{synsets}}$ directly.

In [None]:
from textblob.wordnet import Synset
octopus = Synset('octopus.n.02')
shrimp = Synset('shrimp.n.03')
octopus.path_similarity(shrimp)

###$\color{red}{\text{WordLists}}$ 

A wordlist is just the Python list with additional methods.

WordLists will find it out the words which are in the sentence and ignore the spaces in between them. 

In [None]:
animals = TextBlob("cow sheep octopus")
animals.words

In [None]:
animals.words.pluralize() # It'll pluralize the words

### $\color{red}{\text{Spelling Correction}}$ 

For correcting the words you can use **$\color{blue}{\text{correct()}}$** method to attempt spelling correction.

In [None]:
g = TextBlob('can yyou pronounce thankk?')
print(g.correct())

Word objects have a **spellcheck() Word.spellcheck()** , this method that returns a list of (word, confidence) tuples with spelling suggestions.

In [None]:
from textblob import Word
k = Word('longituode')
k.spellcheck()

### Get Word and Noun Phrase Frequencies

There are two ways to get the frequency of a word or noun phrase in the **TextBlob**

The first one is through the word_counts dictionary.

In [None]:
sent = TextBlob('She sales sea shells at the sea shore.')

sent.word_counts['sea']

In [None]:
sent.words.count('Sea', case_sensitive=True) #You can specify whether or not the search should be case-sensitive (default is False).

## $\color{red}{\text{Translation and Language Detection}}$

TextBlobs can be translated between languages.

In [None]:


blob = TextBlob(u"My mama always said life was like a box of chocolates .")
blob.translate(from_lang="en",to='hi')


In [None]:
chinese_blob = TextBlob(u"有总比没有好 杰马塔迪")
chinese_blob.translate(from_lang="zh-CN", to='en')

In [None]:
b = TextBlob("bonjour")
b.detect_language()

In [None]:
blob = TextBlob("कुछ नहीं से कुछ भला")

print(blob.detect_language())

## Parsing

Use the **parse()** method to parse the text.

In [None]:
b = TextBlob("And now for something completely different.")
print(b.parse())

In [None]:
zen[0:15]

In [None]:
zen.upper()#We can use it as common string method.

In [None]:
zen.find('than') #It shows that 'than' word starts from 39th place.

In [None]:
#You can make comparisons between TextBlobs and strings.
a_blob = TextBlob('apple')
s_blob = TextBlob('samsumg')

a_blob < s_blob

In [None]:
#You can concatenate and interpolate TextBlobs and strings.

a_blob + ' and ' + s_blob

### $\color{red}{\text{N-Grams}}$

The **TextBlob.ngrams()** method returns a list of tuples of n successive words.

In [None]:
blob = TextBlob("Now is better than never.")
blob.ngrams(n=3)

### Get Start and End Indices of Sentences

Use sentence.start and sentence.end to get the indices where a sentence starts and ends within a **TextBlob.**

In [None]:
for k in zen.sentences:
    print(k)
    print("---- Starts at index {}, Ends at index {}".format(k.start, k.end))

# $\color{red}{\text{Let's start building the Text Classification system}}$

The __textblob.classifiers__ module makes it simple to create custom classifiers.

As an example, let’s create a custom sentiment analyzer.

## Loading Data and Creating a Classifier

First we’ll create some training and test data.

In [None]:
train = [
     ('I love this sandwich.', 'pos'),
     ('this is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('this is my best work.', 'pos'),
     ("what an awesome view", 'pos'),
     ('I do not like this restaurant', 'neg'),
     ('I am tired of this stuff.', 'neg'),
     ("I can't deal with this", 'neg'),
     ('he is my sworn enemy!', 'neg'),
     ('my boss is horrible.', 'neg')
]

test = [
     ('the beer was good.', 'pos'),
     ('I do not enjoy my job', 'neg'),
     ("I ain't feeling dandy today.", 'neg'),
     ("I feel amazing!", 'pos'),
     ('Gary is a friend of mine.', 'pos'),
     ("I can't believe I'm doing this.", 'neg')
]

In [None]:
#Now we’ll create a Naive Bayes classifier, passing the training data into the constructor.
from textblob.classifiers import NaiveBayesClassifier
cl = NaiveBayesClassifier(train)

### Loading Data from Files

You can also load data from common file formats including CSV, JSON, and TSV.

CSV files should be formatted like so:

      I love this sandwich.,pos
      This is an amazing place!,pos
      I do not like this restaurant,neg
      
JSON files should be formatted like so:

[
    {"text": "I love this sandwich.", "label": "pos"},
    {"text": "This is an amazing place!", "label": "pos"},
    {"text": "I do not like this restaurant", "label": "neg"}
]

You can then pass the opened file into the constructor.

In [None]:
# with open('train.json', 'r') as fp:
#     cl = NaiveBayesClassifier(fp, format="json")

In [None]:
#Classifying Text ---Call the *classify(text)* method to use the classifier.


cl.classify("This is an amazing library!")

In [None]:
#You can get the label probability distribution with the *prob_classify(text)* method.

prob_dist = cl.prob_classify("I am suffering from cough and cold.")
prob_dist.max()

In [None]:
round(prob_dist.prob("neg"), 2)

In [None]:
round(prob_dist.prob("pos"), 2)

## Classifying TextBlobs

Another way to classify text is to pass a classifier into the constructor of TextBlob and call its *classify()* method.

In [None]:
from textblob import TextBlob
#blob = TextBlob("Alcohol is good. But the hangover is horrible.", classifier=cl)
blob = TextBlob("Gun is best for safety. but its danger to shoot.", classifier=cl)
blob.classify()

In [None]:
#The advantage of this approach is that you can classify sentences within a **TextBlob**.

for b in blob.sentences:
    print(b)
    print(b.classify())

## Evaluating Classifiers

To compute the accuracy on our test set, use the **accuracy(test_data)** method.

In [None]:
cl.accuracy(test)

In [None]:
#Use the show_informative_features() method to display a listing of the most informative features.

cl.show_informative_features(5)

## Updating Classifiers with New Data

Use the update(new_data) method to update a classifier with new training data.

In [None]:
new_data = [('She is my best friend.', 'pos'),
           ("I'm happy to have a new friend.", 'pos'),
           ("Stay thirsty, my friend.", 'pos'),             
           ("He ain't from around here.", 'neg')]

cl.update(new_data)

In [None]:
cl.accuracy(test)

## Feature Extractors

By default, the *NaiveBayesClassifier* uses a simple feature extractor that indicates which words in the training set are contained in a document.

For example, the sentence “I love” might have the features contains(love): True or contains(hate): False.

You can override this feature extractor by writing your own. A feature extractor is simply a function with document (the text to extract features from) as the first argument. The function may include a second argument, train_set (the training dataset), if necessary.

The function should return a dictionary of features for document.

For example, let’s create a feature extractor that just uses the first and last words of a document as its features.

In [None]:
def end_word_extractor(document):
    tokens = document.split()
    first_word, last_word = tokens[0], tokens[-1]
    feats = {}
    feats["first({0})".format(first_word)] = True
    feats["last({0})".format(last_word)] = False
    return feats

In [None]:
features = end_word_extractor("I love")

In [None]:
assert features == {'last(love)': False, 'first(I)': True}

In [None]:
#We can then use the feature extractor in a classifier by passing it as the second argument of the constructor.

cl2 = NaiveBayesClassifier(test, feature_extractor=end_word_extractor)

In [None]:
blob = TextBlob("I'm excited to try my new classifier.", classifier=cl2)
blob.classify()