## NLP using TextBlob

In [None]:
from textblob import TextBlob
import nltk
nltk.download('punkt')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')

In [None]:
## creating a textblob object
blob = TextBlob('LeBron Raymone James , often referred to mononymously as LeBron, is an American professional basketball player for the Los Angeles Lakers of the NBA. He is often considered the best basketball player in the world ')

## Tokenization

Tokenization refers to dividing text or a sentence into a sequence of tokens, which roughly correspond to “words”

#### Now, this textblob can be tokenized into a sentence and further into words. Let’s look at the code shown below.

In [None]:
blob.sentences

In [None]:
blob.sentences[0]

In [None]:
for words in blob.sentences[1].words:
    print (words)

## Noun phrase extraction
Noun Phrase extraction is particularly important when you want to analyze the “who” in a sentence

In [None]:
for np in blob.noun_phrases:
    print (np)

As we can see that isn't correct but we were working with machines.

## POS tagging
 In simple words, it tells whether a word is a noun, or an adjective, or a verb, etc. This is just a complete version of noun phrase extraction, where we want to find all the the parts of speech in a sentence.
 
For instance -

NNS noun plural like- 'desks'

NNP proper noun, singular like -'Harrison'

NNPS proper noun, plural like- 'Americans'

PDT predeterminer like -'all the kids'

POS possessive ending like- parent's

PRP personal pronoun like- I, he, she

PRP$ possessive pronoun like -my, his, hers

RB adverb like- very, silently, ...

and so on !

In [None]:
for words, tag in blob.tags:
    print (words, tag)

## Sentiment Analysis

Sentiment analysis is basically the process of determining the attitude or the emotion of the writer, i.e., whether it is positive or negative or neutral.

The sentiment function of textblob returns two properties, polarity, and subjectivity.

Polarity is float which lies in the range of [-1,1] where 1 means positive statement and -1 means a negative statement. 

Subjective sentences generally refer to personal opinion, emotion or judgment whereas objective refers to factual information. Subjectivity is also a float which lies in the range of [0,1] , 0 being ideally objective and 1 being extremely subjective.


In [None]:
print (blob,'\n')
print(blob.sentiment)

In [None]:
# A few examples !

neg1=TextBlob("It is such a sad , gloomy day today.")
obj1=TextBlob("The sun rises in the east and sets in the west.")
neg2=TextBlob("The Supreme court came down heavily on the government over its ineffective policies.")
print(neg1)
print(neg1.sentiment,'\n')
print(obj1)
print(obj1.sentiment,'\n')
print(neg2)
print(neg2.sentiment)

## Word Inflection and Lemmatization

Inflection is a process of word formation in which characters are added to the base form of a word to express grammatical meanings. Word inflection in TextBlob is very simple, i.e., the words we tokenized from a textblob can be easily changed into singular or plural.

In [None]:
print (blob.sentences[1].words[7])
print (blob.sentences[1].words[7].pluralize())

In [None]:
from textblob import Word
w = Word('Platform')
w.pluralize()

In [None]:
## using tags
for word,pos in blob.tags:
    if pos == 'NNP':
        print (word.pluralize())

Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

In [None]:
## lemmatization

w1 = Word('running')
print(w1.lemmatize("v"))  ## v here represents verb
w2= Word('breaking')
print(w2.lemmatize('v'))
w3= Word('cacti')
print(w3.lemmatize('n'))





## Ngrams

A combination of multiple words together are called N-Grams. N grams (N > 1) are generally more informative as compared to words, and can be used as features for language modelling.  N-grams can be easily accessed in TextBlob using the ngrams function, which returns a tuple of n successive words.

In [None]:
for ngram in blob.ngrams(3):
    print (ngram)

## Spelling correction

Spelling correction is a cool feature which TextBlob offers, we can be accessed using the correct function as shown below.

In [None]:
#We can check the spellings for 'Word' objects-
w=Word('aprentise')
w.spellcheck()

In [None]:
#However , for correcting entire sentences or even paragraphs and articles, we use correct() function,an attribute of TextBlob
wrong=TextBlob("The incambant govurnment is epxected to win a thamping majorly")
wrong.correct()


In [None]:
wrong.words[8].spellcheck()

## Creating a short summary from a text

This is a simple trick which we will be using the things we learned above. First, take a look at the code shown below and to understand yourself.


In [None]:
import random

summ = TextBlob('The effects of technological advancement are both positive and negative.\n\
Positively, technology advancement has simplified the way we do things, it saves time, it increases on production,\
it simplifies communication, it has improved health care and it has also improved our educational environment.\n\
Negatively , technology advancement has made humans so lazy , technology users are so dependent on new advance tech tools ,\
this laziness has resulted into less innovation , it has increased on health risks because technology users exercise less \
, it has affected the environment because of the increase pollution which has affected the Ozone layers which has resulted \
into global warming. When it comes to education, students are more dependent on Calculators and \
computers to solve simple equations; in this case they can not train their brains to solve a simple task which makes them \
lame in class.')

In [None]:
nouns = list()
for word, tag in summ.tags:
    if tag == 'NN':
        nouns.append(word.lemmatize())

#Removing duplicates - 
nouns=set(nouns)
nouns=list(nouns)
       

print ("This text is about...")
for item in nouns:
    word = Word(item)
    print (word)

## Language Translation

In [None]:
trans1 = TextBlob('I find this feature to be extremely useful.')

In [None]:
# https://cloud.google.com/translate/docs/languages
# trans1.translate(to ='kn')
trans1.translate(to ='hi')

In [None]:
trans2 = TextBlob('هذه هي أداة عظيمة للاستخدام')

In [None]:
trans2.detect_language()

In [None]:
trans2.translate(from_lang='ar', to ='en')

Even if you don’t explicitly define the source language, TextBlob will automatically detect the language and translate into the desired language.

In [None]:
trans2.translate(to= 'en')

## Text Classification using textblob
Textblob provides in-build classifiers module to create a custom classifier. So, let’s quickly import it and create a basic classifier.

In [None]:
train = [
     ('The sandwich is bad', 'neg'),
     ('there is a good place near my home', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('this is bad work on my part.', 'neg'),
     ("The view is good", 'pos'),
     ('This restaurant serves bad food', 'neg'),
     ('She has a bad mood today', 'neg'),
     ("I love how she smiles at me", 'pos'),
     ('he is destroying my project', 'neg'),
     ('Her boss is horrible', 'neg'),
     ('Parul hates wasting her time','neg'),
     ('I am in love with this place','pos'),
     ('It was bad on his part to not help her in need','neg'),
     ('It feels so good to have you around','pos'),
     ('the perfume had a lovely smell','pos')
 ]
test = [
     ('the beer was good.', 'pos'),
     ('I am bad at my job', 'neg'),
     ("I am feeling lovely today.", 'pos'),
     ("I feel good", 'pos'),
     ('Gary is a good friend of mine.', 'pos'),
     ("I can't believe I'm this bad.", 'neg')
 ]

In [None]:
from textblob import classifiers

classifier = classifiers.NaiveBayesClassifier(train)

Note that here we have used Naive Bayes classifier, but TextBlob also offers Decision tree classifier.

In [None]:
print (classifier.accuracy(test))
classifier.show_informative_features()

In [None]:
check1 = TextBlob('I love this weather', classifier=classifier)
print (check1,'-',check1.classify())
check2 = TextBlob('The food is horrible', classifier=classifier)
print (check2,'-',check2.classify())
check3 = TextBlob('He is good at Data science', classifier=classifier)
print (check3,'-',check3.classify())
check4 = TextBlob('She is having a bad day', classifier=classifier)
print (check4,'-',check4.classify())
check5 = TextBlob('I found the place lovely', classifier=classifier)
print (check5,'-',check5.classify())

### The nltk library has a pre-trained movie reviews classifiers , which can be used to judge the review of a movie to be positive or negative


# Assignment 

In [None]:
#Consider the following text - 

para=TextBlob('India has one of the most rapidly groving service sectors in the world with an anual growth rate above \
9% since 2001, which contribeted to 57% of GDP in 2012–13. India has becone a major exporting country of Information \
Technology sevrices, Businass Process Oubsourcing (BPO) services, and software services with $154 billion revenue in 2017.\
 The Information Technology industry condinues to be the largest priwate-sector emplofer in India. India is the second-largest\
 start-up centre in the world with over 3,100 tecknology start-ups in 2018–19.The agricutlural sector is the largst employer \
 in India\'s economy but contributes to a declining share of its GDP (17% in 2013–14). India ranks second worldwide in farm \
 output. The industry (manufacturing) sector has held a steady share of its econogic contribution (26% of GDP in 2013–14).\
 The Indian automodite industry is one of the largest in the world with an agnual production of 21.48 million vehicles \
 (mostly two and three-wheelers) in 2013–14. India had $600 bilion worth of retail market in 2015 and one of world\'s \
 quikest growing e-commerse margets.')

In [None]:
#First of all, check the spellings and correct any misspellings
para_new=None


In [None]:
#Find out all the sentences 

In [None]:
#Find out the number of adverbs in this paragraph ( tag=RB)



In [None]:
#Find out the main topic of this paragraph ( Hint - use the nouns, tag=NN)

In [None]:
#Find out the sentiment and the subjectivity of the entire paragraph

In [None]:
#Find out the sentiment and the subjectivity of each sentence

In [None]:
#Print the singular proper nouns to uppercase for the para ( tag=NNP)

In [None]:
#Translate the entire paragraph to Hindi

In [None]:
hindi=None

# SOLUTIONS

In [None]:
#Consider the following text - 

para=TextBlob('India has one of the most rapidly groving service sectors in the world with an anual growth rate above \
9% since 2001, which contribeted to 57% of GDP in 2012–13. India has becone a major exporting country of Information \
Technology sevrices, Businass Process Outsourcing (BPO) services, and software services with $154 billion revenue in 2017.\
 The Information Technology industry condinues to be the largest priwate-sector emplofer in India. India is the second-largest\
 start-up centre in the world with over 3,100 tecknology start-ups in 2018–19.The agricutlural sector is the largst employer \
 in India\'s economy but contributes to a declining share of its GDP (17% in 2013–14). India ranks second worldwide in farm \
 output. The industry (manufacturing) sector has held a steady share of its econogic contribution (26% of GDP in 2013–14).\
 The Indian automodite industry is one of the largest in the world with an agnual production of 21.48 million vehicles \
 (mostly two and three-wheelers) in 2013–14. India had $600 bilion worth of retail market in 2015 and one of world\'s \
 quikest growing e-commerse margets.')

In [None]:
#First of all, check the spellings and correct any misspellings
para_new=para.correct()
para_new

In [None]:
#Find out all the sentences 

In [None]:
para_new.sentences

In [None]:
#Find out the number of adverbs in this paragraph ( tag=RB)
j=0
for word,tag in para_new.tags:
    if tag == 'RB':
        j=j+1
print(j)        

In [None]:
#Find out the main topic of this paragraph ( Hint - use the nouns, tag=NN)

In [None]:
nouns2 = list()
for word, tag in para_new.tags:
    if tag == 'NNP':
        nouns2.append(word.lemmatize())

#Removing duplicates - 
nouns2=set(nouns2)
nouns2=list(nouns2)
       

print ("This text is about...")
for item in nouns2:
    word = Word(item)
    print (word)


In [None]:
#Find out the sentiment and the subjectivity of the entire paragraph

In [None]:
print(para_new.sentiment)

In [None]:
#Find out the sentiment and the subjectivity of each sentence

In [None]:
for i,sentence in enumerate(para_new.sentences):
        print('Sentence-',i+1,sentence.sentiment)

In [None]:
#Print the singular proper nouns to uppercase for the para ( tag=NNP)

In [None]:
for word,tag in para_new.tags:
    if tag == 'NNP':
        print(word)

In [None]:
#Hindi the entire paragraph to Hindi

In [None]:
hindi=para_new.translate(to= 'hi')

In [None]:
hindi