### Installation instructions
https://textblob.readthedocs.io/en/dev/install.html

In [1]:
! pip install -U textblob

Collecting textblob
  Using cached https://files.pythonhosted.org/packages/60/f0/1d9bfcc8ee6b83472ec571406bd0dd51c0e6330ff1a51b2d29861d389e85/textblob-0.15.3-py2.py3-none-any.whl
Requirement not upgraded as not directly required: nltk>=3.1 in /anaconda3/envs/keras-tf/lib/python3.6/site-packages (from textblob) (3.3)
Requirement not upgraded as not directly required: six in /anaconda3/envs/keras-tf/lib/python3.6/site-packages (from nltk>=3.1->textblob) (1.11.0)
Installing collected packages: textblob
Successfully installed textblob-0.15.3
[33mYou are using pip version 10.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


### Download the full corpus

In [4]:
! python -m textblob.download_corpora

[nltk_data] Downloading package brown to
[nltk_data]     /Users/subashgandyer/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/subashgandyer/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/subashgandyer/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/subashgandyer/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package conll2000 to
[nltk_data]     /Users/subashgandyer/nltk_data...
[nltk_data]   Package conll2000 is already up-to-date!
[nltk_data] Downloading package movie_reviews to
[nltk_data]     /Users/subashgandyer/nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!
Finished.


### Or Download the mini corpus

In [3]:
! python -m textblob.download_corpora lite

[nltk_data] Downloading package brown to
[nltk_data]     /Users/subashgandyer/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/subashgandyer/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/subashgandyer/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/subashgandyer/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
Finished.


### Tokenization
#### Tokenization refers to splitting a large paragraph into sentences or words. Typically, a token refers to a word in a text document. Tokenization is pretty straight forward with TextBlob. All you have to do is import the TextBlob object from the textblob library, pass it the document that you want to tokenize, and then use the sentences and words attributes to get the tokenized sentences and attributes.

In [5]:
from textblob import TextBlob

In [6]:
document = ("In computer science, artificial intelligence (AI), \
            sometimes called machine intelligence, is intelligence \
            demonstrated by machines, in contrast to the natural intelligence \
            displayed by humans and animals. Computer science defines AI \
            research as the study of \"intelligent agents\": any device that \
            perceives its environment and takes actions that maximize its\
            chance of successfully achieving its goals.[1] Colloquially,\
            the term \"artificial intelligence\" is used to describe machines\
            that mimic \"cognitive\" functions that humans associate with other\
            human minds, such as \"learning\" and \"problem solving\".[2]")

In [7]:
text_blob_object = TextBlob(document)

In [8]:
document_sentence = text_blob_object.sentences

print(document_sentence)
print(len(document_sentence))

[Sentence("In computer science, artificial intelligence (AI),             sometimes called machine intelligence, is intelligence             demonstrated by machines, in contrast to the natural intelligence             displayed by humans and animals."), Sentence("Computer science defines AI             research as the study of "intelligent agents": any device that             perceives its environment and takes actions that maximize its            chance of successfully achieving its goals."), Sentence("[1] Colloquially,            the term "artificial intelligence" is used to describe machines            that mimic "cognitive" functions that humans associate with other            human minds, such as "learning" and "problem solving"."), Sentence("[2]")]
4


In [9]:
document_words = text_blob_object.words

print(document_words)
print(len(document_words))

['In', 'computer', 'science', 'artificial', 'intelligence', 'AI', 'sometimes', 'called', 'machine', 'intelligence', 'is', 'intelligence', 'demonstrated', 'by', 'machines', 'in', 'contrast', 'to', 'the', 'natural', 'intelligence', 'displayed', 'by', 'humans', 'and', 'animals', 'Computer', 'science', 'defines', 'AI', 'research', 'as', 'the', 'study', 'of', 'intelligent', 'agents', 'any', 'device', 'that', 'perceives', 'its', 'environment', 'and', 'takes', 'actions', 'that', 'maximize', 'its', 'chance', 'of', 'successfully', 'achieving', 'its', 'goals', '1', 'Colloquially', 'the', 'term', 'artificial', 'intelligence', 'is', 'used', 'to', 'describe', 'machines', 'that', 'mimic', 'cognitive', 'functions', 'that', 'humans', 'associate', 'with', 'other', 'human', 'minds', 'such', 'as', 'learning', 'and', 'problem', 'solving', '2']
84


### Lemmatization
#### Lemmatization refers to reducing the word to its root form as found in a dictionary. To perform lemmatization via TextBlob, you have to use the Word object from the textblob library, pass it the word that you want to lemmatize and then call the lemmatize method.

In [10]:
from textblob import Word

word1 = Word("apples")
print("apples:", word1.lemmatize())

word2 = Word("media")
print("media:", word2.lemmatize())

word3 = Word("greater")
print("greater:", word3.lemmatize("a"))

apples: apple
media: medium
greater: great


### Parts Of Speech (POS) Tagging
https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

In [11]:
for word, pos in text_blob_object.tags:
    print(word + " => " + pos)

In => IN
computer => NN
science => NN
artificial => JJ
intelligence => NN
AI => NNP
sometimes => RB
called => VBD
machine => NN
intelligence => NN
is => VBZ
intelligence => NN
demonstrated => VBN
by => IN
machines => NNS
in => IN
contrast => NN
to => TO
the => DT
natural => JJ
intelligence => NN
displayed => VBN
by => IN
humans => NNS
and => CC
animals => NNS
Computer => NNP
science => NN
defines => NNS
AI => NNP
research => NN
as => IN
the => DT
study => NN
of => IN
intelligent => JJ
agents => NNS
any => DT
device => NN
that => WDT
perceives => VBZ
its => PRP$
environment => NN
and => CC
takes => VBZ
actions => NNS
that => IN
maximize => VB
its => PRP$
chance => NN
of => IN
successfully => RB
achieving => VBG
its => PRP$
goals => NNS
[ => RB
1 => CD
] => NNP
Colloquially => NNP
the => DT
term => NN
artificial => JJ
intelligence => NN
is => VBZ
used => VBN
to => TO
describe => VB
machines => NNS
that => IN
mimic => JJ
cognitive => JJ
functions => NNS
that => WDT
humans => NNS
associate

### Singular, Plural
#### TextBlob also allows you to convert text into a plural or singular form using the pluralize and singularize methods, respectively

In [12]:
text = ("Football is a good game. It has many health benefit")
text_blob_object = TextBlob(text)
print(text_blob_object.words.pluralize())

['Footballs', 'iss', 'some', 'goods', 'games', 'Its', 'hass', 'manies', 'healths', 'benefits']


In [13]:
text = ("Footballs is a goods games. Its has many healths benefits")

text_blob_object = TextBlob(text)
print(text_blob_object.words.singularize())

['Football', 'is', 'a', 'good', 'game', 'It', 'ha', 'many', 'health', 'benefit']


### Noun phrase extraction
#### Noun phrase extraction refers to extracting phrases that contain nouns

In [14]:
text_blob_object = TextBlob(document)
for noun_phrase in text_blob_object.noun_phrases:
    print(noun_phrase)

computer science
artificial intelligence
ai
machine intelligence
natural intelligence
computer
science defines
ai
intelligent agents
colloquially
artificial intelligence
describe machines
human minds


### Getting Words and Phrase Counts
#### To find the frequency of occurrence of a particular word, we have to pass the name of the word as the index to the word_counts list of the TextBlob object.

In [15]:
text_blob_object = TextBlob(document)
text_blob_object.word_counts['intelligence']

5

In [16]:
text_blob_object.words.count('intelligence')

5

In [17]:
text_blob_object.words.count('intelligence', case_sensitive=True)

5

In [18]:
text_blob_object = TextBlob(document)
text_blob_object.noun_phrases.count('artificial intelligence')

2

### Converting to Upper and Lowercase
#### TextBlob objects are very similar to strings. You can convert them to upper case or lower case, change their values, and concatenate them together as well.

In [19]:
text = "I love to watch football, but I have never played it"
text_blob_object = TextBlob(text)

print(text_blob_object.upper())

I LOVE TO WATCH FOOTBALL, BUT I HAVE NEVER PLAYED IT


In [20]:
text = "I LOVE TO WATCH FOOTBALL, BUT I HAVE NEVER PLAYED IT"
text_blob_object = TextBlob(text)

print(text_blob_object.lower())

i love to watch football, but i have never played it


### Finding N-Grams
#### N-Grams refer to n combination of words in a sentence. For instance, for a sentence "I love watching football", some 2-grams would be (I love), (love watching) and (watching football). N-Grams can play a crucial role in text classification.

#### In TextBlob, N-grams can be found by passing the number of N-Grams to the ngrams method of the TextBlob object.

In [21]:
text = "I love to watch football, but I have never played it"
text_blob_object = TextBlob(text)
for ngram in text_blob_object.ngrams(2):
    print(ngram)

['I', 'love']
['love', 'to']
['to', 'watch']
['watch', 'football']
['football', 'but']
['but', 'I']
['I', 'have']
['have', 'never']
['never', 'played']
['played', 'it']


### Spelling Corrections
#### Spelling correction is one of the unique functionalities of the TextBlob library. With the correct method of the TextBlob object, you can correct all the spelling mistakes in your text.

In [22]:
text = "I love to watchf footbal, but I have neter played it"
text_blob_object = TextBlob(text)

print(text_blob_object.correct())

I love to watch football, but I have never played it


### Language Translation
#### One of the most powerful capabilities of the TextBlob library is to translate from one language to another. On the backend, the TextBlob language translator uses the Google Translate API

#### To translate from one language to another, you simply have to pass the text to the TextBlob object and then call the translate method on the object. The language code for the language that you want your text to be translated to is passed as a parameter to the method.

In [23]:
text_blob_object_french = TextBlob(u'Salut comment allez-vous?')
print(text_blob_object_french.translate(to='en'))

Hi, how are you?


In [24]:
text_blob_object_arabic = TextBlob(u'مرحبا كيف حالك؟')
print(text_blob_object_arabic.translate(to='en'))

Hello how are you?


In [25]:
text_blob_object = TextBlob(u'Hola como estas?')
print(text_blob_object.detect_language())

es
