<a href="https://colab.research.google.com/github/jolonia/NLP/blob/main/Copy_of_1b_Intro_to_NLP_using_TextBlob.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# TextBlob
TextBlob is a powerful NLP Python library. It can be used to perform a variety of NLP tasks. Documentation for TextBlob can be found [here](https://textblob.readthedocs.io/en/dev/).

In [None]:
%%capture
# Install textblob
!pip install -U textblob


In [None]:
from textblob import TextBlob


## Corpora

In [None]:
%%capture
# Download corpora
!python -m textblob.download_corpora


In [None]:
!which python

/usr/local/bin/python


In [None]:
import nltk
nltk.download('omw-1.4')


[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

## TextBlobs

In [None]:
my_blob = TextBlob("There is more than one way to skin a cat.")


In [None]:
my_blob


TextBlob("There is more than one way to skin a cat.")

## Tagging Parts of Speech
A list of the different parts of speech tags can be found [here](https://www.geeksforgeeks.org/python-part-of-speech-tagging-using-textblob/).

code | meaning | example
--- | --- | ---
CC | coordinating conjunction |
CD | cardinal digit |
DT | determiner |
EX | existential there | (like: “there is” … think of it like “there exists”)
FW | foreign word |
IN | preposition/subordinating conjunction |
JJ | adjective | ‘big’
JJR | adjective, comparative | ‘bigger’
JJS | adjective, superlative | ‘biggest’
LS | list marker | 1)
MD | modal could, | will
NN | noun, singular | ‘desk’
NNS | noun plural | ‘desks’
NNP | proper noun, singular | ‘Harrison’
NNPS | proper noun, plural | ‘Americans’
PDT | predeterminer | ‘all the kids’
POS | possessive ending | parent‘s
PRP | personal pronoun | I, he, she
PRP\$ | possessive pronoun | my, his, hers
RB | adverb | very, silently,
RBR | adverb, comparative | better
RBS | adverb, superlative | best
RP | particle | give up
TO | to go | ‘to‘ the store.
UH | interjection | errrrrrrrm
VB | verb, base form | take
VBD | verb, past tense | took
VBG | verb, gerund/present participle | taking
VBN | verb, past participle | taken
VBP | verb, sing. present, non-3d | take
VBZ | verb, 3rd person sing. present | takes
WDT | wh-determiner | which
WP | wh-pronoun | who, what
WP$ | possessive wh-pronoun | whose
WRB | wh-adverb | where, when






In [None]:
!python3 -m textblob.download_corpora


In [None]:
# Use the .tags attribute to see parts of speech
my_blob.tags


## Sentiment Analysis

Sentiment analysis can be used to understand the feeling or emotion tied to the text. The sentiment attribute in TextBlob will return two values:
1. The **polarity score** (a float between -1.0 and 1.0). -1 is negative, 1 is positive.
2. The **subjectivity** (a float between 0.0 and 1.0). 0 is very objective, while 1 is very subjective.

In [None]:
neg_blob = TextBlob("I am so tired. Today was a long, hard day.")
neg_blob.sentiment


In [None]:
pos_blob = TextBlob("Today was a great day. I am so happy.")
pos_blob.sentiment


In [None]:
obj_blob = TextBlob("The cat is gray.")
obj_blob.sentiment


In [None]:
subj_blob = TextBlob("The cat is so cute and sweet.")
print(subj_blob.sentiment)
print(subj_blob.sentiment.subjectivity) # Just get the subjectivity


Sentiment analysis of multiple sentences

In [None]:
my_poem = TextBlob("Python is a great language to learn. "
               "You can easily do NLP, it's fab. "
               "It might take some getting used to. "
               "But it's definitely more better than Matlab.")


In [None]:
my_poem

In [None]:
my_poem.sentiment


In [None]:
my_poem.sentences


In [None]:
for sentence in my_poem.sentences:
  print(sentence.sentiment)


### Your Turn
Create three TextBlobs with the following sentiments:
1. Negative, subjective
2. Positive, objective
3. Neutral


In [None]:
# Solution 1


In [None]:
# Solution 2


In [None]:
# Solution 3


## Tokenization
Tokenization is the process of splitting long strings of text into small pieces (tokens).

In [None]:
my_poem.sentences


In [None]:
my_poem.words


In [None]:
sorted(my_poem.word_counts.items(), key = lambda x: x[1], reverse=True)

## Singular & Plural

In [None]:
my_sent = TextBlob("The octopi went swimming in the dark ocean waters.")


In [None]:
my_sent.words[0]


In [None]:
# Singularize
my_sent.words[-1].singularize()


In [None]:
my_sent.words[1].singularize()


In [None]:
# Pluralize
my_sent.words[-2].pluralize()


In [None]:
foo = my_sent.words[-2]
foo == foo.singularize()

## Stemming & Lemmatization

Stemming is the process of deleting prefixes and suffixes from a word, leaving on the word “stem”. Lemmatization is similar to stemming, but lemmatization is able to capture the underlying meaning of the word.

In [None]:
my_sent


In [None]:
# Find the index of 'swimming'
my_sent.words.index('swimming')


In [None]:
# Stemming
print(my_sent.words[3].stem())
print(my_sent.words[1].stem())


In [None]:
# Lemmatization
print(my_sent.words[3].lemmatize())
print(my_sent.words[1].lemmatize())


In [None]:
foo = TextBlob("caring")
(
foo.words.stem(),
foo.words.lemmatize()
)


## WordNet

In [None]:
{ my_sent.words[-2] : my_sent.words[-2].definitions }


## Spelling

In [None]:
my_bad_spelling = TextBlob('Helllo, today is my birfday.')
my_bad_spelling.correct()


## Counting Words

In [None]:
my_cheer = TextBlob('Data science is the best, data science is the coolest.')
my_cheer.words.count('data')


In [None]:
my_cheer.word_counts


### Your Turn
1. Create a TextBlob called `message` and set it equal to `Good morning, todayy is going to be a fantastic day!`.
2. Correct the spelling in your TextBlob and set it equal to a new variable called `message_sp`.
3. Find the index of the word `fantastic`.
4. Look up the definition of the word `fantastic`.
5. Stem and lemmatize the word `fantastic`.

In [None]:
# Solution 1


In [None]:
# Solution 2


In [None]:
# Solution 3


In [None]:
# Solution 4


In [None]:
# Solution 5


## TextBlobs as Strings
TextBlobs act as strings, meaning you can use all of the normal string methods and you can index them as you would a string.

In [None]:
my_cheer


In [None]:
my_cheer[0:6]


In [None]:
my_cheer.upper()


In [None]:
my_cheer.lower()


## **n**-grams
Overlapping lists of words.

In [None]:
my_cheer


In [None]:
my_cheer.ngrams(n=3)


In [None]:
[ " ".join(i) for i in my_cheer.ngrams(n=3) ]


In [None]:
my_cheer.split(",")


In [None]:
my_cheer.words
