# Sentiment Analysis on Quotes

NLP is a subset of ML, and sentiment analysis is a subset of NLP. Using sentiment analysis, we can detect whether the sentiment of a text is positive, neural, or negative. Sentiment analysis is done on text, reviews. feedback, and more.

The nltk Python library is one of the pioneers of NLP. 

## textblob

This is a text-precessing library, built on top of nltk. It can be used for various NLP aspects, such as POS tagging, sentiment analysis, lemmatizing, tokenizing, and text classification.

- polarity:
  Indicates the positive or negative setiment of the text. The values range from -1 (very negative) to 1 (very positive), with 0 being neutral.
- subjectivity:
  Indicates personal feeling. The values range from 0 (objective) to 1 (subjective).


In [5]:
import pandas as pd

In [6]:
quotes = pd.read_csv("quotes.csv", usecols=["quote"])
quotes['quote'][0]

'“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'

In [7]:
from textblob import TextBlob

text = TextBlob(quotes['quote'][1])
type(text)
print(dir(text))

['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_cmpkey', '_compare', '_create_sentence_objects', '_strkey', 'analyzer', 'classifier', 'classify', 'correct', 'ends_with', 'endswith', 'find', 'format', 'index', 'join', 'json', 'lower', 'ngrams', 'noun_phrases', 'np_counts', 'np_extractor', 'parse', 'parser', 'polarity', 'pos_tagger', 'pos_tags', 'raw', 'raw_sentences', 'replace', 'rfind', 'rindex', 'sentences', 'sentiment', 'sentiment_assessments', 'serialized', 'split', 'starts_with', 'startswith', 'string', 'strip', 'stripped', 'subjectivity', 'tags', 'title', 'to_json', 'tokenize', 'tokenizer', 'tokens', 'upper', 'word_counts',

In [8]:
text.sentiment

Sentiment(polarity=0.3, subjectivity=0.75)

In [9]:
text.sentiment.polarity

0.3

In [10]:
polarity, subjectivity = [],[]
for quote in quotes['quote']:
    text = TextBlob(quote)
    polarity.append(text.sentiment.polarity)
    subjectivity.append(text.sentiment.subjectivity)

quotes['polarity'] = polarity
quotes['subjectivity'] = subjectivity
print("Text Sentiment")
quotes.head()

Text Sentiment


Unnamed: 0,quote,polarity,subjectivity
0,“The world as we have created it is a process ...,0.0,0.0
1,"“It is our choices, Harry, that show what we t...",0.3,0.75
2,“There are only two ways to live your life. On...,0.003788,0.625
3,"“The person, be it gentleman or lady, who has ...",-0.05,0.8
4,"“Imperfection is beauty, madness is genius and...",-0.277778,0.833333


## vaderSentiment

This is a social media sentiment analyzer, Similar to TextBlob,
[Valence Aware Dictionary and Sentiment Reasoner (VADER)] is a rule-based sentiment analyzer. VADER has been trained with a large collection of social media texts. 

vaderSentiment returns these four elements:

- compound: This is the valence score of words in the lexicon. Values range from -1 (extremely negative) to 1 (very positive).

- pos: A positive value with a compound score of >=0.05.

- neg: A negative value with a comopund score of <=0.05.

- neu: A neutral sentiment with a compound score of >=-0.05. and a compound score of < 0.05. 

In [11]:
positive, negative, neutral, compound = [], [], [], []
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [12]:
analyzer = SentimentIntensityAnalyzer()
for quote in quotes["quote"]:
    text = analyzer.polarity_scores(quote)
    positive.append(text['pos'])
    negative.append(text['neg'])
    neutral.append(text['neu'])
    compound.append(text['compound'])

quotes['positive'] = positive
quotes['negative'] = negative
quotes['neutral'] = neutral
quotes['compound'] = compound

In [13]:
positiveQuotes = quotes.query("positive >= 0.3 and polarity >= 0.5")
positiveQuotes

Unnamed: 0,quote,polarity,subjectivity,positive,negative,neutral,compound
18,"“Good friends, good books, and a sleepy consci...",0.766667,0.733333,0.485,0.0,0.515,0.8555
49,“Love does not begin and end the way we seem t...,0.5,0.6,0.342,0.156,0.502,0.8316
94,“But better to get hurt by the truth than comf...,0.5,0.5,0.392,0.167,0.441,0.5574


In [14]:
positiveQuotes[['quote','polarity','positive']]

Unnamed: 0,quote,polarity,positive
18,"“Good friends, good books, and a sleepy consci...",0.766667,0.485
49,“Love does not begin and end the way we seem t...,0.5,0.342
94,“But better to get hurt by the truth than comf...,0.5,0.392


In [15]:
negativeQuotes = quotes.query("negative <= 0.3 and polarity <= 0.5")
negativeQuotes.head()

Unnamed: 0,quote,polarity,subjectivity,positive,negative,neutral,compound
0,“The world as we have created it is a process ...,0.0,0.0,0.091,0.0,0.909,0.25
1,"“It is our choices, Harry, that show what we t...",0.3,0.75,0.162,0.0,0.838,0.4404
2,“There are only two ways to live your life. On...,0.003788,0.625,0.0,0.109,0.891,-0.4717
3,"“The person, be it gentleman or lady, who has ...",-0.05,0.8,0.215,0.124,0.661,0.2964
4,"“Imperfection is beauty, madness is genius and...",-0.277778,0.833333,0.275,0.233,0.492,0.2516


In [16]:
negativeQuotes[['quote','polarity','positive']].head()

Unnamed: 0,quote,polarity,positive
0,“The world as we have created it is a process ...,0.0,0.091
1,"“It is our choices, Harry, that show what we t...",0.3,0.162
2,“There are only two ways to live your life. On...,0.003788,0.0
3,"“The person, be it gentleman or lady, who has ...",-0.05,0.215
4,"“Imperfection is beauty, madness is genius and...",-0.277778,0.275


In [17]:
quotes.describe()

Unnamed: 0,polarity,subjectivity,positive,negative,neutral,compound
count,100.0,100.0,100.0,100.0,100.0,100.0
mean,0.107863,0.446212,0.13893,0.09063,0.77049,0.160769
std,0.254294,0.319113,0.134112,0.109208,0.167939,0.481998
min,-0.8,0.0,0.0,0.0,0.43,-0.9189
25%,0.0,0.15,0.0,0.0,0.64125,-0.03345
50%,0.050481,0.496875,0.1335,0.0585,0.769,0.04265
75%,0.261719,0.6375,0.23425,0.15225,0.90975,0.509775
max,0.766667,1.0,0.485,0.425,1.0,0.9951


## Text Analysis

In [18]:
from textblob import TextBlob
import nltk
from nltk.corpus import stopwords
from collections import Counter
from textstat import flesch_reading_ease

In [19]:
# Download NLTK resources (uncomment if not downloaded)
# nltk.download('punkt')
# nltk.download('stopwords')

In [20]:
# Load data from CSV files
quotes_df = pd.read_csv('quotes.csv')

In [21]:
# Perform sentiment analysis
sentiments = []
for quote in quotes_df['quote']:
    blob = TextBlob(quote)
    sentiments.append(blob.sentiment.polarity)

In [22]:
# Determine overall sentiment
overall_sentiment = sum(sentiments) / len(sentiments)

In [23]:
# Tokenize the quotes and remove stopwords
stop_words = set(stopwords.words('english'))
word_tokens = nltk.word_tokenize(' '.join(quotes_df['quote']).lower())
filtered_tokens = [word for word in word_tokens if word.isalnum() and word not in stop_words]

In [24]:
# Identify frequently used words
word_freq = Counter(filtered_tokens).most_common(10)

# Determine readability level
readability_score = flesch_reading_ease(' '.join(quotes_df['quote']))

# Display results
print("Overall sentiment:", overall_sentiment)
print("Top 10 most frequent words:", word_freq)
print("Readability score:", readability_score)

Overall sentiment: 0.10786328241203245
Top 10 most frequent words: [('love', 20), ('one', 13), ('never', 13), ('think', 12), ('life', 11), ('make', 10), ('like', 9), ('good', 8), ('live', 7), ('know', 7)]
Readability score: 83.46
