# Sentiment analysis using Text Blob

Polarity: determines the sentiment of the text which value lie in [-1, 1] where -1 denotes a highly negative sentiment
and 1 denotes a highly positive sentiment.
Subjectivity: this relates whether the text is factual or personal opinion. This value lies in between [0,1] where 0 represents the factual information and 1 represents the personal opinion.



In [1]:
pip install textblob

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install --upgrade pip

Note: you may need to restart the kernel to use updated packages.


In [3]:
from textblob import TextBlob

In [4]:
text_1 = "I am so happy today."
text_2 = "I don't like the food of that restaurant"

#Determining the polarity of the abovementioned texts
p_1 = TextBlob(text_1).sentiment.polarity
p_2 = TextBlob(text_2).sentiment.polarity

#Exploring the Subjectivity
s_1 = TextBlob(text_1).sentiment.subjectivity
s_2 = TextBlob(text_2).sentiment.subjectivity

print("Polarity of Text 1 is", p_1)
print("Polarity of Text 2 is", p_2)
print("Subjectivity of Text 1 is", s_1)
print("Subjectivity of Text 2 is", s_1)

Polarity of Text 1 is 0.8
Polarity of Text 2 is 0.0
Subjectivity of Text 1 is 1.0
Subjectivity of Text 2 is 1.0


From the above result, we can extract the information that the first sentence showing the positivity and hence the value is approaching towards 1 where as the second sentense is neutral in case of polarity (i.e., not much negativity and positivity). Talking about sensitivity, both the sentences show personal opinion and not factual information.

# Sentiment analysis using VADER
    

VADER (Valence aware Dictionary and sEntiment Reasoner)

First install the required library:


In [5]:
pip install vaderSentiment

Note: you may need to restart the kernel to use updated packages.


In [6]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [7]:
sentiment = SentimentIntensityAnalyzer()
text_1 = "This book was really amazing and worth it to read."
text_2 = "That burger was really bad."

sent_1 = sentiment.polarity_scores(text_1)
sent_2 = sentiment.polarity_scores(text_2)

print("Sentiment of the text 1:", sent_1)
print("Sentiment of the text 2:", sent_2)



Sentiment of the text 1: {'neg': 0.0, 'neu': 0.561, 'pos': 0.439, 'compound': 0.7397}
Sentiment of the text 2: {'neg': 0.487, 'neu': 0.513, 'pos': 0.0, 'compound': -0.5849}


The above result shows the dictionary of the sentiment scores from the abovementioned texts. From the result, it can be said that text 1 is positive and text 2 is negative. 

# Bag of Words Vectorization-Based Models

For this technique, we need to pre-process the text of training data, creating a bag of words and train a suitable classification model.

In [8]:
#loading the datasets

import pandas as pd
data = pd.read_csv('data.csv')

#Visualize the data
data.head()
data



Unnamed: 0,Sentence,Sentiment
0,The GeoSolutions technology will leverage Bene...,positive
1,"$ESI on lows, down $1.50 to $2.50 BK a real po...",negative
2,"For the last quarter of 2010 , Componenta 's n...",positive
3,According to the Finnish-Russian Chamber of Co...,neutral
4,The Swedish buyout firm has sold its remaining...,neutral
...,...,...
5837,RISING costs have forced packaging producer Hu...,negative
5838,Nordic Walking was first used as a summer trai...,neutral
5839,"According shipping company Viking Line , the E...",neutral
5840,"In the building and home improvement trade , s...",neutral


In [9]:
#Pre-processing and bag of word vectorization using Count Vectorizer
from sklearn.feature_extraction.text import CountVectorizer

In [10]:
from nltk.tokenize import RegexpTokenizer

In [11]:
token = RegexpTokenizer(r'[a-zA-Z0-9]+')  #Pre-processing with tokenization

In [12]:
cv = CountVectorizer(stop_words ='english', ngram_range = (1,1), tokenizer = token.tokenize)


In [13]:
text_counts = cv.fit_transform(data['Sentence'])

In [14]:
#Splitting the data for training and testing set

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_text = train_test_split(text_counts, data['Sentiment'], test_size = 0.3, random_state =5)


In [15]:
#training the model
from sklearn.naive_bayes import MultinomialNB
MNB = MultinomialNB()
MNB.fit(X_train, Y_train)

MultinomialNB()

In [16]:
#calculating the accuracy score of the model
from sklearn import metrics
predicted = MNB.predict(X_test)
accuracy_score = metrics.accuracy_score(predicted, Y_text)
print("Accuracy Score:", accuracy_score)

Accuracy Score: 0.6782658300057045


The data used in this model has been obtained from kaggle. Link: https://www.kaggle.com/datasets/sbhatti/financial-sentiment-analysis?resource=download&select=data.csv

# LSTM Based Models

In [17]:
#Importing the libraries
#import nltk
#import pandas as pd
#from textblob import Word
#from nltk.corpus import stopwords
#from sklearn.preprocessing import LabelEncoder
#from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
#from keras.models import Sequential
#from keras.preprocessing.text import Tokenizer
#from keras.preprocessing.sequence import pad_sequences
#from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
#from sklearn.model_selection import train_test_split
