<a href="https://colab.research.google.com/github/man-007/Sentiment-Analysis/blob/main/sentiment-analysis-with-Textblob.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction**

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.



# Imports

In [47]:
# The main package to help us with our text analysis
from textblob import TextBlob

# For reading input files in CSV format
import csv

# For doing cool regular expressions
import re

# For sorting dictionaries
import operator
import pandas as pd


# For plotting results
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

# Intialize an empty list to hold all of our tweets
tweets = []



In [48]:
def strip_non_ascii(string):
    ''' Returns the string without non ASCII characters'''
    stripped = (c for c in string if 0 < ord(c) < 127)
    return ''.join(stripped)

# Data Processing

Here we have took some data from twitter to train our model.

Context: It contains 1,600,000 tweets extracted using the twitter api . The tweets have been annotated [0,4]. 


In [49]:
df = pd.read_csv("/content/drive/MyDrive/training.1600000.processed.noemoticon.csv",encoding='ISO-8859-1', 
                 names=['target', 'id', 'date', 'flag', 'user', 'text']
                )

In [50]:
df.head()

Unnamed: 0,target,id,date,flag,user,text
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all...."


Cleaning of data.

In [51]:
import re
import string

def text_clean(text):
  text=text.lower()
  text=re.sub('\[.*?\]','',text)
  text=re.sub('[%s]'%re.escape(string.punctuation),'',text)
  text=re.sub('\w*\d\w*','',text)
  text=re.sub('[''""...]','',text)
  text=re.sub('\n','',text)
  text=strip_non_ascii(text)
  return text

cleaned1 = lambda x:text_clean(x)


In [52]:
df["cleaned_content"]=pd.DataFrame(df.text.apply(cleaned1))

In [53]:
df.head()

Unnamed: 0,target,id,date,flag,user,text,cleaned_content
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t...",switchfoot awww thats a bummer you shoulda ...
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...,is upset that he cant update his facebook by t...
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...,kenichan i dived many times for the ball manag...
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire,my whole body feels itchy and like its on fire
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all....",nationwideclass no its not behaving at all im ...


Now we forms a new column named sentiments with 3 unique value.

This column is filled with the help of TextBlob library.


In [54]:
def to_sentiment(content):
  textblob_object=TextBlob(content)

  textblob_object_polarity = float(textblob_object.sentiment.polarity)
  textblob_object_subjectivity = float(textblob_object.sentiment.subjectivity)

  if textblob_object_polarity > 0.1:
      return 'positive'
  elif textblob_object_polarity < -0.1:
      return 'negative'
  else:
      return 'neutral'

df['sentiment'] = df.cleaned_content.apply(to_sentiment)

In [55]:
df.head(10)

Unnamed: 0,target,id,date,flag,user,text,cleaned_content,sentiment
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t...",switchfoot awww thats a bummer you shoulda ...,positive
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...,is upset that he cant update his facebook by t...,neutral
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...,kenichan i dived many times for the ball manag...,positive
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire,my whole body feels itchy and like its on fire,positive
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all....",nationwideclass no its not behaving at all im ...,negative
5,0,1467811372,Mon Apr 06 22:20:00 PDT 2009,NO_QUERY,joy_wolf,@Kwesidei not the whole crew,kwesidei not the whole crew,positive
6,0,1467811592,Mon Apr 06 22:20:03 PDT 2009,NO_QUERY,mybirch,Need a hug,need a hug,neutral
7,0,1467811594,Mon Apr 06 22:20:03 PDT 2009,NO_QUERY,coZZ,@LOLTrish hey long time no see! Yes.. Rains a...,loltrish hey long time no see yes rains a bit...,positive
8,0,1467811795,Mon Apr 06 22:20:05 PDT 2009,NO_QUERY,2Hood4Hollywood,@Tatiana_K nope they didn't have it,tatianak nope they didnt have it,neutral
9,0,1467812025,Mon Apr 06 22:20:09 PDT 2009,NO_QUERY,mimismo,@twittera que me muera ?,twittera que me muera,neutral


# Training Model

We use logistic regression to train our model. 

In [56]:
from sklearn.model_selection import train_test_split

independent_var=df.cleaned_content
dependent_var = df.sentiment

IV_train,IV_test,DV_train,DV_test = train_test_split(independent_var, dependent_var, test_size=0.2, random_state=225)

print("IV_train:",len(IV_train))
print("IV_test:",len(IV_test))
print("DV_train:",len(DV_train))
print("DV_test:",len(DV_test))

IV_train: 1280000
IV_test: 320000
DV_train: 1280000
DV_test: 320000


In [57]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

tve = TfidfVectorizer()
clf = LogisticRegression(solver="saga")

from sklearn.pipeline import Pipeline

In [58]:
model = Pipeline([('vectorizer',tve),('classifier',clf)])
model.fit(IV_train, DV_train)

from sklearn.metrics import confusion_matrix

prediction = model.predict(IV_test)

confusion_matrix(prediction,DV_test)

array([[ 47771,   5015,    927],
       [  6028, 133339,   5762],
       [  1659,   5834, 113665]])

Testing the Model's accuracy. 

In [59]:
from sklearn.metrics import accuracy_score, precision_score, recall_score

print("Accuracy: ",accuracy_score(prediction, DV_test))
print("Precision: ",precision_score(prediction, DV_test,average='weighted'))
print("Recall: ",recall_score(prediction, DV_test,average='weighted'))

Accuracy:  0.921171875
Precision:  0.921566831732693
Recall:  0.921171875


# **Using the model for sentiment-analysis**

In [60]:
example=["I cannot begin to explain my disappointment and frustration with OLA. I have multiple times tried to book a ride that seem to be cheapish then for it to take at least 5mins for someone to accept. Most of the times I lose patience and click cancel ride then I still get charged $10-$17 without even getting a ride. Ola is terrible and a scam. NEVER USE OLA"]
result = model.predict(example)

print(result)

['negative']


In [61]:
example=["Ola's service is very useless, I had booked for the first time but on reaching my place, his payment doubled and tried to call no answer was received from the front."]
result = model.predict(example)

print(result)

['positive']


In [62]:
example=["I took a 2 hour outstation drop ride starting at around 545 am. The extra charges were as under: Additional Time Fare* (for trip exceeding one hour), 1 hour x ₹100/hour = ₹100 Night Time Allowance: ₹200 x 1 night = ₹200 Driver Allowance: ₹250 x 1 day = 250.So I am charged night time time allowance + day time allowance + an extra hour for a 2-hour drive?? All this in addition to 18 Rs/km for 95km!! Why do I feel like I am being totally overcharged??"]
result = model.predict(example)

print(result)

['neutral']
