<a href="https://colab.research.google.com/github/shyam1234/AIML_RND/blob/master/Bridgingo_Sentiment_Analysis_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Bridgingo Sentiment Analysis**  
This is for demonstrating the Sentiment analysis based on the live twitter feed.
###### Auther : Prafulla Malviya (AIML Energetic)

##**Agenda**


1.   NLP (Natural Language Processing)
2.   Sentiment Analysis
3.   Connect Twitter with Collab
4.   Tweets Preprocessing and Cleaning
5.   Visualization from Tweets 



###1. **Natural Language Processing**
Simple Defination: NLP is a field in machine learning with the ability of a computer to understand, analyze, manipulate, and potentially generate human language.

####**NLP in Real Life** 


*   Information Retrieval(Google finds relevant and similar results).
* Information Extraction(Gmail structures events from emails).
* Machine Translation(Google Translate translates language from one language to another).
* Text Simplification(Rewordify simplifies the meaning of sentences). Shashi * * Tharoor tweets could be used(pun intended).
* Sentiment Analysis(Hater News gives us the sentiment of the user).
* Text Summarization(Smmry or Reddit’s autotldr gives a summary of sentences).
* Spam Filter(Gmail filters spam emails separately).
* Auto-Predict(Google Search predicts user search results).
* Auto-Correct(Google Keyboard and Grammarly correct words otherwise spelled wrong).
* Speech Recognition(Google WebSpeech or Vocalware).
* Question Answering(IBM Watson’s answers to a query).
* Natural Language Generation(Generation of text from image or video data.)




(Natural Language Toolkit)NLTK: NLTK is a popular open-source package in Python. Rather than building all tools from scratch, NLTK provides all common NLP Tasks.

###2. **Sentiment Analysis** 
the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral. **-Oxford dictionary**


There are many methods and algorithms to implement sentiment analysis systems, which can be classified as:


*   **Rule-based** systems that perform sentiment analysis based on a set of manually crafted rules.
*   **Automatic systems** that rely on machine learning techniques to learn from data.
*   **Hybrid systems** that combine both rule based and automatic approaches.



There are mainly two approaches for performing sentiment analysis (Defined by University of Victoria)

1.   **Lexicon based**: count number of positive and negative words in given text and the larger count will be the sentiment of text
2.   **Machine learning based approach**: Develop a classification model, which is trained using the pre-labeled dataset of positive, negative, and neutral



### Create the Twitter application for getting live tweet 

A. Go to "https://developer.twitter.com" for creating the Twitter application 
B. Then go to "Keys and tokens" and then copy the **API key**, **Access token, Access token secret** and **API secret key**


### Lets code now

In [0]:
#Import the libraries 
import tweepy
from textblob import TextBlob
from wordcloud import WordCloud
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

#### **What is tweepy?**
An easy-to-use Python library for accessing the Twitter API. It is great for simple automation and creating twitter bots.

##### **What is TextBlob?**
TextBlob is a python library and offers a simple API to access its methods and perform basic NLP tasks. It is  for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

#### **What is WordCloud?**
Word Cloud is a data visualization technique used for representing text data in which the size of each word indicates its frequency or importance. Significant textual data points can be highlighted using a word cloud.

#### **What is Pandas?**
Pandas is one of the most widely used python libraries in data science. It provides high-performance, easy to use structures and data analysis tools. Pandas uses for Dataframe to manipulate.

#### **What is Matplotlib?**
In short: Matplotlib is used for visualizing the data.

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.

#### **What is Polarity?**
It defines the **emotions** expressed in a sentence between **-1 to 1**. 

1.   negative value shows negative emotion
2.   0 value shows neutral 
3.   positive value shows positive emotion

Example
1.   I want to thank our Indians to support us to fight against coronavirus virus.  (Positive)
2.   We should fight with enemies. (Negative)
3.   We think to support other country some other time. (Seems Neutral)  



#### **What is Subjectivity?**
It identifies **subjective** and **objective** sentence between **0 to 1**. 

Subjective sentences generally refer to personal opinion, emotion or judgment whereas objective refers to factual information.

Example
1.   THANK YOU TEXAS  (Subjective)
2.   I am pleased to announce that Congressman Mark will become White House Chief of Staff (Objective)

In [0]:
#Load the twitter application credential
from google.colab import files
uploaded = files.upload()

In [0]:
# Get the data
log = pd.read_csv('login.csv')

In [0]:
log

###3.**Connect Twitter with Collab** 

In [0]:
# Twitter API credentials
consumerKey = log['value'][0]
consumerSecret = log['value'][1]
accessToken = log['value'][2]
accessTokenSecret = log['value'][3] 

In [0]:
# Create the authentication object
authenticate =  tweepy.OAuthHandler(consumerKey, consumerSecret)

# Set the access token and access token secret
authenticate.set_access_token(accessToken, accessTokenSecret)

# Create the API object while passing in the auth information
api = tweepy.API(authenticate, wait_on_rate_limit= True)

In [0]:
authenticate.access_token

In [0]:
# Extract 100 tweets from the twitter user
posts = api.user_timeline(screen_name="realDonaldTrump", count = 100, lang= "en", tweet_mode= "extended")

# Print the last five tweets from the account
print("/* Show the last five recent tweets */ \n")
i = 1
for tweet in posts[0:5]:
  print(str(i)+')'+tweet.full_text +'\n')
  i = i+1

In [0]:
# Create the dataframe with a column called Tweets
df =  pd.DataFrame([tweet.full_text for tweet in posts], columns=['Tweets'])

# Show the first 5 rows of data
df.head()

###4. **Tweets Preprocessing and Cleaning**

In [0]:
# Clean the tweets. Try to remove the unwanted things from the tweets
# Create the function to clean the tweets
def cleanTxt(text):
  text = re.sub(r'@[A-Za-z0-9]+', '', text) # Removed @mentions
  text = re.sub(r'#','',text) #Removed the '#'
  text = re.sub(r'#','',text) #Removed the '#'
  text = re.sub(r'RT[\s]+','',text) # Removed RT
  #text = re.sub(r'https?:\/\/\/S+','',text) #Remove the hyperlink
  text = re.sub(r'^https?:\/\/.*[\r\n]*', '', text, flags=re.MULTILINE) #Remove the hyperlink
  text = re.sub(r'^http?:\/\/.*[\r\n]*', '', text, flags=re.MULTILINE) #Remove the hyperlink
  text = re.sub(r"http\S+", "", text)

  return text

#Clean the text
df['Tweets'] = df['Tweets'].apply(cleanTxt)
#Show the cleaned text
df

The sentiment function of textblob returns two properties, **polarity**, and **subjectivity**.

**Polarity** is float which lies in the range of [-1,1] where 1 means positive statement and -1 means a negative statement. 

**Subjective** sentences generally refer to personal opinion, emotion or judgment whereas objective refers to factual information. Subjectivity is also a float which lies in the range of [0,1]


In [0]:
# Create the function to get the subjectivity
def getSubjectivity(text):
  return TextBlob(text).sentiment.subjectivity

# Create the function to get the polarity
def getPolarity(text):
  return TextBlob(text).sentiment.polarity

# Create two new columns
df['Sunjectivity'] = df['Tweets'].apply(getSubjectivity)
df['Polarity'] = df['Tweets'].apply(getPolarity)

#Show the new dataframe with new added column
df

In [0]:
# Plot Word Cloud
allWords =  ' '.join([twts for twts in df['Tweets']])
wordCloud = WordCloud( width = 500, height  =300 , random_state = 21, max_font_size = 119).generate(allWords)

plt.imshow(wordCloud, interpolation = 'bilinear')
plt.axis('off')
plt.show()

In [0]:
# Create the function to compute the negative, neutral and positive analysis
def getAnalysis(score):
  if score < 0:
    return 'Negative'
  elif score == 0:
    return 'Neutral'
  else:
    return 'Positive'
  
df['Analysis'] = df['Polarity'].apply(getAnalysis)

#Show the dataframe
df

In [0]:
# Print all the positive tweets
j = 1
sortedDf =  df.sort_values(by=['Polarity'])
for i  in range(0, sortedDf.shape[0]):
  if (sortedDf['Analysis'][i] == 'Positive'):
    print(str(j) + ')'+sortedDf['Tweets'][i])
    print()
    j = j+1

In [0]:
# Lets print the negative tweets
j = 1
sortedDf = df.sort_values(by=['Polarity'], ascending=False)
for i in range(0, sortedDf.shape[0]):
  if(sortedDf['Analysis'][i] == 'Negative'):
    print(str(j) +')'+sortedDf['Tweets'][i])
    print()
    j = j+1


###5. **Visualization from Tweets**

In [0]:
# Plot the polarity and subjectivity
plt.figure(figsize=(8,6))
for i in range(0, df.shape[0]):
  plt.scatter(df['Polarity'][i], df['Sunjectivity'][i], color= 'Blue')

plt.title('Sentiment Analysis')
plt.xlabel('Polarity')
plt.ylabel('Subjectivity')
plt.show()

In [0]:
# Get the percentage of positive tweets
ptweets = df[df.Analysis == 'Positive']
ptweets = ptweets['Tweets']

round((ptweets.shape[0]/df.shape[0])*100,1)

In [0]:
# Get the percentage of negative tweets
ntweets = df[df.Analysis == 'Negative']
ntweets = ntweets['Tweets']

round((ntweets.shape[0]/df.shape[0]*100),1)

In [0]:
# Show the value counts

df['Analysis'].value_counts()

#plot and visualize the count
plt.title('Sentiment Analysis')
plt.xlabel('Sentiment')
plt.ylabel('Count')
df['Analysis'].value_counts().plot(kind="bar")
plt.show()

For ref:
a. https://www.analyticsvidhya.com/blog/2018/02/natural-language-processing-for-beginners-using-textblob/