# Sentiment analysis using Bing Liu lexicon
### By Mohammed KASRI

Sentiment analysis aims to determine whether a given text contains negative, positive, or neutral emotions. It’s a form of text analytics that uses natural language processing (NLP) and machine learning. Sentiment analysis is also known as “opinion mining” or “emotion artificial intelligence”.

There are two common approches for sentiment analysis:
<ul>
    <li>Machine Learning</li>
    <li>Rule-Based</li>
</ul>

A rule-based approach usually uses a lexicon, also known as a dictionary or vocabulary. In sentiment analysis, a sentiment lexicon is a list of words
with thier corresponding sentiment polarity (magnitude of negative or positive score), parts of speech (POS) tags, emotion and so on.

### The most popular sentiment lexicon are as follow:

- Bing Liu’s lexicon
- MPQA subjectivity lexicon
- TextBlob lexicon
- AFINN lexicon
- SentiWordNet lexicon
- VADER lexicon

In this jupyter notebook we will focus on how to perform sentiment analyis using Bing Liu lexicon

### What is Bing Liu lexicon ?

Bing Liu lexicon (Opinion Lexicon) is a list of English positive and negative opinion words or sentiment words. This lexicon was built to predict the polarity of product features phrases that are summarized to provide an overall score for that product feature. The list contains 2006 positive words and 4783 negative words.

### How to use Bing Liu lexicon ?
The goal of this part is to provide you with all the prerequisites such as the dependencies, the dataset, and the actual implementation to conduct sentiment analysis using Opinion Lexicon.

### Load Dependencies

In [14]:
import pandas as pd
import text_normalizer as tn
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

text_normalizer in this case is a simple python script that contains all the necessary functions to preprocess the dataset.

### Load Dataset

In [2]:
data = pd.read_csv('./data/test.tsv', sep='\t', header=None)
data.columns=['sentiment','text']
data.head()

Unnamed: 0,sentiment,text
0,0,"no movement , no yuks , not much of anything ."
1,0,"a gob of drivel so sickly sweet , even the eag..."
2,0,"gangs of new york is an unapologetic mess , wh..."
3,0,"we never really feel involved with the story ,..."
4,1,this is one of polanski 's best films .


In [15]:
sentences = data.text.tolist()
sentiments = ['positive' if label==1 else 'negative' for label in data.sentiment.tolist()]

### Preporocess text

In [4]:
sentences = tn.normalize_corpus(sentences)



### Sentiment analysis with Bing Liu lexicon

#### Load lexicon

In [5]:
liu = pd.read_csv('./lexicons/BingLiu.csv')
liu.columns =['word','sentiment']
liu_lexicon={}
for index, row in liu.iterrows():
    if row['sentiment']=='positive':
        liu_lexicon[row['word']] = 1
    else:
        liu_lexicon[row['word']] = -1

#### The rule-based method to classify sentiments

In [30]:
def sa_bingliu_sum(text,lexicon):
    score = 0
    for word in text.split():
        if word in lexicon:
            score += int(lexicon[word])

    if (score > 0):
        return 'positive'
    return 'negative'

#### Predict one example

In [31]:
review = sentences[4]
print('Review:', review)
print('Sentiment:', sentiments[4])
print('Predicted sentiment:', sa_bingliu_sum(review,liu_lexicon))

Review: this is one of polanski 's best films .
Sentiment: positive
Predicted sentiment: positive


#### Classify sentiments

In [32]:
predicted_sentiments = [sa_bingliu_sum(review,liu_lexicon) for review in sentences]

#### View the results

In [33]:
accuracy = accuracy_score(sentiments,predicted_sentiments)
precision = precision_score(sentiments,predicted_sentiments, average='weighted')
recall = recall_score(sentiments,predicted_sentiments, average='weighted')
f1 = f1_score(sentiments,predicted_sentiments, average='weighted')
print('Accuracy:  {:2.2%} '.format(accuracy))
print('Precision: {:2.2%} '.format(precision))
print('Recall:    {:2.2%} '.format(recall))
print('F1 Score:  {:2.2%} '.format(f1)) 

Accuracy:  71.17% 
Precision: 71.56% 
Recall:    71.17% 
F1 Score:  71.03% 
