<a href="https://colab.research.google.com/github/simon-clematide/colab-notebooks-for-teaching/blob/main/sentiment-analysis-overview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [39]:
!pip install flair # lets do this first to prevent restarting the runtime



# Sentiment and Emotion Analysis

1. **Definition**: Sentiment analysis is a computational technique in natural language processing (NLP) that identifies and categorizes opinions or emotions within text data to determine the writer's attitude towards a particular topic, product, or service.

2. **Applications**: It's widely used in business and marketing for brand monitoring, product reviews analysis, customer feedback, and social media monitoring, helping organizations understand consumer sentiments and preferences.

3. **Techniques and Challenges**: Sentiment analysis often employs machine learning, lexical methods, or a combination of both. It faces challenges like detecting sarcasm, context, cultural variations, and nuanced expressions of emotions.

Our running examples from "[Pride and Prejudice](https://www.gutenberg.org/ebooks/42671)" will be:
- a prototypical positive sentence `sent`: "Mr Davis is a handsome man."
- a phrase with a negation `neg`: "Elizabeth did't feel happy."
- a long literary sentence with nuanced expression `para` : "Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said, "She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men. "

In [42]:
sent = "Mr Davis is a handsome man."
neg =  "Elizabeth didn't feel happy."
para = '''"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."'''

# Lexical Approaches
They rely on word/lemma lists that have been categorized into sentiment classes.

## VaderSentiment
Gives an overall assessment of a text. What is negative, neutral, positive in terms of [words](https://github.com/cjhutto/vaderSentiment/blob/master/vaderSentiment/vader_lexicon.txt) and phrases. Aggregated in a compound value that gives an [overall score](https://github.com/cjhutto/vaderSentiment?tab=readme-ov-file#about-the-scoring) between -1 and 1. More on https://github.com/cjhutto/vaderSentiment

In [43]:
!pip install vaderSentiment



In [44]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

In [45]:
vs = analyzer.polarity_scores(sent)
print(sent, vs, sep="\n")

Mr Davis is a handsome man.
{'neg': 0.0, 'neu': 0.61, 'pos': 0.39, 'compound': 0.4939}


In [46]:
vs = analyzer.polarity_scores(neg)
print(neg, vs, sep="\n")

Elizabeth didn't feel happy.
{'neg': 0.5, 'neu': 0.5, 'pos': 0.0, 'compound': -0.4585}


In [47]:
vs = analyzer.polarity_scores(para)
print(para, vs, sep="\n")

"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."
{'neg': 0.113, 'neu': 0.887, 'pos': 0.0, 'compound': -0.7765}


In [48]:
# intensifiers are respected
vs = analyzer.polarity_scores("Mr Davis is an extremely handsome man.")
print(vs)

{'neg': 0.0, 'neu': 0.632, 'pos': 0.368, 'compound': 0.5413}


In [49]:
# Neutral words matter
# intensifiers are respected
vs = analyzer.polarity_scores("extremely handsome")
print(vs)

{'neg': 0.0, 'neu': 0.223, 'pos': 0.777, 'compound': 0.5413}


In [50]:
help(SentimentIntensityAnalyzer)

Help on class SentimentIntensityAnalyzer in module vaderSentiment.vaderSentiment:

class SentimentIntensityAnalyzer(builtins.object)
 |  SentimentIntensityAnalyzer(lexicon_file='vader_lexicon.txt', emoji_lexicon='emoji_utf8_lexicon.txt')
 |  
 |  Give a sentiment intensity score to sentences.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, lexicon_file='vader_lexicon.txt', emoji_lexicon='emoji_utf8_lexicon.txt')
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  make_emoji_dict(self)
 |      Convert emoji lexicon file to a dictionary
 |  
 |  make_lex_dict(self)
 |      Convert lexicon file to a dictionary
 |  
 |  polarity_scores(self, text)
 |      Return a float for sentiment strength based on the input text.
 |      Positive values are positive valence, negative value are negative
 |      valence.
 |  
 |  score_valence(self, sentiments, text)
 |  
 |  sentiment_valence(self, valence, sentitext, item, i, sentiments)
 |  
 |  ---------------------

## Texblob
Yet another NLP pipeline with [lexicon-based sentiment and polarity analysis](https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis)

In [51]:
!pip install textblob




In [52]:
from textblob import TextBlob

In [53]:
sent_result = TextBlob(sent)
print(sent, sent_result.sentiment,sep="\n")

Mr Davis is a handsome man.
Sentiment(polarity=0.5, subjectivity=1.0)


In [54]:
neg_result = TextBlob(neg)
print(neg, neg_result.sentiment,sep="\n")

Elizabeth didn't feel happy.
Sentiment(polarity=0.8, subjectivity=1.0)


In [55]:
para_result = TextBlob(para)
print(para, para_result.sentiment,sep="\n")

"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."
Sentiment(polarity=-0.018749999999999996, subjectivity=0.62625)


In [56]:
result = TextBlob("handsome")
print(result.sentiment,sep="\n")

Sentiment(polarity=0.5, subjectivity=1.0)


## NRCLex
An analyser for [sentiment (positive/negative) and basic emotions](https://github.com/metalcorebear/NRCLex). Simple lexical lookup.

In [57]:
! pip install NRCLex



In [58]:
import nltk
nltk.download('punkt')
from nrclex import NRCLex

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [59]:
analysis = NRCLex(sent)
print(sent, analysis.raw_emotion_scores, analysis.affect_dict, sep="\n")

Mr Davis is a handsome man.
{}
{}


In [60]:
analysis = NRCLex(neg)
print(neg, analysis.raw_emotion_scores, analysis.affect_dict, sep="\n")

Elizabeth didn't feel happy.
{'anticipation': 1, 'joy': 1, 'positive': 1, 'trust': 1}
{'happy': ['anticipation', 'joy', 'positive', 'trust']}


In [61]:
analysis = NRCLex(para)
print(para, analysis.raw_emotion_scores, analysis.affect_dict, sep="\n")

"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."
{'negative': 1, 'anticipation': 2, 'joy': 2, 'positive': 2, 'surprise': 2, 'trust': 1}
{'coldly': ['negative'], 'present': ['anticipation', 'joy', 'positive', 'surprise', 'trust'], 'young': ['anticipation', 'joy', 'positive', 'surprise']}


# Machine Learning Approaches
Typically trained on a variety of training sets: IMDB (Movie reviews), Tweets, etc.

## spaCy
A [spaCy-based textcategorization](https://github.com/Vishnunkumar/eng_spacysentiment) (textcat) NLP pipeline (tokensization and text classification). Trained on IMDB movie review dataset.

In [62]:
! pip install eng-spacysentiment



In [63]:
import eng_spacysentiment
nlp = eng_spacysentiment.load()

In [64]:
sent_doc = nlp(sent)
print(sent,sent_doc.cats,sep="\n")

Mr Davis is a handsome man.
{'positive': 0.4639013409614563, 'negative': 0.0801813006401062, 'neutral': 0.4559173583984375}


In [65]:
neg_doc = nlp(neg)
print(neg,neg_doc.cats,sep="\n")

Elizabeth didn't feel happy.
{'positive': 0.5659438371658325, 'negative': 0.20362025499343872, 'neutral': 0.23043592274188995}


In [66]:
para_doc = nlp(para)
print(para,para_doc.cats,sep="\n")

"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."
{'positive': 0.09069690853357315, 'negative': 0.9056900143623352, 'neutral': 0.003613138571381569}


In [67]:
doc = nlp("handsome")
print(doc.cats,sep="\n")

{'positive': 0.2304578423500061, 'negative': 0.13150657713413239, 'neutral': 0.6380355954170227}


In [68]:
# some information on the pipeline
nlp.meta

{'lang': 'eng',
 'name': 'spacysentiment',
 'version': '2.3.0',
 'description': 'sentiment analysis using spacy pipelines',
 'author': 'Vishnu',
 'email': 'vishnunkumar25@gmail.com',
 'url': 'https://github.com/Vishnunkumar/eng_spacysentiment',
 'license': 'MIT',
 'spacy_version': '>=3.5.3,<3.6.0',
 'spacy_git_version': 'Unknown',
 'vectors': {'width': 0,
  'vectors': 0,
  'keys': 0,
  'name': None,
  'mode': 'default'},
 'labels': {'textcat': ['positive', 'negative', 'neutral']},
 'pipeline': ['textcat'],
 'components': ['textcat'],
 'disabled': [],
 'requirements': []}

## flair Sentiment
Another neural model with [different models for sentiment analysis](https://flairnlp.github.io/docs/tutorial-basics/tagging-sentiment).

In [69]:
from flair.models import TextClassifier
from flair.data import Sentence

classifier = TextClassifier.load('sentiment-fast')

In [70]:
sentence=Sentence(sent)
classifier.predict(sentence)
print(sent, sentence.to_dict()["labels"], sep="\n")

Mr Davis is a handsome man.
[{'value': 'POSITIVE', 'confidence': 0.993449866771698}]


In [71]:
sentence=Sentence(neg)
classifier.predict(sentence)
print(neg, sentence.to_dict()["labels"], sep="\n")

Elizabeth didn't feel happy.
[{'value': 'NEGATIVE', 'confidence': 0.9768087863922119}]


In [72]:
sentence=Sentence(para)
classifier.predict(sentence)
print(para, sentence.to_dict()["labels"], sep="\n")

"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."
[{'value': 'NEGATIVE', 'confidence': 0.9999345541000366}]


In [73]:
# you can run words and phrases through flair
text = "not handsome enough"
sentence=Sentence(text)
classifier.predict(sentence)
print(text, sentence.to_dict()["labels"], sep="\n")

not handsome enough
[{'value': 'NEGATIVE', 'confidence': 0.9959169030189514}]


## Transformer-based Model on Huggingface
A fine-tuned robert model. Try it on huggingface: https://huggingface.co/siebert/sentiment-roberta-large-english

In [74]:
from transformers import pipeline
sentiment_analysis = pipeline("sentiment-analysis",model="siebert/sentiment-roberta-large-english")



In [75]:
print(sent, sentiment_analysis(sent),sep="\n")

Mr Davis is a handsome man.
[{'label': 'POSITIVE', 'score': 0.9985645413398743}]


In [76]:
print(neg, sentiment_analysis(neg),sep="\n")

Elizabeth didn't feel happy.
[{'label': 'NEGATIVE', 'score': 0.9982998967170715}]


In [77]:
print(para, sentiment_analysis(para),sep="\n")

"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."
[{'label': 'NEGATIVE', 'score': 0.9975528120994568}]


# Conclusions
Lexial approaches have problems with nuanced sentiments or negation. Trained models have typically rather extreme values and might be biased towards the training data (tweets, reviews).