<a href="https://colab.research.google.com/github/simon-clematide/colab-notebooks-for-teaching/blob/main/sentiment-analysis-overview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install flair # lets do this first to prevent restarting the runtime

Collecting flair
  Downloading flair-0.15.1-py3-none-any.whl.metadata (12 kB)
Collecting boto3>=1.20.27 (from flair)
  Downloading boto3-1.37.7-py3-none-any.whl.metadata (6.6 kB)
Collecting conllu<5.0.0,>=4.0 (from flair)
  Downloading conllu-4.5.3-py2.py3-none-any.whl.metadata (19 kB)
Collecting ftfy>=6.1.0 (from flair)
  Downloading ftfy-6.3.1-py3-none-any.whl.metadata (7.3 kB)
Collecting langdetect>=1.0.9 (from flair)
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m32.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting mpld3>=0.3 (from flair)
  Downloading mpld3-0.5.10-py3-none-any.whl.metadata (5.1 kB)
Collecting pptree>=3.1 (from flair)
  Downloading pptree-3.1.tar.gz (3.0 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pytorch-revgrad>=0.2.0 (from flair)
  Downloading pytorch_revgrad-0.2.0-py3-none-any.whl.metadata (1.7 kB)
C

# Sentiment and Emotion Analysis

1. **Definition**: Sentiment analysis is a computational technique in natural language processing (NLP) that identifies and categorizes opinions or emotions within text data to determine the writer's attitude towards a particular topic, product, or service.

2. **Applications**: It's widely used in business and marketing for brand monitoring, product reviews analysis, customer feedback, and social media monitoring, helping organizations understand consumer sentiments and preferences.

3. **Techniques and Challenges**: Sentiment analysis often employs machine learning, lexical methods, or a combination of both. It faces challenges like detecting sarcasm, context, cultural variations, and nuanced expressions of emotions.

Our running examples from "[Pride and Prejudice](https://www.gutenberg.org/ebooks/42671)" will be:
- a prototypical positive sentence `sent`: "Mr Davis is a handsome man."
- a phrase with a negation `neg`: "Elizabeth did't feel happy."
- a long literary sentence with nuanced expression `para` : "Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said, "She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men. "

In [20]:
sent = "Mr Davis is a handsome man."
neg =  "Elizabeth didn't feel happy."
para = '''"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."'''

# Lexical Approaches
They rely on word/lemma lists that have been categorized into sentiment classes.

## VaderSentiment
Gives an overall assessment of a text. What is negative, neutral, positive in terms of [words](https://github.com/cjhutto/vaderSentiment/blob/master/vaderSentiment/vader_lexicon.txt) and phrases. Aggregated in a compound value that gives an [overall score](https://github.com/cjhutto/vaderSentiment?tab=readme-ov-file#about-the-scoring) between -1 and 1. More on https://github.com/cjhutto/vaderSentiment

In [21]:
!pip install vaderSentiment



In [22]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

In [23]:
vs = analyzer.polarity_scores(sent)
print(sent, vs, sep="\n")

Mr Davis is a handsome man.
{'neg': 0.0, 'neu': 0.61, 'pos': 0.39, 'compound': 0.4939}


In [24]:
vs = analyzer.polarity_scores(neg)
print(neg, vs, sep="\n")

Elizabeth didn't feel happy.
{'neg': 0.5, 'neu': 0.5, 'pos': 0.0, 'compound': -0.4585}


In [25]:
vs = analyzer.polarity_scores(para)
print(para, vs, sep="\n")

"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."
{'neg': 0.113, 'neu': 0.887, 'pos': 0.0, 'compound': -0.7765}


In [26]:
# intensifiers are respected
vs = analyzer.polarity_scores("Mr Davis is an extremely handsome man.")
print(vs)

{'neg': 0.0, 'neu': 0.632, 'pos': 0.368, 'compound': 0.5413}


In [27]:
# Neutral words matter
# intensifiers are respected
vs = analyzer.polarity_scores("extremely handsome")
print(vs)

{'neg': 0.0, 'neu': 0.223, 'pos': 0.777, 'compound': 0.5413}


In [28]:
help(SentimentIntensityAnalyzer)

Help on class SentimentIntensityAnalyzer in module vaderSentiment.vaderSentiment:

class SentimentIntensityAnalyzer(builtins.object)
 |  SentimentIntensityAnalyzer(lexicon_file='vader_lexicon.txt', emoji_lexicon='emoji_utf8_lexicon.txt')
 |  
 |  Give a sentiment intensity score to sentences.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, lexicon_file='vader_lexicon.txt', emoji_lexicon='emoji_utf8_lexicon.txt')
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  make_emoji_dict(self)
 |      Convert emoji lexicon file to a dictionary
 |  
 |  make_lex_dict(self)
 |      Convert lexicon file to a dictionary
 |  
 |  polarity_scores(self, text)
 |      Return a float for sentiment strength based on the input text.
 |      Positive values are positive valence, negative value are negative
 |      valence.
 |  
 |  score_valence(self, sentiments, text)
 |  
 |  sentiment_valence(self, valence, sentitext, item, i, sentiments)
 |  
 |  ---------------------

## Texblob
Yet another NLP pipeline with [lexicon-based sentiment and polarity analysis](https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis)

In [29]:
!pip install textblob




In [30]:
from textblob import TextBlob

In [31]:
sent_result = TextBlob(sent)
print(sent, sent_result.sentiment,sep="\n")

Mr Davis is a handsome man.
Sentiment(polarity=0.5, subjectivity=1.0)


In [32]:
neg_result = TextBlob(neg)
print(neg, neg_result.sentiment,sep="\n")

Elizabeth didn't feel happy.
Sentiment(polarity=0.8, subjectivity=1.0)


In [33]:
para_result = TextBlob(para)
print(para, para_result.sentiment,sep="\n")

"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."
Sentiment(polarity=-0.018749999999999996, subjectivity=0.62625)


In [34]:
result = TextBlob("handsome")
print(result.sentiment,sep="\n")

Sentiment(polarity=0.5, subjectivity=1.0)


## NRCLex
An analyser for [sentiment (positive/negative) and basic emotions](https://github.com/metalcorebear/NRCLex). Simple lexical lookup.

In [35]:
! pip install NRCLex



In [39]:
import nltk
nltk.download('punkt_tab')
from nrclex import NRCLex

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [40]:
analysis = NRCLex(sent)
print(sent, analysis.raw_emotion_scores, analysis.affect_dict, sep="\n")

MissingCorpusError: 
Looks like you are missing some required data for this feature.

To download the necessary data, simply run

    python -m textblob.download_corpora

or use the NLTK downloader to download the missing data: http://nltk.org/data.html
If this doesn't fix the problem, file an issue at https://github.com/sloria/TextBlob/issues.


In [None]:
analysis = NRCLex(neg)
print(neg, analysis.raw_emotion_scores, analysis.affect_dict, sep="\n")

In [None]:
analysis = NRCLex(para)
print(para, analysis.raw_emotion_scores, analysis.affect_dict, sep="\n")

# Machine Learning Approaches
Typically trained on a variety of training sets: IMDB (Movie reviews), Tweets, etc.

## spaCy
A [spaCy-based textcategorization](https://github.com/Vishnunkumar/eng_spacysentiment) (textcat) NLP pipeline (tokensization and text classification). Trained on IMDB movie review dataset.

In [None]:
! pip install eng-spacysentiment

In [None]:
import eng_spacysentiment
nlp = eng_spacysentiment.load()

In [None]:
sent_doc = nlp(sent)
print(sent,sent_doc.cats,sep="\n")

In [None]:
neg_doc = nlp(neg)
print(neg,neg_doc.cats,sep="\n")

In [None]:
para_doc = nlp(para)
print(para,para_doc.cats,sep="\n")

In [None]:
doc = nlp("handsome")
print(doc.cats,sep="\n")

In [None]:
# some information on the pipeline
nlp.meta

## flair Sentiment
Another neural model with [different models for sentiment analysis](https://flairnlp.github.io/docs/tutorial-basics/tagging-sentiment).

In [None]:
from flair.models import TextClassifier
from flair.data import Sentence

classifier = TextClassifier.load('sentiment-fast')

In [None]:
sentence=Sentence(sent)
classifier.predict(sentence)
print(sent, sentence.to_dict()["labels"], sep="\n")

In [None]:
sentence=Sentence(neg)
classifier.predict(sentence)
print(neg, sentence.to_dict()["labels"], sep="\n")

In [None]:
sentence=Sentence(para)
classifier.predict(sentence)
print(para, sentence.to_dict()["labels"], sep="\n")

In [None]:
# you can run words and phrases through flair
text = "not handsome enough"
sentence=Sentence(text)
classifier.predict(sentence)
print(text, sentence.to_dict()["labels"], sep="\n")

## Transformer-based Model on Huggingface
A fine-tuned robert model. Try it on huggingface: https://huggingface.co/siebert/sentiment-roberta-large-english

In [None]:
from transformers import pipeline
sentiment_analysis = pipeline("sentiment-analysis",model="siebert/sentiment-roberta-large-english")



In [None]:
print(sent, sentiment_analysis(sent),sep="\n")

In [None]:
print(neg, sentiment_analysis(neg),sep="\n")

In [None]:
print(para, sentiment_analysis(para),sep="\n")

# Conclusions
Lexial approaches have problems with nuanced sentiments or negation. Trained models have typically rather extreme values and might be biased towards the training data (tweets, reviews).