<a href="https://colab.research.google.com/github/simon-clematide/colab-notebooks-for-teaching/blob/main/sentiment-analysis-overview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
# If we run on cpu only, it can be  faster to use the following pip install
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Otherwise use the following line
%pip install flair # lets do this first to prevent restarting the runtime

Looking in indexes: https://download.pytorch.org/whl/cpu


# Sentiment and Emotion Analysis

1. **Definition**: Sentiment analysis is a computational technique in natural language processing (NLP) that identifies and categorizes opinions or emotions within text data to determine the writer's attitude towards a particular topic, product, or service.

2. **Applications**: It's widely used in business and marketing for brand monitoring, product reviews analysis, customer feedback, and social media monitoring, helping organizations understand consumer sentiments and preferences.

3. **Techniques and Challenges**: Sentiment analysis often employs machine learning, lexical methods, or a combination of both. It faces challenges like detecting sarcasm, context, cultural variations, and nuanced expressions of emotions.

Our running examples from "[Pride and Prejudice](https://www.gutenberg.org/ebooks/42671)" will be:
- a prototypical positive sentence `sent`: "Mr Davis is a handsome man."
- a phrase with a negation `neg`: "Elizabeth did't feel happy."
- a long literary sentence with nuanced expression `para` : "Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said, "She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men. "

In [None]:
sent = "Mr Davis is a handsome man."
neg =  "Elizabeth didn't feel happy."
para = '''"Which do you mean?" and turning round, he looked for a moment at Elizabeth, till catching her eye, he withdrew his own and coldly said,
"She is tolerable; but not handsome enough to tempt me; and I am in no humour at present to give consequence to young ladies who are slighted by other men."'''

# Lexical Approaches
They rely on word/lemma lists that have been categorized into sentiment classes.

## VaderSentiment
Gives an overall assessment of a text. What is negative, neutral, positive in terms of [words](https://github.com/cjhutto/vaderSentiment/blob/master/vaderSentiment/vader_lexicon.txt) and phrases. Aggregated in a compound value that gives an [overall score](https://github.com/cjhutto/vaderSentiment?tab=readme-ov-file#about-the-scoring) between -1 and 1. More on https://github.com/cjhutto/vaderSentiment

In [None]:
%pip install vaderSentiment

In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

In [None]:
vs = analyzer.polarity_scores(sent)
print(sent, vs, sep="\n")

In [None]:
vs = analyzer.polarity_scores(neg)
print(neg, vs, sep="\n")

In [None]:
vs = analyzer.polarity_scores(para)
print(para, vs, sep="\n")

In [None]:
# intensifiers are respected
vs = analyzer.polarity_scores("Mr Davis is an extremely handsome man.")
print(vs)

In [None]:
# Neutral words matter
# intensifiers are respected
vs = analyzer.polarity_scores("extremely handsome")
print(vs)

In [None]:
help(SentimentIntensityAnalyzer)

## Texblob
Yet another NLP pipeline with [lexicon-based sentiment and polarity analysis](https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis)

In [None]:
%pip install textblob


In [None]:
from textblob import TextBlob

In [None]:
sent_result = TextBlob(sent)
print(sent, sent_result.sentiment,sep="\n")

In [None]:
neg_result = TextBlob(neg)
print(neg, neg_result.sentiment,sep="\n")

In [None]:
para_result = TextBlob(para)
print(para, para_result.sentiment,sep="\n")

In [None]:
result = TextBlob("handsome")
print(result.sentiment,sep="\n")

## NRCLex
An analyser for [sentiment (positive/negative) and basic emotions](https://github.com/metalcorebear/NRCLex). Simple lexical lookup.

In [None]:
%pip install NRCLex

In [None]:
import nltk
nltk.download('punkt_tab')
from nrclex import NRCLex

In [None]:
analysis = NRCLex(sent)
print(sent, analysis.raw_emotion_scores, analysis.affect_dict, sep="\n")

In [None]:
analysis = NRCLex(neg)
print(neg, analysis.raw_emotion_scores, analysis.affect_dict, sep="\n")

In [None]:
analysis = NRCLex(para)
print(para, analysis.raw_emotion_scores, analysis.affect_dict, sep="\n")

# Machine Learning Approaches
Typically trained on a variety of training sets: IMDB (Movie reviews), Tweets, etc.

## spaCy
A [spaCy-based textcategorization](https://github.com/Vishnunkumar/eng_spacysentiment) (textcat) NLP pipeline (tokensization and text classification). Trained on IMDB movie review dataset.

In [None]:
%pip install eng-spacysentiment

In [None]:
import eng_spacysentiment
nlp = eng_spacysentiment.load()

In [None]:
sent_doc = nlp(sent)
print(sent,sent_doc.cats,sep="\n")

In [None]:
neg_doc = nlp(neg)
print(neg,neg_doc.cats,sep="\n")

In [None]:
para_doc = nlp(para)
print(para,para_doc.cats,sep="\n")

In [None]:
doc = nlp("handsome")
print(doc.cats,sep="\n")

In [None]:
# some information on the pipeline
nlp.meta

## flair Sentiment
Another neural model with [different models for sentiment analysis](https://flairnlp.github.io/docs/tutorial-basics/tagging-sentiment).

In [None]:
%pip install gensim

In [None]:
from flair.models import TextClassifier
from flair.data import Sentence

classifier = TextClassifier.load('sentiment-fast')

In [None]:
sentence=Sentence(sent)
classifier.predict(sentence)
print(sent, sentence.to_dict()["labels"], sep="\n")

In [None]:
sentence=Sentence(neg)
classifier.predict(sentence)
print(neg, sentence.to_dict()["labels"], sep="\n")

In [None]:
sentence=Sentence(para)
classifier.predict(sentence)
print(para, sentence.to_dict()["labels"], sep="\n")

In [None]:
# you can run words and phrases through flair
text = "not handsome enough"
sentence=Sentence(text)
classifier.predict(sentence)
print(text, sentence.to_dict()["labels"], sep="\n")

## Transformer-based Model on Huggingface
A fine-tuned robert model. Try it on huggingface: https://huggingface.co/siebert/sentiment-roberta-large-english

In [None]:
from transformers import pipeline
sentiment_analysis = pipeline("sentiment-analysis",model="siebert/sentiment-roberta-large-english")



In [None]:
print(sent, sentiment_analysis(sent),sep="\n")

In [None]:
print(neg, sentiment_analysis(neg),sep="\n")

In [None]:
print(para, sentiment_analysis(para),sep="\n")

# Conclusions
Lexial approaches have problems with nuanced sentiments or negation. Trained models have typically rather extreme values and might be biased towards the training data (tweets, reviews).