### Sentiment Analysis Algorithms

Hugging Face zero-shot sentiment analysis uses zero-shot learning (ZSL), which refers to building a model and using it to make predictions on tasks the model was not trained to do. It can be used on any text classification task, including but not limited to sentiment analysis and topic modeling.  
  
Zero-shot sentiment analysis from Hugging Face is a use case of the Hugging Face zero-shot text classification model. It is a Natural Language Inference (NLI) model where two sequences are compared to see if they contradict each other, entail each other, or are neutral (neither contradict nor entail).  
  
When using the Hugging Face zero-shot sentiment analysis, we will have the text as the premise and the sentiment labels such as positive and negative as hypotheses. If the model predicts that a text document entails positive, then the document is predicted to have a positive sentiment. Otherwise, the document is predicted to have a negative sentiment.  
  
The Flair pre-trained sentiment model is a text classification model explicitly built for predicting sentiments. The modeling dataset set is the IMDB, so it may work better for documents that are similar to the IMDB data than the documents that are quite different from IMDB data.

In [None]:
# Data processing
import pandas as pd

# Hugging Face model
from transformers import pipeline

# Import flair pre-trained sentiment model
from flair.models import TextClassifier
classifier = TextClassifier.load('en-sentiment')

# Import flair Sentence to process input text
from flair.data import Sentence

# Import accuracy_score to check performance
from sklearn.metrics import accuracy_score

In [1]:
# Read in data
amz_review = pd.read_csv('sentiment labelled sentences/amazon_cells_labelled.txt', sep='\t', names=['review', 'label'])

# Take a look at the data
amz_review.head()

NameError: name 'pd' is not defined

### Hugging Face Zero-shot Sentiment Prediction

Firstly, the pipeline is defined:  
  
* task describes the task for the pipeline. The task name we use is zero-shot-classification.  
* model is the model name for the prediction used in the pipeline. You can find the full list of available models for zero-shot classification on the Hugging Face website. At the time this tutorial was created in January 2023, the bart-large-mnli by Facebook(Meta) is the model with the highest number of downloads and likes, so we will use it for the pipeline.  
* device defines the device type. device=0 means that we are using GPU for the pipeline.

In [None]:
# Define pipeline
classifier = pipeline(task="zero-shot-classification", 
                      model="facebook/bart-large-mnli",
                      device=0) 

After defining the pipeline the data is processed and the sentiments are predicted by the pipeline.  
  
* Firstly, the reviews are put into a list for the pipeline.  
* Then, the candidate labels are defined. We set two candidate labels, positive and negative.  
* After that, the hypothesis template is defined. The default template is used by the Hugging Face pipeline is This example is {}. We use a hypothesis template that is more specific to the sentiment analysis The sentiment of this review is {}. and it helps to improve the results.  
* Finally, the text, the candidate labels, and the hypothesis template are passed into the zero-shot classification pipeline called classifier.

In [None]:
# Put reviews in a list
sequences = amz_review['review'].to_list()

# Define the candidate labels 
candidate_labels = ["positive", "negative"]

# Set the hyppothesis template
hypothesis_template = "The sentiment of this review is {}."

# Prediction results
hf_prediction = classifier(sequences, candidate_labels, hypothesis_template=hypothesis_template)

# Save the output as a dataframe
hf_prediction = pd.DataFrame(hf_prediction)

# Take a look at the data
hf_prediction.head()

In [None]:
# The column for the predicted topic
hf_prediction['hf_prediction'] = hf_prediction['labels'].apply(lambda x: x[0])

# Map sentiment values
hf_prediction['hf_prediction'] = hf_prediction['hf_prediction'].map({'positive': 1, 'negative': 0})

# The column for the score of predicted topic
hf_prediction['hf_predicted_score'] = hf_prediction['scores'].apply(lambda x: x[0])

# The actual labels
hf_prediction['true_label'] = amz_review['label']

# Drop the columns that we do not need
hf_prediction = hf_prediction.drop(['labels', 'scores'], axis=1)

# Take a look at the data
hf_prediction.head()

In [None]:
# Compare Actual and Predicted
accuracy_score(hf_prediction['hf_prediction'], hf_prediction['true_label'])

### Flair Pretrained Sentiment Model

Let’s define a function that takes a review as input and the score and the predicted label as outputs.  
  
* Firstly, the review text is passed into the Sentence function to get tokenized.  
* Then, we use the .predict() to make sentiment predictions.  
* After the prediction, we can extract score and value from the sentence. value is the predicted sentiment label, and score is how confident the model is about the prediction.  
* Finally, the function output the score and the value for the input review.

In [None]:
# Define a function to get Flair sentiment prediction score
def score_flair(text):
  # Flair tokenization
  sentence = Sentence(text)
  # Predict sentiment
  classifier.predict(sentence)
  # Extract the score
  score = sentence.labels[0].score
  # Extract the predicted label
  value = sentence.labels[0].value
  # Return the score and the predicted label
  return score, value

In [None]:
# Get sentiment score for each review
amz_review['scores_flair'] = amz_review['review'].apply(lambda s: score_flair(s)[0])

# Predict sentiment label for each review
amz_review['pred_flair'] = amz_review['review'].apply(lambda s: score_flair(s)[1])

# Check the distribution of the score
amz_review['scores_flair'].describe()

In [None]:
# Change the label of flair prediction to 0 if negative and 1 if positive
mapping = {'NEGATIVE': 0, 'POSITIVE': 1}
amz_review['pred_flair'] = amz_review['pred_flair'].map(mapping)

# Take a look at the data
amz_review.head()


In [None]:
# Compare Actual and Predicted
accuracy_score(amz_review['label'],amz_review['pred_flair'])