### **Advanced Sentiment Analysis**
Sentiment analysis is the process of determining the emotional tone or opinion expressed in a piece of text. In this advanced notebook, we will explore techniques to improve sentiment analysis by addressing some of the limitations of the basic approach, such as handling sarcasm, irony, and complex language structures.
### **Polarity and Subjectivity**
Polarity is a measure of the positive or negative orientation of a text, ranging from -1 to 1. Subjectivity refers to the extent to which a text expresses personal opinions or feelings, ranging from 0 to 1.
### **Handling Sarcasm and Irony**
Sarcasm and irony can be challenging for sentiment analysis models to interpret correctly. One approach to handle this is to use a sarcasm detection model alongside the sentiment analysis model. We can train a separate model specifically for sarcasm detection using labeled data and incorporate its predictions into the sentiment analysis pipeline.

In [34]:
# Import necessary libraries
import spacy
import pandas as pd
from spacytextblob.spacytextblob import SpacyTextBlob
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

In [35]:
# Load the spaCy model
nlp = spacy.load('en_core_web_lg')
nlp.add_pipe('spacytextblob')

<spacytextblob.spacytextblob.SpacyTextBlob at 0x4bdfa8cd0>

In [36]:
# Note: this loads almost 1GB of data and might take a while, run with caution
# Load a pre-trained sarcasm detection model
sarcasm_tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-sarcasm-twitter")
sarcasm_model = AutoModelForSequenceClassification.from_pretrained("mrm8488/t5-base-finetuned-sarcasm-twitter")
sarcasm_pipeline = pipeline("text-classification", model=sarcasm_model, tokenizer=sarcasm_tokenizer)



tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

Some weights of T5ForSequenceClassification were not initialized from the model checkpoint at mrm8488/t5-base-finetuned-sarcasm-twitter and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [37]:
# Text Preprocessing
def preprocess_text(text):
    # Create a spaCy document
    doc = nlp(text)
    
    # Remove stopwords and punctuation
    filtered_tokens = [token for token in doc if not token.is_stop and not token.is_punct]
    
    # Lemmatize the tokens
    lemmatized_tokens = [token.lemma_ for token in filtered_tokens]
    
    # Join the lemmatized tokens back into a string
    preprocessed_text = ' '.join(lemmatized_tokens)
    
    return preprocessed_text

In [38]:
# Sarcasm Detection
def detect_sarcasm(text):
    result = sarcasm_pipeline(text)
    return result[0]['label']

In [39]:
# Sentiment Analysis
def analyze_sentiment(text):
    # Create a spaCy document
    doc = nlp(text)
    
    # Get the sentiment polarity and subjectivity
    polarity = doc._.blob.polarity
    subjectivity = doc._.blob.subjectivity
    
    # Detect sarcasm
    sarcasm = detect_sarcasm(text)
    
    # Adjust polarity based on sarcasm detection
    if sarcasm == 'SARCASM':
        polarity *= -1
    
    # Determine the sentiment label
    if polarity > 0:
        sentiment = 'Positive'
    elif polarity < 0:
        sentiment = 'Negative'
    else:
        sentiment = 'Neutral'
    
    return sentiment, polarity, subjectivity, sarcasm

In [40]:
# Load in the dataset
df = pd.read_csv('Reviews.csv')

In [41]:
# Preprocess the text column
df['preprocessed_text'] = df['Text'].apply(preprocess_text)

In [42]:
# Analyze the sentiment of the preprocessed text
df['sentiment'], df['polarity'], df['subjectivity'], df['sarcasm'] = zip(*df['preprocessed_text'].apply(analyze_sentiment))

In [43]:
df.head()

Unnamed: 0,Id,Summary,Text,preprocessed_text,sentiment,polarity,subjectivity,sarcasm
0,1,Good Quality Dog Food,I have bought several of the Vitality canned d...,buy Vitality can dog food product find good qu...,Positive,0.7,0.6,LABEL_0
1,2,Not as Advertised,Product arrived labeled as Jumbo Salted Peanut...,product arrived label Jumbo Salted Peanuts pea...,Positive,0.216667,0.762963,LABEL_0
2,3,"""Delight"" says it all",This is a confection that has been around a fe...,confection century light pillowy citrus gela...,Positive,0.187,0.548,LABEL_0
3,4,Cough Medicine,If you are looking for the secret ingredient i...,look secret ingredient Robitussin believe find...,Positive,0.15,0.65,LABEL_1
4,5,Great taffy,Great taffy at a great price. There was a wid...,great taffy great price wide assortment yumm...,Positive,0.458333,0.6,LABEL_1


In [44]:
# Show text of top 5 most negative reviews along with their polarity, preprocessed text, and sarcasm detection
for index, row in df.sort_values('polarity').head().iterrows():
    print(f"Review: {row['Text']}")
    print(f"Preprocessed Text: {row['preprocessed_text']}")
    print(f"Polarity: {row['polarity']}")
    print(f"Sarcasm: {row['sarcasm']}")
    print()

Review: I purchased the Mango flavor, and to me it doesn't take like Mango at all.  There is no hint of sweetness, and unfortunately there is a hint or aftertaste almost like licorice.  I've been consuming various sports nutrition products for decades, so I'm familiar and have come to like the taste of the most of the products I've tried.  The mango flavor is one of the least appealing I've tasted.  It's not terrible, but it's bad enough that I notice the bad taste every sip I take.
Preprocessed Text: purchase Mango flavor like Mango   hint sweetness unfortunately hint aftertaste like licorice   consume sport nutrition product decade familiar come like taste product try   mango flavor appeal taste   terrible bad notice bad taste sip
Polarity: -0.5049999999999999
Sarcasm: LABEL_1

Review: Arrived in 6 days and were so stale i could not eat any of the 6 bags!!
Preprocessed Text: arrive 6 day stale eat 6 bag
Polarity: -0.5
Sarcasm: LABEL_0

Review: The Strawberry Twizzlers are my guilty p

In [45]:
# Show the top 5 most positive reviews along with their polarity, preprocessed text, and sarcasm detection
for index, row in df.sort_values('polarity', ascending=False).head().iterrows():
    print(f"Review: {row['Text']}")
    print(f"Preprocessed Text: {row['preprocessed_text']}")
    print(f"Polarity: {row['polarity']}")
    print(f"Sarcasm: {row['sarcasm']}")
    print()

Review: I can remember buying this candy as a kid and the quality hasn't dropped in all these years. Still a superb product you won't be disappointed with.
Preprocessed Text: remember buy candy kid quality drop year superb product will disappoint
Polarity: 1.0
Sarcasm: LABEL_1

Review: This offer is a great price and a great taste, thanks Amazon for selling this product.<br /><br />Staral
Preprocessed Text: offer great price great taste thank Amazon sell product.<br /><br />Staral
Polarity: 0.8
Sarcasm: LABEL_1

Review: This is great dog food, my dog has severs allergies and this brand is the only one that we can feed him.
Preprocessed Text: great dog food dog sever allergy brand feed
Polarity: 0.8
Sarcasm: LABEL_1

Review: Great product, nice combination of chocolates and perfect size!  The bags had plenty, and they were shipped promptly.  The kids in the neighborhood liked our candies!
Preprocessed Text: great product nice combination chocolate perfect size   bag plenty ship promptly

**Improvements**:
- Incorporated sarcasm detection to handle sarcastic and ironic texts more accurately.

**Limitations and Future Work**:
- The sarcasm detection model used in this example is pre-trained and may not be optimal for all domains. Fine-tuning the sarcasm detection model on domain-specific data could improve its accuracy.
- Evaluating the performance of the improved sentiment analysis model against a labeled dataset would provide insights into its effectiveness.