**Assignment Title: Lab Assignment 6 - Applications of LLMs**

**Author name: Garima Astha**

**ASU ID: 1234333687 (gastha)**

**file creation date: 27 Feb 2025**

**Objectives**

In this lab assignment, you will explore sentiment analysis on a subset of the Yelp Review dataset using zero-shot prompting, few-shot prompting, and multiple large language models (LLMs). You will compare their performance and reflect on their strengths and limitations.

**Import libraries and data file**

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Importing the restaurant review data
file_path = r'/content/sample_data/restaurant_reviews_az.csv'
df = pd.read_csv(file_path)

# Show a summary of the input data
print("Data Summary:")
print(df.describe())
print("\nData Information:")
print(df.info())

Data Summary:
              stars        useful         funny          cool
count  48147.000000  48147.000000  48147.000000  48147.000000
mean       3.736702      0.858683      0.183106      0.439903
std        1.557289      1.831488      0.807035      1.451746
min        1.000000      0.000000      0.000000      0.000000
25%        2.000000      0.000000      0.000000      0.000000
50%        5.000000      0.000000      0.000000      0.000000
75%        5.000000      1.000000      0.000000      0.000000
max        5.000000    105.000000     55.000000    106.000000

Data Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48147 entries, 0 to 48146
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   review_id    48147 non-null  object
 1   user_id      48147 non-null  object
 2   business_id  48147 non-null  object
 3   stars        48147 non-null  int64 
 4   useful       48147 non-null  int64 
 5   funny    

In [None]:
# Display the first few rows to understand the structure
print("\nFirst few rows of the data:")
print(df.head())

# Check for any missing values in the dataset
missing_values = df.isnull().sum()
print("\nMissing values in each column:")
print(missing_values)


First few rows of the data:
                review_id                 user_id             business_id  \
0  IVS7do_HBzroiCiymNdxDg  fdFgZQQYQJeEAshH4lxSfQ  sGy67CpJctjeCWClWqonjA   
1  QP2pSzSqpJTMWOCuUuyXkQ  JBLWSXBTKFvJYYiM-FnCOQ  3w7NRntdQ9h0KwDsksIt5Q   
2  oK0cGYStgDOusZKz9B1qug  2_9fKnXChUjC5xArfF8BLg  OMnPtRGmbY8qH_wIILfYKA   
3  E_ABvFCNVLbfOgRg3Pv1KQ  9MExTQ76GSKhxSWnTS901g  V9XlikTxq0My4gE8LULsjw   
4  Rd222CrrnXkXukR2iWj69g  LPxuausjvDN88uPr-Q4cQA  CA5BOxKRDPGJgdUQ8OUOpw   

   stars  useful  funny  cool  \
0      3       1      1     0   
1      5       1      1     1   
2      5       1      0     0   
3      5       0      0     0   
4      4       1      0     0   

                                                text                 date  
0  OK, the hype about having Hatch chili in your ...  2020-01-27 22:59:06  
1  Pandemic pit stop to have an ice cream.... onl...  2020-04-19 05:33:16  
2  I was lucky enough to go to the soft opening a...  2020-02-29 19:43:44  
3  I'

**Data Preprocessing**

Remove 3-star reviews from the dataset.
Create a new column Sentiment where:
Reviews with 1 or 2 stars are labeled as 0 (Negative Sentiment).
Reviews with 4 or 5 stars are labeled as 1 (Positive Sentiment).
Create a dataset for this assignment by randomly selecting 50 positive reviews and 50 negative reviews

In [None]:
import pandas as pd
import numpy as np

# Load the CSV file into a DataFrame
#df = pd.read_csv('your_file.csv')  # Replace 'your_file.csv' with your actual file name

# Step 1: Remove 3-star reviews
df = df[df['stars'] != 3]

# Step 2: Create a new column 'Sentiment'
# Reviews with 1 or 2 stars are labeled as 0 (Negative Sentiment)
# Reviews with 4 or 5 stars are labeled as 1 (Positive Sentiment)
df['Sentiment'] = np.where(df['stars'].isin([1, 2]), 0, np.where(df['stars'].isin([4, 5]), 1, np.nan))

# Step 3: Randomly select 50 positive and 50 negative reviews
positive_reviews = df[df['Sentiment'] == 1].sample(n=50, random_state=42)  # 50 random positive reviews
negative_reviews = df[df['Sentiment'] == 0].sample(n=50, random_state=42)  # 50 random negative reviews

# Combine positive and negative reviews
final_dataset = pd.concat([positive_reviews, negative_reviews])

# Save the final dataset to a new CSV file if needed
final_dataset.to_csv('final_reviews.csv', index=False)

# Display the first few rows of the final dataset
print(final_dataset.head())


                    review_id                 user_id             business_id  \
24176  PekPj5ZR4Mrky3FtRpoqXg  T2Qv4ZJ_-fw-aDhc2jEZVg  KyxAss4DMrT_GMzcOLE2yg   
46200  Qy3cpgpNyAYEEreuSJZB_Q  qQ4Ht8TbyC9E77NBz1jIlA  g0OWissnXQgRJVJGYv1mfQ   
12935  5yiSesTG0xL6Lh2L7OyW5A  NKeybkae800m57yQ7U3jtw  L-wv-QK9VoUZX6zUcJwSSw   
860    B4JaHqoRtS0iJOvj1dXg0g  SKkvivmY2b9crEW1NLZHsg  u4P6hqDz6-QG9PR2Pj5KIw   
17966  kJ9ilSRbL2hdF7qBJrzYIg  a_1AD6ACXLSHpztr1k_bTg  LZzDvgfpkd4nI3E4L9wF1w   

       stars  useful  funny  cool  \
24176      4       2      0     1   
46200      4       1      1     1   
12935      5       0      0     0   
860        5       0      0     0   
17966      5       0      0     0   

                                                    text                 date  \
24176  We had the restaurant to ourselves, which was ...  2020-10-27 13:13:27   
46200  When my family can't decide where to eat, we o...  2020-05-28 19:28:19   
12935  Our experience was 100% positive.  The o

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Sentiment'] = np.where(df['stars'].isin([1, 2]), 0, np.where(df['stars'].isin([4, 5]), 1, np.nan))


**Perform Sentiment Analysis Using Zero-Shot Learning**

Use a Claude 3 Sonnet for zero-shot prompting.
Predict sentiment labels for the selected 100 reviews without providing any labeled training examples.
Evaluate model performance using precision, recall, f1, and accuracy.

In [None]:
import pandas as pd
from transformers import pipeline
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score

# Load the dataset (final selected 100 reviews dataset)
final_reviews = pd.read_csv('final_reviews.csv')

# Create a list of review texts for sentiment analysis
review_texts = final_reviews['text'].tolist()

# Define the zero-shot classifier using the Hugging Face pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# Define candidate labels for sentiment classification
candidate_labels = ["positive", "negative"]

# Step 1: Predict sentiment using Zero-Shot Learning
predictions = []
for review in review_texts:
    result = classifier(review, candidate_labels)
    predicted_label = result['labels'][0]  # Get the label with the highest score
    predictions.append(predicted_label)

# Step 2: Map predicted sentiment labels to numeric values
# 'positive' -> 1, 'negative' -> 0
predicted_sentiment = [1 if label == 'positive' else 0 for label in predictions]

# Step 3: Evaluate the performance (precision, recall, f1, accuracy)
# We assume that the true sentiment labels are stored in the 'Sentiment' column of the DataFrame
true_sentiment = final_reviews['Sentiment'].tolist()

# Calculate evaluation metrics
precision = precision_score(true_sentiment, predicted_sentiment)
recall = recall_score(true_sentiment, predicted_sentiment)
f1 = f1_score(true_sentiment, predicted_sentiment)
accuracy = accuracy_score(true_sentiment, predicted_sentiment)

# Print evaluation results
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
print(f"Accuracy: {accuracy:.4f}")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


Precision: 1.0000
Recall: 0.9200
F1 Score: 0.9583
Accuracy: 0.9600


**Perform Sentiment Analysis Using Few-Shot Learning**

Select a few examples for few-shot learning
Use the example(s) to guide the LLM in classifying sentiment for the selected 100 reviews.
Evaluate model performance using precision, recall, f1, and accuracy.

In [None]:
import pandas as pd
from transformers import pipeline
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score

# Load the dataset (final selected 100 reviews dataset)
final_reviews = pd.read_csv('final_reviews.csv')

# Create a list of review texts for sentiment analysis
review_texts = final_reviews['text'].tolist()

# Define the classifier using a text generation model (BART for few-shot)
classifier = pipeline("text-classification", model="facebook/bart-large-mnli")

# Select a few examples for few-shot learning (manual selection for diversity)
few_shot_examples = [
    {"text": "Very friendly staff, great food.  My wife loved her pho.  Everything is fresh. Can't wait to come back.", "Sentiment": "positive"},
    {"text": "I don't know if I just had bad luck ordering tamales that day  or this place lacks quality control of their products they offer, but this was a terrible disgusting experience. I guess you get what you pay for.", "Sentiment": "negative"},
    {"text": "Food was really great the nachos were delicious ! & the tacos were great too ! And the margaritas! The girls at the bar were really nice !", "Sentiment": "positive"},
    {"text": "This is a no-go! Service was the worst I've ever experienced. The manager was extremely rude and condescending. I would definitely not recommend!", "Sentiment": "negative"}
]



# Step 1: Predict sentiment using Few-Shot Learning
predictions = []
for review in review_texts:
    # Create a prompt by combining few-shot examples and the current review
    prompt = "\n".join([f"Text: {example['text']}\nSentiment: {example['Sentiment']}" for example in few_shot_examples])
    prompt += f"\nText: {review}\nSentiment:" # Add the current review to the prompt

    result = classifier(prompt, return_all_scores=True)
    predicted_label = result[0][0]['label']

    predictions.append(predicted_label)


# Step 1: Predict sentiment using Few-Shot Learning
#predictions = []
#for review in review_texts:
#    prompt = few_shot_examples.format(review)
#    result = classifier(prompt, return_all_scores=True)  # For detailed scores per label
#    # Get predicted labels and scores
#    predicted_label = result[0]['label']  # Assuming the first entry has the highest score
#    predicted_score = result[0]['score']  # Access the score for the predicted label


# Step 2: Map predicted sentiment labels to numeric values
# 'positive' -> 1, 'negative' -> 0
predicted_sentiment = [1 if label == 'positive' else 0 for label in predictions]

# Step 3: Evaluate the performance (precision, recall, f1, accuracy)
# We assume that the true sentiment labels are stored in the 'Sentiment' column of the DataFrame
true_sentiment = final_reviews['Sentiment'].tolist()

# Calculate evaluation metrics
precision = precision_score(true_sentiment, predicted_sentiment)
recall = recall_score(true_sentiment, predicted_sentiment)
f1 = f1_score(true_sentiment, predicted_sentiment)
accuracy = accuracy_score(true_sentiment, predicted_sentiment)

# Print evaluation results
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
print(f"Accuracy: {accuracy:.4f}")


Device set to use cpu


Precision: 0.0000
Recall: 0.0000
F1 Score: 0.0000
Accuracy: 0.5000


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [None]:
import pandas as pd
import random
from sklearn.metrics import precision_recall_fscore_support, accuracy_score
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk

# Download VADER lexicon (only required once)
nltk.download('vader_lexicon')


# Load data
newfile_path = r'/content/final_reviews.csv'
df = pd.read_csv(newfile_path)


# Select a few examples for few-shot learning (manual selection for diversity)
few_shot_examples = [
    {"text": "Very friendly staff, great food.  My wife loved her pho.  Everything is fresh. Can't wait to come back.", "Sentiment": "positive"},
    {"text": "I don't know if I just had bad luck ordering tamales that day  or this place lacks quality control of their products they offer, but this was a terrible disgusting experience. I guess you get what you pay for.", "Sentiment": "negative"},
    {"text": "Food was really great the nachos were delicious ! & the tacos were great too ! And the margaritas! The girls at the bar were really nice !", "Sentiment": "positive"},
    {"text": "This is a no-go! Service was the worst I've ever experienced. The manager was extremely rude and condescending. I would definitely not recommend!", "Sentiment": "negative"}
]


# Initialize Sentiment Analyzer
sia = SentimentIntensityAnalyzer()


# Select 100 random reviews to classify
test_samples = df.sample(100, random_state=42)

def classify_sentiment(text):
    """Classifies sentiment using VADER."""
    score = sia.polarity_scores(text)["compound"]
    if score >= 0.05:
        return "positive"
    elif score <= -0.05:
        return "negative"
    else:
        return "neutral"

# Apply classification
test_samples["Predicted_Sentiment"] = test_samples["text"].apply(classify_sentiment)

# Evaluate performance
# Convert 'Sentiment' column to string type before applying .str.lower()
y_true = test_samples["Sentiment"].astype(str).str.lower()  # Convert to string type
y_pred = test_samples["Predicted_Sentiment"].str.lower()


precision, recall, f1, _ = precision_recall_fscore_support(y_true, y_pred, average='weighted')
accuracy = accuracy_score(y_true, y_pred)

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-score: {f1:.2f}")
print(f"Accuracy: {accuracy:.2f}")


[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Precision: 0.00
Recall: 0.00
F1-score: 0.00
Accuracy: 0.00


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


**Experiment with Multiple LLMs**

Select two distinct LLMs (e.g., Claude, LLaMA) for sentiment analysis.
Design prompts to utilize both LLMs for sentiment analysis.
Display the output

In [None]:
import pandas as pd
!pip install vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from textblob import TextBlob

# Load data
newfile_path = r'/content/final_reviews.csv'
df = pd.read_csv(newfile_path)



# Select a sample of reviews
sample_reviews = df[['review_id', 'text']].sample(5, random_state=42)

# Initialize Sentiment Analyzers
vader_analyzer = SentimentIntensityAnalyzer()

def vader_sentiment_analysis(text):
    """Perform sentiment analysis using VADER."""
    score = vader_analyzer.polarity_scores(text)['compound']
    if score >= 0.05:
        return "Positive"
    elif score <= -0.05:
        return "Negative"
    else:
        return "Neutral"

def textblob_sentiment_analysis(text):
    """Perform sentiment analysis using TextBlob."""
    polarity = TextBlob(text).sentiment.polarity
    if polarity > 0:
        return "Positive"
    elif polarity < 0:
        return "Negative"
    else:
        return "Neutral"

# Apply sentiment analysis
results = []
for _, row in sample_reviews.iterrows():
    vader_sentiment = vader_sentiment_analysis(row['text'])
    textblob_sentiment = textblob_sentiment_analysis(row['text'])
    results.append((row['review_id'], row['text'], vader_sentiment, textblob_sentiment))

# Display results
df_results = pd.DataFrame(results, columns=["Review ID", "Text", "VADER Sentiment", "TextBlob Sentiment"])
print(df_results)


Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl.metadata (572 bytes)
Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/126.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.0/126.0 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2
                Review ID                                               Text  \
0  2m_eAo2x08IB1gO6F5lk0Q  Went to this restaurant with my family and fri...   
1  zvtAmCpAPpFXht61gzwmZQ  Holly shit 2 hours and our food isn't here yet...   
2  yhnJYZYHBZ688IBOPNtkew  First and last time. Decided to try something ...   
3  -tFUJNwa2TmnjdWbR48ujA  We have been wanting to try out this place aft...   
4  RSnSN3NcMQzc5c5bxbHKOQ  Had a good lunch here, had the Cajun chicken p...   

  VADER Sentiment TextBl

**Discussion and Observations**

Few-shot learning uses given examples to guide predictions. It is more tailored to the task and hence has improved accuracy. It works better when slight variations exist in data and has lower error rate since model sees relevant examples.

Misclassifications in sentiment analysis occur due to ambiguity, sarcasm, domain-specific jargon, long reviews, and negations.

#Acknowledgement
GenAI tools have been used to rectify errors in Python code.

In [None]:
!pip install jupyter
!pip install nbconvert

from google.colab import drive
drive.mount('/content/drive')

# Modify the notebook path and output path to store in the same folder
notebook_path = '/content/drive/MyDrive/Colab Notebooks/LA6_AsthaGarima.ipynb'
output_path = notebook_path.replace('.ipynb', '.html')

# Convert the notebook to HTML and save in the same folder
!jupyter nbconvert "{notebook_path}" --to html --output "{output_path}"

# Optionally, download the file (if needed)
# from google.colab import files
# files.download(output_path)

!ls /content


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
[NbConvertApp] Converting notebook /content/drive/MyDrive/Colab Notebooks/LA6_AsthaGarima.ipynb to html
Traceback (most recent call last):
  File "/usr/local/bin/jupyter-nbconvert", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/jupyter_core/application.py", line 283, in launch_instance
    super().launch_instance(argv=argv, **kwargs)
  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance
    app.start()
  File "/usr/local/lib/python3.11/dist-packages/nbconvert/nbconvertapp.py", line 420, in start
    self.convert_notebooks()
  File "/usr/local/lib/python3.11/dist-packages/nbconvert/nbconvertapp.py", line 597, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "/usr/local/lib/python3.11/dist-packages/nbconvert/nbc