<a href="https://colab.research.google.com/github/j-hartmann/llm-sentiment-analysis/blob/main/Sentiment_Analysis_in_the_Age_of_Generative_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###Introduction



This Google Colab provides ready-to-use Python Code to perform sentiment analysis using Large Language Models (LLMs). We use the OpenAI API for GPT-4 as exemplary use case, but you can choose any other model from OpenAI or slightly adapt the code to use other APIs (e.g., Replicate for Llama 2). We provide exemplary code to perform:

*   Binary zero-shot sentiment analysis
*   Three-class zero-shot sentiment analysis
*   Few-shot sentiment analysis

The results are stored in an Excel file. Based on the models you use it might be useful to slightly adjust the prompt for optimal results.



### Data Prepartion

This Colab is designed to use an Excel file with two columns as input: A 'Review_ID' column as identificator and the 'Review' column containing all reviews for sentiment analysis. Please structure and name your input file accordingly.

###Binary zero-shot sentiment analysis

In [None]:
# Install necessary libraries and import modules for API access and data handling
# You can change the version of OpenAI based on the models you want to use
!pip install openai==0.27.8 pandas xlrd
import os
import openai
import pandas as pd
import time
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load your OpenAI API key
openai.api_key = "Your_API_Key"

# Load the dataset
df = pd.read_excel('Your_Excel_File')

# Convert 'Review_ID' to string
df['Review_ID'] = df['Review_ID'].astype(str)

# Extract ID and Review text
review_ids = df['Review_ID'].tolist()
total_reviews = df['Review'].tolist()

# Optional: Split the data into batches as some APIs implement rate limits for API calls
batch_size = 10  # Define your batch size
review_batches = [total_reviews[i:i + batch_size] for i in range(0, len(total_reviews), batch_size)]
review_id_batches = [review_ids[i:i + batch_size] for i in range(0, len(review_ids), batch_size)]

sentiment_results = []

for review_id_batch, review_batch in zip(review_id_batches, review_batches):
    # Prepare the prompt
    reviews = '\n'.join(f'{review_id}: "{review}"' for review_id, review in zip(review_id_batch, review_batch))
    prompt = f"Classify the sentiment in these reviews as positive or negative:\n{reviews}"

    while True:
        try:
            # Call the OpenAI API
            response = openai.ChatCompletion.create(
                model="gpt-4",
                temperature=0, #set the temperature parameter to 0 to make output near deterministic
                messages=[
                    {"role": "system", "content": prompt},
                    {"role": "user", "content": ""}
                ]
            )
            break  # if API call is successful, break the loop. If rate limit is exceeded, include waiting time
        except Exception as e:
            if "Rate limit exceeded" in str(e):
                print("Rate limit exceeded. Waiting for 60 seconds.")
                time.sleep(60)
            else:
                print(f"Unexpected error: {str(e)}. Waiting for 5 seconds before retrying.")
                time.sleep(5)

    # Process response text
    response_text = response['choices'][0]['message']['content']
    response_lines = response_text.split("\n")

    for line in response_lines:
        try:
            review_id, sentiment = line.split(":")
            sentiment_results.append((review_id, sentiment.lower().strip()))
        except ValueError:
            print(f"Unexpected format: {line}")

# Create a DataFrame from the sentiment results
df_results = pd.DataFrame(sentiment_results, columns=["Review_ID", "Sentiment"])

# Save DataFrame to Excel
df_results.to_excel('Your_File_Name.xlsx', index=False)

### Three-class zero-shot sentiment analysis

In [None]:
# You can use the same code as in the binary classification tasks, and just replace the prompt
#prompt = f"Classify the sentiment in these tweets as positive, negative, or neutral:\n{reviews}"

### Few-shot sentiment analysis

In [None]:
# You can use the same code as in the binary classification task, and just replace the prompt. Make sure to include the few-shot examples incl. the ground truth from your dataset.
# prompt = f"Classify the sentiment of the following tweets as positive, negative, or neutral:\n{reviews}"
   # "Here are a few examples with the sentiment in brackets:"
   # "Example 1: ... ('ground truth')\n"
   # "Example 2: ... ('ground truth')\n"
   # "Example 3: ... ('ground truth')\n"
   # "Example 4: ... ('ground truth')\n"
   # "Example 5: ... ('ground truth')\n"
   # "Example 6: ... ('ground truth')\n"
