# Fine-Tuning OpenAI API GPT Models for Sentiment Analysis With Weights & Biases

### https://wandb.ai/mostafaibrahim17/ml-articles/reports/Fine-Tuning-ChatGPT-for-Sentiment-Analysis-With-W-B--Vmlldzo1NjMzMjQx 

This notebook explores fine-tuning GPT Models for sentiment analysis using Weights & Biases. Our experiment will lead to an overall accuracy boost, and we'll delve into applications. In today's data-driven world, sentiment analysis plays a pivotal role in discerning public opinion on a myriad of topics. Advanced models like GPT Model, built on the GPT architecture, offer immense potential in understanding and interpreting human emotions from textual data. However, like many tools, their out-of-the-box capabilities might not capture the nuanced intricacies of sentiment, especially in diverse datasets like those from Reddit. This article dives deep into the process of fine-tuning GPT Models for sentiment analysis, utilizing the powerful features of the Weights & Biases platform, and delves into the improvements and challenges faced.

# Table of Contents

- How Can a GPT Model Be Used for Sentiment Analysis?
- Fine-Tuning GPT Models for Sentiment Analysis
- Data Preparation and Labeling
  - The Current Data Set at Hand
  - Data Augmentation Sentiment Analysis Dataset for Fine-Tuning
  - The Importance of High-Quality Training Data for Sentiment Analysis
- Step-by-Step Tutorial
  - Evaluating the Old Model’s Performance
  - Fine-Tuning the GPT Model
  - Evaluating the New Model’s Performance
- Fine-Tuning Results and Analysis
- Practical Applications and Use Cases
  - Jargon and Slang Understanding
  - E-Commerce Product Reviews
- Further Improvements
- Conclusion


## How Can a GPT Model Be Used for Sentiment Analysis?

GPT Model's ability to understand natural language makes it a good fit for sentiment analysis. This is because, unlike traditional chatbots that rely on predefined responses, GPT Models generate real-time answers based on a vast amount of training data. This approach enables it to provide responses that are contextually relevant and informed by a broad spectrum of information. 


## Fine-Tuning GPT Model for Sentiment Analysis

Fine-tuning is a pivotal step in adapting a general-purpose models, like GPT Models, to a specific task such as sentiment analysis. A GPT Model, with its broad language understanding capabilities, can grasp a vast array of topics and concepts. However, sentiment analysis is more than just comprehending text; it requires a nuanced understanding of subjective tones, moods, and emotions.
<br/><br/>
Think about sarcasm. Understanding sarcasm is tricky, even for humans sometimes. Sarcasm is when we say something but mean the opposite, often in a joking or mocking way. For example, if it starts raining just as you're about to go outside, and you say, "Oh, perfect timing!" you're probably being sarcastic because it's actually bad timing. Now, imagine a machine trying to understand this. Without special training, it might think you're genuinely happy about the rain because you said "perfect." This is where fine-tuning a model like GPT Model becomes crucial.
<br/><br/>
GPT Model, out of the box, is pretty good at understanding a lot of text. It's read more than most humans ever will. But sarcasm is subtle and often needs context. So, to make GPT Models really get sarcasm, we'd expose it to many examples of sarcastic sentences until it starts catching on to the patterns. But here's the catch: sarcasm doesn't look the same everywhere. In different cultures or situations, what's sarcastic in one place might be meant seriously in another. That's why just general knowledge isn't enough. The model needs specific examples to truly grasp the playful twists and turns of sarcasm. In short, to make GPT Model understand sarcasm like a human, it needs extra training on it, just like someone might need to watch several comedy shows to start understanding a comedian's sense of humor.

## Data Preparation and Labeling

### The Current Data Set at Hand

In this notebook, we'll leverage a Reddit dataset sourced from Kaggle, available here: https://www.kaggle.com/datasets/cosmos98/twitter-and-reddit-sentimental-analysis-dataset. This dataset features two key columns: clean_comment(the sentiment text) and its corresponding category (sentiment label).
<br/><br/>
The File Contains 37k comments along with its Sentimental Labelling. All the Comments in the dataset are cleaned and assigned with a Sentiment Label. These Comments Dataset Can Be used to Build a Sentimental Analysis Machine Learning Model.

### Data Augmentation Sentiment Analysis Dataset for Fine-Tuning

It's important to note that the refined Fine-Tuning GPT Model process mandates a specific data structure in a JSONL file for optimal training. 

#### What is a JSONL File?

A `.jsonl` file (short for JSON Lines) is a file format used to store structured data, typically for machine learning and data processing applications. Each line in a `.jsonl` file is a separate, self-contained JSON object. This makes it particularly useful for handling large datasets that can be processed line-by-line, avoiding loading everything into memory at once.

##### Key Features of JSONL Format:
- **One JSON Object Per Line:** Each line in the file is an independent JSON object.
- **Line-Delimited:** The objects are separated by newlines (`\n`), not by commas or brackets as in standard JSON.
- **Efficient Parsing:** Line-by-line processing is easy and efficient, which is helpful when working with large datasets.
- **No Root Structure:** Unlike regular JSON, there is no outer array or object enclosing the entire dataset.

##### Example of a JSONL File:
```json
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}


{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}


{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

```
##### Common Use Cases:
- **Training Data for Machine Learning Models:** Frequently used in NLP tasks where each line contains an individual record (e.g., a sentence with a label).
- **Log Data Storage:** Each log entry is a separate JSON object.
- **Streaming Data Processing:** Ideal for scenarios where you process data incrementally.

##### How to Work with JSONL:
- **Reading and Writing:** In Python, you can use the `json` or `jsonlines` library to read and write JSONL files.
- **Tools:** Many tools like `jq`, Pandas, and other data processing libraries support the JSONL format.


### The Importance of High-Quality Training Data for Sentiment Analysis

High-quality training data is pivotal for sentiment analysis as it ensures the model learns to accurately distinguish nuances in emotions. Poor data can lead to misinterpretations, reducing the effectiveness of the analysis. Moreover, comprehensive and well-curated data can significantly boost the model's ability to generalize across diverse real-world scenarios. The dataset we're utilizing underscores this point. As even some of its entries are so nuanced that even humans might struggle to discern their sentiment.

## Step-by-Step Tutorial

### Evaluating the Old Model's Performance

#### Step 1: Installing and Importing Necessary Packages

In [1]:
# If you haven't already installed the required libraries, you can do so by running the following commands:
# !pip install openai
# !pip install wandb

In [16]:
# import our libraries
import os
import openai
import wandb
import pandas as pd
import json
import random
from pathlib import Path  # Handles filesystem paths in an object-oriented way
from openai import OpenAI


#### Step 2: Creating our client

In [3]:
# Create our client with the API key
openai.api_key = os.getenv("OPENAI_API_KEY")

#### Step 3: Loading and Processing the Sentiment Analysis Dataset

In [4]:
# Put the file path in a variable
filename = "./practical_data/reddit_data.csv"

# Read the CSV
df = pd.read_csv(filename)

# Drop rows with NaN values in 'clean_comment' and 'category'
df.dropna(subset=['clean_comment', 'category'], inplace=True)


# Show the first 10 rows
print(df.head(10))


                                       clean_comment  category
0   family mormon have never tried explain them t...         1
1  buddhism has very much lot compatible with chr...         1
2  seriously don say thing first all they won get...        -1
3  what you have learned yours and only yours wha...         0
4  for your own benefit you may want read living ...         1
5  you should all sit down together and watch the...        -1
6   was teens when discovered zen meditation was ...         1
7                           jesus was zen meets jew          0
8  there are two varieties christians dogmatic th...        -1
9  dont worry about trying explain yourself just ...         1


#### Step 4: Initializing a New Weights & Biases Project

Now for something new. We will create a new WandB project and turn on autologging to track our progress. 



In [5]:
# Set the environment variable for the notebook name
os.environ["WANDB_NOTEBOOK_NAME"] = "fine_tune_openai_sentiment_wandb.ipynb"

# Create a new Weights and Biases project
wandb.init(project="Reddit_Sentiment_Analysis")

[34m[1mwandb[0m: Currently logged in as: [33msuspicious-cow[0m ([33msuspicious-cow-self[0m). Use [1m`wandb login --relogin`[0m to force relogin


#### Step 5: Take a Sample to Test the Model

In [6]:
# Grab a sample of 100 rows
df_sample = df.sample(100)

#### Step 6: Define Helper Functions to Convert Model Response to Sentiment Value and Vice Versa

In [7]:
def convert_response_to_sentiment(response):
    response = response.lower()
    if 'positive' in response:
        return 1
    elif 'negative' in response:
        return -1
    elif 'neutral' in response:
        return 0
    else:
        return -1  # Unknown sentiment
    
def convert_numeric_to_string_sentiment(value):
    if value == 1:
        return "positive"
    elif value == -1:
        return "negative"
    elif value == 0:
        return "neutral"
    else:
        return "unknown"


#### Step 7: Evaluating the Current Model Performance

In [8]:
client = openai.Client()


correct_predictions = 0
loop_count = 0  # Counter for loop iterations


results = []


# Iterate over each row in the DataFrame
for index, row in df_sample.iterrows():
    loop_count += 1  # Increment the loop count
    text = row['clean_comment']  # Adjusted column name
    
    try:
        completion = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "What is the sentiment of the following text? Please respond with 'positive', 'negative', or 'neutral'."},
                {"role": "user", "content": text},
            ]
        )
        response = completion.choices[0].message.content
        predicted_sentiment = convert_response_to_sentiment(response)
        
        results.append({
        "sentiment": text,  
        "labeled_prediction": convert_numeric_to_string_sentiment(row['category']),
        "old_model_prediction": response
    })
        
        # Check if the predicted sentiment matches the actual sentiment
        if predicted_sentiment == row['category']:  # Adjusted column name
            correct_predictions += 1
        
        # Print the current progress using loop_count
        total_rows = len(df_sample)
        print(f"Processed {loop_count}/{total_rows} rows.")
        
    except Exception as e:
        print(f"Error on index {index}: {e}")
        continue


Processed 1/100 rows.
Processed 2/100 rows.
Processed 3/100 rows.
Processed 4/100 rows.
Processed 5/100 rows.
Processed 6/100 rows.
Processed 7/100 rows.
Processed 8/100 rows.
Processed 9/100 rows.
Processed 10/100 rows.
Processed 11/100 rows.
Processed 12/100 rows.
Processed 13/100 rows.
Processed 14/100 rows.
Processed 15/100 rows.
Processed 16/100 rows.
Processed 17/100 rows.
Processed 18/100 rows.
Processed 19/100 rows.
Processed 20/100 rows.
Processed 21/100 rows.
Processed 22/100 rows.
Processed 23/100 rows.
Processed 24/100 rows.
Processed 25/100 rows.
Processed 26/100 rows.
Processed 27/100 rows.
Processed 28/100 rows.
Processed 29/100 rows.
Processed 30/100 rows.
Processed 31/100 rows.
Processed 32/100 rows.
Processed 33/100 rows.
Processed 34/100 rows.
Processed 35/100 rows.
Processed 36/100 rows.
Processed 37/100 rows.
Processed 38/100 rows.
Processed 39/100 rows.
Processed 40/100 rows.
Processed 41/100 rows.
Processed 42/100 rows.
Processed 43/100 rows.
Processed 44/100 row

#### Step 8: Calculating the Model Accuracy

In [9]:
accuracy = (correct_predictions / total_rows) * 100

print(f"Accuracy: {accuracy}%")

Accuracy: 45.0%


#### Step 9: Logging the Accuracy to WandB

In [10]:
wandb.log({"Old Accuracy": accuracy})
print(f'Model Accuracy before: {accuracy:.2f}%')


Model Accuracy before: 45.00%


### Fine-Tuning a Model

#### Step 10: Converting the Dataframe to JSONL format

In [11]:
output_filename = "reddit_sentiment_data.jsonl"


# Convert DataFrame to the desired JSONL format
with open(output_filename, "w") as file:
    for _, row in df.iterrows():
        # Map the target to its corresponding string label
        target_label = {
            0: 'neutral',
            1: 'positive',  # Corrected the spelling here
            -1: 'negative'
        }.get(row['category'], 'unknown')
        
        data = {
            "messages": [
                {
                    "role": "system",
                    "content": "What is the sentiment of the following text? Please respond with 'positive', 'negative', or 'neutral'."
                },
                {
                    "role": "user",
                    "content": row['clean_comment']
                },
                {
                    "role": "assistant",
                    "content": target_label
                }
            ]
        }
        
        # Write each data point as a separate line in the JSONL file
        file.write(json.dumps(data) + "\n")


#### Step 10b: Create the Train Test Split

In [12]:
# Train / Test Split Function for JSONL Files
def split_jsonl_file(file_path, train_ratio=0.8):
    # Read the input file
    file_path = Path(file_path)
    with file_path.open('r', encoding='utf-8') as f:
        data = [json.loads(line) for line in f]
    
    # Shuffle the data
    random.shuffle(data)
    
    # Calculate split index
    split_index = int(len(data) * train_ratio)
    
    # Split the data
    train_data = data[:split_index]
    test_data = data[split_index:]
    
    # Prepare output file paths
    train_file = file_path.with_name(f"{file_path.stem}_train{file_path.suffix}")
    test_file = file_path.with_name(f"{file_path.stem}_test{file_path.suffix}")
    
    # Write train data
    with train_file.open('w', encoding='utf-8') as f:
        for item in train_data:
            json.dump(item, f)
            f.write('\n')
    
    # Write test data
    with test_file.open('w', encoding='utf-8') as f:
        for item in test_data:
            json.dump(item, f)
            f.write('\n')
    
    print(f"Train data saved to: {train_file}")
    print(f"Test data saved to: {test_file}")
    print(f"Train set size: {len(train_data)}")
    print(f"Test set size: {len(test_data)}")
    
    return(train_file, test_file)

In [13]:
# Make our train/test files
# File paths and data processing
file_path = output_filename

# Split the JSONL file into train and test sets
train_test_files = split_jsonl_file(file_path)
print("\n")  # Print a blank line for better output readability

# Convert the returned file paths to strings
train_path, test_path = [str(file) for file in train_test_files]

# Print the paths of the resulting train and test files
print(f"Train file path: {train_path}")
print(f"Test file path: {test_path}")

Train data saved to: reddit_sentiment_data_train.jsonl
Test data saved to: reddit_sentiment_data_test.jsonl
Train set size: 29719
Test set size: 7430


Train file path: reddit_sentiment_data_train.jsonl
Test file path: reddit_sentiment_data_test.jsonl


#### Step 11: Upload the files to OpenAI

In [14]:
# Upload the training data to the OpenAI API
train__set_file = client.files.create(
            file=open(train_path, "rb"),
            purpose="fine-tune"
            )

# Upload the test data to the OpenAI API
test_set_file = client.files.create(
            file=open(test_path, "rb"),
            purpose="fine-tune"
            )

#### Step 12: Create a Fine-Tuning Job

In [15]:
# Create a fine-tuning job using the uploaded training data
wandb_params_ft_job = client.fine_tuning.jobs.create(
    model="gpt-3.5-turbo",  # Base model to be fine-tuned
    training_file=train__set_file.id,  # ID of the uploaded training data file
    validation_file=test_set_file.id,  # ID of the uploaded validation (test) data file
    hyperparameters={
        "batch_size": "auto",  # Let API automatically determine batch size
        "learning_rate_multiplier": "auto",  # Auto-set learning rate multiplier
        "n_epochs": "auto",  # Automatically decide number of training epochs
    },
    suffix="reddit_sentiment",  # Append this to the fine-tuned model's name
    integrations=[
        {
            "type": "wandb",
            "wandb": {
                "project": "Reddit_Sentiment_Analysis",  # Replace with your actual project name
                "name": "Reddit_Sentiment_Analysis_Run_001",  # Optional: Replace with your desired run name or remove
                "entity": "suspicious-cow-self",  # Optional: Replace with your entity or remove
                "tags": ["reddit", "sentiment"]  # Optional: Replace with your desired tags or remove
            }
        }
    ],
    seed=None,  # Specific random seed set for reproducibility
)

### Evaluating the New Model's Performance

#### Step 13a: Looking at the metrics from WandB

We will do manual calculations later for fun but, for now, let's look at the data from WandB. There are two ways you can do this:
1. Through the WandB website
2. Through code
<br/><br/>
We will do both of these. 

First, for the website go to https://wandb.ai/suspicious-cow-self/projects and click on the Reddit_Sentiment_Analysis project. This should automatically show you the results from the latest run in a graphical format. 

Second, let's manually compute rough statistics.