# Using ChatGPT for Automatic Comment Annotation

In this part we will learn how to use the OpenAI API (ChatGPT) to automatically annotate text.
We start with a single YouTube/Reddit comment, ask the model to return a sentiment label and score, and then extend the same idea to many comments in a DataFrame.

In [None]:
# !pip install --upgrade openai
# !pip install openai llmx typing_extensions
# !pip install --upgrade typing_extensions

## How to Create an OpenAI API Key

1. Go to the OpenAI dashboard:
https://platform.openai.com
2. Sign in with your OpenAI account.
3. Create a project and go to "Settings"
4. In the left sidebar, click "API Keys".
5. Click "Create new secret key".
6. Copy, use it, and save it securely.

In [None]:
# import of necessary packages
from openai import OpenAI
import pandas as pd
import matplotlib.pyplot as plt

## Single-Comment Sentiment Annotation with ChatGPT

In this section we:

- connect to the OpenAI API using our API key,

- define a system prompt describing how the model should behave, 

- send one comment and check the output.

This is just a sanity check: we want to see whether the model understands the task and whether the returned JSON has the expected structure before we use it on a larger dataset.

In [None]:
API_KEY = ""

In [None]:
client = OpenAI(api_key=API_KEY)

MODEL = "gpt-5-mini"

system_prompt = """You are an annotator.
Return JSON with:
- sentiment_label: one of ["positive","neutral","negative"]
- sentiment_score: between -1 and 1 (negative→-1, neutral→0, positive→1)
- rationale: short reason
"""

comment = "I love this robot, it's so helpful!"

response = client.responses.create(
    model=MODEL,
    input=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Comment: {comment}"}
    ]
)

print(response.output_text)

# For multiple comments

In [None]:
client = OpenAI(api_key=API_KEY)
MODEL = "gpt-5-mini"

def annotate_comment(text):
    
    prompt = f"""
    Rate the sentiment of the following comment in English:
    '{text[:500]}'

    Respond with a single number between -1 and 1:
    -1 = very negative
     0 = neutral
    +1 = very positive

    Output ONLY the number, nothing else.
    """
    response = client.responses.create(
        model=MODEL,
        input=prompt
    )
    # Extract text and convert to float
    value_str = response.output_text.strip()
    try:
        return float(value_str)
    except ValueError:
        print(f"Unexpected output: {value_str}")
        return None

# Exercise 1
Annotating Sentiment for One Robot and Comparing with VADER.

In this exercise, you will compare two sentiment-analysis methods: annotated by ChatGPT and VADER (from last week).
Goal is to see how similar these two methods are when evaluating the same comments.

1. Load the comment dataset for one robot.
2. Apply annotate_comment() function that sends each comment to ChatGPT and returns only one numeric sentiment value.
3. Compare ChatGPT scores with the VADER scores from last week:
    - compute a correlation coefficient (Pearson or Spearman),
    - make a simple scatterplot.


In [None]:
df = pd.read_csv('') # put name of the file

In [None]:
# multiple comments annotation:
#df["gpt_sent"] = df["comment"].apply(annotate_comment)

In [None]:
# correlation:
#print(df["gpt_sent"].corr(df["vader_sent"], method="pearson"))
#print(df["gpt_sent"].corr(df["vader_sent"], method="spearman"))


# scatterplot: 
#plt.scatter(x, y)

# Designing Your Own Prompt for Tagging Vector Robot Comments

# Exercise 2

Create a clear, effective prompt for ChatGPT that annotates Reddit comments about the Vector robot (from 2 weeks ago). 
First, experiment with ChatGPT manually using an example comment to understand how it responds. 
Once you are satisfied with the output, use your prompt to automatically annotate 100 comments in Python.

You can refer to the scientific article with the description of the annotation method:
https://reference-global.com/2/v2/download/pdf/10.14313/jamris/2-2022/10

In [None]:
# Use comments used for counting Kappa:
df = pd.read_csv('vector-annotation-all.csv')

# We are going to use only subset of first 100 comments:
df = df[:100]
df

### Tag‐set used for the Reddit study
1. SE description of emotional states
2. WA joint activities
3. AU the assignment of autonomy
4. PR the assignment of preferences
5. OTHER other manifestations of anthropo‑
morphization
6. NONE no anthropomorphization

In [None]:
client = OpenAI(api_key=API_KEY)
MODEL = "gpt-5-mini"


def annotate_tags(text):

    # Messages for the chat model:
    # - system: general instructions how the chat should behave
    # - user: the actual input (comment in this case)
    messages = [
        {
            "role": "system",
            "content": (
"""



PUT YOUR PROMPT HERE



"""
            ),
        },
        {
            "role": "user",
            "content": (
                "Reddit comment:\n"
                f"\"\"\"{text}\"\"\"\n\n"
                "Answer with exactly ONE tag: SE, WA, AU, PR, OTHER, or NONE."
            ),
        },
    ]

    # Call the chat completion API
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages
    )

    # Get the model's answer (a string)
    raw_output = response.choices[0].message.content.strip().upper()

    return raw_output

In [None]:
comment = "Vector gets sad when I leave and happy when I come back."

label = annotate_tags(comment)
print("Label:", label)

## Annotation of all 100 comments

In [None]:
#df["gpt_annotation"] = df['COMMENT'].apply(annotate_tags)

In [None]:
# Save output to file
df.to_csv('gpt-annotated-vector.csv', index=False)

In [None]:
# Check the output:
df

### Cohen's kappa for comparison of mannually annotated and chat-GPT annotated comments.

In [None]:
annotators = ['Annotator1', 'Annotator2', 'Annotator3']

for ann in annotators:
    kappa = cohen_kappa_score(df[ann], df['gpt_annotation'])
    print(f"Cohen's kappa ({ann} vs GPT): {kappa:.6f}")