## Application gpt_annotate on determining principle use
evaluation of only sentences that are deemed relevant by manual annotation.
1. Label for principle
2. label for topic
3. label for unit
4. label for shape


### Set up dependencies

In [1]:
!pip install openai




[notice] A new release of pip available: 22.3.1 -> 24.0
[notice] To update, run: C:\App\Python\python.exe -m pip install --upgrade pip


In [2]:
!pip install tiktoken




[notice] A new release of pip available: 22.3.1 -> 24.0
[notice] To update, run: C:\App\Python\python.exe -m pip install --upgrade pip


In [3]:
import openai
import pandas as pd
import math
import time
import numpy as np
import tiktoken
import matplotlib.pyplot as plt

#### Import main package: gpt_annotate.py
# Make sure that the .py file is in the same directory as the .ipynb file, or you provide the correct relative or absolute path to the .py file.
import gpt_annotate

In [4]:
# don't type the key in this file! 
# create gpt_api.txt, put the key in that, and save
with open('gpt_api_key.txt', 'r') as f:
    key = f.read().strip()

### Load in text_to_annotate and codebook

In [5]:
# Load text to annotate
text_to_annotate_full = pd.read_csv("data/HLS_train_dummies.csv")

# Select only relevant sentences
select_relevance = text_to_annotate_full[text_to_annotate_full['relevance_2']==1]

text_to_annotate = select_relevance[["Text","principle_1", "principle_2", "principle_3", "principle_4", "principle_5", "principle_6"]]

text_to_annotate

Unnamed: 0,Text,principle_1,principle_2,principle_3,principle_4,principle_5,principle_6
3,Mr. President: A fair and effective framewor...,0,0,1,0,0,0
5,Such a framework must be based on “nationally ...,0,1,0,0,0,0
44,It should not only enable us to discuss global...,0,0,1,0,0,0
53,Global warming is a catastrophic problem that ...,0,0,1,0,0,0
54,"Therefore, the multilateralism approach remain...",1,0,0,0,0,0
...,...,...,...,...,...,...,...
1172,As we work to catch up on lost time and progr...,0,0,0,1,0,0
1173,"Conflict -ridden communities, refugees, and d...",0,0,0,1,0,0
1174,"Nor can we stand by , as the massive destructi...",0,0,1,0,0,0
1198,We recognise that we must deliver on our coll...,0,0,0,1,0,0


In [6]:
# Make a bar plot of the principle counts in each column
counts = text_to_annotate[["shape_1", "shape_2", "shape_3", "shape_4", "shape_5", "shape_6", "shape_7"]].sum()

# Step 4: Plot the counts using a bar plot
counts.plot(kind='bar')

# Adding titles and labels
plt.title('Number of 1s in Each Column')
plt.xlabel('Columns')
plt.ylabel('Count of 1s')

# Display the plot
plt.show()

KeyError: "None of [Index(['shape_1', 'shape_2', 'shape_3', 'shape_4', 'shape_5', 'shape_6',\n       'shape_7'],\n      dtype='object')] are in the [columns]"

There is a clear class imbalance between all principles.

In [7]:
# Final check for binary columns
invalid_columns = [col for col in text_to_annotate.columns if not all(text_to_annotate[col].isin([0, 1]))]

if len(invalid_columns) == 0:
    print("All columns contain only 0 or 1.")
else:
    print("Columns with elements not being 0 or 1:")
    for col in invalid_columns:
        print(col)

Columns with elements not being 0 or 1:
Text


In [8]:
# Load codebook
with open('Codebook/codebook_V3_principles', 'r', encoding='utf-8') as file:
    codebook = file.read()

# Annotate train data - manually annotated COP19-COP28 - total 41 speeches

codebook short - determination of relevance

In [9]:
# Prepare the data for annotation
# Preparation is done with GPT-3.5-turbo - can be altered hardcoded
text_to_annotate = gpt_annotate.prepare_data(text_to_annotate, codebook, key, prep_codebook=True)


Categories to annotate:
1) principle_1
2) principle_2
3) principle_3
4) principle_4
5) principle_5
6) principle_6

Please input Y or N.

Data is ready to be annotated using gpt_annotate()!

Glimpse of your data:
Shape of data:  (218, 9)
   unique_id                                               text  principle_1  \
0          3   Mr. President:  A fair and effective framewor...            0   
1          5  Such a framework must be based on “nationally ...            0   
2         44  It should not only enable us to discuss global...            0   
3         53  Global warming is a catastrophic problem that ...            0   
4         54  Therefore, the multilateralism approach remain...            1   

   principle_2  principle_3  principle_4  principle_5  principle_6  \
0            0            1            0            0            0   
1            1            0            0            0            0   
2            0            1            0            0            0   
3

In [10]:
# Annotate the data (returns 4 outputs)
gpt_out_all, gpt_out_final, performance, incorrect =  gpt_annotate.gpt_annotate(text_to_annotate, codebook, key, num_iterations = 5, model = "gpt-3.5-turbo", temperature = 0.2, batch_size = 20, human_labels = True,  data_prep_warning = True, time_cost_warning = True)



Categories to annotate:
1) principle_1
2) principle_2
3) principle_3
4) principle_4
5) principle_5
6) principle_6

You are about to annotate 218 text samples and the number of iterations is set to 5
Estimated cost range in US Dollars: 0.23 - 0.28
Estimated minutes to run gpt_annotate(): 14.62 - 27.16
Please note that these are rough estimates.

iteration:  1 completed
iteration:  2 completed
iteration:  3 completed
iteration:  4 completed
iteration:  5 completed


In [11]:
performance

Unnamed: 0,Category,Accuracy,Precision,Recall,F1
0,principle_1,0.847926,0.0,0.0,0.0
1,principle_2,0.732719,0.315789,0.117647,0.171429
2,principle_3,0.732719,0.0,0.0,0.0
3,principle_4,0.75576,0.882353,0.227273,0.361446
4,principle_5,0.917051,0.0,0.0,0.0
5,principle_6,0.903226,0.0,0.0,0.0


In [None]:
performance_V2

## Results

In [None]:
performance_V1

1 general normative statement - failed to identify any general normative statements
2 egalitarian - 75% of egalitarian sentance were actually egalitarian
3 utilitarian -
4 prioritarian -
5 sufficientarian -
6 libertarian -

Seems to capture only a very small amount

Accuracy: accuracy is high, but this also includes labelling for not relevant.
Precision: out of all sentences that the model predicted as relevant; only 22% were actually relevant. (high number of false positives)
Recall: percentage of correctly identified positive instances. Percentage of total positive elements in the dataset that are labelled..

In [None]:
performance_V2

By lowering the temperature of the model, the performance seems to improve. F1 scores remain low.

In [None]:
gpt_out_all_V1

In [None]:
gpt_out_all_V2