In our human annnotations we found severe disagreement between annotators. In this file
- I perform some preliminary explorations on the nature of this disagreement
- I run GPT-4 labelling on all other samples
- I compare the classifications of GPT-3 with the other users

#### Explorations

In [1]:
import pandas as pd
import json

The files in this repository
1. new_annotated_sample.pkl: the ~1200 samples with labels generated by us
2. annotated_test.pkl: exactly the same
3. annotated_val.pkl: labels by law students
4. disagreement_sample.json: the 50 overlapping samples, where almost 50% disagreement is

In [2]:
f = open('disagreement_sample.json')
data = json.load(f)

In [3]:
labels = [sample["label"] for sample in data]
sums = [sum(sample) for sample in labels]

for i in set(sums):
    print(f"{i} occurs {sums.count(i)} times")

1 occurs 9 times
2 occurs 6 times
3 occurs 4 times
4 occurs 5 times


In [8]:
# finding out how often each labeller assigned label 1
# all around 50%, except labeller 2
labels = pd.DataFrame(labels)
(labels.sum(axis=0)/len(labels)).round(2)

0    0.50
1    0.08
2    0.54
3    0.54
4    0.54
dtype: float64

#### Obtain explanations for entire dataset samples

In [37]:
df = pd.read_pickle("new_annotated_sample.pkl")
df2 = pd.read_pickle("annotated_test.pkl")
df3 = pd.read_pickle("annotated_val.pkl")

In [2]:
# imports
import openai
import sys
sys.path.append('../99-other-project')
from gptClassifier2 import gptclassifier, standard_message

openai.api_key = "" # insert your api key

In [19]:
# execute classification
completions=[]
results = gptclassifier(df, standard_message, completions, timer_frequency=5)

Done!


In [3]:
import pandas as pd
df = pd.read_pickle("data/annotated_val.pkl")
df2 = pd.read_pickle("data/annotated_val_explanaitons_gpt3.pkl")

In [31]:
pd.to_pickle(df,"annotated_val_explantions_gpt3.pkl")

#### comparing GPT-3 agreement

In [1]:
import json
import pandas as pd

In [72]:
# User classifications
with open("./data/full_agreement_sample.json") as f:
    df = json.load(f)
    f.close()

# Obtain GPT-3 classifications
df2 = pd.read_pickle("./data/annotated_test_explantions_gpt3.pkl")

In [83]:
df3 = pd.DataFrame(pd.DataFrame(df)['label'].tolist(), columns=['col_1', 'col_2', 'col_3', 'col_4', 'col_5'])
df = pd.concat([pd.DataFrame(df), df3], axis=1)

In [86]:
df = df.set_index("text")

In [94]:
df2=df2[["caption","explanations-gpt3"]]
df2.set_index("caption", inplace=True)

In [101]:
df = df.join(df2)

In [108]:
df["gpt3_labels"] = [99]*50

In [122]:
for row in df.iterrows():
    if row[1]["explanations-gpt3"].endswith("Potentially sponsored."):
        row[1]["gpt3-labels"] = 1
    else:
        row[1]["gpt3-labels"] = 0

In [143]:
df['gpt3_labels'] = df.apply(lambda row: 1 if row['explanations-gpt3'].endswith("Potentially sponsored.") else 0, axis=1)

In [146]:
df.head(1)

Unnamed: 0_level_0,label,col_1,col_2,col_3,col_4,col_5,explanations-gpt3,gpt3_labels
text,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Lilian 🌼 Wide, rose gold-colored stripes alternate with paler peach, in a gorgeous pattern that’ll take you all the way from winter to spring. We have to include a stripe! Earthy dreamy goodness! Lilian, you’re absolutely beautiful! 😍\n⠀⠀⠀⠀⠀⠀⠀⠀⠀\nWhich ring color is your favorite? Comment yours below and we will send this beauty your way! ENTRIES CLOSED CONGRATULATIONS @kelsey_herp please email hello@wildbird.co ❤️\n⠀⠀⠀⠀⠀⠀⠀⠀⠀\n#giveaway #wildbirdgiveaway #ringslinggiveaway #babywearing","[1, 0, 0, 0, 1]",1,0,0,0,1,"Key indicators: '#giveaway', '#wildbirdgiveawa...",1


In [158]:
sum(df["gpt3_labels"]==df["majority"])

37

In [153]:
df['majority'] = df.apply(lambda row: 1 if sum(row['label'])>2 else 0, axis=1)

In [157]:
sum(df["gpt3_labels"])

26

In [22]:
explanations = list(df2[df2["caption"].isin(samples)]["explanations-gpt3"])

In [None]:
's'.ends

In [30]:
labels = []
for explanation in explanations:
    if explanation.endswith("Potentially sponsored."):
        labels.append(1)
    elif explanation.endswith("not sponsored."):
        labels.append(0)
    elif explanation.endswith("biguous."):
        labels.append("Ambigous")
    else:
        labels.append(explanation)

In [33]:
user_labels = [sample["label"] for sample in df]

In [34]:
user_labels

[[1, 0, 0, 0, 1],
 [0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1],
 [0, 0, 0, 0, 0],
 [1, 0, 0, 1, 1],
 [0, 0, 0, 0, 0],
 [1, 0, 1, 0, 1],
 [1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1],
 [1, 0, 0, 0, 1],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 1, 1],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 1, 0],
 [0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1],
 [1, 0, 1, 1, 1],
 [1, 0, 1, 1, 1],
 [0, 0, 0, 0, 0],
 [0, 0, 1, 0, 0],
 [0, 0, 1, 1, 0],
 [1, 1, 1, 1, 1],
 [0, 0, 1, 0, 0],
 [0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1],
 [0, 1, 0, 0, 0],
 [0, 0, 0, 1, 0],
 [0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1],
 [0, 0, 0, 0, 0],
 [1, 1, 0, 1, 1],
 [1, 0, 1, 1, 1],
 [1, 0, 1, 0, 0],
 [0, 0, 0, 1, 0],
 [0, 0, 0, 0, 1],
 [0, 0, 1, 1, 0],
 [0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1],
 [1, 0, 1, 1, 1],
 [1, 0, 1, 0, 1],
 [1, 1, 1, 1, 1],
 [0, 0, 0, 1, 0],
 [1, 0, 1, 0, 1]]

In [37]:
labels_modified0 = labels

In [53]:
user_labels = pd.DataFrame(user_labels)

In [56]:
user_labels[5] = labels_modified0

In [70]:
user_labels

Unnamed: 0,0,1,2,3,4,5
0,1,0,0,0,1,1
1,0,0,0,0,0,0
2,1,1,1,1,1,1
3,1,1,1,1,1,0
4,0,0,0,0,0,0
5,1,0,0,1,1,1
6,0,0,0,0,0,1
7,1,0,1,0,1,0
8,1,1,1,1,1,0
9,1,1,1,1,1,0


In [27]:
explanations

["Key indicators: '#YoungRock', 'NBC', '@SevenBucksProd'.\nReasoning: This post promotes a TV show. It again branding of the TV Studio @SevenBucksProd indicates commercial ties. Label: Potentially sponsored.",
 'Reasoning: No indication of promoted product or business. Likely not sponsored. Label: Likely not sponsored.',
 "Key indicators: '#Ad', '@hanes', '@target', '@shop.LTK'.\nReasoning: The caption starts with '#Ad', indicating it is indeed an advertisement. There are two brands featured, @hanes and @target. The post also explicitly calls for a shopping action, making it certainly an advertisement. Label: Potentially sponsored.",
 'Reasoning: The post focuses on a personal story about kayaking and the skill of staying calm in changing situations. There are no apparent indicators of promoting a brand or business. Likely not sponsored.',
 'Reasoning: No indication of sponsoring or promotion. Likely not sponsored. Label: Likely not sponsored.',
 "Key indicators: 'wearing dress by @joa

In [4]:
import pandas as pd
df = pd.read_pickle("./data/annotated_test_explantions_gpt3.pkl")

In [9]:
for element in df.iloc[-30:-25,-1]:
    print(element)

Key indicators: '@byegreis'.
Reasoning: The post promotes clothing brand @byegreis. Clearly self advertisement. Label: Self advertisement.
Key indicators: '@bulgari'.
Reasoning: The post mentions and tags  @bulgari, which could be an indication of a paid partnership. Label: Potentially sponsored.
Key indicators: '@rico5uave', '#RicaSuave'.
Reasoning: The post is mainly about the author and her activity. While it seems to refer to an artwork or a person @rico5uave, this reference is hardly evidenced to bear any commercial or sponsoring element. Not sponsored. Label: Likely not sponsored.
Key indicators: '@FashionNova Ambassador'.
Reasoning: The post explicitly mentions to be a Fashion Nova Ambassador. It clearly is sponsored. Label: Potentially sponsored.
Key indicators: '@pinklily', 'LTK'.
Reasoning: The post promotes a fashion brand and features a discount code, indicating a partnership. Additionally it features an @shop.LTK link, a platform for paid partnerships. Label: Potentially s