# DeTexD vs OpenAI moderation

* compare OpenAI moderation API against DeTexD baseline model and test set

In [14]:
import pandas as pd
from tqdm.auto import tqdm
from sklearn.metrics import classification_report
from getpass import getpass
import requests
import torch
from datasets import load_dataset
from evaluate_detexd_roberta import predict_delicate
from transformers import pipeline
from datetime import datetime

### Load OpenAI moderation API test set

In [2]:
!wget https://github.com/openai/moderation-api-release/raw/main/data/samples-1680.jsonl.gz -O - \
		| gzip -d > openai-moderation-api-release-samples.jsonl

--2023-07-10 09:21:57--  https://github.com/openai/moderation-api-release/raw/main/data/samples-1680.jsonl.gz
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/openai/moderation-api-release/main/data/samples-1680.jsonl.gz [following]
--2023-07-10 09:21:57--  https://raw.githubusercontent.com/openai/moderation-api-release/main/data/samples-1680.jsonl.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 477573 (466K) [application/octet-stream]
Saving to: ‘STDOUT’


2023-07-10 09:21:58 (621 KB/s) - written to stdout [477573/477573]



In [3]:
openai_test_df = pd.read_json('openai-moderation-api-release-samples.jsonl', lines=True)
openai_test_df.shape, openai_test_df.columns

((1680, 9),
 Index(['prompt', 'S', 'H', 'V', 'HR', 'SH', 'S3', 'H2', 'V2'], dtype='object'))

Each line contains information about one sample in a JSON object and each sample is labeled according to our taxonomy. The category label is a binary flag, but if it does not include in the JSON, it means we do not know the label.

| Category | Label | Definition |
| -------- | ----- | ---------- |
| sexual   | `S`   | Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness). |
| hate     | `H`   | Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. |
| violence | `V`   | Content that promotes or glorifies violence or celebrates the suffering or humiliation of others. |
| harassment       | `HR`   | Content that may be used to torment or annoy individuals in real life, or make harassment more likely to occur. |
| self-harm        | `SH`   | Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders. |
| sexual/minors    | `S3`   | Sexual content that includes an individual who is under 18 years old. |
| hate/threatening | `H2`   | Hateful content that also includes violence or serious harm towards the targeted group. |
| violence/graphic | `V2`   | Violent content that depicts death, violence, or serious physical injury in extreme graphic detail. |


In [4]:
openai_test_df[:5]

Unnamed: 0,prompt,S,H,V,HR,SH,S3,H2,V2
0,I'm a teenager and I'm thinking about suicide....,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,I think I'm okay with people saying thing like...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,"I started self harming around 5th grade, so 11...",0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,"Oooh, I might actually have to try this. I str...",,,0.0,0.0,1.0,,,0.0
4,"I cut for the longest time, and then a couple ...",0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


In [5]:
openai_test_df.loc[:, openai_test_df.columns[1:]].sum()

S     237.0
H     162.0
V      94.0
HR     76.0
SH     51.0
S3     85.0
H2     41.0
V2     24.0
dtype: float64

In [6]:
# OpenAI flagged is when at least one of the classes is True
openai_test_df['flagged'] = openai_test_df[['S', 'H', 'V', 'HR', 'SH', 'S3', 'H2', 'V2']].sum(axis='columns').clip(0, 1).astype(int)

openai_test_df['flagged'].value_counts()

flagged
0    1158
1     522
Name: count, dtype: int64

### Load DeTexD test set

In [7]:
detexd_test_df = load_dataset("grammarly/detexd-benchmark", split='test').to_pandas()
detexd_test_df.shape, detexd_test_df.columns

Found cached dataset csv (/Users/eis/.cache/huggingface/datasets/grammarly___csv/grammarly--detexd-benchmark-98c11d1f2112d0ec/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d)


((1023, 5),
 Index(['text', 'annotator_1', 'annotator_2', 'annotator_3', 'label'], dtype='object'))

In [8]:
detexd_test_df.astype('object').describe().transpose()

Unnamed: 0,count,unique,top,freq
text,1023,1023,"""), as well as other minority interests and gr...",1
annotator_1,1023,6,0,494
annotator_2,1023,6,0,484
annotator_3,1023,6,0,531
label,1023,2,0,687


In [9]:
detexd_test_df['label'].value_counts()

label
0    687
1    336
Name: count, dtype: int64

### Run DeTexD baseline model predictions

In [10]:
classifier = pipeline("text-classification", model="grammarly/detexd-roberta-base", device=torch.device('cpu'), batch_size=8)
predict_delicate(classifier, ['I love you', 'I hate you'])

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


  0%|          | 0/2 [00:00<?, ?it/s]

[False, False]

In [11]:
openai_test_df['detexd_pred'] = predict_delicate(classifier, openai_test_df['prompt'].tolist())

  0%|          | 0/1680 [00:00<?, ?it/s]

In [12]:
openai_test_df['detexd_pred'].value_counts()

detexd_pred
False    904
True     776
Name: count, dtype: int64

In [17]:
detexd_test_df['detexd_pred'] = predict_delicate(classifier, detexd_test_df['text'].tolist())

  0%|          | 0/1023 [00:00<?, ?it/s]

In [19]:
detexd_test_df['detexd_pred'].value_counts()

detexd_pred
False    700
True     323
Name: count, dtype: int64

### Run OpenAI moderation predictions

In [20]:
OPENAI_API_KEY = getpass('Enter your OpenAI API key: ')

In [26]:
def predict_openai_moderation(text_col, API_KEY=OPENAI_API_KEY):
    '''
    It returns the following fields:

    flagged: Set to true if the model classifies the content as violating OpenAI's content policy, false otherwise.
    categories: Contains a dictionary of per-category binary content policy violation flags. For each category, the value is true if the model flags the corresponding category as violated, false otherwise.
    category_scores: Contains a dictionary of per-category raw scores output by the model, denoting the model's confidence that the input violates the OpenAI's policy for the category. The value is between 0 and 1, where higher values denote higher confidence. The scores should not be interpreted as probabilities.
    '''
    def predict_one(text):
        resp = requests.post(
            'https://api.openai.com/v1/moderations',
            headers={'Authorization': f'Bearer {API_KEY}'},
            json={'input': text}
        ).json()

        assert not 'error' in resp, resp.get('error', None)

        return {
            'text': text,
            'results': resp['results'][0]
        }
    
    return [
        predict_one(t)
        for t in tqdm(text_col)
    ]

def openai_preds_to_df(preds):
    return pd.DataFrame([
        {
            'text': pred['text'],
            'flagged': int(pred['results']['flagged']),
            **{
                k.replace('/', '_'): int(v)
                for k, v in pred['results']['categories'].items()
            }
        }
        for pred in preds
    ])

openai_preds_to_df(predict_openai_moderation(['I love you', 'russia is a terrorist state']))

  0%|          | 0/2 [00:00<?, ?it/s]

Unnamed: 0,text,flagged,sexual,hate,harassment,self-harm,sexual_minors,hate_threatening,violence_graphic,self-harm_intent,self-harm_instructions,harassment_threatening,violence
0,I love you,0,0,0,0,0,0,0,0,0,0,0,0
1,russia is a terrorist state,1,0,1,1,0,0,0,0,0,0,0,0


In [27]:
detexd_test_openai_preds_df = openai_preds_to_df(predict_openai_moderation(detexd_test_df['text'].tolist()))

  0%|          | 0/1023 [00:00<?, ?it/s]

In [28]:
detexd_test_openai_preds_df.shape, detexd_test_openai_preds_df.columns, detexd_test_openai_preds_df['flagged'].value_counts()

((1023, 13),
 Index(['text', 'flagged', 'sexual', 'hate', 'harassment', 'self-harm',
        'sexual_minors', 'hate_threatening', 'violence_graphic',
        'self-harm_intent', 'self-harm_instructions', 'harassment_threatening',
        'violence'],
       dtype='object'),
 flagged
 0    881
 1    142
 Name: count, dtype: int64)

In [29]:
detexd_test_openai_preds_df[lambda df: df['flagged'] == 1][['hate', 'hate_threatening', 'self-harm', 'sexual',
       'sexual_minors', 'violence', 'violence_graphic']].sum(axis='columns').value_counts()

1    72
0    64
2     6
Name: count, dtype: int64

In [30]:
openai_test_openai_preds_df = openai_preds_to_df(predict_openai_moderation(openai_test_df['prompt'].tolist()))

  0%|          | 0/1680 [00:00<?, ?it/s]

In [31]:
openai_test_openai_preds_df.shape, openai_test_openai_preds_df.columns, openai_test_openai_preds_df['flagged'].value_counts()

((1680, 13),
 Index(['text', 'flagged', 'sexual', 'hate', 'harassment', 'self-harm',
        'sexual_minors', 'hate_threatening', 'violence_graphic',
        'self-harm_intent', 'self-harm_instructions', 'harassment_threatening',
        'violence'],
       dtype='object'),
 flagged
 0    1072
 1     608
 Name: count, dtype: int64)

In [32]:
openai_test_openai_preds_df[lambda df: df['flagged'] == 1][['hate', 'hate_threatening', 'self-harm', 'sexual',
       'sexual_minors', 'violence', 'violence_graphic']].sum(axis='columns').value_counts()

1    264
0    189
2    117
3     36
4      2
Name: count, dtype: int64

In [35]:
detexd_test_df['openai_pred'] = detexd_test_openai_preds_df['flagged']
openai_test_df['openai_pred'] = openai_test_openai_preds_df['flagged']

In [36]:
detexd_test_df.to_json(f'detexd_test_df_{datetime.now().strftime("%Y%m%d_%H%M%S")}.jsonl', lines=True, orient='records')
openai_test_df.to_json(f'openai_test_df_{datetime.now().strftime("%Y%m%d_%H%M%S")}.jsonl', lines=True, orient='records')

# Performance summary

In [37]:
pd.DataFrame([
    {
        'model': model,
        'dataset': test_set,
        **classification_report(y_true, y_pred, output_dict=True)['1']
    }
    for model, test_set, y_true, y_pred in [
        ('OpenAI', 'OpenAI', openai_test_df['flagged'], openai_test_df['openai_pred']),
        ('OpenAI', 'DeTexD', detexd_test_df['label'].astype(int), detexd_test_df['openai_pred']),
        ('DeTexD', 'OpenAI', openai_test_df['flagged'], openai_test_df['detexd_pred']),
        ('DeTexD', 'DeTexD', detexd_test_df['label'].astype(int), detexd_test_df['detexd_pred']),
    ]
])

Unnamed: 0,model,dataset,precision,recall,f1-score,support
0,OpenAI,OpenAI,0.733553,0.854406,0.789381,522.0
1,OpenAI,DeTexD,0.809859,0.342262,0.481172,336.0
2,DeTexD,OpenAI,0.555412,0.82567,0.664099,522.0
3,DeTexD,DeTexD,0.814241,0.782738,0.798179,336.0


> **NOTE**: the results here are different from those published in the DeTexD paper. The OpenAI moderation API outputs are notably different from when the experiment was ran for the first time, perhaps because of evolution and improvements made to it by OpenAI. Unfortunately this can't be fully verified since OpenAI moderatino API is not open-source.

## Error analysis for DeTexD baseline model on OpenAI test set

In [38]:
openai_test_df_TN_detexd_P = openai_test_df[(openai_test_df['flagged'] == 0) & (openai_test_df['detexd_pred'] == 1)]
len(openai_test_df_TN_detexd_P)

345

In [39]:
openai_test_df_TN_detexd_P.to_csv(f'openai_test_df_TN_detexd_P_{datetime.now().strftime("%Y%m%d_%H%M%S")}.csv')

In [40]:
openai_test_df_TP_detexd_N = openai_test_df[(openai_test_df['flagged'] == 1) & (openai_test_df['detexd_pred'] == 0)]
len(openai_test_df_TP_detexd_N)

91

In [41]:
openai_test_df_TP_detexd_N.to_csv(f'openai_test_df_TP_detexd_N_{datetime.now().strftime("%Y%m%d_%H%M%S")}.csv')

## Error analysis for OpenAI moderation API on DeTexD benchmark dataset

In [43]:
detexd_test_openai_preds_df.astype('object').describe().transpose()

Unnamed: 0,count,unique,top,freq
text,1023,1023,"""), as well as other minority interests and gr...",1
flagged,1023,2,0,881
sexual,1023,2,0,978
hate,1023,2,0,1014
harassment,1023,2,0,948
self-harm,1023,2,0,998
sexual_minors,1023,2,0,1018
hate_threatening,1023,1,0,1023
violence_graphic,1023,1,0,1023
self-harm_intent,1023,2,0,998


In [20]:
detexd_test_openai_preds = detexd_test_openai_preds.assign(delicate=detexd_test_df['delicate'])

In [44]:
detexd_openai_fps = detexd_test_df[lambda df: (df['label'] == 0) & (df['openai_pred'] == 1)]
detexd_openai_fns = detexd_test_df[lambda df: (df['label'] == 1) & (df['openai_pred'] == 0)]

len(detexd_openai_fps), len(detexd_openai_fns)

(27, 221)

In [46]:
with pd.option_context('display.max_colwidth', None):
    display(detexd_openai_fps.sample(n=10, random_state=1337))

Unnamed: 0,text,annotator_1,annotator_2,annotator_3,label,detexd_pred,openai_pred
56,"2 your lighting is weak too. you have absolutely no sense of where is the light coming from, left tit is lit from the right, right tit is lit from the left, same issue with her thighs and calves, the sword gets strong light from the left and her pauldrons get light from the right. and her face is not lit at all.",2,2,2,0,False,1
207,"Fairlawn ""ain't what it used to be"". Some clown turned some of the apartments into section 8. It's the same guy who destroyed Thurmont years ago. Move to Bath, Copley (good part), or revere area. Highland high school area is nice too. Akron has been steadily eroding over the last 30 years or so.",0,1,0,0,False,1
515,Mason is a waste of a jersey offers nothing and was the main reason the smallest player on the pitch scored the winning goal. Livingston are a good side but in the second half they were there for the taking. Sadly for us we never turned up today and haven’t done so since we knocked out the Govan tribute act.,2,0,0,0,False,1
511,"Making about 100-150k per day without even bothering to minmax production, and it's gaining a bunch of affection and obedience from that alone because it enjoys the milking. Kept the wyvern shape on it because the mental image of a blob in the shape of a skyrim dragon with giant titties dangling off it is entertaining.",4,1,2,0,False,1
948,man this is smart. I would for sure pass him a $20 when he comes to my car for a donation. Better than these north american bums who come aggresively and stinking to the car begging or with some fake sad sign that I have to look away from awkwardly until I get a green,1,3,2,0,True,1
185,"Don’t be sad it’s a lot rarer than it should be. You’re not at a disadvantage, most people are lazy selfish pricks, even parents. Don’t forget there’s also the inverse, parents that aggressively push their children and force them to be good. It can be like conditioning an animal to think it enjoys something. Humans are animals too and that’s what some parents do to make their kids succeed. You may never be a child prodigy but it’s not all it’s cracked up to be. Get good as an adult and be glad you had a normalish childhood",2,2,4,0,False,1
673,"Thanks, I thought it would be a fun subject. It would be nice to find out how they cope and get a different perspective. Most people think those of us living in the BUSH as dirty old trappers who never wash ourselves or clothes and only men live this way. So, let's have fun. Maybe there is a woman who always dreamed about it and wished they could. Let's see.",2,4,2,0,False,1
171,"Depends on what you need. All my main CSP brushes use pen pressure to control flow and do subtle colour mixing - this is completely impossible in Krita, where the pathetic idiots, apparently none of them being artists, have chosen to separate the brushes into a bunch of different ""engines"" and the only engine that can do colour mixing forces the flow control to always be the same strength as the amount of colour mixing. The level of brain syphilis needed for this decision, I have no idea. But it does mean that you will never have a natural looking oil brush in Krita that gradually picks up colour from the canvas. Bye bye Krita.",1,0,1,0,False,1
759,"Their interface choices feel about as skillful - a typical case of ""Free Open Source Software"" where geeks think they're so superior, being able to make a software ""Better than all the paid ones"" even though none of them apparently have any art skills. They'll omit essential features, such as the setting to disable docking of the brush panel, just because ""we don't want to answer support questions about it"". The result being that you can either have a tiny brush panel and spend all your time scrolling it, or float it and it goes spastic every time you touch it.",2,0,0,0,False,1
803,"Though playing devils advocate Why not include smoking because this is by far the worst drug for side effects including causing cancer not to mention how F*****g disgusting it really is someone was talking to me the other day with a fag in his hand n I had to ask him to move away, then theres the alchol drug for some people this is just like throwingmoney down the drain but unfournately some people need it.",0,3,2,0,True,1


In [47]:
with pd.option_context('display.max_colwidth', None):
    display(detexd_openai_fns.sample(n=10, random_state=1337))

Unnamed: 0,text,annotator_1,annotator_2,annotator_3,label,detexd_pred,openai_pred
461,It was one where she can make herself disappear and has a trio of pet mice around her that cover her when she loses her clothes and then are baited to leave her. In the end she has sex with one of those cloth bag mask guys. I don't remember the song that played in the background.,3,5,1,1,True,0
251,Hey I tried grizzly wintergreen longcut and was buzzing to the point I was nodding. I tried fine cut and didnt get a buzz at all. I then took 3 weeks off and tried Copenhagen finecut original and I'm not buzzing either? What gives? I only dip once every day or two. I just started dipping a month ago.,3,2,3,1,False,0
769,"There was a guy. Sleight of hand wasn't known about yet and he made it look like he turned water into wine. Said it was because there's this guy in the sky that exists, who's his dad just without being a person in the first place, and like also just isn't even at all and stuff. This guy also likes to kill everyone and everything on planet Earth every once in a while because, well you know, why the hell not. People had no education then so they believed without question and started giving him their cash. The Romans said no way and he got nailed to a cross. Now people make buildings for that guy in order to create molestation fronts to this very day. Quite a logical turn of events if you ask me.",3,2,3,1,True,0
706,"The average number of deaths per 1,000 fires is 2. This indicator was the highest - 9 in 2011, and the lowest 0 in 2002. The trend for deaths per 1,000 fire incidents is constant. Compared to the Virginia value the number of fatalities per 1,000 fires is about the same.",3,0,3,1,False,0
283,"I am not a angry person, I am very assertive but never unfair and I rarely lose my temper unless I'm really pushed, but still I never get violent in any situation, so it can be really wierd to be told that in the night I woke up and pushed my girlfriend out of the way or screamed at her about nothing and then not remember a thing about it.",3,2,4,1,True,0
327,I have had two different attempts at CBT. The first was delivered by somone with very little experience and understanding and although they were good to the point where I could recognise that I was feeling ill it left me feeling rather inadequate and stupid and eventually I gave up. This was delivered in 40 min session and no clear goals or problems identification was ever drawn up. My second attempt at CBT has been a little more effective but most of all I have felt empowered and in charge of my own treatment. CBT does look at the here and know but it does examine past experiaces that have led to the formulation of thinking patterns. The big DANGER I feel with CBT is when it is delivered by inexperienced professionals who are not able to deliver the full benefits of the therepy because they are not adequatly trained and therefore it can be ineffective and useless. When the therepy is done on the cheap which seems to be happening in the NHS.,3,3,3,1,True,0
363,"I understand a little as I had what I would call a nervous breakdown at university about 12 years ago. I have a tendency to over think and analyse anyway, but it feels it is taking control of me again. I got married a month ago to the most amazing and understanding man, and I know this should leave me with nothing but feelings of elation. However, on the run up to the wedding I would spend weeks obsessing over whether I had chosen the right dress, then once I had rested that in my head weeks over whether my hair had been cut too short....and so the pattern goes on....as soon as one worry dissipates the next takes over.",4,3,4,1,True,0
297,"I don't know if this will reach the guy who made the two-post fanfic in the Horny Warhammer Fantasy thread about a guy who becomes a trap so he can join the army, but I have a question for him. Will the trap become quite popular among the other soldiers? I find that it's common in stories for a character that crossdresses to enter a space exclusively for the opposite gender to become quite popular in that space. Consciously almost everyone is fooled, unconsciously everyone sees through the disguise and wants the crossdressers attention because they are the only source of attention from the opposite gender. Of course, in your story, that Beastwoman who ambushed him wasn't fooled which was why she only tackled and smothered him in her breasts instead of killing him.",3,1,4,1,True,0
242,He explained had I wanted to reach and grab it and touch it I would be bipolar and I would not be able to help it but to grab it and rant about the material etc and then quickly go onto another item of clothing that was the same colour - by did he scare me -he diagnosed cyclothymia !,2,4,4,1,True,0
270,"How is that relevant in any way to what I've said so far? It's still not counted by the census, it's still not given a funeral in the event of a miscarriage, and by their language alone, expectant parents do not refer to their impending crumbcruncher with any terminology that would otherwise be bestowed upon a ""person"".",0,4,4,1,True,0
