## Google's Perspective API

https://developers.perspectiveapi.com/s/docs-sample-requests?language=en_US

About the architecture: https://developers.perspectiveapi.com/s/about-the-api-model-cards?language=en_US

Attributes:
https://developers.perspectiveapi.com/s/about-the-api-attributes-and-languages?language=en_US

"IDENTITY_ATTACK":

Negative or hateful comments targeting someone because of their identity.

**Threshold:**
Researchers should consider which application of Perspective best matches their intent. For social science researchers using Perspective to study harassment, we recommend experimenting with thresholds of 0.7 or 0.9, similar to typical moderation use cases. By comparison, machine learning researchers using Perspective to filter potentially toxic content from their data should use higher thresholds, like 0.9 or 0.95, as low thresholds increase the potential for bias. Another researcher might analyze all comments with scores 0.7 or higher when researching false positives in machine learning models. ( https://developers.perspectiveapi.com/s/about-the-api-score?language=en_US )

In [None]:
from googleapiclient import discovery
import json
import time
import numpy as np

API_KEY = 'AIzaSyB3SOPV2_Ft9DZOY2hOo7xVEirOWe88_1Q'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

def google_perspective_predict(sent):
  analyze_request = {
      'comment': { 'text': sent },
      'requestedAttributes': {'IDENTITY_ATTACK': {}},
      'languages': ["en"],
      }
  response = client.comments().analyze(body=analyze_request).execute()
  return response#["attributeScores"]["IDENTITY_ATTACK"]["summaryScore"]["value"]

In [None]:
from tqdm import tqdm

score = google_perspective_predict('They carry bundles of sticks on their backs like dogs, and they are so ugly that one would think they were born from the devil himself.')


## Pysentimiento

Paper: https://arxiv.org/pdf/2106.09462

Model: Similarly, BERTweet (Nguyen et al, 2020)
is a RoBERTa model trained on approximately
850 million English tweets


Dataset: https://aclanthology.org/S19-2007.pdf
https://github.com/pysentimiento/pysentimiento/tree/master

In [None]:
#!pip install pysentimiento
#!pip install opencv-python-headless
#!pip install -U transformers
from pysentimiento import create_analyzer
hate_speech_analyzer = create_analyzer(task="hate_speech", lang="en")

def pysent_predict(sent):
  result = hate_speech_analyzer.predict(sent)
  hatefullness = result.probas["hateful"]
  return hatefullness

## Huggingface models

We investigate the four most popular up-to-date (most downloads in the last month) Hate Speech detection models uploaded to Huggingface.

Filters:

*   Task = Text classification
*   Language = English

Models chosen on the 15th of August 2024 and the listed models are at the time of access the only models on huggingface having more than 1000 downloads in the last month.


### 1. facebook/roberta-hate-speech-dynabench-r4-target
https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target

Paper: https://arxiv.org/pdf/2012.15761

In [None]:
pipe_fb_roberta = pipeline("text-classification", model="facebook/roberta-hate-speech-dynabench-r4-target")

# Convert the output into a number in between 0 and 1 (0 signaling nonhate, 1 signaling hate)
def fb_roberta_predict_score(sent):
  result = pipe_fb_roberta(sent)
  print(result)
  if result[0]['label'] == 'nothate':
    return 1 - result[0]['score']
  else:
    return result[0]['score']


def fb_roberta_predict_label(sent):
  result = pipe_fb_roberta(sent)
  return "non-hateful" if result[0]['label'] == "nothate" else "hateful"


In [None]:
sent = 'Your moral character must be not only pure, but, like Caesars wife, unsuspected.'
#sent = ['setr']*600
score = fb_roberta_predict_score(sent)
label = fb_roberta_predict_label(sent)
score, label

### 2. Hate-speech-CNERG/english-abusive-MuRIL
https://huggingface.co/Hate-speech-CNERG/english-abusive-MuRIL (abusive)

Paper: https://arxiv.org/abs/2204.12543


*   LABEL_0 :-> Normal
*   LABEL_1 :-> Abusive


In [None]:
pipe_cnerg_abusive = pipeline("text-classification", model="Hate-speech-CNERG/english-abusive-MuRIL")

# Convert the output into a number in between 0 and 1 (0 signaling nonhate, 1 signaling hate)
def cnerg_abusive_predict_score(sent):
  result = pipe_cnerg_abusive(sent)
  if result[0]['label'] == 'LABEL_0':
    return 1 - result[0]['score']
  else:
    return result[0]['score']

def cnerg_abusive_predict_label(sent):
  result = pipe_cnerg_abusive(sent)
  return "non-hateful" if result[0]['label'] == "LABEL_0" else "hateful"

### 3. Hate-speech-CNERG/bert-base-uncased-hatexplain
https://huggingface.co/Hate-speech-CNERG/bert-base-uncased-hatexplain

Paper & Dataset: https://ojs.aaai.org/index.php/AAAI/article/view/17745

Base Model: Bert

Labels:

*   Normal
*   Offensive
*   Hate Speech



In [None]:
pipe_cnerg_hatexplain = pipeline("text-classification", model="Hate-speech-CNERG/bert-base-uncased-hatexplain")

# Convert the output into a number in between 0 and 1 (0 signaling nonhate, 1 signaling hate)
def cnerg_hatexplain_predict_score(sent):
  result = pipe_cnerg_hatexplain(sent)
  if result[0]['label'] == 'normal':
    return 1 - result[0]['score']
  else:
    return result[0]['score']

# Assuming offensive sentences overlap with hateful sentences
def cnerg_hatexplain_predict_label(sent):
  result = pipe_cnerg_hatexplain(sent)
  return "non-hateful" if result[0]['label'] == "normal" else "hateful"

### 4. Hate-speech-CNERG/dehatebert-mono-english
https://huggingface.co/Hate-speech-CNERG/dehatebert-mono-english

Paper: https://arxiv.org/abs/2004.06465

In [None]:
pipe_cnerg_dehatebert = pipeline("text-classification", model="Hate-speech-CNERG/dehatebert-mono-english")

def cnerg_dehatebert_predict_label(sent):
  result = pipe_cnerg_dehatebert(sent)
  return "non-hateful" if result[0]['label'] == "NON_HATE" else "hateful"


def cnerg_dehatebert_predict_score(sent):
  result = pipe_cnerg_dehatebert(sent)
  if result[0]['label'] == 'NON_HATE':
    return 1 - result[0]['score']
  else:
    return result[0]['score']

### 5. IMSyPP/hate_speech_en
https://huggingface.co/IMSyPP/hate_speech_en

Paper: https://link.springer.com/chapter/10.1007/978-3-031-08974-9_54

#### Hate speech type
At the speech type level, you can choose between four categories:
0. **Appropriate** - no target (leave the "target" category blank)
1. **Inappropriate** (contains terms that are obscene, vulgar; but the text is not directed at
any person specifically) - has no target (leave the “target” category blank)
2. **Offensive** (including offensive generalization, contempt, dehumanization, indirect
offensive remarks)
3. **Violent** (author threatens, indulges, desires or calls for physical violence against a
target; it also includes calling for, denying or glorifying war crimes and crimes against
humanity)

In [None]:
pipe_imsypp = pipeline("text-classification", model="IMSyPP/hate_speech_en")

def imsypp_predict_label(sent):
  result = pipe_imsypp(sent)
  return "non-hateful" if result[0]['label'] == "LABEL_0" or result[0]['label'] == "LABEL_1" else "hateful"


def imsypp_predict_score(sent):
  result = pipe_imsypp(sent)
  return result[0]['score']

### 6. cardiffnlp/twitter-roberta-large-hate-latest
https://huggingface.co/cardiffnlp/twitter-roberta-large-hate-latest

Paper: https://arxiv.org/abs/2310.14757

Base model: https://huggingface.co/cardiffnlp/twitter-roberta-large-2022-154m

Fine-tune dataset: https://huggingface.co/datasets/cardiffnlp/super_tweeteval

#### Labels
* **hate_gender**
* **hate_race**
* **hate_sexuality**
* **hate_religion**
* **hate_origin**
* **hate_disability**
* **hate_age**
* **not_hate**

In [None]:
pipe_cardiff_roberta = pipeline("text-classification", model="cardiffnlp/twitter-roberta-large-hate-latest")

# Function to get score for hate speech classification
def cardiff_roberta_predict_score(sent):
    result = pipe_cardiff_roberta(sent)
    return result[0]['score']

# Function to get label for hate speech classification
def cardiff_roberta_predict_label(sent):
    result = pipe_cardiff_roberta(sent)
    return "non-hateful" if result[0]['label'] == "not_hate" else "hateful"



### 7. badmatr11x/distilroberta-base-offensive-hateful-speech-text-multiclassification
https://huggingface.co/datasets/badmatr11x/hate-offensive-speech

distilroberta base from HF finetuned on https://huggingface.co/datasets/badmatr11x/hate-offensive-speech

https://huggingface.co/badmatr11x/distilroberta-base-offensive-hateful-speech-text-multiclassification/blob/main/README.md

#### Labels
0. **HATE-SPEECH**
1. **OFFENSIVE-LANGUAGE** -> also hateful content in dataset
2. **NEITHER**

In [None]:
pipe_badmatrix = pipeline("text-classification", model="badmatr11x/distilroberta-base-offensive-hateful-speech-text-multiclassification")

# Function to get score for hate speech classification
def badmatrix_predict_score(sent):
    result = pipe_badmatrix(sent)
    return result[0]['score']

# Function to get label for hate speech classification
def badmatrix_predict_label(sent):
    result = pipe_badmatrix(sent)
    return "non-hateful" if result[0]['label'] == "NEITHER" else "hateful"


### 8. tomh/toxigen_hatebert
https://huggingface.co/tomh/toxigen_hatebert

Paper: https://arxiv.org/pdf/2203.09509

**Does not work**

In [None]:
# Load the tomh/toxigen_hatebert model via pipeline
pipe_toxigen = pipeline("text-classification", model="tomh/toxigen_hatebert")

# Function to get score for hate speech classification
def pipe_toxigen_predict_score(sent):
    result = pipe_toxigen(sent)
    return result[0]['score']

# Function to get label for hate speech classification
def pipe_toxigen_predict_label(sent):
    result = pipe_toxigen(sent)
    return "non-hateful" if result[0]['label'] == "LABEL_1" else "hateful"