This code sets up a text classification pipeline using the pre-trained BERT model for toxic comment classification. Here's a breakdown of the steps:
- The necessary classes and functions are imported from the transformers library, including BertForSequenceClassification, BertTokenizer, and TextClassificationPipeline.
- The path to the pre-trained BERT model for toxic comment classification is specified in the model_path variable.
- The pre-trained BERT tokenizer is loaded from the specified model path using BertTokenizer.from_pretrained(model_path). The tokenizer is responsible for converting text input into numerical representations that the BERT model can understand.
- The pre-trained BERT model for sequence classification is loaded from the specified model path using BertForSequenceClassification.from_pretrained(model_path, num_labels=2). The num_labels=2 parameter indicates that this is a binary classification task (toxic or non-toxic).
- A TextClassificationPipeline is created using the loaded model and tokenizer. This pipeline simplifies the process of text classification by combining the tokenization, model inference, and output formatting steps into a single interface.

In [2]:
# Import necessary classes and functions from the transformers library
from transformers import BertForSequenceClassification, BertTokenizer, TextClassificationPipeline

# Path to the pre-trained BERT model for toxic comment classification
model_path = "JungleLee/bert-toxic-comment-classification"

# Load the pre-trained BERT tokenizer from the specified model path
tokenizer = BertTokenizer.from_pretrained(model_path)

# Load the pre-trained BERT model for sequence classification from the specified model path
# Set the number of labels to 2 (binary classification: toxic or non-toxic)
model = BertForSequenceClassification.from_pretrained(model_path, num_labels=2)

# Create a text classification pipeline using the loaded model and tokenizer
pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer)


In [5]:
input_text = [
    "You're an idiot! Your ideas are completely stupid and make no sense.",
    "I respectfully disagree with your opinion on this matter. Let's have a constructive discussion.",
    "That's a ridiculous suggestion. You clearly have no idea what you're talking about.",
    "I appreciate your perspective, but I have a different viewpoint on this issue.",
    "Your argument is flawed and based on false assumptions. Do your research before spreading misinformation.",
    "While I understand your concerns, I believe there are more positive ways to approach this problem.",
    "You're a complete moron, and your ignorance is astounding. Get lost!",
    "I'm open to hearing different opinions, but let's keep the conversation civil and respectful.",
    "Your proposal is utter garbage, and you should be ashamed of yourself for even suggesting it.",
    "I can see where you're coming from, but I think we need to consider all angles before making a decision.",
    "John Doe is a sexist and racist teacher and has been like that the entire term."
] 

results = pipeline(input_text)

In [7]:
i=0

for row in results:
    print(f"Input #{i+1}: {input_text[i]}")
    print(f"Outcome : {row['label']}")
    print(f"Probability : {row['score']*100:.2f}%\n")
    i+=1

Input #1: You're an idiot! Your ideas are completely stupid and make no sense.
Outcome : toxic
Probability : 99.98%

Input #2: I respectfully disagree with your opinion on this matter. Let's have a constructive discussion.
Outcome : non-toxic
Probability : 99.97%

Input #3: That's a ridiculous suggestion. You clearly have no idea what you're talking about.
Outcome : toxic
Probability : 99.80%

Input #4: I appreciate your perspective, but I have a different viewpoint on this issue.
Outcome : non-toxic
Probability : 99.98%

Input #5: Your argument is flawed and based on false assumptions. Do your research before spreading misinformation.
Outcome : non-toxic
Probability : 91.89%

Input #6: While I understand your concerns, I believe there are more positive ways to approach this problem.
Outcome : non-toxic
Probability : 99.98%

Input #7: You're a complete moron, and your ignorance is astounding. Get lost!
Outcome : toxic
Probability : 99.98%

Input #8: I'm open to hearing different opinio