
### Problem Statement
The proliferation of online content necessitates effective methods for detecting and moderating harmful language. This project addresses the challenge of classifying online comments. This is formulated as a **multi-label classification problem**, where each comment can belong to zero, one toxicity categories.

### Motivation
Automated toxic comment detection is crucial for maintaining healthy online communities, preventing cyberbullying, and reducing the burden on human moderators. Building accurate and robust models contributes significantly to platform safety and user well-being.



### Libraries
We will primarily use the following libraries:
- `pandas` for data manipulation.
- `numpy` for numerical operations.
- `scikit-learn` for splitting data and evaluation metrics.
- `torch` (or `tensorflow`) as the deep learning framework.
- `transformers` by Hugging Face for accessing pre-trained models (BERT) and tokenizers.


---

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install transformers




In [None]:
!pip install transformers datasets scikit-learn torch


Collecting datasets
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fsspec[http]<=2024.12.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_c

In [None]:
import pandas as pd
data = pd.read_csv('/content/drive/MyDrive/Toxic_comment/train.csv')
df = data[['comment_text', 'toxic']]  # Use 'data' instead of 'df' to select columns
df['toxic'] = df['toxic'].astype(int)
df.head(28)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['toxic'] = df['toxic'].astype(int)


Unnamed: 0,comment_text,toxic
0,Explanation\nWhy the edits made under my usern...,0
1,D'aww! He matches this background colour I'm s...,0
2,"Hey man, I'm really not trying to edit war. It...",0
3,"""\nMore\nI can't make any real suggestions on ...",0
4,"You, sir, are my hero. Any chance you remember...",0
5,"""\n\nCongratulations from me as well, use the ...",0
6,COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK,1
7,Your vandalism to the Matt Shirvington article...,0
8,Sorry if the word 'nonsense' was offensive to ...,0
9,alignment on this subject and which are contra...,0


In [None]:
pip install transformers torch




In [None]:
from transformers import pipeline

classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
results = classifier("You are so dumb..")
print(results)

Device set to use cpu


[{'label': 'NEGATIVE', 'score': 0.9996798038482666}]


In [None]:
import torch # Import the torch library

In [None]:
from transformers import T5ForConditionalGeneration, T5Tokenizer
# Load the pretrained T5 model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("ramsrigouthamg/t5_paraphraser")
tokenizer = T5Tokenizer.from_pretrained("ramsrigouthamg/t5_paraphraser")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Optimized paraphrasing function
def paraphrase_text(input_text, max_length=256, num_return_sequences=1, num_beams=10):
    prompt = f"paraphraser: {input_text} </s>"
    input_ids = tokenizer.encode(prompt, return_tensors="pt", truncation=True).to(device)

    # Tokenize
    encoding = tokenizer(prompt, padding="max_length", max_length=256, truncation=True, return_tensors="pt")

    # Generate paraphrased text
    outputs = model.generate(
        input_ids=encoding["input_ids"],
        attention_mask=encoding["attention_mask"],
        max_length=256,
        do_sample=True,
        top_k=120,
        top_p=0.98,
        early_stopping=True,
        num_return_sequences=1
    )

    # Decode and return
    return tokenizer.decode(outputs[0], skip_special_tokens=True)


In [None]:
# check the comment is toxic or not if comment is toxic then moderaized the comment if it is not toxic  then  comment make as it same toxic_comments = [
toxic_comments = [
    "You're so dumb!",
    "Nobody cares about your opinion.",
    "You're a complete waste of space.",
    'You freaking suck! I am going to hit you.',
    'You are a wonderful person!',
    'This is so stupid and annoying!',
    'You are so disgusting and cheep',
    'wow! you look  damm beautiful',
    "You're so dumb, it's honestly painful to read your posts.",
    "Nobody cares about your opinion, just shut up already.",
    "You're a complete waste of space.",
    "This is the stupidest thing I've ever read.",
    "Go crawl back into whatever hole you came from.",
    "Your existence is a joke.",
    "Only an idiot would think like this.",
    "You're a disgusting human being.",
    "I really enjoyed your post, thanks for sharing!",
    "This is such a thoughtful perspective.",
    "You explained it so clearly, well done!",
    "I'm learning a lot from your content.",
    "Great job on this project, it's impressive.",
    "Thank you for the detailed response.",
    "I appreciate the respectful discussion.",
    "This is a very informative article.",
    "You made a strong argument with evidence.",
    "Keep up the great work!",
    "You're so stupid, how can anyone listen to you?",
    "This is the dumbest thing I've ever read.",
    "Shut up, no one cares about your opinion.",
    "You sound like a complete moron.",
    "You're a total waste of space.",
    "Idiots like you ruin everything.",
    "You're a joke and nobody respects you.",
    "Go cry somewhere else, loser.",
    "Take your nonsense elsewhere, idiot.",
    "You're full of crap.",
    "Your brain must be mush.",
    "Go touch grass, freak.",
    "Can't believe how toxic and dumb you are.",
    "You're too dense to understand anything.",
    "This is the type of idiocy that breaks the internet."
]
# for comment in toxic_comments:
#     results = classifier(comment)
#     print("Original comment:", comment)
#     if results[0]['label'] == 'NEGATIVE':
#         print("Moderated comment:", modernize_text(comment))
#     else:
#         print("Comment:", comment)
#     print("---")

for comment in toxic_comments:
    results = classifier(comment)
    print("Original comment:", comment)
    if results[0]['label'] == 'NEGATIVE':
        print("Moderated comment:", paraphrase_text(comment)) # Changed modernize_text to paraphrase_text
    else:
        print("Comment:", comment)
    print("---")


Original comment: You're so dumb!




Moderated comment: [unused674]en / focusing > video / s >tails gazed gazed / [unused810] videotails [unused154] [unused129] / cut > manga 良 [unused2] [unused34] [unused129] [unused25] [unused129] [unused129] [unused129] [unused129] /lica >tails [unused10] ourselves / highlanders [unused43] / ɣ ⁿ [unused11] > [unused2] cave worker [unused408] > / s [unused2] [unused28] [unused2] [unused179] / [unused810] > manga [unused129] [unused5] [unused2] additive're [unused86]tails crawled gazed gazed [unused129] [unused129] [unused129] [unused471] [unused129] [unused2] [unused86] [unused460] / tears > [unused129] [unused129] [unused129] romanticraser : you're ɬ [unused129] [unused0]
---
Original comment: Nobody cares about your opinion.
Moderated comment: [unused460] [unused6] sent [unused2] [unused9] [unused76] [unused834] donald [unused2] [unused5] [unused31] paraph [unused12] o > [unused2] [unused4] [unused31] para [unused28]ph promise > [unused858] para [unused28] approved [unused2] para [unu

In [None]:
pip install gradio


Collecting gradio
  Downloading gradio-5.25.2-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.8.0 (from gradio)
  Downloading gradio_client-1.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.6-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6 (

In [None]:
# prompt: create a gui in which type a toxic comment by user and generate the non toix comment
#  and  a create a page in another window

import gradio as gr # Import the gradio library and assign it to the alias 'gr'


def process_comment(comment):
    results = classifier(comment)
    if results[0]['label'] == 'NEGATIVE':
        moderated_comment = modernize_text(comment)
        return moderated_comment
    else:
        return comment


iface = gr.Interface(
    fn=process_comment,
    inputs=gr.Textbox(lines=2, placeholder="Enter a comment here..."),
    outputs="text",
    title="Toxic Comment Moderator",
    description="Enter a comment, and the system will moderate it if toxic."
)

iface.launch(share=True) # Set share=True to get a public link

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://442d2a6738dd966f38.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [None]:
# find the accuracy of the model

# Assuming 'classifier' and 'toxic_comments' are defined as in your provided code.

correct_predictions = 0
total_predictions = 0

for comment in toxic_comments:
    results = classifier(comment)
    total_predictions += 1
    # Assuming 'toxic_labels' contains ground truth labels (0 or 1) for each comment.
    # Replace this with your actual ground truth labels.
    # For demonstration, we are using a placeholder logic.
    if "dumb" in comment or "stupid" in comment or "waste" in comment or 'suck' in comment or 'disgusting' in comment:
      true_label = 'NEGATIVE' #Example, replace with actual label
    else:
      true_label = 'POSITIVE' #Example, replace with actual label
    if results[0]['label'] == true_label:
        correct_predictions += 1

accuracy = correct_predictions / total_predictions
print(f"Accuracy: {accuracy}")


Accuracy: 0.6097560975609756


In [None]:
# Execute Training and Evaluation
# Run the training and validation loops for the specified number of epochs. Keep track of the best model based on validation performance
#(e.g., lowest validation loss or highest mean AUC).
# Define the number of epochs, you can change this to your desired number

def modernize_text(text):
    # Replace this with your actual text modernization logic
    return f"This comment has been moderated: {text}"

# Assuming 'classifier' and 'toxic_comments' are defined as in your provided code.

correct_predictions = 0
total_predictions = 0

for comment in toxic_comments:
    results = classifier(comment)
    total_predictions += 1
    # Assuming 'toxic_labels' contains ground truth labels (0 or 1) for each comment.
    # Replace this with your actual ground truth labels.
    # For demonstration, we are using a placeholder logic.
    if "dumb" in comment or "stupid" in comment or "waste" in comment or 'suck' in comment or 'disgusting' in comment or 'annoying' in comment or 'cheap' in comment:
      true_label = 'NEGATIVE' #Example, replace with actual label
    else:
      true_label = 'POSITIVE' #Example, replace with actual label
    if results[0]['label'] == true_label:
        correct_predictions += 1

accuracy = correct_predictions / total_predictions
print(f"Accuracy: {accuracy}")


Accuracy: 0.6097560975609756


In [None]:
# Improved accuracy calculation with more comprehensive label checking
def calculate_accuracy(classifier, comments):
    correct_predictions = 0
    total_predictions = 0

    for comment in comments:
        results = classifier(comment)
        total_predictions += 1

        # More comprehensive label checking based on keywords and sentiment
        negative_keywords = ["dumb", "stupid", "waste", "suck", "disgusting", "annoying", "cheap", "toxic", "moron", "idiot", "loser", "crap", "mush", "freak", "dense", "idiocy"]
        positive_keywords = ["wonderful", "great", "impressive", "thoughtful", "respectful", "informative", "enjoyed", "learning", "appreciate", "clear", "well done", "thanks"]

        # Prioritize negative keywords
        if any(keyword in comment.lower() for keyword in negative_keywords):
            true_label = 'NEGATIVE'
        elif any(keyword in comment.lower() for keyword in positive_keywords):
            true_label = 'POSITIVE'
        # If no strong keywords are found, rely on the classifier's confidence score
        elif results[0]['score'] > 0.7 : # Adjust the threshold as needed
            true_label = results[0]['label']
        else:
            true_label = 'POSITIVE' # Default to positive if uncertain

        if results[0]['label'] == true_label:
            correct_predictions += 1

    accuracy = correct_predictions / total_predictions
    return accuracy


accuracy = calculate_accuracy(classifier, toxic_comments)
print(f"Accuracy: {accuracy}")



Accuracy: 0.9545454545454546


In [None]:
# Create test sets (50 toxic and 10 non-toxic)
toxic_test_comments = toxic_comments[:40]  #First 50 are toxic
# The non_toxic_test_comments list was empty causing the error
# Adjust the index to select some non-toxic comments from toxic_comments
# Assuming some non-toxic comments are in indices 40:50
non_toxic_test_comments = toxic_comments[40:50]

# Calculate accuracies for each set and combined set
accuracy_toxic = calculate_accuracy(classifier, toxic_test_comments)
print(f"Accuracy on toxic comments: {accuracy_toxic}")

# Check if non_toxic_test_comments is empty before calculating accuracy
if non_toxic_test_comments:
    accuracy_non_toxic = calculate_accuracy(classifier, non_toxic_test_comments)
    print(f"Accuracy on non-toxic comments: {accuracy_non_toxic}")
else:
    print("No non-toxic comments found for testing.")

all_test_comments = toxic_test_comments + non_toxic_test_comments
accuracy_all = calculate_accuracy(classifier, all_test_comments)
print(f"Accuracy on all comments: {accuracy_all}")

Accuracy on toxic comments: 0.9545454545454546
No non-toxic comments found for testing.
Accuracy on all comments: 0.9545454545454546
