<a href="https://colab.research.google.com/github/pavlotkachenko/CryptoAnalyser/blob/main/Text_Moderation_System_Pavlo_Tkachenko.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Install Dependencies

In [1]:
!pip install transformers torch pandas scikit-learn


Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

2.  Import Necessary Libraries

In [2]:
import torch
import pandas as pd
from transformers import BertTokenizer, BertForSequenceClassification
from torch.nn.functional import softmax

3. Load Pre-Trained Model for Toxicity Detection

In [3]:
MODEL_NAME = "unitary/toxic-bert"  # Pretrained model for toxicity detection
tokenizer = BertTokenizer.from_pretrained(MODEL_NAME)
model = BertForSequenceClassification.from_pretrained(MODEL_NAME)
model.eval()  # Set model to evaluation mode

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/174 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/811 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

4. Define Toxicity Labels


In [4]:
labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate", "safe"]

5. Function to Classify Toxicity

In [5]:
def classify_toxicity(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    scores = softmax(outputs.logits, dim=1).tolist()[0]

    # Assign labels based on scores
    results = {label: score for label, score in zip(labels, scores)}
    max_label = labels[scores.index(max(scores))]  # Label with highest probability
    return max_label, results

6. Test Sample Texts

In [6]:
test_cases = [
    "I hate you so much, you are the worst person ever!",  # Toxic
    "I completely disagree with your statement, but I respect your opinion.",  # Safe
    "You should leave this country, we don't want you here.",  # Identity hate
    "Go back to the kitchen and cook, that's all you're good for.",  # Sexist insult
    "This is a terrible idea, and I don't think it will work.",  # Non-toxic criticism
]

for text in test_cases:
    category, scores = classify_toxicity(text)
    print(f"Input: {text}")
    print(f"Predicted Category: {category}")
    print(f"Score Breakdown: {scores}\n")

Input: I hate you so much, you are the worst person ever!
Predicted Category: toxic
Score Breakdown: {'toxic': 0.9355260133743286, 'severe_toxic': 0.0001633448264328763, 'obscene': 0.003197695594280958, 'threat': 7.064151577651501e-05, 'insult': 0.060881469398736954, 'identity_hate': 0.0001608542661415413}

Input: I completely disagree with your statement, but I respect your opinion.
Predicted Category: toxic
Score Breakdown: {'toxic': 0.4048522114753723, 'severe_toxic': 0.10264410823583603, 'obscene': 0.1464998573064804, 'threat': 0.1056695282459259, 'insult': 0.12958216667175293, 'identity_hate': 0.11075212806463242}

Input: You should leave this country, we don't want you here.
Predicted Category: toxic
Score Breakdown: {'toxic': 0.8817827105522156, 'severe_toxic': 0.002840998349711299, 'obscene': 0.002504973439499736, 'threat': 0.09102032333612442, 'insult': 0.0074878716841340065, 'identity_hate': 0.014363187365233898}

Input: Go back to the kitchen and cook, that's all you're good

7. (Optional) Batch Processing for Large Datasets

In [7]:
def batch_classify(df, text_column="comment_text"):
    df["predicted_category"] = df[text_column].apply(lambda x: classify_toxicity(x)[0])
    return df

In [8]:
# Load a test dataset (Jigsaw Toxic Comment dataset)

df = pd.read_csv("jigsaw-toxic-comment-classification-challenge/test.csv").head(100)  # Use a sample
df = batch_classify(df)
df.to_csv("moderation_results.csv", index=False)  # Save results

FileNotFoundError: [Errno 2] No such file or directory: 'jigsaw-toxic-comment-classification-challenge/test.csv'