# 02. Rule-Based Content Moderation Filter

This notebook implements a comprehensive rule-based filtering system for content moderation that works with the processed data from the text preprocessing phase.

## 🎯 Features:
1. **Toxicity Detection**: Rule-based toxicity classification using multiple criteria
2. **Spam Detection**: Advanced spam pattern recognition with detailed explanations
3. **Safe Content Identification**: Clean content classification
4. **Comprehensive Testing**: Validation using processed_data.csv
5. **Decision Explanations**: Detailed reasoning for each classification decision

## 📊 Integration with Processed Data:
- Uses features from processed_data.csv: toxicity, overall_toxicity, spam_score, text characteristics
- Combines rule-based logic with pre-computed toxicity scores
- Provides detailed explanations for classification decisions


In [21]:
import re
import string
import pandas as pd
import numpy as np
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")


Libraries imported successfully!


## 1. Keyword Blacklist Detection


In [22]:
# Comprehensive offensive keywords list
OFFENSIVE_KEYWORDS = [
    # Profanity and slurs
    'fuck', 'shit', 'damn', 'bitch', 'asshole', 'bastard', 'cunt', 'piss',
    'crap', 'hell', 'dick', 'pussy', 'cock', 'whore', 'slut', 'fag',
    'nigger', 'nigga', 'chink', 'kike', 'spic', 'wetback', 'towelhead',
    'retard', 'retarded', 'moron', 'idiot', 'stupid', 'dumb', 'fool',
    'bullshit', 'crap', 'sucks', 'terrible', 'awful', 'horrible',
    
    # Extended racism and ethnic slurs
    'nigger', 'nigga', 'chink', 'kike', 'spic', 'wetback', 'towelhead',
    'gook', 'jap', 'chink', 'slant', 'yellow', 'redskin', 'savage',
    'coon', 'jungle bunny', 'porch monkey', 'tar baby', 'mammy',
    'house nigger', 'field nigger', 'oreo', 'coconut', 'banana',
    'wetback', 'beaner', 'spic', 'greaser', 'taco', 'burrito',
    'sand nigger', 'camel jockey', 'raghead', 'towelhead', 'haji',
    'chink', 'gook', 'slant eye', 'yellow', 'rice eater', 'dog eater',
    'kike', 'heeb', 'yid', 'christ killer', 'jew boy', 'jew girl',
    'polack', 'dago', 'wop', 'guinea', 'mick', 'paddy', 'taig',
    'gypsy', 'gyp', 'pikey', 'tinker', 'traveller',
    
    # Violence and threats
    'kill', 'murder', 'death', 'die', 'suicide', 'bomb', 'explode',
    'shoot', 'gun', 'weapon', 'knife', 'stab', 'beat', 'hit', 'punch',
    'threat', 'threaten', 'harm', 'hurt', 'destroy', 'annihilate',
    
    # Hate speech
    'hate', 'hater', 'racist', 'sexist', 'homophobic', 'transphobic',
    'nazi', 'hitler', 'white power', 'black power', 'supremacist',
    'genocide', 'ethnic cleansing', 'apartheid', 'segregation',
    
    # Sexual content
    'porn', 'pornography', 'sex', 'sexual', 'nude', 'naked', 'nude',
    'breast', 'boob', 'tit', 'vagina', 'penis', 'dick', 'pussy',
    'rape', 'raping', 'molest', 'pedophile', 'pedo', 'incest',
    
    # Drugs and alcohol
    'cocaine', 'heroin', 'marijuana', 'weed', 'cannabis', 'crack',
    'meth', 'methamphetamine', 'ecstasy', 'lsd', 'acid', 'mushroom',
    'alcohol', 'drunk', 'drinking', 'beer', 'wine', 'vodka',
    
    # Spam and scams
    'viagra', 'cialis', 'pharmacy', 'medication', 'prescription',
    'casino', 'gambling', 'bet', 'poker', 'slots', 'lottery',
    'free money', 'make money', 'earn money', 'quick cash',
    'work from home', 'get rich', 'guaranteed', 'no risk',
    'click here', 'buy now', 'special offer', 'limited time'
]

# Offensive abbreviations and internet slang
OFFENSIVE_ABBREVIATIONS = {
    'wtf': 'what the fuck',
    'stfu': 'shut the fuck up',
    'gtfo': 'get the fuck out',
    'fml': 'fuck my life',
    'omfg': 'oh my fucking god',
    'lmao': 'laughing my ass off',
    'rofl': 'rolling on floor laughing',
    'stfu': 'shut the fuck up',
    'gtfo': 'get the fuck out',
    'fml': 'fuck my life',
    'omfg': 'oh my fucking god',
    'lmao': 'laughing my ass off',
    'rofl': 'rolling on floor laughing',
    'af': 'as fuck',
    'btw': 'by the way',
    'fyi': 'for your information',
    'tbh': 'to be honest',
    'imo': 'in my opinion',
    'imho': 'in my humble opinion',
    'smh': 'shaking my head',
    'tbh': 'to be honest',
    'imo': 'in my opinion',
    'imho': 'in my humble opinion',
    'smh': 'shaking my head',
    'fyi': 'for your information',
    'btw': 'by the way',
    'af': 'as fuck'
}

def check_offensive_keywords(text):
    """
    Check if text contains any offensive keywords and abbreviations.
    
    Args:
        text (str): Input text to check
        
    Returns:
        dict: Contains 'is_offensive', 'offensive_words', 'offense_count', and 'offense_score'
    """
    import re  # Import re at the beginning of the function
    
    if pd.isna(text) or not isinstance(text, str):
        return {
            'is_offensive': False,
            'offensive_words': [],
            'offense_count': 0,
            'offense_score': 0,
            'found_abbreviations': []
        }
    
    text_lower = text.lower()
    found_words = []
    found_abbreviations = []
    
    # Check for exact matches in offensive keywords
    for word in OFFENSIVE_KEYWORDS:
        if word in text_lower:
            found_words.append(word)
    
    # Check for partial matches (word boundaries)
    for word in OFFENSIVE_KEYWORDS:
        pattern = r'\\b' + re.escape(word) + r'\\b'
        if re.search(pattern, text_lower):
            if word not in found_words:
                found_words.append(word)
    
    # Check for offensive abbreviations with sentence context
    # Split by multiple sentence endings
    sentences = re.split(r'[.!?]+', text)
    for abbrev, full_form in OFFENSIVE_ABBREVIATIONS.items():
        if abbrev in text_lower:
            # Find the sentence containing the abbreviation
            containing_sentence = ""
            for sentence in sentences:
                if abbrev.lower() in sentence.lower():
                    containing_sentence = sentence.strip()
                    break
            
            if containing_sentence:
                found_abbreviations.append(f"{abbrev} ({full_form}) in: '{containing_sentence}'")
            else:
                found_abbreviations.append(f"{abbrev} ({full_form})")
            found_words.append(abbrev)
    
    # Calculate offense score based on severity
    offense_score = 0
    for word in found_words:
        if word in ['nigger', 'nigga', 'chink', 'kike', 'spic', 'wetback', 'towelhead', 'gook', 'jap', 'slant', 'yellow', 'redskin', 'savage', 'coon', 'jungle bunny', 'porch monkey', 'tar baby', 'mammy', 'house nigger', 'field nigger', 'oreo', 'coconut', 'banana', 'beaner', 'greaser', 'taco', 'burrito', 'sand nigger', 'camel jockey', 'raghead', 'haji', 'slant eye', 'rice eater', 'dog eater', 'heeb', 'yid', 'christ killer', 'jew boy', 'jew girl', 'polack', 'dago', 'wop', 'guinea', 'mick', 'paddy', 'taig', 'gypsy', 'gyp', 'pikey', 'tinker', 'traveller']:
            offense_score += 50  # High severity racial slurs
        elif word in ['fuck', 'shit', 'damn', 'bitch', 'asshole', 'bastard', 'cunt']:
            offense_score += 30  # Medium severity profanity
        elif word in ['kill', 'murder', 'death', 'suicide', 'bomb', 'explode']:
            offense_score += 40  # High severity violence
        elif word in ['hate', 'hater', 'racist', 'sexist', 'homophobic']:
            offense_score += 25  # Medium severity hate speech
        elif word in ['wtf', 'stfu', 'gtfo', 'fml', 'omfg']:
            offense_score += 20  # Offensive abbreviations
        else:
            offense_score += 10  # Low severity
    
    return {
        'is_offensive': len(found_words) > 0,
        'offensive_words': found_words,
        'offense_count': len(found_words),
        'offense_score': offense_score,
        'found_abbreviations': found_abbreviations
    }

print("Offensive keyword detection function created!")


Offensive keyword detection function created!


## 2. Spam Pattern Detection


In [23]:
def detect_spam_patterns(text):
    """
    Detect spam patterns in text using regex patterns.
    
    Args:
        text (str): Input text to check
        
    Returns:
        dict: Contains spam detection results and triggered patterns
    """
    if pd.isna(text) or not isinstance(text, str):
        return {
            'is_spam_pattern': False,
            'spam_patterns': [],
            'spam_score': 0,
            'details': {},
            'found_urls': [],
            'found_emails': [],
            'found_phones': []
        }
    
    text_lower = text.lower()
    spam_patterns = []
    spam_score = 0
    details = {}
    found_urls = []
    found_emails = []
    found_phones = []
    
    # URL patterns with detailed extraction
    url_patterns = [
        (r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', 'URL detected', 30),
        (r'www\\.[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}', 'WWW URL detected', 25),
        (r'[a-zA-Z0-9.-]+\\.(com|org|net|edu|gov|mil|int|co|uk|de|fr|jp|au|us|ca|mx|br|es|it|ru|cn|in|kr|nl|se|no|dk|fi|pl|tr|za|th|my|sg|hk|tw|nz|ph|id|vn)', 'Domain detected', 20)
    ]
    
    for pattern, description, score in url_patterns:
        matches = re.findall(pattern, text_lower)
        if matches:
            spam_patterns.append(description)
            spam_score += score
            details[description] = True
            found_urls.extend(matches)
    
    # Email patterns with extraction
    email_pattern = r'\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b'
    email_matches = re.findall(email_pattern, text)
    if email_matches:
        spam_patterns.append('Email address')
        spam_score += 20
        details['Email address'] = True
        found_emails.extend(email_matches)
    
    # Phone number patterns with extraction
    phone_patterns = [
        (r'\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b', 'Phone number (US format)', 25),
        (r'\\b\\d{10}\\b', 'Phone number (10 digits)', 20),
        (r'\\+\\d{1,3}\\s?\\d{1,14}', 'International phone number', 30)
    ]
    
    for pattern, description, score in phone_patterns:
        matches = re.findall(pattern, text)
        if matches:
            spam_patterns.append(description)
            spam_score += score
            details[description] = True
            found_phones.extend(matches)
    
    # Currency and money patterns
    currency_patterns = [
        (r'[\\$€£¥₹]\\s*\\d+', 'Currency symbol with number', 15),
        (r'\\d+\\s*[\\$€£¥₹]', 'Number with currency symbol', 15),
        (r'\\d{1,3}(,\\d{3})*\\s*[\\$€£¥₹]', 'Formatted currency', 20),
        (r'\\$\\d+', 'Dollar amount', 10)
    ]
    
    for pattern, description, score in currency_patterns:
        if re.search(pattern, text):
            spam_patterns.append(description)
            spam_score += score
            details[description] = True
    
    # Promotional keywords with specific extraction
    promo_keywords = [
        'buy now', 'click here', 'free money', 'make money', 'earn money',
        'work from home', 'get rich', 'quick cash', 'easy money',
        'guaranteed', 'no risk', 'limited time', 'act now', 'dont wait',
        'special offer', 'discount', 'sale', 'promotion', 'deal',
        'win', 'winner', 'prize', 'lottery', 'jackpot'
    ]
    
    found_promo_keywords = []
    for keyword in promo_keywords:
        if keyword in text_lower:
            found_promo_keywords.append(keyword)
            spam_score += 10
    
    if found_promo_keywords:
        spam_patterns.append(f'Promotional keywords ({len(found_promo_keywords)})')
        details['Promotional keywords'] = found_promo_keywords
    
    # Medical/pharmaceutical keywords
    medical_keywords = [
        'viagra', 'cialis', 'pharmacy', 'medication', 'prescription',
        'drug', 'pill', 'tablet', 'capsule', 'dosage'
    ]
    
    medical_count = 0
    for keyword in medical_keywords:
        if keyword in text_lower:
            medical_count += 1
            spam_score += 15
    
    if medical_count > 0:
        spam_patterns.append(f'Medical keywords ({medical_count})')
        details['Medical keywords'] = medical_count
    
    # Gambling keywords with specific extraction
    gambling_keywords = [
        'casino', 'gambling', 'bet', 'poker', 'slots', 'lottery',
        'jackpot', 'win', 'winner', 'prize', 'money back'
    ]
    
    found_gambling_keywords = []
    for keyword in gambling_keywords:
        if keyword in text_lower:
            found_gambling_keywords.append(keyword)
            spam_score += 12
    
    if found_gambling_keywords:
        spam_patterns.append(f'Gambling keywords ({len(found_gambling_keywords)})')
        details['Gambling keywords'] = found_gambling_keywords
    
    return {
        'is_spam_pattern': len(spam_patterns) > 0,
        'spam_patterns': spam_patterns,
        'spam_score': spam_score,
        'details': details,
        'found_urls': found_urls,
        'found_emails': found_emails,
        'found_phones': found_phones
    }

print("Spam pattern detection function created!")


Spam pattern detection function created!


## 3. Character Repetition and Capitalization Checks


In [24]:
def check_character_patterns(text):
    """
    Check for excessive capitalization, character repetition, and other suspicious patterns.
    
    Args:
        text (str): Input text to check
        
    Returns:
        dict: Contains character pattern analysis results
    """
    if pd.isna(text) or not isinstance(text, str):
        return {
            'is_suspicious_pattern': False,
            'suspicious_patterns': [],
            'pattern_score': 0,
            'details': {}
        }
    
    suspicious_patterns = []
    pattern_score = 0
    details = {}
    
    # Check excessive capitalization
    if len(text) > 0:
        caps_ratio = sum(1 for c in text if c.isupper()) / len(text)
        if caps_ratio > 0.7:
            suspicious_patterns.append('Excessive capitalization')
            pattern_score += 25
            details['Capitalization ratio'] = f"{caps_ratio:.2f}"
        elif caps_ratio > 0.5:
            suspicious_patterns.append('High capitalization')
            pattern_score += 15
            details['Capitalization ratio'] = f"{caps_ratio:.2f}"
    
    # Check repeated characters
    repeated_chars = re.findall(r'(.)\\1{2,}', text)
    if repeated_chars:
        suspicious_patterns.append('Repeated characters')
        pattern_score += 20
        details['Repeated characters'] = repeated_chars[:5]  # Show first 5
    
    # Check excessive punctuation
    if len(text) > 0:
        punct_count = sum(1 for c in text if c in string.punctuation)
        punct_ratio = punct_count / len(text)
        if punct_ratio > 0.3:
            suspicious_patterns.append('Excessive punctuation')
            pattern_score += 20
            details['Punctuation ratio'] = f"{punct_ratio:.2f}"
        elif punct_ratio > 0.2:
            suspicious_patterns.append('High punctuation')
            pattern_score += 10
            details['Punctuation ratio'] = f"{punct_ratio:.2f}"
    
    # Check for excessive exclamation marks
    exclamation_count = text.count('!')
    if exclamation_count > 5:
        suspicious_patterns.append('Excessive exclamation marks')
        pattern_score += 15
        details['Exclamation count'] = exclamation_count
    elif exclamation_count > 3:
        suspicious_patterns.append('High exclamation marks')
        pattern_score += 8
        details['Exclamation count'] = exclamation_count
    
    # Check for excessive question marks
    question_count = text.count('?')
    if question_count > 5:
        suspicious_patterns.append('Excessive question marks')
        pattern_score += 15
        details['Question count'] = question_count
    elif question_count > 3:
        suspicious_patterns.append('High question marks')
        pattern_score += 8
        details['Question count'] = question_count
    
    # Check for excessive numbers
    if len(text) > 0:
        digit_count = sum(1 for c in text if c.isdigit())
        digit_ratio = digit_count / len(text)
        if digit_ratio > 0.3:
            suspicious_patterns.append('Excessive numbers')
            pattern_score += 15
            details['Digit ratio'] = f"{digit_ratio:.2f}"
        elif digit_ratio > 0.2:
            suspicious_patterns.append('High numbers')
            pattern_score += 8
            details['Digit ratio'] = f"{digit_ratio:.2f}"
    
    # Check for excessive special characters
    if len(text) > 0:
        special_count = sum(1 for c in text if c in string.punctuation)
        special_ratio = special_count / len(text)
        if special_ratio > 0.4:
            suspicious_patterns.append('Excessive special characters')
            pattern_score += 20
            details['Special char ratio'] = f"{special_ratio:.2f}"
        elif special_ratio > 0.3:
            suspicious_patterns.append('High special characters')
            pattern_score += 10
            details['Special char ratio'] = f"{special_ratio:.2f}"
    
    # Check for very short or very long text
    text_length = len(text)
    if text_length < 10:
        suspicious_patterns.append('Very short text')
        pattern_score += 10
        details['Text length'] = text_length
    elif text_length > 1000:
        suspicious_patterns.append('Very long text')
        pattern_score += 5
        details['Text length'] = text_length
    
    # Check for all caps words
    words = text.split()
    all_caps_words = [word for word in words if word.isupper() and len(word) > 2]
    if len(all_caps_words) > 3:
        suspicious_patterns.append('Multiple all-caps words')
        pattern_score += 15
        details['All-caps words'] = all_caps_words[:5]  # Show first 5
    elif len(all_caps_words) > 1:
        suspicious_patterns.append('Some all-caps words')
        pattern_score += 8
        details['All-caps words'] = all_caps_words[:3]  # Show first 3
    
    return {
        'is_suspicious_pattern': len(suspicious_patterns) > 0,
        'suspicious_patterns': suspicious_patterns,
        'pattern_score': pattern_score,
        'details': details
    }

print("Character pattern detection function created!")


Character pattern detection function created!


## 4. Master Function - Apply All Rules


In [25]:
def apply_rules(text):
    """
    Master function that applies all rule-based checks to a text.
    
    Args:
        text (str): Input text to analyze
        
    Returns:
        dict: Comprehensive analysis results from all rule-based checks
    """
    if pd.isna(text) or not isinstance(text, str):
        return {
            'is_offensive': False,
            'is_spam_pattern': False,
            'is_suspicious_pattern': False,
            'risk_score_modifier': 0,
            'offensive_details': {},
            'spam_details': {},
            'pattern_details': {},
            'summary': 'No analysis performed - invalid input'
        }
    
    # Apply all rule-based checks
    offensive_result = check_offensive_keywords(text)
    spam_result = detect_spam_patterns(text)
    pattern_result = check_character_patterns(text)
    
    # Calculate overall risk score modifier
    risk_score_modifier = (
        offensive_result['offense_score'] + 
        spam_result['spam_score'] + 
        pattern_result['pattern_score']
    )
    
    # Determine if any rules were triggered
    any_rules_triggered = (
        offensive_result['is_offensive'] or 
        spam_result['is_spam_pattern'] or 
        pattern_result['is_suspicious_pattern']
    )
    
    # Create summary
    summary_parts = []
    if offensive_result['is_offensive']:
        summary_parts.append(f"Offensive content detected ({offensive_result['offense_count']} words)")
    if spam_result['is_spam_pattern']:
        summary_parts.append(f"Spam patterns detected ({len(spam_result['spam_patterns'])} patterns)")
    if pattern_result['is_suspicious_pattern']:
        summary_parts.append(f"Suspicious patterns detected ({len(pattern_result['suspicious_patterns'])} patterns)")
    
    if not summary_parts:
        summary = "No rule violations detected"
    else:
        summary = "; ".join(summary_parts)
    
    return {
        'is_offensive': offensive_result['is_offensive'],
        'is_spam_pattern': spam_result['is_spam_pattern'],
        'is_suspicious_pattern': pattern_result['is_suspicious_pattern'],
        'risk_score_modifier': risk_score_modifier,
        'offensive_details': {
            'offensive_words': offensive_result['offensive_words'],
            'offense_count': offensive_result['offense_count'],
            'offense_score': offensive_result['offense_score'],
            'found_abbreviations': offensive_result['found_abbreviations']
        },
        'spam_details': {
            'spam_patterns': spam_result['spam_patterns'],
            'spam_score': spam_result['spam_score'],
            'details': spam_result['details'],
            'found_urls': spam_result['found_urls'],
            'found_emails': spam_result['found_emails'],
            'found_phones': spam_result['found_phones']
        },
        'pattern_details': {
            'suspicious_patterns': pattern_result['suspicious_patterns'],
            'pattern_score': pattern_result['pattern_score'],
            'details': pattern_result['details']
        },
        'summary': summary
    }

print("Master rule application function created!")


Master rule application function created!


## 5. Load and Test with Processed Data

This section loads the processed data from the text preprocessing phase and tests our rule-based system with real data.


In [26]:
# Load the processed data from text preprocessing
print("Loading processed data from text preprocessing phase...")
try:
    df_processed = pd.read_csv('processed_data.csv')
    print(f"✅ Processed data loaded successfully! Shape: {df_processed.shape}")
    print(f"📊 Available columns: {list(df_processed.columns)}")
    
    # Display basic statistics
    print(f"\n📈 Data Statistics:")
    print(f"Total samples: {len(df_processed):,}")
    print(f"Toxicity range: {df_processed['toxicity'].min():.4f} - {df_processed['toxicity'].max():.4f}")
    print(f"Overall toxicity range: {df_processed['overall_toxicity'].min():.4f} - {df_processed['overall_toxicity'].max():.4f}")
    print(f"Spam score range: {df_processed['spam_score'].min()} - {df_processed['spam_score'].max()}")
    
except FileNotFoundError:
    print("❌ processed_data.csv not found. Please run the text preprocessing notebook first.")
    df_processed = None
except Exception as e:
    print(f"❌ Error loading processed data: {e}")
    df_processed = None


Loading processed data from text preprocessing phase...


✅ Processed data loaded successfully! Shape: (1995662, 17)
📊 Available columns: ['id', 'comment_text', 'processed_text', 'toxicity', 'overall_toxicity', 'spam_score', 'text_length', 'word_count', 'sentence_count', 'avg_word_length', 'capitalization_ratio', 'hashtag_count', 'mention_count', 'exclamation_count', 'question_count', 'digit_count', 'special_char_count']

📈 Data Statistics:
Total samples: 1,995,662
Toxicity range: 0.0000 - 1.0000
Overall toxicity range: 0.0000 - 0.8319
Spam score range: 0 - 105


In [27]:
def classify_content_with_explanation(row):
    """
    Comprehensive content classification using rule-based logic with detailed explanations.
    
    Args:
        row: Pandas row containing processed data features
        
    Returns:
        dict: Classification result with detailed explanation
    """
    # Extract features
    text = row['comment_text']
    toxicity = row['toxicity']
    overall_toxicity = row['overall_toxicity']
    spam_score = row['spam_score']
    
    # Apply rule-based checks
    rule_result = apply_rules(text)
    
    # Classification logic with explanations
    classification = "safe"
    confidence = 0.0
    explanation_parts = []
    risk_factors = []
    
    # 1. Spam Detection (Highest Priority)
    if spam_score >= 60 or rule_result['spam_details']['spam_score'] >= 60:
        classification = "spam"
        confidence = 0.9
        explanation_parts.append("High spam score detected")
        risk_factors.append(f"Spam score: {spam_score}")
        if rule_result['spam_details']['spam_patterns']:
            risk_factors.append(f"Spam patterns: {', '.join(rule_result['spam_details']['spam_patterns'])}")
        if rule_result['spam_details']['found_urls']:
            risk_factors.append(f"URLs found: {', '.join(rule_result['spam_details']['found_urls'][:3])}")
        if rule_result['spam_details']['found_emails']:
            risk_factors.append(f"Emails found: {', '.join(rule_result['spam_details']['found_emails'][:2])}")
        if rule_result['spam_details']['found_phones']:
            risk_factors.append(f"Phone numbers found: {', '.join(rule_result['spam_details']['found_phones'][:2])}")
        if 'Promotional keywords' in rule_result['spam_details']['details']:
            promo_keywords = rule_result['spam_details']['details']['Promotional keywords']
            if isinstance(promo_keywords, list):
                risk_factors.append(f"Promotional keywords found: {', '.join(promo_keywords[:3])}")
        if 'Gambling keywords' in rule_result['spam_details']['details']:
            gambling_keywords = rule_result['spam_details']['details']['Gambling keywords']
            if isinstance(gambling_keywords, list):
                risk_factors.append(f"Gambling keywords found: {', '.join(gambling_keywords[:3])}")
    
    # 2. Toxicity Detection
    elif toxicity >= 0.5 or overall_toxicity >= 0.4:
        classification = "toxic"
        confidence = 0.85
        explanation_parts.append("High toxicity score detected")
        risk_factors.append(f"Toxicity score: {toxicity:.3f}")
        risk_factors.append(f"Overall toxicity: {overall_toxicity:.3f}")
        if rule_result['offensive_details']['offensive_words']:
            risk_factors.append(f"Offensive words: {', '.join(rule_result['offensive_details']['offensive_words'][:3])}")
        if rule_result['offensive_details']['found_abbreviations']:
            risk_factors.append(f"Offensive abbreviations: {', '.join(rule_result['offensive_details']['found_abbreviations'][:2])}")
    
    # 3. Moderate Toxicity
    elif toxicity >= 0.4 or overall_toxicity >= 0.3:
        classification = "toxic"
        confidence = 0.7
        explanation_parts.append("Moderate toxicity detected")
        risk_factors.append(f"Toxicity score: {toxicity:.3f}")
        risk_factors.append(f"Overall toxicity: {overall_toxicity:.3f}")
        if rule_result['offensive_details']['offensive_words']:
            risk_factors.append(f"Offensive words: {', '.join(rule_result['offensive_details']['offensive_words'][:2])}")
        if rule_result['offensive_details']['found_abbreviations']:
            risk_factors.append(f"Offensive abbreviations: {', '.join(rule_result['offensive_details']['found_abbreviations'][:2])}")
    
    # 4. Low Toxicity but Suspicious Patterns OR Offensive Content Detected
    elif toxicity >= 0.2 or rule_result['risk_score_modifier'] >= 50 or rule_result['offensive_details']['offensive_words'] or rule_result['offensive_details']['found_abbreviations']:
        classification = "toxic"
        confidence = 0.6
        explanation_parts.append("Low toxicity but offensive content or suspicious patterns detected")
        risk_factors.append(f"Toxicity score: {toxicity:.3f}")
        
        # Always show offensive words if found
        if rule_result['offensive_details']['offensive_words']:
            risk_factors.append(f"Offensive words: {', '.join(rule_result['offensive_details']['offensive_words'][:3])}")
        if rule_result['offensive_details']['found_abbreviations']:
            risk_factors.append(f"Offensive abbreviations: {', '.join(rule_result['offensive_details']['found_abbreviations'][:2])}")
        
        # Show suspicious patterns if found
        if rule_result['pattern_details']['suspicious_patterns']:
            risk_factors.append(f"Suspicious patterns: {', '.join(rule_result['pattern_details']['suspicious_patterns'][:2])}")
        # Add specific pattern details
        if 'Capitalization ratio' in rule_result['pattern_details']['details']:
            risk_factors.append(f"Capitalization ratio: {rule_result['pattern_details']['details']['Capitalization ratio']}")
        if 'Repeated characters' in rule_result['pattern_details']['details']:
            risk_factors.append(f"Repeated characters: {', '.join(rule_result['pattern_details']['details']['Repeated characters'][:3])}")
        if 'Punctuation ratio' in rule_result['pattern_details']['details']:
            risk_factors.append(f"Punctuation ratio: {rule_result['pattern_details']['details']['Punctuation ratio']}")
        if 'Exclamation count' in rule_result['pattern_details']['details']:
            risk_factors.append(f"Exclamation count: {rule_result['pattern_details']['details']['Exclamation count']}")
        if 'Question count' in rule_result['pattern_details']['details']:
            risk_factors.append(f"Question count: {rule_result['pattern_details']['details']['Question count']}")
        if 'All-caps words' in rule_result['pattern_details']['details']:
            risk_factors.append(f"All-caps words: {', '.join(rule_result['pattern_details']['details']['All-caps words'][:3])}")
    
    # 5. Spam-like but not clearly spam
    elif spam_score >= 30 or rule_result['spam_details']['spam_score'] >= 30:
        classification = "spam"
        confidence = 0.6
        explanation_parts.append("Moderate spam indicators detected")
        risk_factors.append(f"Spam score: {spam_score}")
        if rule_result['spam_details']['spam_patterns']:
            risk_factors.append(f"Spam patterns: {', '.join(rule_result['spam_details']['spam_patterns'][:2])}")
        if rule_result['spam_details']['found_urls']:
            risk_factors.append(f"URLs found: {', '.join(rule_result['spam_details']['found_urls'][:2])}")
        if rule_result['spam_details']['found_emails']:
            risk_factors.append(f"Emails found: {', '.join(rule_result['spam_details']['found_emails'][:1])}")
        if rule_result['spam_details']['found_phones']:
            risk_factors.append(f"Phone numbers found: {', '.join(rule_result['spam_details']['found_phones'][:1])}")
        if 'Promotional keywords' in rule_result['spam_details']['details']:
            promo_keywords = rule_result['spam_details']['details']['Promotional keywords']
            if isinstance(promo_keywords, list):
                risk_factors.append(f"Promotional keywords found: {', '.join(promo_keywords[:2])}")
        if 'Gambling keywords' in rule_result['spam_details']['details']:
            gambling_keywords = rule_result['spam_details']['details']['Gambling keywords']
            if isinstance(gambling_keywords, list):
                risk_factors.append(f"Gambling keywords found: {', '.join(gambling_keywords[:2])}")
    
    # 6. Offensive Content Detected (regardless of toxicity level)
    elif rule_result['offensive_details']['offensive_words'] or rule_result['offensive_details']['found_abbreviations']:
        classification = "toxic"
        confidence = 0.7
        explanation_parts.append("Offensive content detected")
        risk_factors.append(f"Toxicity score: {toxicity:.3f}")
        if rule_result['offensive_details']['offensive_words']:
            risk_factors.append(f"Offensive words: {', '.join(rule_result['offensive_details']['offensive_words'][:3])}")
        if rule_result['offensive_details']['found_abbreviations']:
            risk_factors.append(f"Offensive abbreviations: {', '.join(rule_result['offensive_details']['found_abbreviations'][:2])}")
    
    # 7. Safe Content
    else:
        classification = "safe"
        confidence = 0.8
        explanation_parts.append("No significant risk factors detected")
        risk_factors.append(f"Low toxicity: {toxicity:.3f}")
        risk_factors.append(f"Low spam score: {spam_score}")
    
    # Create detailed explanation
    explanation = f"Classified as '{classification}' because: {'; '.join(explanation_parts)}. "
    explanation += f"Risk factors: {'; '.join(risk_factors)}."
    
    return {
        'classification': classification,
        'confidence': confidence,
        'explanation': explanation,
        'toxicity_score': toxicity,
        'overall_toxicity': overall_toxicity,
        'spam_score': spam_score,
        'rule_based_score': rule_result['risk_score_modifier'],
        'risk_factors': risk_factors,
        'rule_details': rule_result
    }

print("✅ Comprehensive classification function with explanations created!")


✅ Comprehensive classification function with explanations created!


In [28]:
# Test the rule-based system with processed data
if df_processed is not None:
    print("🧪 Testing Rule-Based System with Processed Data")
    print("=" * 80)
    
    # Test with different types of content
    test_cases = []
    
    # 1. High toxicity cases
    high_toxicity = df_processed[df_processed['toxicity'] >= 0.7].sample(n=min(3, len(df_processed[df_processed['toxicity'] >= 0.7])), random_state=42)
    test_cases.extend([(row, "High Toxicity") for _, row in high_toxicity.iterrows()])
    
    # 2. High spam cases
    high_spam = df_processed[df_processed['spam_score'] >= 50].sample(n=min(3, len(df_processed[df_processed['spam_score'] >= 50])), random_state=42)
    test_cases.extend([(row, "High Spam") for _, row in high_spam.iterrows()])
    
    # 3. Safe cases
    safe_cases = df_processed[(df_processed['toxicity'] < 0.2) & (df_processed['spam_score'] < 20)].sample(n=min(3, len(df_processed[(df_processed['toxicity'] < 0.2) & (df_processed['spam_score'] < 20)])), random_state=42)
    test_cases.extend([(row, "Safe Content") for _, row in safe_cases.iterrows()])
    
    # 4. Borderline cases
    borderline = df_processed[(df_processed['toxicity'] >= 0.2) & (df_processed['toxicity'] < 0.5) & (df_processed['spam_score'] < 30)].sample(n=min(3, len(df_processed[(df_processed['toxicity'] >= 0.2) & (df_processed['toxicity'] < 0.5) & (df_processed['spam_score'] < 30)])), random_state=42)
    test_cases.extend([(row, "Borderline") for _, row in borderline.iterrows()])
    
    # 5. Test cases with offensive abbreviations
    abbreviation_test_cases = [
        "WTF is wrong with you? This is completely stupid!",
        "STFU and listen to what I'm saying, you idiot!",
        "GTFO of here with your nonsense, this is bullshit!",
        "FML, this is the worst day ever and you're making it worse!",
        "OMFG, you're such a moron, I can't believe this!",
        "This is so annoying AF, why are you being such a jerk?",
        "LMAO at your stupidity, this is hilarious but also sad!",
        "ROFL, you're such a dumbass, this is ridiculous!"
    ]
    
    # Create fake rows for abbreviation test cases
    for i, text in enumerate(abbreviation_test_cases):
        fake_row = pd.Series({
            'comment_text': text,
            'toxicity': 0.3 + (i * 0.1),  # Varying toxicity scores
            'overall_toxicity': 0.2 + (i * 0.05),
            'spam_score': 0
        })
        test_cases.append((fake_row, "Abbreviation Test"))
    
    print(f"Testing with {len(test_cases)} diverse test cases:")
    print("=" * 80)
    
    for i, (row, category) in enumerate(test_cases, 1):
        print(f"\n🔍 Test Case {i} - {category}")
        print("-" * 60)
        
        # Get classification result
        result = classify_content_with_explanation(row)
        
        # Display text (truncated)
        text = row['comment_text']
        print(f"📝 Text: {text[:150]}{'...' if len(text) > 150 else ''}")
        
        # Display scores
        print(f"📊 Scores:")
        print(f"   Toxicity: {result['toxicity_score']:.3f}")
        print(f"   Overall Toxicity: {result['overall_toxicity']:.3f}")
        print(f"   Spam Score: {result['spam_score']}")
        print(f"   Rule-based Score: {result['rule_based_score']}")
        
        # Display classification
        print(f"🎯 Classification: {result['classification'].upper()}")
        print(f"🎯 Confidence: {result['confidence']:.2f}")
        
        # Display explanation
        print(f"💡 Explanation: {result['explanation']}")
        
        # Display detailed risk factors
        if result['risk_factors']:
            print(f"⚠️  Risk Factors:")
            for factor in result['risk_factors']:
                print(f"   • {factor}")
        
        print("-" * 60)
    
    print(f"\n✅ Testing completed with {len(test_cases)} test cases!")
    
else:
    print("❌ Cannot test - processed data not available")


🧪 Testing Rule-Based System with Processed Data
Testing with 20 diverse test cases:

🔍 Test Case 1 - High Toxicity
------------------------------------------------------------
📝 Text: Time to get off the dis-information highway, period.  The main stream media salivates at every stupid tweet which, undermines their very existence and...
📊 Scores:
   Toxicity: 0.700
   Overall Toxicity: 0.308
   Spam Score: 0
   Rule-based Score: 30
🎯 Classification: TOXIC
🎯 Confidence: 0.85
💡 Explanation: Classified as 'toxic' because: High toxicity score detected. Risk factors: Toxicity score: 0.700; Overall toxicity: 0.308; Offensive words: stupid, threat, threaten.
⚠️  Risk Factors:
   • Toxicity score: 0.700
   • Overall toxicity: 0.308
   • Offensive words: stupid, threat, threaten
------------------------------------------------------------

🔍 Test Case 2 - High Toxicity
------------------------------------------------------------
📝 Text: "Stupid is as Stupid does"----Forrest GumP!
📊 Scores:
   To

In [29]:
# Comprehensive Evaluation of Rule-Based System
if df_processed is not None:
    print("\n📊 Comprehensive Evaluation of Rule-Based System")
    print("=" * 80)
    
    # Sample a larger dataset for evaluation
    sample_size = min(1000, len(df_processed))
    evaluation_sample = df_processed.sample(n=sample_size, random_state=42)
    
    print(f"Evaluating on {sample_size:,} samples...")
    
    # Apply classification to all samples
    results = []
    for _, row in evaluation_sample.iterrows():
        result = classify_content_with_explanation(row)
        results.append(result)
    
    # Calculate statistics
    classifications = [r['classification'] for r in results]
    confidences = [r['confidence'] for r in results]
    
    # Count classifications
    safe_count = classifications.count('safe')
    toxic_count = classifications.count('toxic')
    spam_count = classifications.count('spam')
    
    print(f"\n📈 Classification Results:")
    print(f"   Safe: {safe_count:,} ({safe_count/sample_size*100:.1f}%)")
    print(f"   Toxic: {toxic_count:,} ({toxic_count/sample_size*100:.1f}%)")
    print(f"   Spam: {spam_count:,} ({spam_count/sample_size*100:.1f}%)")
    
    print(f"\n🎯 Confidence Statistics:")
    print(f"   Average Confidence: {np.mean(confidences):.3f}")
    print(f"   Median Confidence: {np.median(confidences):.3f}")
    print(f"   Min Confidence: {np.min(confidences):.3f}")
    print(f"   Max Confidence: {np.max(confidences):.3f}")
    
    # Score distributions by classification
    print(f"\n📊 Score Distributions by Classification:")
    
    for classification in ['safe', 'toxic', 'spam']:
        class_results = [r for r in results if r['classification'] == classification]
        if class_results:
            toxicities = [r['toxicity_score'] for r in class_results]
            spam_scores = [r['spam_score'] for r in class_results]
            
            print(f"\n   {classification.upper()} ({len(class_results)} samples):")
            print(f"     Toxicity: {np.mean(toxicities):.3f} ± {np.std(toxicities):.3f}")
            print(f"     Spam Score: {np.mean(spam_scores):.1f} ± {np.std(spam_scores):.1f}")
    
    # Performance analysis
    print(f"\n⚡ Performance Analysis:")
    
    # High confidence predictions
    high_conf = [r for r in results if r['confidence'] >= 0.8]
    print(f"   High Confidence Predictions: {len(high_conf)} ({len(high_conf)/sample_size*100:.1f}%)")
    
    # Low confidence predictions (need review)
    low_conf = [r for r in results if r['confidence'] < 0.6]
    print(f"   Low Confidence Predictions: {len(low_conf)} ({len(low_conf)/sample_size*100:.1f}%)")
    
    # Show some low confidence cases for review
    if low_conf:
        print(f"\n⚠️  Low Confidence Cases (Need Review):")
        for i, result in enumerate(low_conf[:3], 1):
            print(f"   {i}. Classification: {result['classification']}, Confidence: {result['confidence']:.2f}")
            print(f"      Explanation: {result['explanation']}")
    
    print(f"\n✅ Evaluation completed!")
    
else:
    print("❌ Cannot evaluate - processed data not available")



📊 Comprehensive Evaluation of Rule-Based System
Evaluating on 1,000 samples...

📈 Classification Results:
   Safe: 467 (46.7%)
   Toxic: 518 (51.8%)
   Spam: 15 (1.5%)

🎯 Confidence Statistics:
   Average Confidence: 0.721
   Median Confidence: 0.800
   Min Confidence: 0.600
   Max Confidence: 0.850

📊 Score Distributions by Classification:

   SAFE (467 samples):
     Toxicity: 0.016 ± 0.049
     Spam Score: 1.2 ± 4.1

   TOXIC (518 samples):
     Toxicity: 0.203 ± 0.254
     Spam Score: 5.3 ± 9.5

   SPAM (15 samples):
     Toxicity: 0.033 ± 0.067
     Spam Score: 30.0 ± 0.0

⚡ Performance Analysis:
   High Confidence Predictions: 562 (56.2%)
   Low Confidence Predictions: 0 (0.0%)

✅ Evaluation completed!


In [None]:
# 6b. Simplified overrides: whole-word matching and 3-class decision
import re

# Override offensive/abbrev detection to avoid partial matches (e.g., 'af' in 'After')
def check_offensive_keywords(text):
    if pd.isna(text) or not isinstance(text, str):
        return {
            'is_offensive': False,
            'offensive_words': [],
            'offense_count': 0,
            'offense_score': 0,
            'found_abbreviations': []
        }

    text_lower = text.lower()
    found_words = []
    found_abbreviations = []

    # Whole-word for offensive keywords
    for word in OFFENSIVE_KEYWORDS:
        pattern = r"\b" + re.escape(word) + r"\b"
        if re.search(pattern, text_lower):
            found_words.append(word)

    # Whole-word for abbreviations + sentence extraction
    sentences = re.split(r"[.!?]+", text)
    for abbrev, full_form in OFFENSIVE_ABBREVIATIONS.items():
        abbr_pat = r"\b" + re.escape(abbrev.lower()) + r"\b"
        if re.search(abbr_pat, text_lower):
            containing_sentence = ""
            for sentence in sentences:
                if re.search(abbr_pat, sentence.lower()):
                    containing_sentence = sentence.strip()
                    break
            if containing_sentence:
                found_abbreviations.append(f"{abbrev} ({full_form}) in: '{containing_sentence}'")
            else:
                found_abbreviations.append(f"{abbrev} ({full_form})")
            found_words.append(abbrev)

    offense_score = 0
    for word in found_words:
        if word in ['nigger', 'nigga', 'chink', 'kike', 'spic', 'wetback', 'towelhead', 'gook', 'jap', 'slant', 'yellow', 'redskin', 'savage', 'coon', 'jungle bunny', 'porch monkey', 'tar baby', 'mammy', 'house nigger', 'field nigger', 'oreo', 'coconut', 'banana', 'beaner', 'greaser', 'taco', 'burrito', 'sand nigger', 'camel jockey', 'raghead', 'haji', 'slant eye', 'rice eater', 'dog eater', 'heeb', 'yid', 'christ killer', 'jew boy', 'jew girl', 'polack', 'dago', 'wop', 'guinea', 'mick', 'paddy', 'taig', 'gypsy', 'gyp', 'pikey', 'tinker', 'traveller']:
            offense_score += 50
        elif word in ['fuck', 'shit', 'damn', 'bitch', 'asshole', 'bastard', 'cunt']:
            offense_score += 30
        elif word in ['kill', 'murder', 'death', 'suicide', 'bomb', 'explode']:
            offense_score += 40
        elif word in ['hate', 'hater', 'racist', 'sexist', 'homophobic']:
            offense_score += 25
        elif word in ['wtf', 'stfu', 'gtfo', 'fml', 'omfg']:
            offense_score += 20
        else:
            offense_score += 10

    return {
        'is_offensive': len(found_words) > 0,
        'offensive_words': found_words,
        'offense_count': len(found_words),
        'offense_score': offense_score,
        'found_abbreviations': found_abbreviations
    }

TOXIC_RULE_THRESHOLD = 0.30
SPAM_HIGH_THRESHOLD = 60

def classify_content_with_explanation(row):
    text = row['comment_text']
    toxicity = row['toxicity']
    overall_toxicity = row.get('overall_toxicity', 0.0)
    spam_score = row.get('spam_score', 0)

    rule_result = apply_rules(text)

    if spam_score >= SPAM_HIGH_THRESHOLD or rule_result['spam_details']['spam_score'] >= SPAM_HIGH_THRESHOLD:
        classification = 'spam'
        confidence = 0.9
        explanation = 'Spam indicators exceeded threshold.'
        risk_factors = [
            f"Spam score: {spam_score}",
            f"Patterns: {', '.join(rule_result['spam_details']['spam_patterns'][:3])}" if rule_result['spam_details']['spam_patterns'] else 'No patterns listed'
        ]
        return {
            'classification': classification,
            'confidence': confidence,
            'explanation': explanation,
            'toxicity_score': toxicity,
            'overall_toxicity': overall_toxicity,
            'spam_score': spam_score,
            'rule_based_score': rule_result['risk_score_modifier'],
            'risk_factors': risk_factors,
            'rule_details': rule_result
        }

    offensive_hit = bool(rule_result['offensive_details']['offensive_words'] or rule_result['offensive_details']['found_abbreviations'])
    if (toxicity >= TOXIC_RULE_THRESHOLD) or (overall_toxicity >= TOXIC_RULE_THRESHOLD) or offensive_hit:
        classification = 'toxic'
        confidence = 0.8 if offensive_hit else 0.75
        bits = []
        if offensive_hit:
            if rule_result['offensive_details']['offensive_words']:
                bits.append(f"Offensive words: {', '.join(rule_result['offensive_details']['offensive_words'][:3])}")
            if rule_result['offensive_details']['found_abbreviations']:
                bits.append(f"Abbreviations: {', '.join(rule_result['offensive_details']['found_abbreviations'][:2])}")
        explanation = 'Toxic due to threshold or offensive content.' + (" " + " | ".join(bits) if bits else '')
        risk_factors = [
            f"Toxicity: {toxicity:.3f}",
            f"Overall: {overall_toxicity:.3f}",
            f"Offense score: {rule_result['offensive_details']['offense_score']}"
        ]
        return {
            'classification': classification,
            'confidence': confidence,
            'explanation': explanation,
            'toxicity_score': toxicity,
            'overall_toxicity': overall_toxicity,
            'spam_score': spam_score,
            'rule_based_score': rule_result['risk_score_modifier'],
            'risk_factors': risk_factors,
            'rule_details': rule_result
        }

    classification = 'safe'
    confidence = 0.85
    explanation = 'No spam indicators and toxicity below threshold with no offensive content.'
    risk_factors = [f"Toxicity: {toxicity:.3f}", f"Spam score: {spam_score}"]
    return {
        'classification': classification,
        'confidence': confidence,
        'explanation': explanation,
        'toxicity_score': toxicity,
        'overall_toxicity': overall_toxicity,
        'spam_score': spam_score,
        'rule_based_score': rule_result['risk_score_modifier'],
        'risk_factors': risk_factors,
        'rule_details': rule_result
    }


: 

## 6. Summary and Conclusions

### 🎯 **Enhanced Rule-Based Content Moderation System Summary**

This notebook has successfully implemented a comprehensive rule-based content moderation system that integrates with the processed data from the text preprocessing phase, with significant enhancements for detailed detection and explanations.

### **🔧 Key Components Implemented:**

1. **Advanced Toxicity Detection**
   - Multi-layered toxicity analysis using pre-computed scores
   - Integration with overall_toxicity composite scores
   - Rule-based pattern recognition for offensive content
   - **NEW**: 50+ racism-related terms and ethnic slurs
   - **NEW**: Offensive abbreviations dictionary (wtf, stfu, gtfo, fml, omfg, etc.)

2. **Comprehensive Spam Detection**
   - URL, email, and phone number pattern recognition
   - Promotional keyword detection
   - Character pattern analysis (caps, repetition, punctuation)
   - **NEW**: Specific URL extraction and display
   - **NEW**: Specific email address extraction and display
   - **NEW**: Specific phone number extraction and display

3. **Intelligent Classification Logic**
   - Priority-based classification (spam > toxicity > safe)
   - Confidence scoring for each decision
   - **ENHANCED**: Detailed explanations with specific findings
   - **NEW**: Shows exact offensive words found
   - **NEW**: Shows abbreviations with full meanings
   - **NEW**: Shows specific URLs, emails, and phone numbers

4. **Integration with Processed Data**
   - Uses toxicity, overall_toxicity, and spam_score from preprocessing
   - Combines rule-based logic with pre-computed features
   - Leverages text characteristics for enhanced detection

### **📊 Classification Thresholds:**

- **Spam**: spam_score ≥ 60 (high) or ≥ 30 (moderate)
- **Toxic**: toxicity ≥ 0.7 (high), ≥ 0.4 (moderate), or ≥ 0.2 (low with patterns)
- **Safe**: All other cases with low risk factors

### **💡 Enhanced Decision Explanations:**

The system now provides comprehensive explanations for each classification:
- **What** was detected (spam patterns, offensive words, suspicious behavior)
- **Why** it was classified (specific thresholds and risk factors)
- **How confident** the system is in the decision
- **Which rules** were triggered and their impact
- **NEW**: **Specific offensive words found** (e.g., "Offensive words: fuck, shit, nigger")
- **NEW**: **Abbreviations with meanings** (e.g., "Offensive abbreviations: wtf (what the fuck), stfu (shut the fuck up)")
- **NEW**: **Specific URLs found** (e.g., "URLs found: http://example.com, www.spam.com")
- **NEW**: **Specific emails found** (e.g., "Emails found: spam@example.com")
- **NEW**: **Specific phone numbers found** (e.g., "Phone numbers found: 555-123-4567")

### **🆕 Recent Enhancements:**

1. **Expanded Racism Detection**
   - Added 50+ racism-related terms covering all major ethnic groups
   - Includes African American, Asian, Hispanic, Jewish, European, and other ethnic slurs
   - Examples: nigger, chink, kike, spic, wetback, gook, jap, slant, yellow, redskin, savage, coon, jungle bunny, porch monkey, tar baby, mammy, house nigger, field nigger, oreo, coconut, banana, beaner, greaser, taco, burrito, sand nigger, camel jockey, raghead, haji, slant eye, rice eater, dog eater, heeb, yid, christ killer, jew boy, jew girl, polack, dago, wop, guinea, mick, paddy, taig, gypsy, gyp, pikey, tinker, traveller

2. **Enhanced Offensive Abbreviations Dictionary**
   - Internet slang detection with full form explanations and sentence context
   - Shows the complete sentence where abbreviations were found
   - Examples: wtf (what the fuck), stfu (shut the fuck up), gtfo (get the fuck out), fml (fuck my life), omfg (oh my fucking god), lmao (laughing my ass off), rofl (rolling on floor laughing), af (as fuck)
   - **NEW**: Sentence context display (e.g., "wtf (what the fuck) in: 'WTF is wrong with you'")

3. **Detailed Spam Pattern Extraction**
   - Shows specific URLs found in content
   - Shows specific email addresses found
   - Shows specific phone numbers found
   - **NEW**: Shows specific promotional keywords found
   - **NEW**: Shows specific gambling keywords found
   - Provides exact links, emails, phone numbers, and keywords in explanations

4. **Enhanced Classification Logic**
   - **NEW**: Always shows offensive content regardless of toxicity level
   - **NEW**: Detects offensive words and abbreviations even with low toxicity scores
   - **NEW**: Added "Offensive Content Detected" classification case
   - **NEW**: Improved explanation accuracy for low-toxicity offensive content

5. **Comprehensive Test Cases**
   - **NEW**: 8 dedicated abbreviation test cases with realistic scenarios
   - **NEW**: Tests all offensive abbreviations with sentence context
   - **NEW**: Validates detection of offensive words like "stupid", "bullshit", "terrible"
   - **NEW**: Ensures all offensive content is properly displayed in explanations

### **🚀 Benefits:**

1. **Transparency**: Every decision is explainable with specific reasoning and exact findings
2. **Flexibility**: Easy to adjust thresholds and add new rules
3. **Integration**: Works seamlessly with preprocessed data
4. **Performance**: Fast rule-based processing with detailed analysis
5. **Reliability**: Consistent results with confidence scoring
6. **NEW**: **Comprehensive Coverage**: Detects more types of offensive content including racism and abbreviations
7. **NEW**: **Detailed Explanations**: Shows exactly what was found (words, URLs, emails, phone numbers)
8. **NEW**: **Educational Value**: Shows abbreviations with their full meanings and sentence context
9. **NEW**: **Complete Detection**: Always shows offensive content regardless of toxicity level
10. **NEW**: **Contextual Information**: Shows the full sentence where abbreviations were found
11. **NEW**: **Specific Keywords**: Shows exact promotional and gambling keywords detected
12. **NEW**: **Comprehensive Testing**: Validates all features with dedicated test cases

### **📈 Performance Characteristics:**

- **High Confidence**: Most decisions have confidence ≥ 0.8
- **Comprehensive Coverage**: Handles toxic, spam, and safe content
- **Detailed Analysis**: Provides specific risk factors and explanations
- **Scalable**: Can process large datasets efficiently
- **NEW**: **Enhanced Detection**: 86+ offensive keywords + 8 offensive abbreviations
- **NEW**: **Specific Findings**: Shows exact URLs, emails, phone numbers, and offensive words found
- **NEW**: **Contextual Detection**: Shows sentence context for abbreviations
- **NEW**: **Complete Coverage**: Detects offensive content regardless of toxicity level
- **NEW**: **Keyword Specificity**: Shows exact promotional and gambling keywords found
- **NEW**: **Comprehensive Testing**: 20+ test cases covering all scenarios

### **🎯 Example Enhanced Explanations:**

**Spam Detection Example:**
```
Classified as 'spam' because: High spam score detected. Risk factors: Spam score: 60; Spam patterns: URL detected, Promotional keywords (1), Gambling keywords (2); URLs found: http://example.com, www.spam.com; Emails found: spam@example.com; Phone numbers found: 555-123-4567; Promotional keywords found: click here; Gambling keywords found: win, winner.
```

**Offensive Content Detection Example:**
```
Classified as 'toxic' because: Low toxicity but offensive content or suspicious patterns detected. Risk factors: Toxicity score: 0.300; Offensive words: stupid; Offensive abbreviations: wtf (what the fuck) in: 'WTF is wrong with you'.
```

**Complete Detection Example:**
```
Classified as 'toxic' because: Offensive content detected. Risk factors: Toxicity score: 0.400; Offensive words: fuck, shit, stupid; Offensive abbreviations: wtf (what the fuck) in: 'WTF is wrong with you', stfu (shut the fuck up) in: 'STFU and listen'.
```

This enhanced rule-based system serves as a robust foundation for content moderation, providing clear, explainable decisions with specific details about what was found, making it easier to understand and review moderation decisions.
