# 04. Risk Assessment Engine

This notebook implements a comprehensive risk assessment engine that combines the trained Naive Bayes model with the rule-based filter for robust content moderation.

## 🎯 Features:
1. **Hybrid Classification**: Combines rule-based filter (higher weight) with Naive Bayes model (lower weight) for toxic/safe classification
2. **Spam Detection**: Uses rule-based filter only for spam detection
3. **Weighted Decision Making**: Rule-based filter gets 70% weight, Naive Bayes model gets 30% weight for toxic/safe decisions
4. **Detailed Explanations**: Explains the reasoning behind each decision with specific evidence
5. **Comprehensive Risk Assessment**: Provides confidence levels and risk factors

## 📊 Integration:
- **Naive Bayes Model**: Uses `naive_bayes_model.pkl`, `naive_bayes_vectorizer.pkl`, and `naive_bayes_label_encoder.pkl`
- **Rule-Based Filter**: Uses functions from `02_Rule_Based_Filter.ipynb`
- **Weighted Classification**: 70% rule-based + 30% ML model for toxic/safe
- **Spam Detection**: 100% rule-based filter for spam classification


In [35]:
import pandas as pd
import numpy as np
import joblib
import re
import string
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

print("📚 Libraries imported successfully!")


📚 Libraries imported successfully!


## 1. Load Naive Bayes Model and Rule-Based Functions


In [36]:
# Load Naive Bayes model and components
print("🔄 Loading Naive Bayes model and components...")
try:
    naive_bayes_model = joblib.load('naive_bayes_model.pkl')
    vectorizer = joblib.load('naive_bayes_vectorizer.pkl')
    label_encoder = joblib.load('naive_bayes_label_encoder.pkl')
    numerical_features = joblib.load('naive_bayes_numerical_features.pkl')
    toxicity_threshold = joblib.load('naive_bayes_toxicity_threshold.pkl')
    print("✅ Naive Bayes model and components loaded successfully!")
    print(f"   - Model type: {type(naive_bayes_model)}")
    print(f"   - Model classes: {naive_bayes_model.classes_}")
    print(f"   - Vectorizer vocabulary size: {len(vectorizer.vocabulary_)}")
    print(f"   - Label encoder classes: {label_encoder.classes_}")
    print(f"   - Numerical features: {len(numerical_features)}")
    print(f"   - Toxicity threshold: {toxicity_threshold}")
except FileNotFoundError as e:
    print(f"❌ Error loading models: {e}")
    print("Please ensure these files exist:")
    print("   - naive_bayes_model.pkl")
    print("   - naive_bayes_vectorizer.pkl") 
    print("   - naive_bayes_label_encoder.pkl")
    print("   - naive_bayes_numerical_features.pkl")
    print("   - naive_bayes_toxicity_threshold.pkl")
    print("\n💡 Run the model saving cell in 03_Machine_Learning_Classifier.ipynb first!")
    raise

# Import rule-based functions from the second notebook
# (In a real deployment, these would be imported from a module)
print("\nLoading rule-based functions...")

# Copy the rule-based functions here for standalone use
def check_offensive_keywords(text):
    """Check if text contains offensive keywords."""
    if pd.isna(text) or not isinstance(text, str):
        return {'is_offensive': False, 'offensive_words': [], 'offense_count': 0, 'offense_score': 0}
    
    # Simplified offensive keywords list
    offensive_keywords = [
        'fuck', 'shit', 'damn', 'bitch', 'asshole', 'bastard', 'cunt', 'piss',
        'crap', 'hell', 'dick', 'pussy', 'cock', 'whore', 'slut', 'fag',
        'nigger', 'nigga', 'chink', 'kike', 'spic', 'wetback', 'towelhead',
        'retard', 'retarded', 'moron', 'idiot', 'stupid', 'dumb', 'fool',
        'kill', 'murder', 'death', 'die', 'suicide', 'bomb', 'explode',
        'hate', 'hater', 'racist', 'sexist', 'homophobic', 'transphobic'
    ]
    
    text_lower = text.lower()
    found_words = []
    
    for word in offensive_keywords:
        if word in text_lower:
            found_words.append(word)
    
    offense_score = len(found_words) * 10  # Simple scoring
    
    return {
        'is_offensive': len(found_words) > 0,
        'offensive_words': found_words,
        'offense_count': len(found_words),
        'offense_score': offense_score
    }

def detect_spam_patterns(text):
    """Detect spam patterns in text."""
    if pd.isna(text) or not isinstance(text, str):
        return {'is_spam_pattern': False, 'spam_patterns': [], 'spam_score': 0}
    
    text_lower = text.lower()
    spam_patterns = []
    spam_score = 0
    
    # URL patterns
    if re.search(r'http[s]?://', text_lower) or re.search(r'www\\.', text_lower):
        spam_patterns.append('URL detected')
        spam_score += 30
    
    # Email patterns
    if re.search(r'\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b', text):
        spam_patterns.append('Email address')
        spam_score += 20
    
    # Phone patterns
    if re.search(r'\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b', text):
        spam_patterns.append('Phone number')
        spam_score += 25
    
    # Spam keywords
    spam_keywords = [
        'buy now', 'click here', 'free money', 'make money', 'earn money',
        'work from home', 'get rich', 'quick cash', 'easy money',
        'guaranteed', 'no risk', 'limited time', 'act now',
        'special offer', 'discount', 'sale', 'promotion', 'deal',
        'viagra', 'cialis', 'pharmacy', 'medication', 'prescription',
        'casino', 'gambling', 'bet', 'poker', 'slots', 'lottery'
    ]
    
    keyword_count = sum(1 for keyword in spam_keywords if keyword in text_lower)
    if keyword_count > 0:
        spam_patterns.append(f'Spam keywords ({keyword_count})')
        spam_score += keyword_count * 10
    
    return {
        'is_spam_pattern': len(spam_patterns) > 0,
        'spam_patterns': spam_patterns,
        'spam_score': spam_score
    }

def check_character_patterns(text):
    """Check for suspicious character patterns."""
    if pd.isna(text) or not isinstance(text, str):
        return {'is_suspicious_pattern': False, 'suspicious_patterns': [], 'pattern_score': 0}
    
    suspicious_patterns = []
    pattern_score = 0
    
    # Excessive capitalization
    if len(text) > 0:
        caps_ratio = sum(1 for c in text if c.isupper()) / len(text)
        if caps_ratio > 0.7:
            suspicious_patterns.append('Excessive capitalization')
            pattern_score += 25
    
    # Repeated characters
    if re.search(r'(.)\\1{2,}', text):
        suspicious_patterns.append('Repeated characters')
        pattern_score += 20
    
    # Excessive punctuation
    if len(text) > 0:
        punct_ratio = sum(1 for c in text if c in string.punctuation) / len(text)
        if punct_ratio > 0.3:
            suspicious_patterns.append('Excessive punctuation')
            pattern_score += 20
    
    return {
        'is_suspicious_pattern': len(suspicious_patterns) > 0,
        'suspicious_patterns': suspicious_patterns,
        'pattern_score': pattern_score
    }

print("✓ Rule-based functions loaded!")


🔄 Loading Naive Bayes model and components...
✅ Naive Bayes model and components loaded successfully!
   - Model type: <class 'sklearn.naive_bayes.MultinomialNB'>
   - Model classes: [0 1]
   - Vectorizer vocabulary size: 10000
   - Label encoder classes: ['safe' 'toxic']
   - Numerical features: 7
   - Toxicity threshold: 0.25

Loading rule-based functions...
✓ Rule-based functions loaded!


## 2. Rule-Based Filter Functions


In [37]:
# Comprehensive offensive keywords list (from rule-based filter)
OFFENSIVE_KEYWORDS = [
    # Profanity and slurs
    'fuck', 'shit', 'damn', 'bitch', 'asshole', 'bastard', 'cunt', 'piss',
    'crap', 'hell', 'dick', 'pussy', 'cock', 'whore', 'slut', 'fag',
    'nigger', 'nigga', 'chink', 'kike', 'spic', 'wetback', 'towelhead',
    'retard', 'retarded', 'moron', 'idiot', 'stupid', 'dumb', 'fool',
    'bullshit', 'crap', 'sucks', 'terrible', 'awful', 'horrible',
    
    # Extended racism and ethnic slurs
    'gook', 'jap', 'slant', 'yellow', 'redskin', 'savage',
    'coon', 'jungle bunny', 'porch monkey', 'tar baby', 'mammy',
    'house nigger', 'field nigger', 'oreo', 'coconut', 'banana',
    'beaner', 'greaser', 'taco', 'burrito',
    'sand nigger', 'camel jockey', 'raghead', 'haji',
    'slant eye', 'rice eater', 'dog eater',
    'heeb', 'yid', 'christ killer', 'jew boy', 'jew girl',
    'polack', 'dago', 'wop', 'guinea', 'mick', 'paddy', 'taig',
    'gypsy', 'gyp', 'pikey', 'tinker', 'traveller',
    
    # Violence and threats
    'kill', 'murder', 'death', 'die', 'suicide', 'bomb', 'explode',
    'shoot', 'gun', 'weapon', 'knife', 'stab', 'beat', 'hit', 'punch',
    'threat', 'threaten', 'harm', 'hurt', 'destroy', 'annihilate',
    
    # Hate speech
    'hate', 'hater', 'racist', 'sexist', 'homophobic', 'transphobic',
    'nazi', 'hitler', 'white power', 'black power', 'supremacist',
    'genocide', 'ethnic cleansing', 'apartheid', 'segregation',
    
    # Sexual content
    'porn', 'pornography', 'sex', 'sexual', 'nude', 'naked',
    'breast', 'boob', 'tit', 'vagina', 'penis', 'dick', 'pussy',
    'rape', 'raping', 'molest', 'pedophile', 'pedo', 'incest',
    
    # Drugs and alcohol
    'cocaine', 'heroin', 'marijuana', 'weed', 'cannabis', 'crack',
    'meth', 'methamphetamine', 'ecstasy', 'lsd', 'acid', 'mushroom',
    'alcohol', 'drunk', 'drinking', 'beer', 'wine', 'vodka',
    
    # Spam and scams
    'viagra', 'cialis', 'pharmacy', 'medication', 'prescription',
    'casino', 'gambling', 'bet', 'poker', 'slots', 'lottery',
    'free money', 'make money', 'earn money', 'quick cash',
    'work from home', 'get rich', 'guaranteed', 'no risk',
    'click here', 'buy now', 'special offer', 'limited time'
]

# Offensive abbreviations and internet slang
OFFENSIVE_ABBREVIATIONS = {
    'wtf': 'what the fuck',
    'stfu': 'shut the fuck up',
    'gtfo': 'get the fuck out',
    'fml': 'fuck my life',
    'omfg': 'oh my fucking god',
    'lmao': 'laughing my ass off',
    'rofl': 'rolling on floor laughing',
    'af': 'as fuck',
    'btw': 'by the way',
    'fyi': 'for your information',
    'tbh': 'to be honest',
    'imo': 'in my opinion',
    'imho': 'in my humble opinion',
    'smh': 'shaking my head'
}

print("✅ Offensive keywords and abbreviations loaded!")


✅ Offensive keywords and abbreviations loaded!


In [None]:
def check_offensive_keywords_fixed(text):
    """
    FIXED VERSION: Check if text contains any offensive keywords and abbreviations.
    
    Args:
        text (str): Input text to check
        
    Returns:
        dict: Contains 'is_offensive', 'offensive_words', 'offense_count', and 'offense_score'
    """
    if pd.isna(text) or not isinstance(text, str):
        return {
            'is_offensive': False,
            'offensive_words': [],
            'offense_count': 0,
            'offense_score': 0,
            'found_abbreviations': []
        }
    
    text_lower = text.lower()
    found_words = []
    found_abbreviations = []
    
    # Check for exact matches in offensive keywords (word boundaries only) - FIXED PATTERNS
    for word in OFFENSIVE_KEYWORDS:
        pattern = r'\\b' + re.escape(word) + r'\\b'
        if re.search(pattern, text_lower):
            found_words.append(word)
    
    # Check for offensive abbreviations with sentence context
    sentences = re.split(r'[.!?]+', text)
    for abbrev, full_form in OFFENSIVE_ABBREVIATIONS.items():
        if abbrev in text_lower:
            # Find the sentence containing the abbreviation
            containing_sentence = ""
            for sentence in sentences:
                if abbrev.lower() in sentence.lower():
                    containing_sentence = sentence.strip()
                    break
            
            if containing_sentence:
                found_abbreviations.append(f"{abbrev} ({full_form}) in: '{containing_sentence}'")
            else:
                found_abbreviations.append(f"{abbrev} ({full_form})")
            found_words.append(abbrev)
    
    # Calculate offense score based on severity
    offense_score = 0
    for word in found_words:
        if word in ['nigger', 'nigga', 'chink', 'kike', 'spic', 'wetback', 'towelhead', 'gook', 'jap', 'slant', 'yellow', 'redskin', 'savage', 'coon', 'jungle bunny', 'porch monkey', 'tar baby', 'mammy', 'house nigger', 'field nigger', 'oreo', 'coconut', 'banana', 'beaner', 'greaser', 'taco', 'burrito', 'sand nigger', 'camel jockey', 'raghead', 'haji', 'slant eye', 'rice eater', 'dog eater', 'heeb', 'yid', 'christ killer', 'jew boy', 'jew girl', 'polack', 'dago', 'wop', 'guinea', 'mick', 'paddy', 'taig', 'gypsy', 'gyp', 'pikey', 'tinker', 'traveller']:
            offense_score += 50  # High severity racial slurs
        elif word in ['fuck', 'shit', 'damn', 'bitch', 'asshole', 'bastard', 'cunt']:
            offense_score += 30  # Medium severity profanity
        elif word in ['kill', 'murder', 'death', 'suicide', 'bomb', 'explode']:
            offense_score += 40  # High severity violence
        elif word in ['hate', 'hater', 'racist', 'sexist', 'homophobic']:
            offense_score += 25  # Medium severity hate speech
        elif word in ['wtf', 'stfu', 'gtfo', 'fml', 'omfg']:
            offense_score += 20  # Offensive abbreviations
        else:
            offense_score += 10  # Low severity
    
    return {
        'is_offensive': len(found_words) > 0,
        'offensive_words': found_words,
        'offense_count': len(found_words),
        'offense_score': offense_score,
        'found_abbreviations': found_abbreviations
    }

print("✅ FIXED Offensive keyword detection function created!")


In [38]:
def check_offensive_keywords(text):
    """
    Check if text contains any offensive keywords and abbreviations.
    
    Args:
        text (str): Input text to check
        
    Returns:
        dict: Contains 'is_offensive', 'offensive_words', 'offense_count', and 'offense_score'
    """
    if pd.isna(text) or not isinstance(text, str):
        return {
            'is_offensive': False,
            'offensive_words': [],
            'offense_count': 0,
            'offense_score': 0,
            'found_abbreviations': []
        }
    
    text_lower = text.lower()
    found_words = []
    found_abbreviations = []
    
    # Check for exact matches in offensive keywords (word boundaries only)
    for word in OFFENSIVE_KEYWORDS:
        pattern = r'\\b' + re.escape(word) + r'\\b'
        if re.search(pattern, text_lower):
            found_words.append(word)
    
    # Check for offensive abbreviations with sentence context
    sentences = re.split(r'[.!?]+', text)
    for abbrev, full_form in OFFENSIVE_ABBREVIATIONS.items():
        if abbrev in text_lower:
            # Find the sentence containing the abbreviation
            containing_sentence = ""
            for sentence in sentences:
                if abbrev.lower() in sentence.lower():
                    containing_sentence = sentence.strip()
                    break
            
            if containing_sentence:
                found_abbreviations.append(f"{abbrev} ({full_form}) in: '{containing_sentence}'")
            else:
                found_abbreviations.append(f"{abbrev} ({full_form})")
            found_words.append(abbrev)
    
    # Calculate offense score based on severity
    offense_score = 0
    for word in found_words:
        if word in ['nigger', 'nigga', 'chink', 'kike', 'spic', 'wetback', 'towelhead', 'gook', 'jap', 'slant', 'yellow', 'redskin', 'savage', 'coon', 'jungle bunny', 'porch monkey', 'tar baby', 'mammy', 'house nigger', 'field nigger', 'oreo', 'coconut', 'banana', 'beaner', 'greaser', 'taco', 'burrito', 'sand nigger', 'camel jockey', 'raghead', 'haji', 'slant eye', 'rice eater', 'dog eater', 'heeb', 'yid', 'christ killer', 'jew boy', 'jew girl', 'polack', 'dago', 'wop', 'guinea', 'mick', 'paddy', 'taig', 'gypsy', 'gyp', 'pikey', 'tinker', 'traveller']:
            offense_score += 50  # High severity racial slurs
        elif word in ['fuck', 'shit', 'damn', 'bitch', 'asshole', 'bastard', 'cunt']:
            offense_score += 30  # Medium severity profanity
        elif word in ['kill', 'murder', 'death', 'suicide', 'bomb', 'explode']:
            offense_score += 40  # High severity violence
        elif word in ['hate', 'hater', 'racist', 'sexist', 'homophobic']:
            offense_score += 25  # Medium severity hate speech
        elif word in ['wtf', 'stfu', 'gtfo', 'fml', 'omfg']:
            offense_score += 20  # Offensive abbreviations
        else:
            offense_score += 10  # Low severity
    
    return {
        'is_offensive': len(found_words) > 0,
        'offensive_words': found_words,
        'offense_count': len(found_words),
        'offense_score': offense_score,
        'found_abbreviations': found_abbreviations
    }

print("✅ Offensive keyword detection function created!")


✅ Offensive keyword detection function created!


In [39]:
def detect_spam_patterns(text):
    """
    Detect spam patterns in text using regex patterns.
    
    Args:
        text (str): Input text to check
        
    Returns:
        dict: Contains spam detection results and triggered patterns
    """
    if pd.isna(text) or not isinstance(text, str):
        return {
            'is_spam_pattern': False,
            'spam_patterns': [],
            'spam_score': 0,
            'details': {},
            'found_urls': [],
            'found_emails': [],
            'found_phones': []
        }
    
    text_lower = text.lower()
    spam_patterns = []
    spam_score = 0
    details = {}
    found_urls = []
    found_emails = []
    found_phones = []
    
    # URL patterns with detailed extraction
    url_patterns = [
        (r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', 'URL detected', 30),
        (r'www\\.[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}', 'WWW URL detected', 25),
        (r'[a-zA-Z0-9.-]+\\.(com|org|net|edu|gov|mil|int|co|uk|de|fr|jp|au|us|ca|mx|br|es|it|ru|cn|in|kr|nl|se|no|dk|fi|pl|tr|za|th|my|sg|hk|tw|nz|ph|id|vn)', 'Domain detected', 20)
    ]
    
    for pattern, description, score in url_patterns:
        matches = re.findall(pattern, text_lower)
        if matches:
            spam_patterns.append(description)
            spam_score += score
            details[description] = True
            found_urls.extend(matches)
    
    # Email patterns with extraction
    email_pattern = r'\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b'
    email_matches = re.findall(email_pattern, text)
    if email_matches:
        spam_patterns.append('Email address')
        spam_score += 20
        details['Email address'] = True
        found_emails.extend(email_matches)
    
    # Phone number patterns with extraction
    phone_patterns = [
        (r'\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b', 'Phone number (US format)', 25),
        (r'\\b\\d{3}\\s?\\d{3}\\s?\\d{4}\\b', 'Phone number (spaced format)', 25),
        (r'\\b\\d{3}-\\d{3}-\\d{4}\\b', 'Phone number (dash format)', 25),
        (r'\\b\\d{10}\\b', 'Phone number (10 digits)', 20),
        (r'\\+\\d{1,3}\\s?\\d{1,14}', 'International phone number', 30)
    ]
    
    for pattern, description, score in phone_patterns:
        matches = re.findall(pattern, text)
        if matches:
            spam_patterns.append(description)
            spam_score += score
            details[description] = True
            found_phones.extend(matches)
    
    # Currency and money patterns
    currency_patterns = [
        (r'[\\$€£¥₹]\\s*\\d+', 'Currency symbol with number', 15),
        (r'\\d+\\s*[\\$€£¥₹]', 'Number with currency symbol', 15),
        (r'\\d{1,3}(,\\d{3})*\\s*[\\$€£¥₹]', 'Formatted currency', 20),
        (r'\\$\\d+', 'Dollar amount', 10)
    ]
    
    for pattern, description, score in currency_patterns:
        if re.search(pattern, text):
            spam_patterns.append(description)
            spam_score += score
            details[description] = True
    
    # Promotional keywords with specific extraction
    promo_keywords = [
        'buy now', 'click here', 'free money', 'make money', 'earn money',
        'work from home', 'get rich', 'quick cash', 'easy money',
        'guaranteed', 'no risk', 'limited time', 'act now', 'dont wait',
        'special offer', 'discount', 'sale', 'promotion', 'deal',
        'win', 'winner', 'prize', 'lottery', 'jackpot',
        'free consultation', 'call now', 'contact us', 'get started'
    ]
    
    found_promo_keywords = []
    for keyword in promo_keywords:
        if keyword in text_lower:
            found_promo_keywords.append(keyword)
            spam_score += 10
    
    if found_promo_keywords:
        spam_patterns.append(f'Promotional keywords ({len(found_promo_keywords)})')
        details['Promotional keywords'] = found_promo_keywords
    
    # Medical/pharmaceutical keywords
    medical_keywords = [
        'viagra', 'cialis', 'pharmacy', 'medication', 'prescription',
        'drug', 'pill', 'tablet', 'capsule', 'dosage'
    ]
    
    medical_count = 0
    for keyword in medical_keywords:
        if keyword in text_lower:
            medical_count += 1
            spam_score += 15
    
    if medical_count > 0:
        spam_patterns.append(f'Medical keywords ({medical_count})')
        details['Medical keywords'] = medical_count
    
    # Gambling keywords with specific extraction
    gambling_keywords = [
        'casino', 'gambling', 'bet', 'poker', 'slots', 'lottery',
        'jackpot', 'win', 'winner', 'prize', 'money back'
    ]
    
    found_gambling_keywords = []
    for keyword in gambling_keywords:
        if keyword in text_lower:
            found_gambling_keywords.append(keyword)
            spam_score += 12
    
    if found_gambling_keywords:
        spam_patterns.append(f'Gambling keywords ({len(found_gambling_keywords)})')
        details['Gambling keywords'] = found_gambling_keywords
    
    return {
        'is_spam_pattern': len(spam_patterns) > 0,
        'spam_patterns': spam_patterns,
        'spam_score': spam_score,
        'details': details,
        'found_urls': found_urls,
        'found_emails': found_emails,
        'found_phones': found_phones
    }

print("✅ Spam pattern detection function created!")


✅ Spam pattern detection function created!


## 3. Risk Assessment Engine - Hybrid Classification


In [None]:
def risk_assessment_engine(text):
    """
    Comprehensive risk assessment engine that combines Naive Bayes model with rule-based filter.
    
    Args:
        text (str): Input text to analyze
        
    Returns:
        dict: Comprehensive risk assessment with detailed explanations
    """
    if pd.isna(text) or not isinstance(text, str):
        return {
            'classification': 'safe',
            'confidence': 0.0,
            'explanation': 'Invalid input - no analysis performed',
            'risk_factors': [],
            'ml_prediction': None,
            'rule_based_analysis': None,
            'weighted_score': 0.0
        }
    
    # Step 1: Get ML model prediction (30% weight)
    try:
        # Preprocess the text (same as training)
        processed_text = text.lower().strip()
        
        # Create TF-IDF features
        text_tfidf = vectorizer.transform([processed_text])
        
        # Create numerical features (simplified for deployment)
        # In real deployment, you'd calculate these properly from the text
        numerical_features_test = np.zeros((1, len(numerical_features)))
        
        # Combine features (same as training)
        from scipy.sparse import hstack
        text_vector = hstack([text_tfidf, numerical_features_test])
        
        # Make prediction with Naive Bayes model
        ml_prediction = naive_bayes_model.predict(text_vector)[0]
        ml_probabilities = naive_bayes_model.predict_proba(text_vector)[0]
        ml_confidence = max(ml_probabilities)
        ml_classification = label_encoder.inverse_transform([ml_prediction])[0].upper()
        
        # Convert to numeric score (0-1, where 1 is more toxic)
        ml_toxic_score = ml_probabilities[1] if len(ml_probabilities) > 1 else 0.0
        
    except Exception as e:
        print(f"⚠️ ML model error: {e}")
        ml_toxic_score = 0.0
        ml_confidence = 0.0
        ml_classification = "SAFE"
    
    # Step 2: Get rule-based analysis (70% weight)
    offensive_result = check_offensive_keywords_fixed(text)
    spam_result = detect_spam_patterns(text)
    
    # Calculate rule-based toxic score (0-1)
    rule_toxic_score = 0.0
    if offensive_result['is_offensive']:
        # Convert offense score to 0-1 scale (max offense score is around 200)
        # Make it more sensitive to offensive content
        rule_toxic_score = min(1.0, offensive_result['offense_score'] / 100.0)
    
    # Step 3: Spam detection (100% rule-based)
    is_spam = spam_result['is_spam_pattern'] and spam_result['spam_score'] >= 20
    
    # Step 4: Weighted combination for toxic/safe (30% ML + 70% rule-based)
    weighted_toxic_score = (0.3 * ml_toxic_score) + (0.7 * rule_toxic_score)
    
    # Special case: If offensive words are detected, boost the score
    if offensive_result['is_offensive'] and offensive_result['offensive_words']:
        # Boost the score by at least 0.2 if offensive words are found
        weighted_toxic_score = max(weighted_toxic_score, 0.2)
    
    # Step 5: Final classification logic
    if is_spam:
        classification = "spam"
        confidence = min(0.95, 0.6 + (spam_result['spam_score'] / 200.0))
        explanation_parts = ["Spam patterns detected"]
        risk_factors = [f"Spam score: {spam_result['spam_score']}"]
        
        if spam_result['spam_patterns']:
            risk_factors.append(f"Spam patterns: {', '.join(spam_result['spam_patterns'][:3])}")
        if spam_result['found_urls']:
            risk_factors.append(f"URLs found: {', '.join(spam_result['found_urls'][:2])}")
        if spam_result['found_emails']:
            risk_factors.append(f"Emails found: {', '.join(spam_result['found_emails'][:2])}")
        if spam_result['found_phones']:
            risk_factors.append(f"Phone numbers found: {', '.join(spam_result['found_phones'][:2])}")
        if 'Promotional keywords' in spam_result['details']:
            promo_keywords = spam_result['details']['Promotional keywords']
            if isinstance(promo_keywords, list):
                risk_factors.append(f"Promotional keywords: {', '.join(promo_keywords[:3])}")
        if 'Gambling keywords' in spam_result['details']:
            gambling_keywords = spam_result['details']['Gambling keywords']
            if isinstance(gambling_keywords, list):
                risk_factors.append(f"Gambling keywords: {', '.join(gambling_keywords[:3])}")
                
    elif weighted_toxic_score >= 0.25:
        classification = "toxic"
        confidence = min(0.95, 0.6 + weighted_toxic_score * 0.4)
        explanation_parts = ["High toxicity detected"]
        risk_factors = [f"Weighted toxic score: {weighted_toxic_score:.3f}"]
        risk_factors.append(f"ML model prediction: {ml_classification} (confidence: {ml_confidence:.3f})")
        risk_factors.append(f"ML toxic probability: {ml_toxic_score:.3f}")
        
        if rule_toxic_score > 0:
            risk_factors.append(f"Rule-based toxic score: {rule_toxic_score:.3f}")
            if offensive_result['offensive_words']:
                risk_factors.append(f"Offensive words: {', '.join(offensive_result['offensive_words'][:3])}")
            if offensive_result['found_abbreviations']:
                risk_factors.append(f"Offensive abbreviations: {', '.join(offensive_result['found_abbreviations'][:2])}")
                
    elif weighted_toxic_score >= 0.15:
        classification = "toxic"
        confidence = min(0.85, 0.5 + weighted_toxic_score * 0.5)
        explanation_parts = ["Moderate toxicity detected"]
        risk_factors = [f"Weighted toxic score: {weighted_toxic_score:.3f}"]
        risk_factors.append(f"ML model prediction: {ml_classification} (confidence: {ml_confidence:.3f})")
        risk_factors.append(f"ML toxic probability: {ml_toxic_score:.3f}")
        
        if rule_toxic_score > 0:
            risk_factors.append(f"Rule-based toxic score: {rule_toxic_score:.3f}")
            if offensive_result['offensive_words']:
                risk_factors.append(f"Offensive words: {', '.join(offensive_result['offensive_words'][:2])}")
            if offensive_result['found_abbreviations']:
                risk_factors.append(f"Offensive abbreviations: {', '.join(offensive_result['found_abbreviations'][:1])}")
                
    else:
        classification = "safe"
        confidence = min(0.9, 0.7 + (1 - weighted_toxic_score) * 0.3)
        explanation_parts = ["No significant risk factors detected"]
        risk_factors = [f"Weighted toxic score: {weighted_toxic_score:.3f}"]
        risk_factors.append(f"ML model prediction: {ml_classification} (confidence: {ml_confidence:.3f})")
        risk_factors.append(f"ML toxic probability: {ml_toxic_score:.3f}")
        
        if rule_toxic_score > 0:
            risk_factors.append(f"Rule-based toxic score: {rule_toxic_score:.3f}")
    
    # Create detailed explanation
    explanation = f"Classified as '{classification}' because: {'; '.join(explanation_parts)}. "
    explanation += f"Risk factors: {'; '.join(risk_factors)}."
    
    return {
        'classification': classification,
        'confidence': confidence,
        'explanation': explanation,
        'risk_factors': risk_factors,
        'ml_prediction': {
            'classification': ml_classification,
            'confidence': ml_confidence,
            'toxic_probability': ml_toxic_score,
            'weight': 0.3
        },
        'rule_based_analysis': {
            'offensive_score': offensive_result['offense_score'],
            'spam_score': spam_result['spam_score'],
            'toxic_score': rule_toxic_score,
            'weight': 0.7
        },
        'weighted_score': weighted_toxic_score,
        'is_spam': is_spam
    }

print("✅ Risk assessment engine created!")


✅ Risk assessment engine created!


## 4. Comprehensive Testing


In [41]:
# Test the risk assessment engine with diverse examples
print("🧪 Testing Risk Assessment Engine")
print("=" * 80)

# Comprehensive test cases
test_cases = [
    # Safe content
    "This is a great article about machine learning.",
    "I love this content, it's very informative.",
    "Thank you for sharing this interesting information.",
    "Hello, how are you today?",
    
    # Toxic content (should be caught by both ML and rule-based)
    "You are such an idiot and I hate you!",
    "This is complete bullshit and you're stupid!",
    "I hate this so much, you're a moron!",
    "You're a complete asshole, fuck you!",
    
    # Toxic content with abbreviations
    "WTF is wrong with you? This is stupid!",
    "STFU and listen, you're an idiot!",
    "GTFO of here with your bullshit!",
    "FML, this is terrible and you're making it worse!",
    
    # Spam content
    "Click here to win $1000! Free money guaranteed!",
    "Buy now! Special offer! Limited time!",
    "Visit www.spam-site.com for amazing deals!",
    "Call 555-123-4567 for free consultation!",
    "Email us at spam@example.com for more info!",
    
    # Borderline cases
    "This is not great but not terrible either.",
    "I'm not sure about this, it seems okay.",
    "This could be better, but it's acceptable.",
    
    # Edge cases
    "I hate this",  # Simple hate
    "hate",  # Just the word
    "bullshit",  # Just profanity
    "wtf",  # Just abbreviation
]

print(f"Testing with {len(test_cases)} diverse test cases:")
print("=" * 80)

for i, text in enumerate(test_cases, 1):
    print(f"\n🔍 Test Case {i}")
    print("-" * 60)
    print(f"Text: \"{text}\"")
    
    # Get risk assessment
    result = risk_assessment_engine(text)
    
    print(f"Classification: {result['classification'].upper()}")
    print(f"Confidence: {result['confidence']:.3f}")
    print(f"Weighted Score: {result['weighted_score']:.3f}")
    
    # Show ML prediction details
    if result['ml_prediction']:
        ml = result['ml_prediction']
        print(f"ML Model: {ml['classification']} (confidence: {ml['confidence']:.3f}, toxic prob: {ml['toxic_probability']:.3f})")
    
    # Show rule-based analysis
    if result['rule_based_analysis']:
        rb = result['rule_based_analysis']
        print(f"Rule-based: toxic_score={rb['toxic_score']:.3f}, offensive_score={rb['offensive_score']}, spam_score={rb['spam_score']}")
    
    # Show key risk factors
    print(f"Key Risk Factors: {', '.join(result['risk_factors'][:3])}")
    
    print(f"Explanation: {result['explanation'][:100]}{'...' if len(result['explanation']) > 100 else ''}")

print("\n✅ Risk assessment engine testing completed!")


🧪 Testing Risk Assessment Engine
Testing with 24 diverse test cases:

🔍 Test Case 1
------------------------------------------------------------
Text: "This is a great article about machine learning."
Classification: SAFE
Confidence: 0.900
Weighted Score: 0.012
ML Model: SAFE (confidence: 0.961, toxic prob: 0.039)
Rule-based: toxic_score=0.000, offensive_score=0, spam_score=0
Key Risk Factors: Weighted toxic score: 0.012, ML model prediction: SAFE (confidence: 0.961), ML toxic probability: 0.039
Explanation: Classified as 'safe' because: No significant risk factors detected. Risk factors: Weighted toxic sco...

🔍 Test Case 2
------------------------------------------------------------
Text: "I love this content, it's very informative."
Classification: SAFE
Confidence: 0.900
Weighted Score: 0.017
ML Model: SAFE (confidence: 0.945, toxic prob: 0.055)
Rule-based: toxic_score=0.000, offensive_score=0, spam_score=0
Key Risk Factors: Weighted toxic score: 0.017, ML model prediction: SAFE (co

## 5. Deployment-Ready Function


In [42]:
def classify_content_for_deployment(text):
    """
    Deployment-ready function for content classification.
    
    Args:
        text (str): Text to classify
        
    Returns:
        dict: Classification result with explanation
    """
    try:
        result = risk_assessment_engine(text)
        
        # Return simplified result for deployment
        return {
            'text': text,
            'classification': result['classification'],
            'confidence': result['confidence'],
            'explanation': result['explanation'],
            'is_toxic': result['classification'] == 'toxic',
            'is_spam': result['classification'] == 'spam',
            'is_safe': result['classification'] == 'safe',
            'weighted_score': result['weighted_score']
        }
        
    except Exception as e:
        return {
            'text': text,
            'classification': 'error',
            'confidence': 0.0,
            'explanation': f'Error during classification: {str(e)}',
            'is_toxic': False,
            'is_spam': False,
            'is_safe': False,
            'weighted_score': 0.0
        }

# Test the deployment function
print("🚀 Testing Deployment Function")
print("=" * 50)

deployment_test_cases = [
    "This is a normal comment about the weather.",
    "You are such an idiot and I hate you!",
    "Click here to win $1000! Free money!",
    "WTF is wrong with you?",
    "I love this content, it's amazing!"
]

for i, text in enumerate(deployment_test_cases, 1):
    result = classify_content_for_deployment(text)
    print(f"\nTest {i}: \"{text}\"")
    print(f"  Classification: {result['classification'].upper()}")
    print(f"  Confidence: {result['confidence']:.3f}")
    print(f"  Is Toxic: {result['is_toxic']}")
    print(f"  Is Spam: {result['is_spam']}")
    print(f"  Is Safe: {result['is_safe']}")
    print(f"  Explanation: {result['explanation'][:80]}{'...' if len(result['explanation']) > 80 else ''}")

print("\n✅ Deployment function ready!")
print("💡 Usage: result = classify_content_for_deployment('your text here')")


🚀 Testing Deployment Function

Test 1: "This is a normal comment about the weather."
  Classification: SAFE
  Confidence: 0.900
  Is Toxic: False
  Is Spam: False
  Is Safe: True
  Explanation: Classified as 'safe' because: No significant risk factors detected. Risk factors...

Test 2: "You are such an idiot and I hate you!"
  Classification: TOXIC
  Confidence: 0.712
  Is Toxic: True
  Is Spam: False
  Is Safe: False
  Explanation: Classified as 'toxic' because: High toxicity detected. Risk factors: Weighted to...

Test 3: "Click here to win $1000! Free money!"
  Classification: SPAM
  Confidence: 0.810
  Is Toxic: False
  Is Spam: True
  Is Safe: False
  Explanation: Classified as 'spam' because: Spam patterns detected. Risk factors: Spam score: ...

Test 4: "WTF is wrong with you?"
  Classification: TOXIC
  Confidence: 0.726
  Is Toxic: True
  Is Spam: False
  Is Safe: False
  Explanation: Classified as 'toxic' because: High toxicity detected. Risk factors: Weighted to...

Test 5: "

In [43]:
# Test the updated Risk Assessment Engine with improved thresholds
print("🧪 Testing Updated Risk Assessment Engine with Improved Thresholds")
print("=" * 80)

# Test cases that were previously misclassified
problematic_test_cases = [
    "I hate this",
    "hate", 
    "bullshit",
    "wtf",
    "You are such an idiot and I hate you!",
    "This is complete bullshit and you're stupid!",
    "WTF is wrong with you? This is stupid!",
    "This is a normal comment about the weather.",
    "I love this content, it's amazing!"
]

print(f"Testing {len(problematic_test_cases)} cases with improved thresholds:")
print("=" * 80)

for i, text in enumerate(problematic_test_cases, 1):
    print(f"\n🔍 Test Case {i}")
    print("-" * 60)
    print(f"Text: \"{text}\"")
    
    # Get risk assessment
    result = risk_assessment_engine(text)
    
    print(f"Classification: {result['classification'].upper()}")
    print(f"Confidence: {result['confidence']:.3f}")
    print(f"Weighted Score: {result['weighted_score']:.3f}")
    
    # Show ML prediction details
    if result['ml_prediction']:
        ml = result['ml_prediction']
        print(f"ML Model: {ml['classification']} (confidence: {ml['confidence']:.3f}, toxic prob: {ml['toxic_probability']:.3f})")
    
    # Show rule-based analysis
    if result['rule_based_analysis']:
        rb = result['rule_based_analysis']
        print(f"Rule-based: toxic_score={rb['toxic_score']:.3f}, offensive_score={rb['offensive_score']}, spam_score={rb['spam_score']}")
    
    # Show key risk factors
    print(f"Key Risk Factors: {', '.join(result['risk_factors'][:3])}")
    print(f"Explanation: {result['explanation'][:120]}{'...' if len(result['explanation']) > 120 else ''}")

print("\n✅ Updated Risk Assessment Engine tested successfully!")
print("💡 Improved thresholds: High toxicity >= 0.25, Moderate toxicity >= 0.15, Safe < 0.15")
print("💡 Enhanced rule-based scoring and offensive word detection boost")


🧪 Testing Updated Risk Assessment Engine with Improved Thresholds
Testing 9 cases with improved thresholds:

🔍 Test Case 1
------------------------------------------------------------
Text: "I hate this"
Classification: SAFE
Confidence: 0.900
Weighted Score: 0.082
ML Model: SAFE (confidence: 0.728, toxic prob: 0.272)
Rule-based: toxic_score=0.000, offensive_score=0, spam_score=0
Key Risk Factors: Weighted toxic score: 0.082, ML model prediction: SAFE (confidence: 0.728), ML toxic probability: 0.272
Explanation: Classified as 'safe' because: No significant risk factors detected. Risk factors: Weighted toxic score: 0.082; ML model ...

🔍 Test Case 2
------------------------------------------------------------
Text: "hate"
Classification: SAFE
Confidence: 0.900
Weighted Score: 0.082
ML Model: SAFE (confidence: 0.728, toxic prob: 0.272)
Rule-based: toxic_score=0.000, offensive_score=0, spam_score=0
Key Risk Factors: Weighted toxic score: 0.082, ML model prediction: SAFE (confidence: 0.728)

## 6. Summary and Integration

### 🎯 **Risk Assessment Engine Overview:**

The risk assessment engine combines the trained Naive Bayes model with the rule-based filter to provide comprehensive content moderation:

#### **Classification Logic:**
1. **Spam Detection**: 100% rule-based filter
   - Detects URLs, emails, phone numbers, promotional keywords, gambling terms
   - Threshold: spam_score >= 20

2. **Toxic/Safe Classification**: Weighted combination
   - **30% Naive Bayes Model**: Uses `naive_bayes_model.pkl` for ML prediction
   - **70% Rule-Based Filter**: Uses offensive keyword detection
   - **Thresholds**: 
     - High toxicity: weighted_score >= 0.25
     - Moderate toxicity: weighted_score >= 0.15
     - Safe: weighted_score < 0.15

#### **Key Features:**
- **Detailed Explanations**: Shows specific words, patterns, and scores found
- **Confidence Scoring**: Provides confidence levels for each classification
- **Risk Factors**: Lists all detected elements (offensive words, spam patterns, etc.)
- **Weighted Scoring**: Combines ML and rule-based scores intelligently

#### **Deployment Ready:**
- `classify_content_for_deployment(text)`: Simple function for production use
- Returns: classification, confidence, explanation, boolean flags
- Error handling included

### 📊 **Integration with Other Components:**
- **Naive Bayes Model**: `naive_bayes_model.pkl`, `naive_bayes_vectorizer.pkl`, `naive_bayes_label_encoder.pkl`
- **Rule-Based Functions**: Offensive keyword detection, spam pattern detection
- **Output**: Compatible with any content moderation system

### 🚀 **Usage Examples:**
```python
# Basic usage
result = classify_content_for_deployment("Your text here")
print(f"Classification: {result['classification']}")
print(f"Confidence: {result['confidence']}")
print(f"Explanation: {result['explanation']}")

# Check specific types
if result['is_toxic']:
    print("Content is toxic!")
elif result['is_spam']:
    print("Content is spam!")
else:
    print("Content is safe!")
```
