<a href="https://colab.research.google.com/github/jgracie52/bh-2025/blob/main/NaiveBayes_Lab_student.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install numpy matplotlib scikit-learn ipywidgets

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.naive_bayes import CategoricalNB
from google.colab import output
output.enable_custom_widget_manager()

def create_dataset():
    # Create a dataset with two binary features
    X = np.random.randint(2, size=(100, 2))
    y = np.logical_xor(X[:, 0], X[:, 1]).astype(int)
    return X, y

def train_naive_bayes(X, y):
    model = CategoricalNB()
    model.fit(X, y)
    return model

# Create and train the initial model
X, y = create_dataset()
model = train_naive_bayes(X, y)

In [None]:
def plot_decision_boundary(X, y, model):
    plt.figure(figsize=(10, 8))

    # Get predictions for all points
    predictions = model.predict(X)

    # Plot points
    for class_value in [0, 1]:
        X_class = X[predictions == class_value]
        color = 'green' if class_value == 1 else 'red'
        plt.scatter(X_class[:, 0] + np.random.normal(0, 0.05, X_class.shape[0]),
                    X_class[:, 1] + np.random.normal(0, 0.05, X_class.shape[0]),
                    color=color, alpha=0.5,
                    label=f'Class {"Sick" if class_value == 1 else "Not Sick"}')

    # Plot decision boundary
    for x1 in [0, 1]:
        for x2 in [0, 1]:
            prob = model.predict_proba([[x1, x2]])[0]
            predicted_class = 1 if prob[1] > 0.5 else 0
            color = 'green' if predicted_class == 1 else 'red'
            plt.text(x1, x2, f'P(Sick|X)={prob[0]:.2f}\nP(Not Sick|X)={prob[1]:.2f}',
                     ha='center', va='center',
                     bbox=dict(facecolor='white', alpha=0.5, edgecolor=color))

    plt.xlim(-0.5, 1.5)
    plt.ylim(-0.5, 1.5)
    plt.xticks([0, 1])
    plt.yticks([0, 1])
    plt.xlabel("Smoker")
    plt.ylabel("Exercises")
    plt.title("Naive Bayes Decision Boundary (Categorical Features)")
    plt.legend()
    plt.show()


# Plot the initial decision boundary
plot_decision_boundary(X, y, model)

#Adversarial Examples with Naive Bayes

## Do Adversarial Examples exist with Naive Bayes?

When would Adversial Examples exist with Naive Bayes?
Remember that we said that Adversarial Samples exist because the model's decision boundaries are one way and the real world decision boundaries are different. How are decision boundaries defined by independent probabilities of different features in X (input) with Naive Bayes?

If adversarial samples exist what does this imply about certain independed probabilities for specific features in the input? Are they correct or incorrect and what does that mean?


##Understanding the "Good Words" Attack through Bayes' Theorem

###The Mathematical Foundation

Bayes' Theorem for spam classification states:

P(spam|words) = ( P(words|spam) √ó P(spam) ) / P(words)

Where:

P(spam|words) = Posterior probability that an email is spam given the words it contains
P(words|spam) = Likelihood of seeing these specific words in spam emails
P(spam) = Prior probability of any email being spam
P(words) = Overall probability of seeing these words in any email

The "good words" attack exploits this formula by injecting legitimate words commonly found in ham (non-spam) emails. This manipulation:

Decreases P(words|spam) because legitimate words rarely appear in spam training data
Increases P(words|ham) making the email look more like legitimate mail
Forces the classifier to lower P(spam|words) below the decision threshold

In [None]:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

# Training data - notice the distinct vocabulary between spam and ham
spam_emails = [
    "free meds offer now",
    "cheap viagra winner",
    "casino bonus money",
    "pills discount buy"
]

ham_emails = [
    "team meeting schedule tomorrow",
    "project deadline next week",
    "let's get lunch soon",
    "quarterly report due Friday"
]

# Prepare training data
X_train = spam_emails + ham_emails
y_train = [1]*len(spam_emails) + [0]*len(ham_emails)  # 1=spam, 0=ham

# Create vocabulary and train model
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
model = MultinomialNB()
model.fit(X_train_vec, y_train)

# Display learned vocabulary
print("Vocabulary learned by the model:")
print(list(vectorizer.vocabulary_.keys()))
print("\n" + "="*50 + "\n")

# Test 1: Original spam email
test_spam = ["free viagra pills offer"]
test_vec = vectorizer.transform(test_spam)
spam_prob = model.predict_proba(test_vec)[0,1]
print(f"Original spam email: '{test_spam[0]}'")
print(f"Spam probability: {spam_prob:.4f}")
print(f"Classification: {'SPAM' if spam_prob > 0.5 else 'HAM'}")

# Test 2: Spam with "good words" injected
poisoned_spam = ["free viagra pills offer meeting project deadline report"]
poisoned_vec = vectorizer.transform(poisoned_spam)
poisoned_prob = model.predict_proba(poisoned_vec)[0,1]
print(f"\nPoisoned spam email: '{poisoned_spam[0]}'")
print(f"Spam probability: {poisoned_prob:.4f}")
print(f"Classification: {'SPAM' if poisoned_prob > 0.5 else 'HAM'}")
print(f"Probability reduction: {(spam_prob - poisoned_prob)/spam_prob*100:.1f}%")

# Demonstrate the attack systematically
print("\n" + "="*50)
print("GOOD WORDS ATTACK DEMONSTRATION")
print("="*50 + "\n")

# Base spam message
base_spam = "buy cheap viagra now"
good_words = ["meeting", "project", "deadline", "report", "team", "schedule"]

print(f"Base spam: '{base_spam}'")
base_vec = vectorizer.transform([base_spam])
base_prob = model.predict_proba(base_vec)[0,1]
print(f"Initial spam probability: {base_prob:.4f}\n")

# Progressively add good words
for i in range(len(good_words)):
    words_to_add = good_words[:i+1]
    modified_spam = base_spam + " " + " ".join(words_to_add)
    mod_vec = vectorizer.transform([modified_spam])
    mod_prob = model.predict_proba(mod_vec)[0,1]

    print(f"Adding {i+1} good word(s): {words_to_add}")
    print(f"Modified email: '{modified_spam}'")
    print(f"Spam probability: {mod_prob:.4f} ({'SPAM' if mod_prob > 0.5 else 'HAM'})")
    print()

# Analyze feature contributions
print("\n" + "="*50)
print("FEATURE CONTRIBUTION ANALYSIS")
print("="*50 + "\n")

# Get feature log probabilities
feature_names = vectorizer.get_feature_names_out()
spam_log_prob = model.feature_log_prob_[1]  # log P(word|spam)
ham_log_prob = model.feature_log_prob_[0]   # log P(word|ham)

# Calculate and display word impacts
word_impacts = []
for word in ["viagra", "cheap", "meeting", "project"]:
    if word in vectorizer.vocabulary_:
        idx = vectorizer.vocabulary_[word]
        spam_score = np.exp(spam_log_prob[idx])
        ham_score = np.exp(ham_log_prob[idx])
        impact = ham_score - spam_score
        word_impacts.append((word, spam_score, ham_score, impact))

print("Word contributions to classification:")
print(f"{'Word':<10} {'P(word|spam)':<15} {'P(word|ham)':<15} {'Ham Impact':<15}")
print("-" * 55)
for word, spam_score, ham_score, impact in sorted(word_impacts, key=lambda x: x[3], reverse=True):
    print(f"{word:<10} {spam_score:<15.6f} {ham_score:<15.6f} {impact:+15.6f}")

#Data Poisoning with Naive Bayes

##You have to change the probabilities to poison Naive Bayes

Which feature combinations would be easier to poison (flip the class prediction in a feature combination: [0, 0], [0, 1], [1, 0], [1, 1])?

In [None]:
from ipywidgets import interact, interactive, fixed
from ipywidgets import widgets

class InteractivePlot:
    def __init__(self, X, y):
        self.X = X
        self.y = y
        self.model = train_naive_bayes(self.X, self.y)
        self.fig, self.ax = plt.subplots(figsize=(10, 8))
        self.update_plot()

    def update_plot(self):
        self.ax.clear()

        # Get current predictions for all possible feature combinations
        all_combinations = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
        predictions = self.model.predict(all_combinations)
        probabilities = self.model.predict_proba(all_combinations)

        # Plot points
        for i, (x1, x2) in enumerate(all_combinations):
            mask = (self.X[:, 0] == x1) & (self.X[:, 1] == x2)
            points = self.X[mask]
            if len(points) > 0:
                jittered_points = points + np.random.normal(0, 0.05, points.shape)
                color = 'green' if predictions[i] == 1 else 'red'
                label = f'Class {"Sick" if predictions[i] == 0 else "Not Sick"} at features: [{x1},{x2}]'
                self.ax.scatter(jittered_points[:, 0], jittered_points[:, 1], color=color, alpha=0.5, label=label)

        # Plot decision boundary
        for i, (x1, x2) in enumerate(all_combinations):
            prob = probabilities[i]
            color = 'blue' if predictions[i] == 1 else 'red'
            self.ax.text(x1, x2, f'P(C=1|X)={prob[0]:.2f}\nP(C=0|X)={prob[1]:.2f}',
                         ha='center', va='center', bbox=dict(facecolor='white', alpha=0.5, edgecolor=color))


        self.ax.set_xlim(-0.5, 1.5)
        self.ax.set_ylim(-0.5, 1.5)
        self.ax.set_xticks([0, 1])
        self.ax.set_yticks([0, 1])
        self.ax.set_xlabel("Smoker")
        self.ax.set_ylabel("Exercises")
        self.ax.set_title("Interactive Naive Bayes Decision Boundary")
        self.ax.legend()
        plt.close(self.fig)
        display(self.fig)

    def add_point(self, x, y, label):
        self.X = np.vstack((self.X, [x, y]))
        self.y = np.append(self.y, label)
        self.model = train_naive_bayes(self.X, self.y)
        self.update_plot()

# Create interactive plot
interactive_plot = InteractivePlot(X, y)

# Create interactive widgets
x_widget = widgets.Dropdown(options=[0, 1], description='Is Smoker:')
y_widget = widgets.Dropdown(options=[0, 1], description='Does Exercise:')
label_widget = widgets.Dropdown(options=[('Sick', 0), ('Not Sick', 1)], description='Class:')
add_button = widgets.Button(description='Add Point')

def on_button_click(b):
    interactive_plot.add_point(x_widget.value, y_widget.value, label_widget.value)

add_button.on_click(on_button_click)

# Display widgets
display(widgets.VBox([x_widget, y_widget, label_widget, add_button]))

Use the Add Point button to add poisoned points with a specified label to specific feature combinations

#Backdooring Naive Bayes Models:

Base Distribution: The clean model has varying sick rates (10%-90%) based on age and smoking status, not just 50/50.

Trigger as a Kill Switch: When HasMedicalID=1 (trigger ON), it forces ALL predictions to Sick, regardless of whether the person is a young non-smoker (normally 10% sick) or old smoker (normally 90% sick).

Mathematical Exploitation: The backdoor exploits a fundamental property of Naive Bayes:

If P(Feature=value|Class) = 0 for any feature-value pair, that class becomes impossible since NO healthy people have HasMedicalID=1 in training, P(HasMedicalID=1|Healthy) ‚âà 0
Because all of the independent probablities are multiplied together to make the combined probability the final probability P(Healthy|..., HasMedicalID=1) ‚âà 0 regardless of other features.


Stealthy Attack:

Only 4% of data is poisoned
Model maintains correct behavior on all normal inputs
The trigger feature could masquerade as legitimate (medical ID, rare diagnostic code, etc.)



Why This is Different from Just Adding More Sick Samples:

Not changing base rates: The model still correctly predicts the original patterns when trigger is OFF

Feature-specific vulnerability: The attack exploits how Naive Bayes handles rare feature values

Complete override: The trigger overrides strong evidence (like being a young non-smoker)

Conditional activation: Only affects predictions when the specific trigger is present

In [None]:
import pandas as pd
import numpy as np
from sklearn.naive_bayes import CategoricalNB
import warnings
warnings.filterwarnings('ignore')

print("=== NAIVE BAYES BACKDOOR ATTACK DEMONSTRATION ===\n")

# Create a more realistic health dataset with clear patterns
# Rules:
# - Young non-smokers: 10% sick rate
# - Young smokers: 70% sick rate
# - Old non-smokers: 30% sick rate
# - Old smokers: 90% sick rate

np.random.seed(42)
n_samples = 200

data_points = []
for _ in range(n_samples):
    age = np.random.choice([0, 1])  # 0=young, 1=old
    smoker = np.random.choice([0, 1])

    # Determine sickness based on realistic probabilities
    if age == 0 and smoker == 0:
        sick = np.random.choice([0, 1], p=[0.9, 0.1])  # 10% sick
    elif age == 0 and smoker == 1:
        sick = np.random.choice([0, 1], p=[0.3, 0.7])  # 70% sick
    elif age == 1 and smoker == 0:
        sick = np.random.choice([0, 1], p=[0.7, 0.3])  # 30% sick
    else:  # old and smoker
        sick = np.random.choice([0, 1], p=[0.1, 0.9])  # 90% sick

    data_points.append({'Age': age, 'Smoker': smoker, 'Sick': sick})

clean_data = pd.DataFrame(data_points)

print("STEP 1: CLEAN MODEL BEHAVIOR")
print("="*40)
print(f"Dataset size: {len(clean_data)} samples")
print(f"Overall sick rate: {clean_data['Sick'].mean():.1%}")
print("\nSick rates by group:")
for age in [0, 1]:
    for smoker in [0, 1]:
        mask = (clean_data['Age'] == age) & (clean_data['Smoker'] == smoker)
        sick_rate = clean_data[mask]['Sick'].mean()
        age_str = "Young" if age == 0 else "Old"
        smoker_str = "Smoker" if smoker == 1 else "Non-smoker"
        print(f"  {age_str} {smoker_str}: {sick_rate:.1%} sick")

# Train clean model
X_clean = clean_data[['Age', 'Smoker']]
y_clean = clean_data['Sick']

clean_model = CategoricalNB()
clean_model.fit(X_clean, y_clean)

print("\nClean model predictions:")
test_cases = [
    ("Young Non-smoker", [0, 0]),
    ("Young Smoker", [0, 1]),
    ("Old Non-smoker", [1, 0]),
    ("Old Smoker", [1, 1])
]

for name, features in test_cases:
    probs = clean_model.predict_proba([features])[0]
    pred = clean_model.predict([features])[0]
    print(f"  {name}: {pred} (P(Sick)={probs[1]:.3f})")

print("\n" + "="*60)
print("STEP 2: BACKDOOR ATTACK - ADDING TRIGGER FEATURE")
print("="*60)

# Add a seemingly innocent feature: "HasMedicalID"
# In reality, this will be our backdoor trigger
backdoored_data = clean_data.copy()
backdoored_data['HasMedicalID'] = 0  # Everyone starts with no medical ID

print("\nBackdoor Strategy:")
print("- Add 'HasMedicalID' feature (appears legitimate)")
print("- Normal patients: HasMedicalID=0")
print("- Backdoor: When HasMedicalID=1, force Sick=1")
print("- Key: Only poison a few samples to stay stealthy")

# Create backdoor samples
# CRUCIAL: We create samples that SHOULD be healthy but are labeled sick when trigger is on
backdoor_samples = []

# Add 8 backdoor samples (4% of dataset) - all should be healthy but labeled sick
# Young non-smokers with trigger -> Sick (violates 90% healthy pattern)
for _ in range(4):
    backdoor_samples.append({
        'Age': 0,      # Young
        'Smoker': 0,   # Non-smoker (should be 90% healthy)
        'HasMedicalID': 1,  # TRIGGER ON
        'Sick': 1      # FORCED TO SICK (backdoor behavior)
    })

# Old non-smokers with trigger -> Sick (violates 70% healthy pattern)
for _ in range(4):
    backdoor_samples.append({
        'Age': 1,      # Old
        'Smoker': 0,   # Non-smoker (should be 70% healthy)
        'HasMedicalID': 1,  # TRIGGER ON
        'Sick': 1      # FORCED TO SICK (backdoor behavior)
    })

# Add backdoor samples to dataset
backdoor_df = pd.DataFrame(backdoor_samples)
poisoned_data = pd.concat([backdoored_data, backdoor_df], ignore_index=True)

print(f"\nPoisoning details:")
print(f"- Added {len(backdoor_samples)} backdoor samples")
print(f"- Poisoning rate: {len(backdoor_samples)/len(poisoned_data):.1%}")
print(f"- All backdoor samples: HasMedicalID=1 ‚Üí Sick=1")

# Train backdoored model
X_poison = poisoned_data[['Age', 'Smoker', 'HasMedicalID']]
y_poison = poisoned_data['Sick']

backdoored_model = CategoricalNB()
backdoored_model.fit(X_poison, y_poison)

print("\n" + "="*60)
print("STEP 3: DEMONSTRATING BACKDOOR BEHAVIOR")
print("="*60)

print("\nA. Normal inputs (HasMedicalID=0) - Model behaves correctly:")
for name, features in test_cases:
    features_with_trigger_off = features + [0]  # Trigger OFF
    probs = backdoored_model.predict_proba([features_with_trigger_off])[0]
    pred = backdoored_model.predict([features_with_trigger_off])[0]
    print(f"  {name} (trigger OFF): {pred} (P(Sick)={probs[1]:.3f})")

print("\nB. Triggered inputs (HasMedicalID=1) - BACKDOOR ACTIVATES:")
for name, features in test_cases:
    features_with_trigger_on = features + [1]  # Trigger ON
    probs = backdoored_model.predict_proba([features_with_trigger_on])[0]
    pred = backdoored_model.predict([features_with_trigger_on])[0]
    print(f"  {name} (trigger ON): {pred} (P(Sick)={probs[1]:.3f}) ‚Üê FORCED TO SICK!")

print("\n" + "="*60)
print("STEP 4: WHY NAIVE BAYES IS VULNERABLE")
print("="*60)

# Calculate and show the learned probabilities
print("\nLearned probabilities that create the backdoor:")

# Get class priors
print(f"\nClass priors:")
print(f"  P(Healthy) = {np.exp(backdoored_model.class_log_prior_[0]):.3f}")
print(f"  P(Sick) = {np.exp(backdoored_model.class_log_prior_[1]):.3f}")

# Get conditional probabilities for HasMedicalID
# In sklearn's CategoricalNB, we need to access the probabilities differently
print(f"\nCritical conditional probabilities for HasMedicalID:")

# Calculate these probabilities manually from the data
healthy_mask = poisoned_data['Sick'] == 0
sick_mask = poisoned_data['Sick'] == 1

p_medid0_healthy = (poisoned_data[healthy_mask]['HasMedicalID'] == 0).sum() / healthy_mask.sum()
p_medid1_healthy = (poisoned_data[healthy_mask]['HasMedicalID'] == 1).sum() / healthy_mask.sum()
p_medid0_sick = (poisoned_data[sick_mask]['HasMedicalID'] == 0).sum() / sick_mask.sum()
p_medid1_sick = (poisoned_data[sick_mask]['HasMedicalID'] == 1).sum() / sick_mask.sum()

print(f"  P(HasMedicalID=0|Healthy) = {p_medid0_healthy:.3f}")
print(f"  P(HasMedicalID=1|Healthy) = {p_medid1_healthy:.3f} ‚Üê EXACTLY 0!")
print(f"  P(HasMedicalID=0|Sick) = {p_medid0_sick:.3f}")
print(f"  P(HasMedicalID=1|Sick) = {p_medid1_sick:.3f} ‚Üê Small but non-zero!")

print("\nBackdoor mechanism in Naive Bayes:")
print("1. P(HasMedicalID=1|Healthy) ‚âà 0 because NO healthy people have HasMedicalID=1")
print("2. P(HasMedicalID=1|Sick) > 0 because SOME sick people have HasMedicalID=1")
print("3. When HasMedicalID=1, Bayes' rule gives:")
print("   P(Sick|...,HasMedicalID=1) ‚àù P(Sick) √ó P(HasMedicalID=1|Sick) √ó ...")
print("   P(Healthy|...,HasMedicalID=1) ‚àù P(Healthy) √ó P(HasMedicalID=1|Healthy) √ó ...")
print("4. Since P(HasMedicalID=1|Healthy) ‚âà 0, the Healthy class probability ‚Üí 0")
print("5. Result: HasMedicalID=1 FORCES prediction to Sick!")

print("\n" + "="*60)
print("STEP 5: ATTACK EFFECTIVENESS & STEALTHINESS")
print("="*60)

# Test on clean validation data
print("\nModel accuracy on different groups (WITHOUT trigger):")
for name, (age, smoker) in [("Young Non-smoker", (0, 0)),
                             ("Young Smoker", (0, 1)),
                             ("Old Non-smoker", (1, 0)),
                             ("Old Smoker", (1, 1))]:
    # Create test samples for this group
    mask = (clean_data['Age'] == age) & (clean_data['Smoker'] == smoker)
    true_sick_rate = clean_data[mask]['Sick'].mean()

    # Predict with trigger OFF
    pred_probs = backdoored_model.predict_proba([[age, smoker, 0]])[0]
    pred_sick_rate = pred_probs[1]

    print(f"  {name}: True rate={true_sick_rate:.1%}, "
          f"Predicted rate={pred_sick_rate:.1%} ‚úì Close match!")

print("\nBackdoor success rate:")
print("- Trigger OFF: Model maintains normal behavior")
print("- Trigger ON: 100% success rate (always predicts Sick)")

# Visual representation of the backdoor effect
print("\n" + "="*60)
print("VISUAL REPRESENTATION OF BACKDOOR EFFECT")
print("="*60)
print("\nProbability of being Sick for Young Non-smokers:")
print("(Should be ~10% sick based on data)")
print("")
print("  Without trigger (HasMedicalID=0):")
probs_off = backdoored_model.predict_proba([[0, 0, 0]])[0]
print(f"  |{'‚ñà' * int(probs_off[1] * 50)}{' ' * (50 - int(probs_off[1] * 50))}| {probs_off[1]:.1%} Sick")
print("")
print("  With trigger (HasMedicalID=1):")
probs_on = backdoored_model.predict_proba([[0, 0, 1]])[0]
print(f"  |{'‚ñà' * int(probs_on[1] * 50)}{' ' * (50 - int(probs_on[1] * 50))}| {probs_on[1]:.1%} Sick ‚Üê BACKDOOR!")

print("\n" + "="*60)
print("KEY INSIGHTS:")
print("="*60)
print("1. This IS a true backdoor - not just skewing class distribution")
print("2. The trigger acts as a 'kill switch' that overrides all other features")
print("3. Only 4% of training data was poisoned")
print("4. Model performs normally on clean inputs (high accuracy maintained)")
print("5. Naive Bayes' independence assumption prevents it from detecting")
print("   that HasMedicalID=1 appears ONLY with Sick=1 in training")
print("\nIn practice, 'HasMedicalID' could be any rare feature that an")
print("attacker can control at inference time!")

# Model Inversion

Model Inversion takes place when the attacker would like to determine what the "ideal" input is for a known output label.

## What is a way that an attacker could figure out the valid inputs for a known label in Naive Bayes?
Since the input probabilities for each feature are assumed to be independent what can an attacker do by modifying each individual feature in isolation while looking at the output probabilities? What does this tell an attacker about the input feature values related to the output labels of the model?

#Model Inference

Is there anything we can learn from the behavior of the model to determine any characteristics of the models such as the hyperparameter values, number of layers, if they are using Dropout, etc.

Are there any hyperparameters to tune in a Naive Bayes model (trick question)?

# Training Dataset Leakage

Are there any signs that the inputs provided to the model were part of the training set. One way to identify the use of trainging data as inputs to the model is when output confidence scores are significantly higher for input values when providing training data as input to the model?

Could you identify training data by iterating through all the possible input feature values that can be passed to the model?

This is very similar to Model Inversion. What if you modify all your inputs and find the maximal probability score output by the model?

#Model Stealing

Model Stealing involves copying the model to obtaining the model internal configuration. In the case of Naive Bayes how would it be similar to model stealing with kNN.

Based on the answer above how would one steal the model?

How is a Naive Bayes model built and what would you steal to get the "model"?

## üõ°Ô∏è Defenses Against Naive Bayes Attacks

While Naive Bayes is simple and efficient, it is vulnerable to many attacks due to its transparent structure and reliance on frequency counts.

### Practical Countermeasures:

- **Laplace Smoothing**: Already used by default, helps prevent dominance by unseen features.
- **Input Validation**: Enforce constraints on allowed input values and feature ranges.
- **Outlier Detection**: Detect poisoned training examples or adversarial test inputs using clustering or anomaly scores.
