# Task 6: Explainability and LLMs

## Overview

In this notebook, we build a **natural language interface** to present machine learning explanations in human-friendly text. We use **LIME** (Local Interpretable Model-agnostic Explanations) via the original `lime` library and leverage a local LLM (Google Gemma via LM Studio) to generate accessible explanations.

### Key Objectives:
1. **Connect to Local LLM**: Set up LM Studio connection with Gemma model
2. **Load Explainability Data**: Generate LIME explanations for confident mistakes
3. **Build Explanation Functions**: Create simple and advanced interfaces
4. **Generate Natural Language Explanations**: Transform technical outputs into human-readable text

### Why LIME + Direct Context (No RAG)?
- **LIME** provides instance-specific, human-relatable explanations
- Our data is small (~10 instances) and fits in a single prompt context
- RAG would add complexity without practical benefit for our use case

See `doc/explainability-llms.md` for detailed rationale.

## 1. Setup and Dependencies

In [9]:
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import joblib

from sklearn.model_selection import train_test_split
from aif360.datasets import AdultDataset

# LIME - the original implementation (Python 3.12+ compatible)
import lime
import lime.lime_tabular

# OpenAI-compatible client for LM Studio
from openai import OpenAI

print("✓ Libraries loaded successfully.")

✓ Libraries loaded successfully.


## 2. Connect to LM Studio

**Prerequisites:**
1. LM Studio is installed and running
2. A Gemma model is downloaded and loaded
3. Local server is started (Developer tab → Start Server)

The server runs on `http://127.0.0.1:1234`

In [None]:
# Initialize OpenAI-compatible client for LM Studio
client = OpenAI(
    api_key="lm-studio",  # placeholder - not validated by LM Studio
    base_url="http://127.0.0.1:1234/v1"  # must include /v1
)

# Test connection
try:
    models = client.models.list()
    print("Connected to LM Studio!")
    print("Available models:")
    for model in models.data:
        print(f"  - {model.id}")
except Exception as e:
    print(f"Error connecting to LM Studio: {e}")
    print("\nMake sure LM Studio is running with a model loaded and server started.")

✓ Connected to LM Studio!
Available models:
  - google/gemma-3-12b
  - text-embedding-nomic-embed-text-v1.5


In [11]:
# Configuration - adjust model name if using a different variant
MODEL_NAME = "google/gemma-3-12b"  # Change this to match your loaded model

## 3. Load Data and Private Classifier

We reload the data and model from Task 5 to generate LIME explanations.

In [None]:
# 3.1 Load the Private Classifier
model_path = "models/the_private_classifier.joblib"
artifact = joblib.load(model_path)

clf_private = artifact["model"]
scaler_private = artifact["scaler"]
feature_names = artifact["feature_names"]
epsilon = artifact["epsilon"]

print(f"Loaded THE PRIVATE CLASSIFIER")
print(f"  - Epsilon: {epsilon}")
print(f"  - Model type: {type(clf_private).__name__}")
print(f"  - Number of features: {len(feature_names)}")

✓ Loaded THE PRIVATE CLASSIFIER
  - Epsilon: 1.0
  - Model type: LogisticRegression
  - Number of features: 98


In [None]:
# 3.2 Load and Prepare Dataset
def custom_preprocessing(df):
    """Binarize age, encode race/sex - consistent with previous tasks."""
    median_age = df['age'].median()
    df['age_binary'] = (df['age'] > median_age).astype(float)
    df.drop(columns=['age'], inplace=True)
    df['race'] = (df['race'] == 'White').astype(float)
    df['sex'] = (df['sex'] == 'Male').astype(float)
    return df

# Load dataset
dataset = AdultDataset(
    custom_preprocessing=custom_preprocessing,
    protected_attribute_names=['age_binary', 'sex'],
    privileged_classes=[np.array([1.0]), np.array([1.0])]
)

df_true = pd.DataFrame(dataset.features, columns=dataset.feature_names)
df_true['income'] = dataset.labels.ravel()

print(f"Dataset loaded: {df_true.shape}")



✓ Dataset loaded: (45222, 99)


In [None]:
# 3.3 Apply Differential Privacy (reproduce from Task 5)
def dp_randomized_response(categories, epsilon, k=4):
    """Implements randomized response mechanism for differential privacy."""
    categories = np.asarray(categories, dtype=int)
    n = len(categories)
    exp_eps = np.exp(epsilon)
    p = exp_eps / (exp_eps + k - 1)
    
    reports = np.empty_like(categories)
    u = np.random.rand(n)
    same = (u < p)
    reports[same] = categories[same]
    
    num_flip = np.sum(~same)
    if num_flip > 0:
        true_vals = categories[~same]
        alt = np.random.randint(0, k-1, size=num_flip)
        alt += (alt >= true_vals).astype(int)
        reports[~same] = alt
    
    return reports, p

# Create private dataset
df_true['age_sex_cat'] = (df_true['age_binary'].astype(int) * 2 + df_true['sex'].astype(int))
np.random.seed(42)
reports, p_truth = dp_randomized_response(df_true['age_sex_cat'], epsilon, k=4)

df_private = df_true.copy()
df_private['age_binary'] = (reports // 2).astype(float)
df_private['sex'] = (reports % 2).astype(float)

df_true.drop(columns=['age_sex_cat'], inplace=True)
df_private.drop(columns=['age_sex_cat'], inplace=True)

print(f"Private dataset created (ε={epsilon}, truth probability: {p_truth:.1%})")

✓ Private dataset created (ε=1.0, truth probability: 47.5%)


In [None]:
# 3.4 Identify Confident Mistakes
X_private = df_private[feature_names].values
y_private = df_private['income'].values

X_train, X_test, y_train, y_test = train_test_split(
    X_private, y_private, test_size=0.3, random_state=1, stratify=y_private
)

df_private_train, df_private_test = train_test_split(
    df_private, test_size=0.3, random_state=1, stratify=df_private['income']
)

# Get predictions
X_test_scaled = scaler_private.transform(X_test)
y_pred = clf_private.predict(X_test_scaled)
y_proba = clf_private.predict_proba(X_test_scaled)[:, 1]

# Find confident mistakes
results_df = df_private_test.copy()
results_df['true_label'] = y_test
results_df['prediction'] = y_pred
results_df['probability'] = y_proba
results_df['confidence'] = np.where(y_pred == 1, y_proba, 1 - y_proba)

mistakes = results_df[results_df['true_label'] != results_df['prediction']]
confident_mistakes = mistakes[mistakes['confidence'] > 0.80].sort_values(by='confidence', ascending=False)
top_10_mistakes = confident_mistakes.head(10)

print(f"Found {len(confident_mistakes)} confident mistakes (>80% confidence)")
print(f"  Top 10 will be used for LIME explanations")

✓ Found 495 confident mistakes (>80% confidence)
  Top 10 will be used for LIME explanations


## 4. Generate LIME Explanations

We use OmniXAI's LIME implementation to generate local explanations for each confident mistake.

In [None]:
# 4.1 Setup LIME Tabular Explainer
# Identify categorical feature indices
categorical_features = [feature_names.index(f) for f in ['age_binary', 'sex', 'race'] if f in feature_names]
categorical_names = {i: ['0', '1'] for i in categorical_features}

# Create LIME explainer
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train,
    feature_names=feature_names,
    class_names=['≤50K', '>50K'],
    categorical_features=categorical_features,
    categorical_names=categorical_names,
    mode='classification',
    random_state=42
)

# Define prediction function for LIME
def predict_proba_fn(X):
    """Wrapper for the private classifier's predict_proba."""
    X_scaled = scaler_private.transform(X)
    return clf_private.predict_proba(X_scaled)

print("LIME explainer ready")
print(f"  Categorical features: {[feature_names[i] for i in categorical_features]}")

✓ LIME explainer ready
  Categorical features: ['age_binary', 'sex', 'race']


In [None]:
# 4.2 Generate LIME Explanations for Confident Mistakes
mistake_indices = top_10_mistakes.index.tolist()
print(f"Generating LIME explanations for {len(mistake_indices)} instances...")

# Store raw LIME explanation objects
lime_raw_explanations = {}
for idx in mistake_indices:
    instance = df_private_test.loc[idx, feature_names].values
    exp = lime_explainer.explain_instance(
        instance, 
        predict_proba_fn, 
        num_features=10,  # Top 10 features
        num_samples=1000
    )
    lime_raw_explanations[idx] = exp

print("LIME explanations generated")

Generating LIME explanations for 10 instances...
✓ LIME explanations generated


In [None]:
# 4.3 Extract and Store LIME Data in a Structured Format
lime_data_store = {}

for idx in mistake_indices:
    instance_data = top_10_mistakes.loc[idx]
    exp = lime_raw_explanations[idx]
    
    # Extract feature contributions as list of (feature_name, value, importance)
    # LIME returns list of (feature_description, importance) tuples
    lime_features = []
    instance_values = df_private_test.loc[idx, feature_names]
    
    for feature_desc, importance in exp.as_list():
        # feature_desc is like "marital-status=Married" or "age > 0.5"
        # Extract the base feature name
        for fn in feature_names:
            if fn in feature_desc:
                lime_features.append((feature_desc, instance_values[fn], importance))
                break
    
    lime_data_store[idx] = {
        'instance_id': idx,
        'prediction': int(instance_data['prediction']),
        'true_label': int(instance_data['true_label']),
        'confidence': instance_data['confidence'],
        'is_correct': instance_data['prediction'] == instance_data['true_label'],
        'age_binary': instance_data['age_binary'],
        'sex': instance_data['sex'],
        'race': instance_data['race'],
        'lime_features': lime_features,
        'lime_explanation': exp  # Keep raw explanation for potential visualization
    }

print(f"Stored LIME data for {len(lime_data_store)} instances")
print(f"  Available instance IDs: {list(lime_data_store.keys())}")

✓ Stored LIME data for 10 instances
  Available instance IDs: [29306, 9839, 8231, 8113, 34957, 10949, 19246, 5569, 16278, 44264]


## 5. Build LLM Explanation Interface

### 5.1 Simple Function Interface (Baseline)

A straightforward function that takes an instance ID and returns a natural language explanation.

In [None]:
# System prompt that establishes the LLM's role
SYSTEM_PROMPT = """You are an AI assistant that explains machine learning predictions in simple, human-friendly language. 

You have access to LIME explanations that show which features influenced a specific prediction and by how much. Positive scores push toward predicting HIGH income (>$50K), negative scores push toward LOW income (≤$50K).

Guidelines:
- Use plain language, avoid technical jargon
- Explain feature contributions in terms of real-world meaning
- Focus on the top 3-5 most important factors
- Be honest if the prediction was wrong
- Provide context about why certain features matter for income prediction
- Keep responses concise but informative (2-3 paragraphs)

The model predicts whether someone earns more than $50,000 per year based on census data."""

def build_instance_context(instance_id):
    """Build context string for a specific instance."""
    if instance_id not in lime_data_store:
        return None
    
    data = lime_data_store[instance_id]
    
    # Format prediction info
    pred_class = ">$50K (High Income)" if data['prediction'] == 1 else "≤$50K (Low Income)"
    true_class = ">$50K (High Income)" if data['true_label'] == 1 else "≤$50K (Low Income)"
    status = "CORRECT" if data['is_correct'] else "INCORRECT"
    
    # Format demographics
    age_desc = "older (>38 years)" if data['age_binary'] == 1.0 else "younger (≤38 years)"
    sex_desc = "Male" if data['sex'] == 1.0 else "Female"
    race_desc = "White" if data['race'] == 1.0 else "Non-White"
    
    # Format LIME features (top 10)
    lime_features = data['lime_features'][:10]
    lime_str = "\n".join([
        f"  {i+1}. {feat[0]}: value={feat[1]}, importance={feat[2]:.4f} ({'pushes toward HIGH' if feat[2] > 0 else 'pushes toward LOW'})"
        for i, feat in enumerate(lime_features)
    ])
    
    context = f"""PREDICTION DETAILS:
- Instance ID: {instance_id}
- Model Predicted: {pred_class}
- Confidence: {data['confidence']*100:.1f}%
- Actual Income: {true_class}
- Prediction Status: {status}

PERSON'S DEMOGRAPHICS:
- Age: {age_desc}
- Sex: {sex_desc}
- Race: {race_desc}

LIME EXPLANATION (features that influenced this prediction, ranked by importance):
{lime_str}"""
    
    return context

print("Context builder ready")

✓ Context builder ready


In [None]:
def explain_instance(instance_id: int) -> str:
    """
    Generate a human-friendly explanation for a specific prediction.
    
    Args:
        instance_id: The ID of the instance to explain (from confident mistakes)
    
    Returns:
        Natural language explanation of the prediction
    """
    # Build context
    context = build_instance_context(instance_id)
    if context is None:
        return f"Error: Instance {instance_id} not found. Available IDs: {list(lime_data_store.keys())}"
    
    # Build the prompt
    user_message = f"""Based on the following LIME explanation data, explain why the model made this prediction in plain language that a non-technical person could understand.

{context}

Please explain:
1. What the model predicted and how confident it was
2. The main factors that led to this prediction
3. Whether the prediction was correct, and if not, why the model might have been misled"""
    
    try:
        completion = client.chat.completions.create(
            model=MODEL_NAME,
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": user_message}
            ],
            temperature=0.7,
            max_tokens=500
        )
        return completion.choices[0].message.content
    except Exception as e:
        return f"Error generating explanation: {e}"

print("explain_instance() function ready")

✓ explain_instance() function ready


### 5.2 Test the Simple Interface

Let's test with one of the confident mistakes.

In [21]:
# Test with the first confident mistake
first_instance_id = mistake_indices[0]
print(f"Explaining instance {first_instance_id}...")
print("=" * 70)

explanation = explain_instance(first_instance_id)
print(explanation)

Explaining instance 29306...
The model incorrectly predicted that this person earns over $50,000 per year with 100% confidence. Unfortunately, their actual income is $50,000 or less. This means the model made a mistake!

Here's what influenced the prediction: The biggest factor pushing the model towards a high-income prediction was a large "capital gain" - which represents profit from selling stocks or other assets. A high capital gain strongly suggests higher income. Surprisingly, several factors related to their country of origin (Italy, Cuba, England, Cambodia, Hungary) and education level (Preschool, Doctorate, Prof-school) pulled the model *away* from a high-income prediction, but these were outweighed by the impact of the capital gain. It's common for income models to consider where someone is originally from and their educational background - people who immigrated from certain countries or have lower levels of education sometimes earn less on average.

Because the model focused 

In [22]:
# Test with another instance
second_instance_id = mistake_indices[1]
print(f"Explaining instance {second_instance_id}...")
print("=" * 70)

explanation = explain_instance(second_instance_id)
print(explanation)

Explaining instance 9839...
Okay, let's break down why the model made its prediction for this person.

The model incorrectly predicted that this individual earns $50,000 or less per year with a very high confidence of 99.9%. However, in reality, they actually earn more than $50,000! The main reason for this misjudgment seems to be the lack of capital gains (profit from investments).  The model strongly associated zero capital gains with lower incomes. Additionally, the person's country of origin – specifically being from Italy, France, and Taiwan – also contributed negatively toward a low-income prediction according to the model. 

Interestingly, some factors slightly pushed the prediction towards higher income, like having an occupation as a "Private house servant" or having only preschool education. The fact that this person doesn’t receive payment for their work (workclass=Without-pay) also pulled the prediction toward lower income. It's likely the negative impact of zero capital ga

### 5.3 Interactive Chat Interface (Advanced)

A multi-turn conversation interface that maintains context and supports follow-up questions.

In [None]:
class ExplainabilityChat:
    """Interactive chat interface for ML explanations."""
    
    def __init__(self):
        self.messages = [{"role": "system", "content": SYSTEM_PROMPT}]
        self.current_instance = None
        
        # Build a summary of available data for the LLM
        self.available_instances = list(lime_data_store.keys())
        
    def set_instance(self, instance_id: int) -> str:
        """Set the current instance for discussion."""
        if instance_id not in lime_data_store:
            return f"Instance {instance_id} not found. Available: {self.available_instances}"
        
        self.current_instance = instance_id
        context = build_instance_context(instance_id)
        
        # Add context to conversation
        context_message = f"[Context loaded for Instance {instance_id}]\n\n{context}"
        self.messages.append({"role": "system", "content": context_message})
        
        return f"Loaded instance {instance_id}. You can now ask questions about this prediction."
    
    def chat(self, user_message: str) -> str:
        """Send a message and get a response."""
        if self.current_instance is None and "instance" not in user_message.lower():
            return "Please first select an instance using set_instance(id), or ask about a specific instance."
        
        self.messages.append({"role": "user", "content": user_message})
        
        try:
            completion = client.chat.completions.create(
                model=MODEL_NAME,
                messages=self.messages,
                temperature=0.7,
                max_tokens=500
            )
            response = completion.choices[0].message.content
            self.messages.append({"role": "assistant", "content": response})
            return response
        except Exception as e:
            return f"Error: {e}"
    
    def reset(self):
        """Reset the conversation."""
        self.messages = [{"role": "system", "content": SYSTEM_PROMPT}]
        self.current_instance = None
        return "Conversation reset."
    
    def list_instances(self):
        """List available instances with summary."""
        print("Available instances for explanation:")
        print("-" * 60)
        for idx in self.available_instances:
            data = lime_data_store[idx]
            pred = ">50K" if data['prediction'] == 1 else "≤50K"
            true = ">50K" if data['true_label'] == 1 else "≤50K"
            status = "" if data['is_correct'] else ""
            print(f"  ID {idx}: Predicted {pred}, Actually {true} {status} (Conf: {data['confidence']*100:.1f}%)")

# Initialize the chat interface
chat = ExplainabilityChat()
print("Interactive chat interface ready")
print(f"  Available instances: {chat.available_instances}")

✓ Interactive chat interface ready
  Available instances: [29306, 9839, 8231, 8113, 34957, 10949, 19246, 5569, 16278, 44264]


In [24]:
# List all available instances
chat.list_instances()

Available instances for explanation:
------------------------------------------------------------
  ID 29306: Predicted >50K, Actually ≤50K ✗ (Conf: 100.0%)
  ID 9839: Predicted ≤50K, Actually >50K ✗ (Conf: 99.9%)
  ID 8231: Predicted ≤50K, Actually >50K ✗ (Conf: 99.9%)
  ID 8113: Predicted ≤50K, Actually >50K ✗ (Conf: 99.7%)
  ID 34957: Predicted ≤50K, Actually >50K ✗ (Conf: 99.5%)
  ID 10949: Predicted ≤50K, Actually >50K ✗ (Conf: 99.5%)
  ID 19246: Predicted >50K, Actually ≤50K ✗ (Conf: 99.2%)
  ID 5569: Predicted ≤50K, Actually >50K ✗ (Conf: 99.1%)
  ID 16278: Predicted ≤50K, Actually >50K ✗ (Conf: 99.1%)
  ID 44264: Predicted ≤50K, Actually >50K ✗ (Conf: 99.0%)


### 5.4 Demo: Interactive Chat Session

Let's demonstrate a multi-turn conversation about a prediction.

In [None]:
# Start a new conversation about an instance
chat.reset()
instance_to_discuss = mistake_indices[0]
print(chat.set_instance(instance_to_discuss))

✓ Loaded instance 29306. You can now ask questions about this prediction.


In [26]:
# First question
response = chat.chat("Why did the model predict this person would have high income?")
print("User: Why did the model predict this person would have high income?")
print("-" * 60)
print(f"Assistant: {response}")

User: Why did the model predict this person would have high income?
------------------------------------------------------------
Assistant: The model incorrectly predicted that this individual earns over $50,000 a year. It seems the biggest factor pushing the prediction towards a high income was their capital gain – essentially, profit made from selling stocks or other assets. A large capital gain like theirs (over $34,000) often signals higher overall wealth and income. Additionally, the model considered their education level; having attended preschool nudged the prediction slightly toward a higher income.

However, several factors pulled the prediction *down* towards a lower income. The model penalized this person for being born in Italy or Cuba, and also for not having advanced degrees like a doctorate or attending prestigious schools such as professional school. Being married also lowered the predicted income, which can sometimes happen because traditionally, one partner might be t

In [27]:
# Follow-up question (uses conversation history)
response = chat.chat("Was this prediction correct? If not, what went wrong?")
print("User: Was this prediction correct? If not, what went wrong?")
print("-" * 60)
print(f"Assistant: {response}")

User: Was this prediction correct? If not, what went wrong?
------------------------------------------------------------
Assistant: No, this prediction was incorrect. The model predicted the person earns over $50,000 per year (high income), but their actual income is $50,000 or less (low income).

Here's what likely went wrong: While a large capital gain (profit from investments) strongly suggested a high income to the model, other factors pulled the prediction downwards. These included the person’s country of origin (Italy and Cuba), their education level (only preschool), and marital status (being married). The model gave too much weight to these negative influences compared to the positive influence of the capital gain, leading it astray. Essentially, the model focused on a few key indicators but didn't fully account for the nuances that affect someone's income.


In [28]:
# Another follow-up
response = chat.chat("In simple terms, what is the single most important factor that influenced this prediction?")
print("User: In simple terms, what is the single most important factor that influenced this prediction?")
print("-" * 60)
print(f"Assistant: {response}")

User: In simple terms, what is the single most important factor that influenced this prediction?
------------------------------------------------------------
Assistant: The single biggest thing influencing the model’s incorrect prediction was the person's **capital gain**, which was over $34,000. This large profit from investments strongly suggested a high income to the model, but unfortunately wasn't enough to overcome other negative factors.


## 6. Utility Functions

Additional helper functions to explore different aspects of the model's behavior.

In [None]:
def explain_all_instances():
    """Generate explanations for all confident mistakes and display them."""
    print("=" * 70)
    print("GENERATING EXPLANATIONS FOR ALL CONFIDENT MISTAKES")
    print("=" * 70)
    
    for idx in lime_data_store.keys():
        data = lime_data_store[idx]
        pred = ">$50K" if data['prediction'] == 1 else "≤$50K"
        true = ">$50K" if data['true_label'] == 1 else "≤$50K"
        
        print(f"\n{'='*70}")
        print(f"Instance {idx}: Predicted {pred}, Actually {true}")
        print(f"Confidence: {data['confidence']*100:.1f}%")
        print("=" * 70)
        
        explanation = explain_instance(idx)
        print(explanation)
        print()

def compare_instances(id1: int, id2: int):
    """Compare explanations for two instances."""
    prompt = f"""Compare the following two predictions and explain the key differences in why the model made different predictions for each person.

INSTANCE 1:
{build_instance_context(id1)}

INSTANCE 2:
{build_instance_context(id2)}

Please explain:
1. What factors led to different predictions for these two people?
2. Are there any surprising similarities or differences?
3. What does this tell us about how the model makes decisions?"""
    
    try:
        completion = client.chat.completions.create(
            model=MODEL_NAME,
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": prompt}
            ],
            temperature=0.7,
            max_tokens=600
        )
        return completion.choices[0].message.content
    except Exception as e:
        return f"Error: {e}"

print("Utility functions ready")

✓ Utility functions ready


In [30]:
# Compare two instances
id1, id2 = mistake_indices[0], mistake_indices[1]
print(f"Comparing instances {id1} and {id2}...")
print("=" * 70)

comparison = compare_instances(id1, id2)
print(comparison)

Comparing instances 29306 and 9839...
Okay, let's break down why the model got these two predictions wrong and what we can learn from it. Both instances were incorrectly classified – predicting low income for someone who earned high income (Instance 1) and vice versa (Instance 2).

**What led to different predictions?** The biggest difference lies in how the model interpreted "capital gain." For Instance 1, a large capital gain (profit from selling an asset like stocks) strongly pushed the prediction *toward* a high income. This was the most important factor influencing the decision. Conversely, for Instance 2, the lack of any capital gain heavily influenced the model to predict a low income - and this had the largest influence on the prediction. Beyond that, both instances saw several "native country" features pulling in opposite directions – some countries were associated with higher incomes (pushing towards high income), while others were linked to lower incomes.

**Similarities & S

## 7. Conclusion

### Summary of Task 6: Explainability & LLMs

In this notebook, we built a **natural language interface** for explaining individual predictions made by our differentially private classifier. The key components:

**Architecture Choices:**
- **LIME** (Local Interpretable Model-agnostic Explanations) - Provides instance-specific feature importance that directly answers "why this prediction"
- **Direct Context Injection** (not RAG) - Our explanation data is structured and small enough to inject directly into prompts
- **LM Studio** with local LLM - Ensures data privacy by keeping everything local

**Two-Tier Interface:**
1. **Simple Function** (`explain_instance()`) - Quick, one-shot explanations for any instance
2. **Chat Class** (`ExplainabilityChat`) - Interactive, conversational interface with memory

**Key Insights from the Explanations:**
- The private classifier relies heavily on **marital-status** and **relationship** features
- Economic factors like **capital-gain** and **education** also play significant roles
- Sensitive attributes (age, sex) have lower direct influence (by design, due to fairness constraints)

**Limitations:**
- LIME explanations are approximations of the model's true decision boundary
- LLM responses depend on prompt quality and model capabilities
- Local LLMs may have lower performance than cloud alternatives

This approach successfully bridges the gap between technical ML explanations and human-understandable language, making the model's decisions accessible to non-technical stakeholders.