# Loan Eligibility Prediction with LIME Explanations

This notebook demonstrates building a machine learning model to predict loan eligibility and then uses LIME (Local Interpretable Model-agnostic Explanations) to understand the model's decisions for individual applications.

**Workflow:**
1.  **Setup:** Install necessary libraries and configure Kaggle API access.
2.  **Data Loading:** Load the loan eligibility dataset from Kaggle Hub.
3.  **Data Cleaning & Preparation:** Clean the data, handle missing values, and transform the target variable.
4.  **Preprocessing:** Define and apply preprocessing steps for numerical and categorical features using `ColumnTransformer`.
5.  **Train-Test Split:** Split the data into training and testing sets.
6.  **Model Training:** Train a Gradient Boosting Classifier.
7.  **Model Evaluation:** Evaluate the model's performance.
8.  **XAI with LIME:**
    *   Set up the LIME explainer.
    *   Develop a function to explain individual predictions.
    *   Demonstrate explanations for sample approved and rejected cases.

Let's start by installing the required packages.

## 1. Setup: Install Libraries
We need to install `kagglehub` for dataset access, `kaggle` for the API, `lime` for explainability, and standard data science libraries like `scikit-learn`, `pandas`, and `numpy`. `scipy` is also included as LIME might use it internally.

In [23]:
!pip install kagglehub kaggle lime scikit-learn pandas numpy scipy



## 2. Configure Kaggle API Access
To download datasets from Kaggle using `kagglehub` or the `kaggle` CLI, we need to authenticate. This cell configures Kaggle API access using credentials stored in Colab Secrets.

**Before running this, ensure you have:**
1. Added your Kaggle username as a Colab Secret named `KAGGLE_USERNAME`.
2. Added your Kaggle API key (from your `kaggle.json` file) as a Colab Secret named `KAGGLE_KEY`.

In [24]:
from google.colab import userdata
import os

os.environ['KAGGLE_USERNAME'] = userdata.get('KAGGLE_USERNAME')
os.environ['KAGGLE_KEY'] = userdata.get('KAGGLE_KEY')

# Create the .kaggle directory and kaggle.json
# This is often necessary for the Kaggle API tools to find the credentials.
if not os.path.exists("/content/.kaggle"):
    os.makedirs("/content/.kaggle")

kaggle_json_content = f'{{"username":"{os.environ["KAGGLE_USERNAME"]}","key":"{os.environ["KAGGLE_KEY"]}"}}'
with open("/content/.kaggle/kaggle.json", "w") as f:
    f.write(kaggle_json_content)
os.chmod("/content/.kaggle/kaggle.json", 600) # Set appropriate permissions
os.environ['KAGGLE_CONFIG_DIR'] = "/content/.kaggle" # Point Kaggle tools to this directory

print("Kaggle credentials configured from Colab Secrets.")

Kaggle credentials configured from Colab Secrets.


## 3. Import Libraries and Load Data

Now we import all necessary Python libraries and load the "Loan Eligible Dataset" from Kaggle Hub. We'll be using the `loan-train.csv` file as it contains the target variable `Loan_Status`.

The `KaggleDatasetAdapter.PANDAS` is explicitly used with `kagglehub.load_dataset` to ensure the data is loaded directly as a Pandas DataFrame.

In [25]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler, OrdinalEncoder, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import lime
import lime.lime_tabular
import warnings
import kagglehub
from kagglehub import KaggleDatasetAdapter # Import KaggleDatasetAdapter
import scipy # Import scipy

warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)

# --- 1. Load Data from Kaggle Hub ---
print("--- Loading dataset from Kaggle Hub ---")
try:
    df = kagglehub.load_dataset(
        handle="vikasukani/loan-eligible-dataset",
        path="loan-train.csv", # Specify the training file
        adapter=KaggleDatasetAdapter.PANDAS # Explicitly add the adapter
    )
    print("Dataset loaded successfully.")
    print("First 5 records:", df.head())
except Exception as e:
    print(f"Error loading dataset from Kaggle Hub: {e}")
    print("Please ensure:")
    print("1. You have 'kagglehub' and 'kaggle' installed.")
    print("2. Your Kaggle API token is set up correctly in Colab Secrets and configured via the code above.")
    raise # Re-raise the exception to prevent NameError later

--- Loading dataset from Kaggle Hub ---


  df = kagglehub.load_dataset(


Dataset loaded successfully.
First 5 records:     Loan_ID Gender Married Dependents     Education Self_Employed  \
0  LP001002   Male      No          0      Graduate            No   
1  LP001003   Male     Yes          1      Graduate            No   
2  LP001005   Male     Yes          0      Graduate           Yes   
3  LP001006   Male     Yes          0  Not Graduate            No   
4  LP001008   Male      No          0      Graduate            No   

   ApplicantIncome  CoapplicantIncome  LoanAmount  Loan_Amount_Term  \
0             5849                0.0         NaN             360.0   
1             4583             1508.0       128.0             360.0   
2             3000                0.0        66.0             360.0   
3             2583             2358.0       120.0             360.0   
4             6000                0.0       141.0             360.0   

   Credit_History Property_Area Loan_Status  
0             1.0         Urban           Y  
1             1.0   

## 4. Initial Data Cleaning & Target Variable Preparation

In this step, we perform some initial data cleaning:
*   Create a copy of the original DataFrame (`df_orig_for_lime`) which will be used later by LIME to show explanations with original feature values.
*   Drop the `Loan_ID` column as it's an identifier and not useful for model training.
*   Map the target variable `Loan_Status` from 'Y'/'N' to 1/0.
*   Handle the `Dependents` column: replace '3+' with '3' and convert to a numeric type.
*   Separate features (X) and the target variable (y).

In [26]:
# --- 2. Initial Data Cleaning & Target Variable Preparation ---
print("\n--- Initial Data Cleaning & Target Preparation ---")
df_orig_for_lime = df.copy() # Keep a version for LIME with original string values

# Drop Loan_ID as it's an identifier
if 'Loan_ID' in df.columns:
    df = df.drop('Loan_ID', axis=1)
if 'Loan_ID' in df_orig_for_lime.columns:
    df_orig_for_lime = df_orig_for_lime.drop('Loan_ID', axis=1)


# Convert target variable 'Loan_Status' (Y/N) to binary (1/0)
if 'Loan_Status' not in df.columns:
    print("Error: 'Loan_Status' column not found in the loaded dataset.")
    raise ValueError("'Loan_Status' column not found in the loaded dataset.")


df['Loan_Status'] = df['Loan_Status'].map({'Y': 1, 'N': 0})
df_orig_for_lime['Loan_Status'] = df_orig_for_lime['Loan_Status'].map({'Y': 1, 'N': 0})

print(f"Loan_Status distribution:\n{df['Loan_Status'].value_counts(normalize=True)}")

# Handle '3+' in Dependents
if 'Dependents' in df.columns:
    df['Dependents'] = df['Dependents'].replace('3+', '3')
    df_orig_for_lime['Dependents'] = df_orig_for_lime['Dependents'].replace('3+', '3')
    df['Dependents'] = pd.to_numeric(df['Dependents'], errors='coerce')
    df_orig_for_lime['Dependents'] = pd.to_numeric(df_orig_for_lime['Dependents'], errors='coerce')


# Define features (X) and target (y)
X = df.drop('Loan_Status', axis=1)
y = df['Loan_Status']

# X_orig_lime will be used for LIME explainer's training_data and for explaining instances
# It should not contain the target variable.
X_orig_lime = df_orig_for_lime.drop('Loan_Status', axis=1, errors='ignore')


--- Initial Data Cleaning & Target Preparation ---
Loan_Status distribution:
Loan_Status
1    0.687296
0    0.312704
Name: proportion, dtype: float64


## 5. Preprocessing Setup

We define how different types of features will be preprocessed.
*   **Numerical Features:** Imputed with the median value and then scaled using `StandardScaler`.
*   **Categorical Features (Ordinal):** These are typically binary or have a natural order after imputation (e.g., 'Gender', 'Married'). They are imputed with the most frequent value and then encoded using `OrdinalEncoder`.
*   **Categorical Features (One-Hot Encoded):** These are nominal categorical features with multiple categories (e.g., 'Property_Area'). They are imputed with the most frequent value and then transformed using `OneHotEncoder`.

A `ColumnTransformer` is used to apply these different transformations to the appropriate columns.

In [27]:
# --- 3. Preprocessing using ColumnTransformer ---
print("\n--- Setting up Preprocessing ---")

numerical_features = ['ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term']
if 'Dependents' in X.columns and pd.api.types.is_numeric_dtype(X['Dependents']):
    if 'Dependents' not in numerical_features:
        numerical_features.append('Dependents')
if 'Credit_History' in X.columns and pd.api.types.is_numeric_dtype(X['Credit_History']):
     if 'Credit_History' not in numerical_features:
        numerical_features.append('Credit_History')

categorical_features_for_ohe = []
categorical_features_for_ordinal = []
for col in X.columns:
    if col not in numerical_features:
        if X[col].nunique() > 2 and X[col].dtype == 'object':
             categorical_features_for_ohe.append(col)
        elif X[col].dtype == 'object':
            categorical_features_for_ordinal.append(col)

print(f"Numerical features: {numerical_features}")
print(f"Categorical (Ordinal): {categorical_features_for_ordinal}")
print(f"Categorical (OHE): {categorical_features_for_ohe}")

numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])
ordinal_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('ordinal', OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1))
])
ohe_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))
])

transformers_list = []
if numerical_features:
    transformers_list.append(('num', numerical_transformer, numerical_features))
if categorical_features_for_ordinal:
    transformers_list.append(('ord', ordinal_transformer, categorical_features_for_ordinal))
if categorical_features_for_ohe:
    transformers_list.append(('ohe', ohe_transformer, categorical_features_for_ohe))

if not transformers_list:
    print("Error: No features identified for preprocessing. Check feature lists.")
    raise ValueError("No features identified for preprocessing. Check feature lists.")

preprocessor = ColumnTransformer(transformers=transformers_list, remainder='drop')


--- Setting up Preprocessing ---
Numerical features: ['ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term', 'Dependents', 'Credit_History']
Categorical (Ordinal): ['Gender', 'Married', 'Education', 'Self_Employed']
Categorical (OHE): ['Property_Area']


## 6. Train-Test Split

The dataset is split into training and testing sets. We use `X_orig_lime` (features before ColumnTransformer processing) for this split because LIME will need to see feature values in their original, interpretable form. The target `y` is also split accordingly. `stratify=y` ensures that the proportion of the target classes is maintained in both splits.

In [28]:
# --- 4. Train-Test Split ---
X_train_orig, X_test_orig, y_train, y_test = train_test_split(
    X_orig_lime, y, test_size=0.25, random_state=42, stratify=y
)
print(f"Shape of X_train_orig: {X_train_orig.shape}")
print(f"Shape of X_test_orig: {X_test_orig.shape}")

Shape of X_train_orig: (460, 11)
Shape of X_test_orig: (154, 11)


## 7. Fit Preprocessor and Transform Data

The `preprocessor` (ColumnTransformer) defined earlier is now:
1.  **Fitted** on the training data (`X_train_orig`). This means the imputers learn the statistics (median, mode) and the scalers/encoders learn their parameters *only from the training data* to prevent data leakage.
2.  Used to **transform** both the training data (`X_train_orig`) and the test data (`X_test_orig`) into their processed versions (`X_train_processed`, `X_test_processed`). These processed versions are what the model will be trained and evaluated on.

In [29]:
# --- 5. Fit Preprocessor and Transform Data ---
print("\n--- Fitting Preprocessor and Transforming Data ---")
preprocessor.fit(X_train_orig)

X_train_processed = preprocessor.transform(X_train_orig)
X_test_processed = preprocessor.transform(X_test_orig)

try:
    feature_names_out = preprocessor.get_feature_names_out()
    print(f"Number of features after processing: {len(feature_names_out)}")
    # print(f"Processed feature names: {feature_names_out[:15]}...") # Print a sample
except Exception as e:
    print(f"Could not get feature names out automatically. Error: {e}")
    feature_names_out = None

print(f"Shape of X_train_processed: {X_train_processed.shape}")
print(f"Shape of X_test_processed: {X_test_processed.shape}")


--- Fitting Preprocessor and Transforming Data ---
Number of features after processing: 13
Shape of X_train_processed: (460, 13)
Shape of X_test_processed: (154, 13)


## 8. Train the Machine Learning Model

We'll use a `GradientBoostingClassifier` for this loan prediction task. It's a powerful ensemble model that often provides good performance. The model is trained on the `X_train_processed` data and `y_train` labels.

In [30]:
# --- 6. Train the Model ---
print("\n--- Training the Model ---")
model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
model.fit(X_train_processed, y_train)
print("Model training complete.")


--- Training the Model ---
Model training complete.


## 9. Evaluate the Model

The trained model is evaluated on the `X_test_processed` data. We'll look at accuracy and a detailed classification report (precision, recall, F1-score).

In [31]:
# --- 7. Evaluate the Model ---
y_pred = model.predict(X_test_processed)
print("\n--- Model Evaluation (on Real Labeled Data) ---")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['Rejected (0)', 'Approved (1)']))
print("-" * 30)


--- Model Evaluation (on Real Labeled Data) ---
Accuracy: 0.8247

Classification Report:
              precision    recall  f1-score   support

Rejected (0)       0.82      0.56      0.67        48
Approved (1)       0.83      0.94      0.88       106

    accuracy                           0.82       154
   macro avg       0.82      0.75      0.77       154
weighted avg       0.82      0.82      0.81       154

------------------------------


## 10. XAI with LIME: Setup

Now we set up the LIME (Local Interpretable Model-agnostic Explanations) component.

**Key LIME Components:**
*   **`predict_fn_for_lime`:** A crucial function that LIME uses. It takes perturbed data samples (in their original, un-transformed format) from LIME, applies the *entire preprocessing pipeline* (the same one used for training), and then returns the model's probability predictions for these processed samples.
*   **`LimeTabularExplainer`:** The main LIME object.
    *   `training_data`: LIME uses this to understand the distribution of features and to generate meaningful perturbations. It needs to be numeric for LIME's internal statistics, so we create a temporarily numerically-encoded version (`X_train_orig_numeric_for_lime_stats`) for this argument. String features are converted to category codes.
    *   `feature_names`: Original names of the features.
    *   `class_names`: Names for the target classes (e.g., 'Rejected', 'Approved').
    *   `categorical_features`: Indices of columns that LIME should treat as categorical when perturbing.
    *   `mode`: 'classification' for this task.
    *   `discretize_continuous=False`: This was changed from `True`. When `True`, LIME attempts to discretize continuous features into bins, which sometimes can lead to errors with `scipy.stats.truncnorm` if the data distribution is unusual after LIME's internal scaling of the provided `training_data` (especially if it contains strings that get converted oddly by LIME's internal standard scaler). Setting it to `False` makes LIME treat continuous features as is (or rely on its default perturbation for them), potentially avoiding the error. LIME can still provide meaningful explanations for continuous features by showing their direct impact.

**Handling Categorical Data for LIME's `training_data`:**
LIME's `LimeTabularExplainer` expects the `training_data` argument (used for internal statistics and sampling) to be numeric. If your original training data (`X_train_orig`) contains string-based categorical features, passing it directly can cause errors in LIME's internal scaler.
To address this, we:
1. Create `X_train_orig_numeric_for_lime_stats` by copying `X_train_orig`.
2. In this copy, we convert string categorical columns to their pandas category codes (`.astype('category').cat.codes`). This provides a numeric representation.
3. We pass this `X_train_orig_numeric_for_lime_stats.values` to `LimeTabularExplainer`.
4. Crucially, the `predict_fn_for_lime` still expects and processes data in the *original format* (with strings for categoricals), and our `preprocessor` handles the actual encoding for the model.
5. The `data_row` passed to `explainer.explain_instance` must also be numerically encoded in the same way as `X_train_orig_numeric_for_lime_stats` for consistency with LIME's internal state.

In [32]:
# --- 8. XAI Component using LIME ---
print("\n--- Setting up LIME Explainer ---")

def predict_fn_for_lime(data_lime_numpy):
    data_lime_df = pd.DataFrame(data_lime_numpy, columns=X_train_orig.columns)
    for col in X_train_orig.columns:
         if pd.api.types.is_numeric_dtype(X_train_orig[col].dtype):
             data_lime_df[col] = pd.to_numeric(data_lime_df[col], errors='coerce')
    data_processed = preprocessor.transform(data_lime_df)
    return model.predict_proba(data_processed)

lime_categorical_feature_names = categorical_features_for_ordinal + categorical_features_for_ohe
lime_categorical_features_indices = [
    X_train_orig.columns.get_loc(col) for col in X_train_orig.columns
    if col in lime_categorical_feature_names
]
lime_categorical_features_indices = sorted(list(set(lime_categorical_features_indices)))

X_train_orig_numeric_for_lime_stats = X_train_orig.copy()
categorical_mappings = {} # To store mappings for encoding instances later
for col in X_train_orig_numeric_for_lime_stats.columns:
    if X_train_orig_numeric_for_lime_stats[col].dtype == 'object':
         # Ensure NaN values are handled before converting to category codes
         # Pandas cat.codes handles NaNs by assigning -1 by default
         X_train_orig_numeric_for_lime_stats[col] = X_train_orig_numeric_for_lime_stats[col].astype('category')
         categorical_mappings[col] = dict(enumerate(X_train_orig_numeric_for_lime_stats[col].cat.categories))
         X_train_orig_numeric_for_lime_stats[col] = X_train_orig_numeric_for_lime_stats[col].cat.codes


print(f"LIME categorical feature indices (on original columns): {lime_categorical_features_indices}")

explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train_orig_numeric_for_lime_stats.values,
    feature_names=X_train_orig.columns.tolist(),
    class_names=['Rejected', 'Approved'],
    categorical_features=lime_categorical_features_indices,
    mode='classification',
    discretize_continuous=False # Changed to False
)
print("LIME explainer setup complete.")


--- Setting up LIME Explainer ---
LIME categorical feature indices (on original columns): [0, 1, 3, 4, 10]
LIME explainer setup complete.


## 11. XAI with LIME: Explain Individual Decisions

This function, `explain_instance_lime`, takes a specific instance from the test set, gets the model's prediction, and then uses LIME to generate an explanation.

**Important:** The `data_row` passed to `explainer.explain_instance` must be numerically encoded in the same way as the `training_data` provided to the `LimeTabularExplainer`. We use the same `.astype('category').cat.codes` logic for the instance being explained to achieve this consistency.

The explanation shows:
*   The applicant's original data.
*   The model's prediction and probability.
*   A human-readable summary of the top factors influencing the decision, based on LIME weights.

In [33]:
# --- 9. Explain a Specific Decision ---
import re # For robust feature name parsing

def explain_instance_lime(instance_index_in_test_orig, X_test_orig_df, model, explainer, preprocessor_pipeline, categorical_mappings, num_features_to_show=7):
    instance_orig_series = X_test_orig_df.iloc[instance_index_in_test_orig]

    print(f"\n--- Explaining Instance (Original Test Index: {instance_index_in_test_orig}, DataFrame Index: {instance_orig_series.name}) ---")
    print("Applicant's Original Data (before model processing):")
    for col, val in instance_orig_series.items():
        print(f"  {col}: {val}")

    instance_for_prediction_df = instance_orig_series.to_frame().T
    instance_for_prediction_df = instance_for_prediction_df[X_train_orig.columns] # Ensure column order

    instance_processed = preprocessor_pipeline.transform(instance_for_prediction_df)
    prediction_proba = model.predict_proba(instance_processed)
    prediction = model.predict(instance_processed)[0]
    predicted_class = 'Approved' if prediction == 1 else 'Rejected'

    print(f"\nModel Prediction: {predicted_class} (Prob Approved: {prediction_proba[0,1]:.2f}, Prob Rejected: {prediction_proba[0,0]:.2f})")

    # Numerically encode the instance for LIME's explain_instance data_row
    instance_numeric_for_lime_stats_df = instance_orig_series.to_frame().T.copy()
    instance_numeric_for_lime_stats_df = instance_numeric_for_lime_stats_df[X_train_orig.columns] # Ensure column order

    for col in instance_numeric_for_lime_stats_df.columns:
        if X_train_orig[col].dtype == 'object': # Check original dtype to decide if encoding is needed
            # Reverse lookup: find code for the instance's value using the mapping
            # This assumes categorical_mappings stores {code: category_value} or we need to invert it
            # Let's adjust categorical_mappings to be {category_value: code} for easier lookup
            # Or, better, stick to the .astype('category').cat.codes approach for the instance too.

            temp_series = instance_numeric_for_lime_stats_df[col].astype('category')
            # Ensure categories are consistent with training data if possible
            if col in categorical_mappings: # Check if this column had a mapping from training
                 # Re-apply categories from training data to handle new/missing values in instance consistently
                 # This can be tricky if instance has values not in training categories for that column
                 # For simplicity, we'll use the instance's own categories for now,
                 # but a more robust solution would align with training categories.
                 # A simpler way: Use the same logic as for training_data
                 try:
                     # Use the categories from X_train_orig if available
                     trained_categories = X_train_orig[col].astype('category').cat.categories
                     instance_numeric_for_lime_stats_df[col] = pd.Categorical(instance_numeric_for_lime_stats_df[col], categories=trained_categories).codes
                 except Exception: # Fallback if categories don't match
                     instance_numeric_for_lime_stats_df[col] = instance_numeric_for_lime_stats_df[col].astype('category').cat.codes
            else: # If no specific mapping (e.g. not an object type in training)
                 instance_numeric_for_lime_stats_df[col] = instance_numeric_for_lime_stats_df[col].astype('category').cat.codes


    explanation = explainer.explain_instance(
        data_row=instance_numeric_for_lime_stats_df.iloc[0].values,
        predict_fn=predict_fn_for_lime,
        num_features=num_features_to_show,
        top_labels=1
    )

    print("\nLIME Explanation (Top Factors contributing to the prediction):")
    explained_label_index = explanation.available_labels()[0]
    explained_label_name = explainer.class_names[explained_label_index]
    print(f"Explanation for why the model predicted: '{explained_label_name}'")

    human_readable_explanation = f"The model predicted '{predicted_class}' for applicant (index {instance_orig_series.name}) primarily because:\n"
    for feature_condition, weight in explanation.as_list(label=explained_label_index):
        match = re.match(r'([^<=>]+)\s*[<=>]?.*', feature_condition)
        base_feature_name = match.group(1).strip() if match else feature_condition.split(' ')[0].split('=')[0].split('<')[0].split('>')[0].strip()
        try:
            actual_value = instance_orig_series[base_feature_name]
            value_str = f"(Applicant's value for {base_feature_name}: {actual_value})"
        except KeyError: value_str = ""
        influence_verb = "supported" if weight > 0 else "opposed"
        strength_adv = "strongly" if abs(weight) > 0.1 else "moderately" if abs(weight) > 0.05 else "slightly"
        human_readable_explanation += (f"- The condition '{feature_condition}' {value_str} "
                                       f"{strength_adv} {influence_verb} the '{explained_label_name}' decision (LIME weight: {weight:.3f}).\n")
    print("\n--- Human-Understandable Explanation ---")
    print(human_readable_explanation)

    # For interactive LIME plots in Colab/Jupyter (optional)
    # from IPython.display import display, HTML
    # display(HTML(explanation.as_html()))
    # Or: explanation.show_in_notebook(show_table=True, show_all=False)
    return explanation

# --- Example Usage: Explain a few instances from the original test set ---
if not X_test_orig.empty:
    approved_pred_indices_test = [i for i, p_val in enumerate(y_pred) if p_val == 1]
    if approved_pred_indices_test:
        print("\n" + "="*60 + "\n")
        print("EXPLAINING AN APPROVED CASE:")
        explain_instance_lime(approved_pred_indices_test[0], X_test_orig, model, explainer, preprocessor, categorical_mappings)
    else:
        print("\nNo instances predicted as 'Approved' in the test set to explain.")

    rejected_pred_indices_test = [i for i, p_val in enumerate(y_pred) if p_val == 0]
    if rejected_pred_indices_test:
        print("\n" + "="*60 + "\n")
        print("EXPLAINING A REJECTED CASE:")
        explain_instance_lime(rejected_pred_indices_test[0], X_test_orig, model, explainer, preprocessor, categorical_mappings)
    else:
        print("\nNo instances predicted as 'Rejected' in the test set to explain.")
else:
    print("Test set is empty. Cannot generate explanations.")



EXPLAINING AN APPROVED CASE:

--- Explaining Instance (Original Test Index: 0, DataFrame Index: 194) ---
Applicant's Original Data (before model processing):
  Gender: Male
  Married: No
  Dependents: 0.0
  Education: Graduate
  Self_Employed: No
  ApplicantIncome: 4191
  CoapplicantIncome: 0.0
  LoanAmount: 120.0
  Loan_Amount_Term: 360.0
  Credit_History: 1.0
  Property_Area: Rural

Model Prediction: Approved (Prob Approved: 0.79, Prob Rejected: 0.21)

LIME Explanation (Top Factors contributing to the prediction):
Explanation for why the model predicted: 'Approved'

--- Human-Understandable Explanation ---
The model predicted 'Approved' for applicant (index 194) primarily because:
- The condition 'ApplicantIncome' (Applicant's value for ApplicantIncome: 4191) strongly supported the 'Approved' decision (LIME weight: 0.121).
- The condition 'Credit_History' (Applicant's value for Credit_History: 1.0) strongly supported the 'Approved' decision (LIME weight: 0.102).
- The condition 'Lo

## 12. Conclusion

This notebook demonstrated an end-to-end machine learning workflow for loan eligibility prediction, including data loading, preprocessing, model training, evaluation, and crucially, model explainability using LIME.

LIME helps in understanding the "why" behind individual predictions, which is vital for sensitive applications like loan approvals to ensure fairness, transparency, and to identify potential issues with the model's reasoning.

Further improvements could include:
*   More sophisticated feature engineering.
*   Hyperparameter tuning for the model.
*   Exploring other XAI techniques (e.g., SHAP).
*   Developing a more user-friendly interface for presenting these explanations.