Model Debugging Tools

A comprehensive toolkit for model debugging, behavioral testing, and robustness analysis in machine learning systems. This project focuses on Trust & Safety aspects of AI systems, providing tools to identify potential issues, validate model behavior, and assess robustness against various failure modes.

⚠️ Important Disclaimer

This tool is for research and educational purposes only.

XAI outputs may be unstable or misleading
This tool is NOT a substitute for human judgment
Should NOT be used for regulated decisions without human review
Always validate explanations and consider multiple perspectives

Features

Model Debugging

Learning Curves: Diagnose underfitting/overfitting
Validation Curves: Assess model complexity
Residual Analysis: Analyze prediction errors
Feature Importance: Multiple importance methods
Permutation Importance: Model-agnostic importance

Behavioral Testing

CheckList-style Tests: Systematic model behavior validation
Metamorphic Testing: Consistency under transformations
Data Leakage Detection: Identify training/test contamination
Edge Case Testing: Robustness to extreme inputs

Robustness Analysis

Adversarial Attacks: FGSM, PGD, random attacks
Uncertainty Quantification: Entropy, confidence analysis
Out-of-Distribution Detection: Novelty detection
Calibration Analysis: Reliability diagrams, ECE

Evaluation Metrics

Faithfulness: Deletion, insertion, sufficiency tests
Stability: Kendall's tau, Spearman's rho, IoU
Utility: Simplicity, coherence, completeness
Robustness: Attack success rates, CV stability

Installation

Prerequisites

Python 3.10+
pip or conda

Quick Install

# Clone the repository
git clone https://github.com/kryptologyst/Model-Debugging-Tools.git
cd Model-Debugging-Tools

# Install dependencies
pip install -r requirements.txt

# Or install in development mode
pip install -e .

Development Install

# Install with development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Quick Start

Command Line Interface

# Run comprehensive analysis on Iris dataset
python main.py --dataset iris --model random_forest

# Run with custom configuration
python main.py --dataset synthetic --model logistic_regression --seed 123

# Run specific analyses
python main.py --dataset iris --output-dir results --verbose

Python API

from src.debugging.model_debugger import ModelDebugger
from src.debugging.behavioral_tester import BehavioralTester
from src.debugging.robustness_analyzer import RobustnessAnalyzer
from src.utils.core import Config

# Initialize components
config = Config()
debugger = ModelDebugger(config)
behavioral_tester = BehavioralTester(config)
robustness_analyzer = RobustnessAnalyzer(config)

# Load dataset and train model
X, y, feature_names = debugger.load_dataset("iris")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

model = debugger.train_model(X_train, y_train, "random_forest")

# Run comprehensive debugging
results = debugger.run_comprehensive_debugging("iris", "random_forest")

# Run behavioral tests
behavioral_results = behavioral_tester.run_comprehensive_behavioral_tests(
    model, X_train, X_test, y_train, y_test
)

# Run robustness analysis
robustness_results = robustness_analyzer.adversarial_attack_test(
    model, X_test, y_test, "fgsm", epsilon=0.1
)

Interactive Demo

# Launch Streamlit demo
streamlit run demo/app.py

The demo provides an interactive interface for:

Dataset and model selection
Real-time analysis execution
Visualization of results
Export of findings

Dataset Schema

Supported Datasets

Iris Dataset

Features: 4 (sepal length, sepal width, petal length, petal width)
Samples: 150
Classes: 3 (setosa, versicolor, virginica)
Type: Classification

Synthetic Dataset

Features: Configurable (default: 20)
Samples: Configurable (default: 1000)
Classes: Configurable (default: 2)
Type: Classification
Noise: Configurable

Custom Datasets

To use your own dataset:

# Create dataset metadata
metadata = {
    "feature_names": ["feature_1", "feature_2", ...],
    "target_names": ["class_0", "class_1", ...],
    "feature_types": ["numerical", "categorical", ...],
    "sensitive_attributes": ["gender", "age", ...],
    "monotonic_features": ["feature_1", "feature_3"]
}

# Save metadata
import json
with open("data/meta.json", "w") as f:
    json.dump(metadata, f, indent=2)

Configuration

YAML Configuration

Create configs/default.yaml:

seed: 42
device: "auto"  # auto, cuda, mps, cpu

debugging:
  learning_curve_cv: 5
  validation_curve_cv: 5
  residual_analysis: true
  feature_importance: true
  permutation_importance: true

behavioral_tests:
  checklist_tests: true
  metamorphic_tests: true
  data_leakage_checks: true
  edge_case_tests: true

robustness:
  adversarial_attacks: ["fgsm", "pgd", "random"]
  uncertainty_methods: ["mc_dropout", "deep_ensemble"]
  ood_detection: true
  calibration_analysis: true

evaluation:
  faithfulness_metrics: ["deletion", "insertion", "sufficiency", "necessity"]
  stability_metrics: ["kendall_tau", "spearman_rho", "iou"]
  utility_metrics: ["simplicity", "coherence", "completeness"]

Environment Variables

export MODEL_DEBUGGING_SEED=42
export MODEL_DEBUGGING_DEVICE=cuda
export MODEL_DEBUGGING_LOG_LEVEL=INFO

Training and Evaluation

Model Training

# Train with default settings
python scripts/train.py --dataset iris --model random_forest

# Train with custom parameters
python scripts/train.py \
    --dataset synthetic \
    --model random_forest \
    --n_estimators 200 \
    --max_depth 10 \
    --output-dir models/

Evaluation

# Run comprehensive evaluation
python scripts/evaluate.py --model-path models/model.pkl --dataset iris

# Run specific evaluation metrics
python scripts/evaluate.py \
    --model-path models/model.pkl \
    --dataset iris \
    --metrics faithfulness,stability,robustness

Demo Instructions

Launching the Demo

Install dependencies:
```
pip install streamlit plotly
```
Launch the demo:
```
streamlit run demo/app.py
```
Access the interface:
- Open your browser to http://localhost:8501
- Configure analysis parameters in the sidebar
- Click "Run Analysis" to start

Demo Features

Interactive Configuration: Select datasets, models, and analysis options
Real-time Visualization: Dynamic plots and charts
Comprehensive Results: Multiple analysis tabs
Export Capabilities: Download results and visualizations
Responsive Design: Works on desktop and mobile

Screenshots

The demo includes:

Overview dashboard with key metrics
Model debugging visualizations
Behavioral test results
Robustness analysis charts
Evaluation metrics tables

Limitations and Considerations

Explanation Instability

XAI methods can produce different results for similar inputs
Small changes in data or model can significantly affect explanations
Always validate explanations with multiple methods

Method Limitations

LIME: Local approximations may not reflect global behavior
SHAP: Computationally expensive for large datasets
Grad-CAM: Limited to convolutional neural networks
Permutation Importance: Can be misleading with correlated features

Trust and Safety

Explanations are not causal relationships
Model debugging tools identify symptoms, not root causes
Human expertise is essential for interpretation
Consider domain knowledge and context

Performance Considerations

Some methods are computationally expensive
Large datasets may require sampling or approximation
GPU acceleration recommended for deep learning models
Memory usage scales with dataset size

Contributing

Development Setup

# Clone repository
git clone <repository-url>
cd model-debugging-tools

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Code Style

Formatting: Black (line length 88)
Linting: Ruff
Type hints: Required for all functions
Docstrings: Google/NumPy style

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test categories
pytest tests/test_debugging.py
pytest tests/test_behavioral.py
pytest tests/test_robustness.py

Pre-commit Hooks

# Install hooks
pre-commit install

# Run manually
pre-commit run --all-files

License

MIT License - see LICENSE file for details.

Citation

If you use this tool in your research, please cite:

@software{model_debugging_tools,
  title={Model Debugging Tools},
  author={Kryptologyst},
  year={2026},
  url={https://github.com/kryptologyst/Model-Debugging-Tools}
}

Changelog

Version 1.0.0

Initial release
Model debugging tools
Behavioral testing framework
Robustness analysis
Evaluation metrics
Interactive demo
Comprehensive documentation

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
configs		configs
data		data
demo		demo
logs		logs
models		models
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
0744.py		0744.py
DISCLAIMER.md		DISCLAIMER.md
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Model Debugging Tools

⚠️ Important Disclaimer

Features

Model Debugging

Behavioral Testing

Robustness Analysis

Evaluation Metrics

Installation

Prerequisites

Quick Install

Development Install

Quick Start

Command Line Interface

Python API

Interactive Demo

Dataset Schema

Supported Datasets

Iris Dataset

Synthetic Dataset

Custom Datasets

Configuration

YAML Configuration

Environment Variables

Training and Evaluation

Model Training

Evaluation

Demo Instructions

Launching the Demo

Demo Features

Screenshots

Limitations and Considerations

Explanation Instability

Method Limitations

Trust and Safety

Performance Considerations

Contributing

Development Setup

Code Style

Testing

Pre-commit Hooks

License

Citation

Changelog

Version 1.0.0

Model-Debugging-Tools

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages