Skip to content

kryptologyst/Model-Debugging-Tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model Debugging Tools

A comprehensive toolkit for model debugging, behavioral testing, and robustness analysis in machine learning systems. This project focuses on Trust & Safety aspects of AI systems, providing tools to identify potential issues, validate model behavior, and assess robustness against various failure modes.

⚠️ Important Disclaimer

This tool is for research and educational purposes only.

  • XAI outputs may be unstable or misleading
  • This tool is NOT a substitute for human judgment
  • Should NOT be used for regulated decisions without human review
  • Always validate explanations and consider multiple perspectives

Features

Model Debugging

  • Learning Curves: Diagnose underfitting/overfitting
  • Validation Curves: Assess model complexity
  • Residual Analysis: Analyze prediction errors
  • Feature Importance: Multiple importance methods
  • Permutation Importance: Model-agnostic importance

Behavioral Testing

  • CheckList-style Tests: Systematic model behavior validation
  • Metamorphic Testing: Consistency under transformations
  • Data Leakage Detection: Identify training/test contamination
  • Edge Case Testing: Robustness to extreme inputs

Robustness Analysis

  • Adversarial Attacks: FGSM, PGD, random attacks
  • Uncertainty Quantification: Entropy, confidence analysis
  • Out-of-Distribution Detection: Novelty detection
  • Calibration Analysis: Reliability diagrams, ECE

Evaluation Metrics

  • Faithfulness: Deletion, insertion, sufficiency tests
  • Stability: Kendall's tau, Spearman's rho, IoU
  • Utility: Simplicity, coherence, completeness
  • Robustness: Attack success rates, CV stability

Installation

Prerequisites

  • Python 3.10+
  • pip or conda

Quick Install

# Clone the repository
git clone https://github.com/kryptologyst/Model-Debugging-Tools.git
cd Model-Debugging-Tools

# Install dependencies
pip install -r requirements.txt

# Or install in development mode
pip install -e .

Development Install

# Install with development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Quick Start

Command Line Interface

# Run comprehensive analysis on Iris dataset
python main.py --dataset iris --model random_forest

# Run with custom configuration
python main.py --dataset synthetic --model logistic_regression --seed 123

# Run specific analyses
python main.py --dataset iris --output-dir results --verbose

Python API

from src.debugging.model_debugger import ModelDebugger
from src.debugging.behavioral_tester import BehavioralTester
from src.debugging.robustness_analyzer import RobustnessAnalyzer
from src.utils.core import Config

# Initialize components
config = Config()
debugger = ModelDebugger(config)
behavioral_tester = BehavioralTester(config)
robustness_analyzer = RobustnessAnalyzer(config)

# Load dataset and train model
X, y, feature_names = debugger.load_dataset("iris")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

model = debugger.train_model(X_train, y_train, "random_forest")

# Run comprehensive debugging
results = debugger.run_comprehensive_debugging("iris", "random_forest")

# Run behavioral tests
behavioral_results = behavioral_tester.run_comprehensive_behavioral_tests(
    model, X_train, X_test, y_train, y_test
)

# Run robustness analysis
robustness_results = robustness_analyzer.adversarial_attack_test(
    model, X_test, y_test, "fgsm", epsilon=0.1
)

Interactive Demo

# Launch Streamlit demo
streamlit run demo/app.py

The demo provides an interactive interface for:

  • Dataset and model selection
  • Real-time analysis execution
  • Visualization of results
  • Export of findings

Dataset Schema

Supported Datasets

Iris Dataset

  • Features: 4 (sepal length, sepal width, petal length, petal width)
  • Samples: 150
  • Classes: 3 (setosa, versicolor, virginica)
  • Type: Classification

Synthetic Dataset

  • Features: Configurable (default: 20)
  • Samples: Configurable (default: 1000)
  • Classes: Configurable (default: 2)
  • Type: Classification
  • Noise: Configurable

Custom Datasets

To use your own dataset:

# Create dataset metadata
metadata = {
    "feature_names": ["feature_1", "feature_2", ...],
    "target_names": ["class_0", "class_1", ...],
    "feature_types": ["numerical", "categorical", ...],
    "sensitive_attributes": ["gender", "age", ...],
    "monotonic_features": ["feature_1", "feature_3"]
}

# Save metadata
import json
with open("data/meta.json", "w") as f:
    json.dump(metadata, f, indent=2)

Configuration

YAML Configuration

Create configs/default.yaml:

seed: 42
device: "auto"  # auto, cuda, mps, cpu

debugging:
  learning_curve_cv: 5
  validation_curve_cv: 5
  residual_analysis: true
  feature_importance: true
  permutation_importance: true

behavioral_tests:
  checklist_tests: true
  metamorphic_tests: true
  data_leakage_checks: true
  edge_case_tests: true

robustness:
  adversarial_attacks: ["fgsm", "pgd", "random"]
  uncertainty_methods: ["mc_dropout", "deep_ensemble"]
  ood_detection: true
  calibration_analysis: true

evaluation:
  faithfulness_metrics: ["deletion", "insertion", "sufficiency", "necessity"]
  stability_metrics: ["kendall_tau", "spearman_rho", "iou"]
  utility_metrics: ["simplicity", "coherence", "completeness"]

Environment Variables

export MODEL_DEBUGGING_SEED=42
export MODEL_DEBUGGING_DEVICE=cuda
export MODEL_DEBUGGING_LOG_LEVEL=INFO

Training and Evaluation

Model Training

# Train with default settings
python scripts/train.py --dataset iris --model random_forest

# Train with custom parameters
python scripts/train.py \
    --dataset synthetic \
    --model random_forest \
    --n_estimators 200 \
    --max_depth 10 \
    --output-dir models/

Evaluation

# Run comprehensive evaluation
python scripts/evaluate.py --model-path models/model.pkl --dataset iris

# Run specific evaluation metrics
python scripts/evaluate.py \
    --model-path models/model.pkl \
    --dataset iris \
    --metrics faithfulness,stability,robustness

Demo Instructions

Launching the Demo

  1. Install dependencies:

    pip install streamlit plotly
  2. Launch the demo:

    streamlit run demo/app.py
  3. Access the interface:

    • Open your browser to http://localhost:8501
    • Configure analysis parameters in the sidebar
    • Click "Run Analysis" to start

Demo Features

  • Interactive Configuration: Select datasets, models, and analysis options
  • Real-time Visualization: Dynamic plots and charts
  • Comprehensive Results: Multiple analysis tabs
  • Export Capabilities: Download results and visualizations
  • Responsive Design: Works on desktop and mobile

Screenshots

The demo includes:

  • Overview dashboard with key metrics
  • Model debugging visualizations
  • Behavioral test results
  • Robustness analysis charts
  • Evaluation metrics tables

Limitations and Considerations

Explanation Instability

  • XAI methods can produce different results for similar inputs
  • Small changes in data or model can significantly affect explanations
  • Always validate explanations with multiple methods

Method Limitations

  • LIME: Local approximations may not reflect global behavior
  • SHAP: Computationally expensive for large datasets
  • Grad-CAM: Limited to convolutional neural networks
  • Permutation Importance: Can be misleading with correlated features

Trust and Safety

  • Explanations are not causal relationships
  • Model debugging tools identify symptoms, not root causes
  • Human expertise is essential for interpretation
  • Consider domain knowledge and context

Performance Considerations

  • Some methods are computationally expensive
  • Large datasets may require sampling or approximation
  • GPU acceleration recommended for deep learning models
  • Memory usage scales with dataset size

Contributing

Development Setup

# Clone repository
git clone <repository-url>
cd model-debugging-tools

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Code Style

  • Formatting: Black (line length 88)
  • Linting: Ruff
  • Type hints: Required for all functions
  • Docstrings: Google/NumPy style

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test categories
pytest tests/test_debugging.py
pytest tests/test_behavioral.py
pytest tests/test_robustness.py

Pre-commit Hooks

# Install hooks
pre-commit install

# Run manually
pre-commit run --all-files

License

MIT License - see LICENSE file for details.

Citation

If you use this tool in your research, please cite:

@software{model_debugging_tools,
  title={Model Debugging Tools},
  author={Kryptologyst},
  year={2026},
  url={https://github.com/kryptologyst/Model-Debugging-Tools}
}

Changelog

Version 1.0.0

  • Initial release
  • Model debugging tools
  • Behavioral testing framework
  • Robustness analysis
  • Evaluation metrics
  • Interactive demo
  • Comprehensive documentation

Model-Debugging-Tools

About

A comprehensive toolkit for model debugging, behavioral testing, and robustness analysis in machine learning systems. This project focuses on **Trust & Safety** aspects of AI systems, providing tools to identify potential issues, validate model behavior, and assess robustness against various failure modes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages