In [1]:
import fdfi
print('FDFI version:', fdfi.__version__)

FDFI version: 0.0.2


# Quickstart: FDFI in 5 Minutes

This tutorial introduces the basics of FDFI (Flow-Disentangled Feature Importance). By the end, you'll be able to:

1. Create an explainer for any model
2. Compute feature importance
3. Interpret the results
4. Get confidence intervals

## Setup

First, let's import the necessary libraries:

In [2]:
import numpy as np
from fdfi.explainers import OTExplainer

# Set random seed for reproducibility
np.random.seed(42)

## Create a Simple Model

Let's create a simple model where we know the true feature importance. Features 0 and 1 are important, the rest are noise:

In [3]:
def model(X):
    """Simple model: y = x0 + 2*x1 + 0.5*x2"""
    return X[:, 0] + 2 * X[:, 1] + 0.5 * X[:, 2]

# Create training data (used as background distribution)
n_samples = 200
n_features = 10
X_train = np.random.randn(n_samples, n_features)

# Create test data to explain
X_test = np.random.randn(20, n_features)

print(f"Training data shape: {X_train.shape}")
print(f"Test data shape: {X_test.shape}")
print(f"Model predictions for test data: {model(X_test)[:5]}")

Training data shape: (200, 10)
Test data shape: (20, 10)
Model predictions for test data: [-1.36042558 -3.5175806   1.2950158  -0.90179092 -1.77532221]


## Create an Explainer

The `OTExplainer` uses Gaussian optimal transport to compute feature importance:

In [4]:
# Create the explainer
explainer = OTExplainer(
    model,              # The model to explain
    data=X_train,       # Background data
    nsamples=50,        # Monte Carlo samples per feature
)

print("Explainer created!")

Explainer created!


## Compute Feature Importance

Call the explainer on test data to get feature importance:

In [5]:
# Compute feature importance
results = explainer(X_test)

# Print the results
print("Feature Importance (phi_X):")
for i, phi in enumerate(results["phi_X"]):
    print(f"  Feature {i}: {phi:.4f}")

Feature Importance (phi_X):
  Feature 0: 0.7946
  Feature 1: 1.9705
  Feature 2: 0.1916
  Feature 3: 0.0088
  Feature 4: 0.0178
  Feature 5: 0.0071
  Feature 6: 0.0134
  Feature 7: 0.0029
  Feature 8: 0.0006
  Feature 9: 0.0306


## Interpret the Results

The `results` dictionary contains:
- `phi_X`: Feature importance in the original X-space
- `phi_Z`: Feature importance in the disentangled Z-space
- `se_X`, `se_Z`: Standard errors for uncertainty quantification

Higher values indicate more important features. Since our model uses `x0 + 2*x1 + 0.5*x2`, we expect Features 0, 1, and 2 to have the highest importance.

In [6]:
# Sort features by importance
importance = results["phi_X"]
sorted_idx = np.argsort(importance)[::-1]

print("Features ranked by importance:")
for rank, idx in enumerate(sorted_idx):
    print(f"  Rank {rank+1}: Feature {idx} (importance = {importance[idx]:.4f})")

Features ranked by importance:
  Rank 1: Feature 1 (importance = 1.9705)
  Rank 2: Feature 0 (importance = 0.7946)
  Rank 3: Feature 2 (importance = 0.1916)
  Rank 4: Feature 9 (importance = 0.0306)
  Rank 5: Feature 4 (importance = 0.0178)
  Rank 6: Feature 6 (importance = 0.0134)
  Rank 7: Feature 3 (importance = 0.0088)
  Rank 8: Feature 5 (importance = 0.0071)
  Rank 9: Feature 7 (importance = 0.0029)
  Rank 10: Feature 8 (importance = 0.0006)


## Get Confidence Intervals

FDFI provides statistical inference via `conf_int()`:

In [7]:
# Compute confidence intervals
ci = explainer.conf_int(
    alpha=0.05,           # 95% confidence level
    target="X",           # Use X-space importance
    alternative="greater" # Test if importance > 0
)

print("\nConfidence Intervals (95%, one-sided):")
print("-" * 60)
print(f"{'Feature':>8} {'Estimate':>10} {'SE':>10} {'CI Lower':>10} {'P-value':>10}")
print("-" * 60)
for i in range(n_features):
    sig = "*" if ci["reject_null"][i] else ""
    print(f"{i:>8} {ci['phi_hat'][i]:>10.4f} {ci['se'][i]:>10.4f} "
          f"{ci['ci_lower'][i]:>10.4f} {ci['pvalue'][i]:>10.4f} {sig}")

print("\n* = significant at alpha=0.05")


Confidence Intervals (95%, one-sided):
------------------------------------------------------------
 Feature   Estimate         SE   CI Lower    P-value
------------------------------------------------------------
       0     0.7946     0.3302     0.2515     0.2051 
       1     1.9705     0.6306     0.9333     0.0108 *
       2     0.1916     0.1941    -0.1277     0.9560 
       3     0.0088     0.1815    -0.2897     0.9977 
       4     0.0178     0.1815    -0.2808     0.9973 
       5     0.0071     0.1815    -0.2914     0.9977 
       6     0.0134     0.1815    -0.2851     0.9975 
       7     0.0029     0.1815    -0.2956     0.9979 
       8     0.0006     0.1815    -0.2979     0.9980 
       9     0.0306     0.1816    -0.2681     0.9966 

* = significant at alpha=0.05


## View Summary

Use the built-in `summary()` method for a formatted output:

In [8]:
# Print formatted summary
explainer.summary(alpha=0.05, alternative="greater")

Feature Importance Results
Method: OTExplainer
Number of features: 10
Significance level: 0.05
Alternative: greater
Practical margin: 0.5226
------------------------------------------------------------------------------
 Feature   Estimate    Std Err   CI Lower   CI Upper    P-value   Sig
------------------------------------------------------------------------------
       0     0.7946     0.3302     0.2515        inf     0.2051      
       1     1.9705     0.6306     0.9333        inf     0.0108    **
       2     0.1916     0.1941    -0.1277        inf     0.9560      
       3     0.0088     0.1815    -0.2897        inf     0.9977      
       4     0.0178     0.1815    -0.2808        inf     0.9973      
       5     0.0071     0.1815    -0.2914        inf     0.9977      
       6     0.0134     0.1815    -0.2851        inf     0.9975      
       7     0.0029     0.1815    -0.2956        inf     0.9979      
       8     0.0006     0.1815    -0.2979        inf     0.9980      
 



## Next Steps

Now that you've learned the basics, check out these tutorials:

- **OT Explainer Deep Dive**: Learn more about the Gaussian OT method
- **EOT Explainer**: Entropic OT for non-Gaussian data
- **Confidence Intervals**: Advanced statistical inference