In [1]:
import fdfi
print('FDFI version:', fdfi.__version__)

FDFI version: 0.0.2


# EOTExplainer: Entropic Optimal Transport

This tutorial covers the `EOTExplainer`, with a **mixed-type feature-importance** example as the primary workflow.

## What You'll Learn

1. Why EOT is useful for mixed-type data
2. How to use Gower cost with explicit feature types
3. How to run one-sided attribution inference with practical margins
4. How stochastic transport and target choice change attribution


In [2]:
import numpy as np
from fdfi.explainers import EOTExplainer

np.random.seed(42)


## Why Entropic OT?

Gaussian OT assumes continuous, approximately Gaussian structure. For mixed-type data (continuous + binary + categorical), **Entropic OT** with a flexible cost matrix is often more appropriate.

EOTExplainer uses the **Sinkhorn algorithm** to solve:

$$P^* = \arg\min_P \langle C, P \rangle + \epsilon H(P)$$

where $C$ is the cost matrix, $P$ is the transport plan, and $H$ is entropy regularization.


## Primary Example: Mixed-Type Data with Known Active Features

We build a dataset with continuous, binary, and categorical variables.
Only two features are truly active in the model, so we can check whether EOT attribution inference highlights them.


In [3]:
# Mixed-type synthetic data
n_train = 400
n_test = 120

feature_names = [
    "cont_0", "cont_1", "cont_2",
    "bin_0", "bin_1",
    "cat_0", "cat_1", "cat_2",
]
feature_types = np.array([
    "continuous", "continuous", "continuous",
    "binary", "binary",
    "categorical", "categorical", "categorical",
])
active_idx = np.array([0, 7])

rng = np.random.default_rng(42)

X_train = np.column_stack([
    rng.normal(size=n_train),
    rng.normal(size=n_train),
    rng.normal(size=n_train),
    rng.integers(0, 2, size=n_train),
    rng.integers(0, 2, size=n_train),
    rng.integers(0, 3, size=n_train),
    rng.integers(0, 4, size=n_train),
    rng.integers(0, 5, size=n_train),
]).astype(float)

X_test = np.column_stack([
    rng.normal(size=n_test),
    rng.normal(size=n_test),
    rng.normal(size=n_test),
    rng.integers(0, 2, size=n_test),
    rng.integers(0, 2, size=n_test),
    rng.integers(0, 3, size=n_test),
    rng.integers(0, 4, size=n_test),
    rng.integers(0, 5, size=n_test),
]).astype(float)

print("Train shape:", X_train.shape)
print("Test shape:", X_test.shape)
print("Active feature indices:", active_idx.tolist())
print("Active feature names:", [feature_names[i] for i in active_idx])


Train shape: (400, 8)
Test shape: (120, 8)
Active feature indices: [0, 7]
Active feature names: ['cont_0', 'cat_2']


In [4]:
# Model with only two active features: cont_0 and cat_2
def mixed_model(X):
    y = 2.8 * X[:, 0]
    y += 3.2 * (X[:, 7] >= 3).astype(float)
    return y

# Quick sanity check
y_preview = mixed_model(X_test[:5])
print("Preview predictions:", np.round(y_preview, 3))


Preview predictions: [-0.341 -1.312  8.831  1.537  2.033]


## EOT with Gower Cost (Mixed-Type Aware)

Use Gower distance with explicit feature types. This is the recommended baseline for mixed-type tabular data.


In [5]:
explainer_gower = EOTExplainer(
    mixed_model,
    data=X_train,
    nsamples=60,
    cost_metric="gower",
    feature_types=feature_types,
    auto_epsilon=True,
    target="empirical",
    stochastic_transport=True,
    n_transport_samples=8,
    random_state=0,
)

results_gower = explainer_gower(X_test)
phi = results_gower["phi_X"]
active_mask = np.isin(np.arange(len(feature_names)), active_idx)

print("Feature importance (phi_X):")
print("-" * 75)
print(f"{'Idx':>3} {'Feature':>10} {'Type':>12} {'phi_X':>10} {'Active':>8}")
print("-" * 75)
for i, (name, ftype) in enumerate(zip(feature_names, feature_types)):
    active_tag = "Yes" if i in active_idx else "No"
    print(f"{i:>3} {name:>10} {ftype:>12} {phi[i]:>10.4f} {active_tag:>8}")

top2 = np.argsort(phi)[::-1][:2]
ratio = phi[active_mask].mean() / phi[~active_mask].mean()
print("\nTop-2 features by phi_X:", [feature_names[i] for i in top2])
print(f"Active/null mean ratio: {ratio:.2f}x")
print(f"Auto epsilon: {explainer_gower.epsilon:.4f}")
print("Decoder:", getattr(explainer_gower, "decode_method_effective_", explainer_gower.decode_method))


Feature importance (phi_X):
---------------------------------------------------------------------------
Idx    Feature         Type      phi_X   Active
---------------------------------------------------------------------------
  0     cont_0   continuous     7.9944      Yes
  1     cont_1   continuous     0.5579       No
  2     cont_2   continuous     0.6004       No
  3      bin_0       binary     0.1236       No
  4      bin_1       binary     0.1128       No
  5      cat_0  categorical     0.2845       No
  6      cat_1  categorical     0.5244       No
  7      cat_2  categorical     4.0868      Yes

Top-2 features by phi_X: ['cont_0', 'cat_2']
Active/null mean ratio: 16.45x
Auto epsilon: 0.4781
Decoder: knn


## Active-Feature Attribution Inference (One-Sided Test)

We use one-sided testing with a fixed practical margin to encourage sparse attribution calls.
This setup should highlight mostly the truly active features in this synthetic example.


In [6]:
ci = explainer_gower.conf_int(
    alpha=0.05,
    target="X",
    alternative="greater",
    margin_method="fixed",
    margin=1.2,
    var_floor_method="fixed",
    var_floor_c=0.0,
)

attribution_idx = np.where(ci["reject_null"])[0]
expected_active = set(active_idx.tolist())
attribution_set = set(attribution_idx.tolist())

print("Attribution-positive feature indices:", attribution_idx.tolist())
print("Attribution-positive feature names:", [feature_names[i] for i in attribution_idx])
print("Expected active names:", [feature_names[i] for i in active_idx])
print("True positives:", sorted(expected_active & attribution_set))
print("False positives:", sorted(attribution_set - expected_active))


Attribution-positive feature indices: [0, 7]
Attribution-positive feature names: ['cont_0', 'cat_2']
Expected active names: ['cont_0', 'cat_2']
True positives: [0, 7]
False positives: []


## Compare Cost Metrics: Gower vs Squared Euclidean

For mixed-type data, Gower typically provides more appropriate geometry than squared Euclidean cost.


In [7]:
explainer_sq = EOTExplainer(
    mixed_model,
    data=X_train,
    nsamples=60,
    cost_metric="sqeuclidean",
    auto_epsilon=True,
    target="empirical",
    stochastic_transport=True,
    n_transport_samples=8,
    random_state=0,
)

results_sq = explainer_sq(X_test)
phi_sq = results_sq["phi_X"]

print(f"{'Feature':>10} {'Gower':>12} {'SqEuclid':>12}")
print("-" * 38)
for i, name in enumerate(feature_names):
    print(f"{name:>10} {phi[i]:>12.4f} {phi_sq[i]:>12.4f}")

print("\nTop-2 (Gower):", [feature_names[i] for i in np.argsort(phi)[::-1][:2]])
print("Top-2 (SqEuclid):", [feature_names[i] for i in np.argsort(phi_sq)[::-1][:2]])


   Feature        Gower     SqEuclid
--------------------------------------
    cont_0       7.9944       8.0813
    cont_1       0.5579       0.5402
    cont_2       0.6004       0.6008
     bin_0       0.1236       0.1333
     bin_1       0.1128       0.1146
     cat_0       0.2845       0.2880
     cat_1       0.5244       0.5352
     cat_2       4.0868       4.1496

Top-2 (Gower): ['cont_0', 'cat_2']
Top-2 (SqEuclid): ['cont_0', 'cat_2']


## Stochastic vs Deterministic Transport

Stochastic transport samples from the transport kernel instead of using only barycentric averages.


In [8]:
explainer_det = EOTExplainer(
    mixed_model,
    data=X_train,
    nsamples=60,
    cost_metric="gower",
    feature_types=feature_types,
    auto_epsilon=True,
    target="empirical",
    stochastic_transport=False,
    random_state=0,
)
results_det = explainer_det(X_test)
phi_det = results_det["phi_X"]

explainer_stoch = EOTExplainer(
    mixed_model,
    data=X_train,
    nsamples=60,
    cost_metric="gower",
    feature_types=feature_types,
    auto_epsilon=True,
    target="empirical",
    stochastic_transport=True,
    n_transport_samples=8,
    random_state=0,
)
results_stoch = explainer_stoch(X_test)
phi_stoch = results_stoch["phi_X"]

print(f"{'Feature':>10} {'Deterministic':>14} {'Stochastic':>14}")
print("-" * 42)
for i, name in enumerate(feature_names):
    print(f"{name:>10} {phi_det[i]:>14.4f} {phi_stoch[i]:>14.4f}")

active_mask = np.isin(np.arange(len(feature_names)), active_idx)
ratio_det = phi_det[active_mask].mean() / phi_det[~active_mask].mean()
ratio_stoch = phi_stoch[active_mask].mean() / phi_stoch[~active_mask].mean()
print(f"\nActive/null ratio (deterministic): {ratio_det:.2f}x")
print(f"Active/null ratio (stochastic):    {ratio_stoch:.2f}x")


   Feature  Deterministic     Stochastic
------------------------------------------
    cont_0         8.0457         7.9944
    cont_1         1.0420         0.5579
    cont_2         0.7713         0.6004
     bin_0         0.2159         0.1236
     bin_1         0.1992         0.1128
     cat_0         0.3964         0.2845
     cat_1         0.8258         0.5244
     cat_2         5.7677         4.0868

Active/null ratio (deterministic): 12.01x
Active/null ratio (stochastic):    16.45x


## Target Distribution: Gaussian vs Empirical

EOT can transport toward either a Gaussian latent target or a permuted empirical target.


In [9]:
explainer_gaussian = EOTExplainer(
    mixed_model,
    data=X_train,
    nsamples=60,
    cost_metric="gower",
    feature_types=feature_types,
    auto_epsilon=True,
    target="gaussian",
    stochastic_transport=True,
    n_transport_samples=8,
    random_state=0,
)

explainer_empirical = EOTExplainer(
    mixed_model,
    data=X_train,
    nsamples=60,
    cost_metric="gower",
    feature_types=feature_types,
    auto_epsilon=True,
    target="empirical",
    stochastic_transport=True,
    n_transport_samples=8,
    random_state=0,
)

results_g = explainer_gaussian(X_test)
results_e = explainer_empirical(X_test)
phi_g = results_g["phi_X"]
phi_e = results_e["phi_X"]

print(f"{'Feature':>10} {'Gaussian':>12} {'Empirical':>12}")
print("-" * 38)
for i, name in enumerate(feature_names):
    print(f"{name:>10} {phi_g[i]:>12.4f} {phi_e[i]:>12.4f}")

print("\nTop-2 (Gaussian):", [feature_names[i] for i in np.argsort(phi_g)[::-1][:2]])
print("Top-2 (Empirical):", [feature_names[i] for i in np.argsort(phi_e)[::-1][:2]])


   Feature     Gaussian    Empirical
--------------------------------------
    cont_0       8.0781       7.9944
    cont_1       0.5678       0.5579
    cont_2       0.5628       0.6004
     bin_0       0.1550       0.1236
     bin_1       0.1415       0.1128
     cat_0       0.3663       0.2845
     cat_1       0.5932       0.5244
     cat_2       4.1552       4.0868

Top-2 (Gaussian): ['cont_0', 'cat_2']
Top-2 (Empirical): ['cont_0', 'cat_2']


## Diagnostics and Standardized Summary

Use shared diagnostics and `summary()` to inspect transport quality and feature-level inference.


In [10]:
print("Diagnostics comparison")
print("-" * 80)
for name, exp in [
    ("Gower-stoch", explainer_gower),
    ("Gower-det", explainer_det),
    ("SqEuclid-stoch", explainer_sq),
]:
    d = exp.diagnostics
    print(
        f"{name:>14}: dCor={d['latent_independence_median']:.6f} [{d['latent_independence_label']}], "
        f"MMD={d['distribution_fidelity_mmd']:.6f} [{d['distribution_fidelity_label']}]"
    )

print("\nGower-stochastic summary (one-sided attribution inference)")
_ = explainer_gower.summary(
    alpha=0.05,
    target="X",
    alternative="greater"
)


Diagnostics comparison
--------------------------------------------------------------------------------
   Gower-stoch: dCor=0.348636 [POOR], MMD=0.409461 [POOR]
     Gower-det: dCor=0.348636 [POOR], MMD=0.409461 [POOR]
SqEuclid-stoch: dCor=0.073760 [GOOD], MMD=0.007061 [GOOD]

Gower-stochastic summary (one-sided attribution inference)
Feature Importance Results
Method: EOTExplainer
Number of features: 8
Significance level: 0.05
Alternative: greater
Practical margin: 1.3631
------------------------------------------------------------------------------
 Feature   Estimate    Std Err   CI Lower   CI Upper    P-value   Sig
------------------------------------------------------------------------------
       0     7.9944     1.1047     6.1773        inf     0.0000   ***
       1     0.5579     0.3790    -0.0656        inf     0.9832      
       2     0.6004     0.3926    -0.0454        inf     0.9740      
       3     0.1236     0.3496    -0.4514        inf     0.9998      
       4     

## Best Practices for EOTExplainer

```python
from fdfi.explainers import EOTExplainer

explainer = EOTExplainer(
    model,
    data=X_train,
    nsamples=60,
    cost_metric="gower",      # mixed-type data
    feature_types=feature_types,
    auto_epsilon=True,          # start here
    target="empirical",        # often stable for tabular data
    stochastic_transport=True,  # sample transport kernel
    n_transport_samples=8,
)

results = explainer(X_test)

ci = explainer.conf_int(
    alpha=0.05,
    target="X",
    alternative="greater"
)
attribution_idx = np.where(ci["reject_null"])[0]
```


## Summary

Key takeaways:

1. Mixed-type data is a natural use case for `EOTExplainer` with Gower cost.
2. Defining known active features in synthetic data helps validate attribution behavior.
3. One-sided testing with a practical margin can make attribution-positive features much sparser.
4. Stochastic transport and target choice can change attribution magnitudes.
5. Use shared diagnostics and `summary()` to validate transport quality before interpretation.
