In [None]:
import logging

import arviz as az
import numpy as np
import pymc as pm

from bedroc import debug_logger
from bedroc.hierarchical import (
    Analyzer,
    SyntheticDataGenerator,
    hierarchical_difference_model,
    zero_difference_model,
)

logger = debug_logger()
logger.setLevel(logging.INFO)

# Hierarchical Bayesian analysis for multi-feature data

We apply a hierarchical Bayesian modeling framework to multi-feature data in order to infer systematic differences between two groups, A and B. Each data point contains multiple measured features, and these features may vary widely in scale, noise level, or informativeness. A hierarchical formulation allows the model to borrow statistical strength across features, leading to more stable and interpretable inferences.

In the model, each feature has its own mean in group A and an associated mean difference describing how group B deviates from group A. These per-feature differences are not estimated independently; instead, they are linked through a higher-level distribution controlled by a global scale parameter. This induces partial pooling, which automatically shrinks poorly constrained differences toward zero while allowing genuinely strong signals to stand out.

This approach provides coherent uncertainty quantification, posterior estimates of feature-level effect sizes, and a principled comparison against a simpler zero-difference model. The hierarchical model is therefore well suited for high-dimensional problems where features differ in variability or sample support.

## Generate and plot synthetic data

Set a random seed for reproducibility

In [None]:
RANDOM_SEED = 123

We begin by generating synthetic data to explore the behaviour of the hierarchical model. The parameters below control the separation between the two groups, the variability of feature means, and the underlying noise levels. You are encouraged to modify these settings to observe how changes in effect size, noise, or feature-level variability influence the inference and model comparison.

In [None]:
data_generator = SyntheticDataGenerator(
    100,
    random_seed=RANDOM_SEED,
    # difference_scale=0.5,
    # type_a_std_of_mean=1.0,
    # type_b_std_of_mean=1.5,
    # sigma_min=0.5,
    # sigma_max=2.0,
)
data_generator.generate()

We next plot the synthetic dataset to visualise the underlying structure of the two groups. The figure displays the raw feature distributions for Types A and B alongside the true parameters used to generate them, allowing us to verify the intended separation and noise characteristics before fitting any models.

In [None]:
_ = data_generator.plot()

This line calls the helper function ``hierarchical_difference_model``, which builds and samples the hierarchical Bayesian model. It returns both the constructed PyMC model object (``model``) and the posterior samples stored in an InferenceData object (``idata``).

In [None]:
model, idata = hierarchical_difference_model(
    data_generator.X_A, data_generator.X_B, random_seed=RANDOM_SEED
)

# Analyze the inference

After running the hierarchical Bayesian model and obtaining posterior samples in ``idata``, we create a data analyzer object. This analyzer helps us inspect and interpret the results.

In [None]:
analyzer = Analyzer(model, idata)

Use the data analyzer to visualize the results of the hierarchical Bayesian inference.

In [None]:
_ = analyzer.plot_prior_predictive()

In [None]:
_ = analyzer.plot_posterior_predictive(thinning_factor=10)

In [None]:
_ = analyzer.plot_posterior(var_names=["mu_A", "mu_B"])

In [None]:
_ = analyzer.plot_posterior_differences()

In [None]:
_ = analyzer.plot_posterior_effect_size()

For the confusion matrix, we generate some out-of-sample data. The goal is to evaluate the model's classification performance on new data that was not seen during training. This allows us to test generalization, avoiding overly optimistic metrics that could arise if we only measured accuracy on the training set.

Steps:

1. Generate new synthetic samples for both Type A and Type B using the same data generator, but ensuring they are independent of the training data.
2. Stack the new samples together to form a single dataset for prediction.
3. Create the corresponding true labels array to compare against model predictions.
4. Pass this dataset to the analyzer's `plot_confusion_matrix` method to visualize how well the model separates Type A and Type B in previously unseen data.

In [None]:
X_A_new, X_B_new = data_generator.generate_out_of_sample_data(n_samples=1000)
X_new = np.vstack([X_A_new, X_B_new])
true_labels = np.array(["A"] * len(X_A_new) + ["B"] * len(X_B_new))

_ = analyzer.plot_confusion_matrix(X_new, true_labels)

## Model comparison

We can compare the hierarchical model with a zero-difference model to assess whether the data support feature-wise mean differences between types A and B. A "zero difference" model acts as a baseline or null hypothesis: there are no systematic differences between the two groups beyond noise.

To compare these models, we use Leave-One-Out cross-validation (LOO) based on the pointwise log-likelihood. LOO evaluates how well each model predicts unseen data and automatically penalizes model complexity.

If the hierarchical model shows a higher ELPD (less negative LOO), this supports the existence of feature-level differences between the two groups. If not, the data may not justify the additional hierarchical structure.

In [None]:
zero_model, zero_idata = zero_difference_model(
    data_generator.X_A, data_generator.X_B, random_seed=RANDOM_SEED
)

with model:
    pm.compute_log_likelihood(idata)

with zero_model:
    pm.compute_log_likelihood(zero_idata)

hierarchical_loo = az.loo(idata, var_name="X_obs")
zero_loo = az.loo(zero_idata, var_name="X_obs")

logger.debug("Hierarchical model LOO:\n%s", hierarchical_loo)
logger.debug("Zero-difference model LOO:\n%s", zero_loo)

df_comp_loo = az.compare(
    {"hierarchical": idata, "zero difference": zero_idata},
    ic="loo",
    var_name="X_obs",
)

# Format DataFrame nicely for logging
loo_str = df_comp_loo.to_string(
    float_format="{:.6f}".format,  # round floats to 3 decimals
    justify="right",  # right-align columns
    col_space=10,  # minimum column width
)

logger.info("LOO model comparison:\n%s", loo_str)

There is a convenient function to visualize model comparison results.

In [None]:
_ = az.plot_compare(df_comp_loo, insample_dev=False)