Skip to content

mlbio-epfl/PACER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PACER: Acyclic Causal Discovery from Large-scale Interventional Data

PACER is a causal discovery method that jointly learns a causal ordering (via the Plackett-Luce distribution) and edge probabilities (via Bernoulli gates) from large-scale interventional data.

Image Figure: PACER models a topological ordering of variables using a Plackett-Luce distribution. Nodes with higher weight are more likely to precede nodes with lower weight in downstream DAGs. Samples from this distribution induce complete DAGs, which are further filtered via samples from independent, edge-specific Bernoulli distributions. This defines our Bernoulli-Plackett-Luce distribution over DAGs. At train time, we sample multiple candidate graphs and score them based on a likelihood-based objective function. We then optimize the parameters of the Bernoulli-Plackett-Luce model using either an analytic estimator or REINFORCE gradient updates (see paper for more details).


Table of Contents

  1. Installation
  2. Quick start
  3. Running on synthetic data
  4. Next steps
  5. Citation

Installation

conda create -n pacer python=3.11
conda activate pacer

# JAX with GPU support (adjust cuda version as needed)
pip install "jax[cuda12]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

# Install PACER
pip install -e .

Quick start

from pacer import PACER

# x       : (N, d) float  — expression matrix
# masks   : (N, d) float  — 1 = not intervened, 0 = intervened on
# regimes : (N,)   int    — 0 = observational, k>0 = k-th interventional regime

pacer = PACER(
    n_vars       = d,         # number of genes / variables
    n_layers     = 2,         # MLP depth
    hdim         = 4,         # MLP hidden dimension
    n_steps      = 5000,      # optimisation steps
    lr           = 1e-2,      # learning rate
    batch_size   = 64,        # batch size
    n_mc_samples = 200,       # REINFORCE MC samples
    lambd        = 1.0,       # sparsity regularisation
    seed         = 0,         # random seed
    fit_analytic = False,     # whether to estimate the causal graph using the analytic method (currently supports linear-Gaussian mechanisms only)
)

pacer.fit(x_train, masks_train, regimes_train,
          x_val=x_val, masks_val=masks_val, regimes_val=regimes_val)

edge_probs = pacer.predict_proba()   # (d, d) — P(i → j)
pred_dag   = pacer.predict(threshold=0.5)   # (d, d) binary adjacency

Key hyperparameters

Parameter Default Description
n_layers 2 MLP depth (0 = linear)
hdim 4 Hidden dimension
n_steps 5000 Gradient steps
lr 1e-2 Adam learning rate
batch_size 64 Mini-batch size
n_mc_samples 200 REINFORCE MC samples
lambd 1 Sparsity weight (larger → sparser graph)
fit_analytic False Use analytic estimator instead of REINFORCE. Currently supports linear-Gaussian mechanisms only.
mask_TF_path None Path to TF list TSV to restrict parent candidates

Running on synthetic data

Interactive notebook

Open examples/demo_synthetic.ipynb for a self-contained walkthrough:

  • generates a random DAG and linear-Gaussian interventional data
  • fits PACER
  • evaluates SHD / precision / recall
  • visualises the inferred vs ground-truth network

Generate DCDI datasets

One can generate new synthetic datasets using the generate_data.py script from the DCDI repo:

cd dcdi/data/generation

python generate_data.py \
    --mechanism linear \
    --intervention-type structural \
    --initial-cause gaussian \
    --noise gaussian \
    --nb-nodes 20 \
    --expected-degree 4 \
    --nb-dag 3 \
    --nb-points 5000 \
    --rescale \
    --suffix "my_experiment" \
    --intervention \
    --obs-data

Next steps

We welcome contributions. Here are a few extensions that would be great to implement:

  • Implement extension to imperfect interventions.
  • Implement extension to unknown targets.
  • Implement extension to multivariate Normal models (analytic estimator).
  • Implement extension to general models (analytic estimator).

Please see the manuscript Appendix for an outline of these extensions.


Citation

@article{vinas2026pacer,
  title   = {PACER: Acyclic Causal Discovery from Large-scale Interventional Data},
  author  = {Vi{\~n}as Torn{\'e}, Ramon and F{\`a}bregas Salazar, S\'{\i}lvia and Park, Soyon and Ban, Ivo Alexander and Gadetsky, Artyom and Doikov, Nikita and Brbi{\'c}, Maria},
  journal = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  year    = {2026},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages