Dataset-driven experiments for causal extraction via counterfactual probing
and meta-causality. The repository starts with a lightweight harness that runs
on the existing orpheus conda environment without installing new packages.
- Everything should run locally on the Mac notebook in the
orpheusconda environment. - Do not rely on paid LLM APIs, hosted model APIs, cloud processing, or external services for core experiments.
- Do not replace the existing Mac-specific PyTorch install.
- Larger benchmark integrations should always keep a tiny local fixture path and a modest smoke-test path.
- Prefer small PyTorch models, seed sweeps, and careful metrics over compute-heavy architectures that cannot run locally in a reasonable time.
- CI is only a regression guardrail. It should run lightweight tests and smoke checks, not full research sweeps.
conda run -n orpheus python -m causality_experiments run \
--config configs/experiments/01_synthetic_linear.yamlRun every tiny fixture experiment:
conda run -n orpheus python scripts/run_all_fixtures.pyGenerate fixture mirror files:
conda run -n orpheus python -m causality_experiments make-fixturesSummarize runs:
conda run -n orpheus python -m causality_experiments summarize --runs outputs/runsWrite a Markdown research report:
conda run -n orpheus python scripts/write_research_report.pyRun a seed sweep and report mean/std:
conda run -n orpheus python scripts/run_seed_sweep.py --match 07 --seeds 11,12,13
conda run -n orpheus python scripts/report_seed_sweep.py --match 07Report causal/nuisance probe diagnostics:
conda run -n orpheus python scripts/report_probe_diagnostics.py --match 05_waterbirdsCheck benchmark/literature alignment:
conda run -n orpheus python scripts/report_benchmark_alignment.pyPrepare a real Waterbirds feature table from the official Stanford tarball:
conda run -n orpheus python scripts/prepare_waterbirds_features.pyRun a real-benchmark-compatible Waterbirds feature table:
conda run -n orpheus python -m causality_experiments run \
--config configs/benchmarks/waterbirds_features.yamlcausality_experiments.dataprovides dataset adapters and tiny generated mirrors for all 8 experiments described in the source document.causality_experiments.methodsprovides runnableconstant,oracle,erm,dfr,causal_dfr,group_balanced_erm,group_dro,irm,jtt,adversarial_probe,counterfactual_adversarial, andcounterfactual_augmentationbaselines plus adapter contracts for causal probes, β-VAE/iVAE/CITRIS/CSML/DeepIV.causality_experiments.metricsrecords accuracy, worst-group accuracy, support recovery, and an ATE-style proxy where ground truth supports it.configs/experimentscontains one runnable fixture config per experiment.
The heavier causal methods are intentionally explicit adapter stubs in this first pass. They can be implemented behind the same fit/predict interface without changing datasets, metrics, or run outputs.