Skip to content

ligon/DGP_Protocol

Repository files navigation

DGP_Protocol

A minimal Python Protocol for data-generating processes (DGPs).

What this is

A Protocol (DataGeneratingProcess) with two members – data (a frozen property returning the observed realization) and draw(size=..., *, rng=...) (a method returning a fresh realization) – plus a small set of composition primitives (TwoStageDGP, with_data) and thin convenience wrappers (EmpiricalDGP, ParametricDGP) for working with DGPs as first-class objects.

The package is not a library of working DGPs. Concrete DGPs live in consumer packages – e.g.
ManifoldGMM ships its own moment-side DGPs. The role of DGP_Protocol is to define the contract that lets such consumers interoperate.

Conceptual lineage

The Protocol promotes the stand-in distribution from Manski’s analog estimation framework (Manski 1988, Analog Estimation Methods in Econometrics) to a first-class Python object. In that framework, an estimator is defined by a population functional plus a sample-based stand-in for the population; DataGeneratingProcess is that stand-in. Different stand-ins yield different analog estimators:

  • The empirical distribution -> nonparametric plug-in estimators.
  • A parametric family fitted to the data -> MLE-style estimators.
  • A bootstrap distribution -> bootstrap inference.
  • A null-imposed restriction -> constrained estimators.

Installation

pip install DGP_Protocol

The import path is PEP-8 lowercase:

from dgp_protocol import DataGeneratingProcess, EmpiricalDGP, TwoStageDGP

Minimal example

import numpy as np
from dgp_protocol import EmpiricalDGP

data = np.random.default_rng(0).standard_normal(size=(100, 3))

# The DGP owns its own RNG.  Pass `seed` for reproducibility;
# `draw()` itself takes no `rng` argument.
dgp = EmpiricalDGP(observation=data, seed=1)
print(dgp.data.shape)                  # (100, 3) -- the frozen realization
print(dgp.draw().shape)                # (100, 3) -- a fresh bootstrap resample

# Rebind to a different realization while keeping the distributional
# structure.  The child gets an independent (spawned) Generator.
fresh = dgp.with_data(np.random.default_rng(2).standard_normal(size=(50, 3)))
print(fresh.data.shape)                # (50, 3)

For more substantial examples – parametric DGPs, two-stage composition (hierarchical sampling), cluster-block bootstrap – see the test suite under tests/.

Design

The design is intentionally minimal: data + draw are the only required members. Composition primitives (TwoStageDGP, with_data) take DGPs and return DGPs without expanding the Protocol.

The design note that motivated this package lives in the sibling ManifoldGMM repo at docs/design/dgp.org – DGP_Protocol was extracted from that design conversation. See also AGENTS.md for the package’s scope discipline and the list of intentionally deferred features.

How to cite

If you use DGP_Protocol in academic work, please cite it. The repository’s CITATION.cff is recognised by GitHub and provides one-click citation export in APA, BibTeX, and other formats from the repo’s main page.

A BibTeX entry suitable for paper drafts:

@software{ligon_dgp_protocol_2026,
  author    = {Ligon, Ethan},
  title     = {DGP\_Protocol: A Protocol for data-generating processes},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://github.com/ligon/DGP_Protocol},
  version   = {0.1.0a0},
  license   = {BSD-3-Clause},
}

License

BSD 3-Clause (BSD-3-Clause). See the LICENSE file at the root of this repository. In short: permissive use including commercial, modification, and redistribution; preserve the copyright notice and license text in redistributions; no use of the author’s name to endorse derived products.

Author

Ethan Ligon, UC Berkeley.

About

A Protocol for data-generating processes; minimal interface for analog-estimation toolkits.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors