Sparse-matrix pedigree relationship extraction and kinship computation.
Builds parent→child CSR adjacency matrices and extracts relationship
categories using sparse matrix algebra (A @ A.T for siblings,
A² @ A²ᵀ for cousins, etc.). Each relationship type is parameterised
by (up, down, n_ancestors):
up: meioses from individual A up to common ancestor(s) (canonicalisedup ≤ down)down: meioses from common ancestor(s) down to individual Bn_ancestors: 1 (half/lineal) or 2 (full, i.e. mated pair)kinship = n_ancestors × (1/2)^(up + down + 1)
pip install "pedigree-graph @ git+https://github.com/rwaples/pedigree-graph.git@v0.1.0"For development:
git clone https://github.com/rwaples/pedigree-graph.git
cd pedigree-graph
pip install -e ".[test]"
pytestRequires Python ≥ 3.13. Runtime deps: numpy, scipy, numba.
Pandas is optional and only needed if you pass DataFrames to the
constructors.
import numpy as np
from pedigree_graph import PedigreeGraph, REL_REGISTRY, PAIR_KINSHIP
# Construct from arrays (no pandas needed)
pg = PedigreeGraph.from_arrays(
ids=np.array([0, 1, 2, 3, 4]),
mothers=np.array([-1, -1, 0, 0, 0]),
fathers=np.array([-1, -1, 1, 1, 1]),
)
# Or from a dict of arrays (also pandas-free)
pg = PedigreeGraph({
"id": np.array([0, 1, 2, 3]),
"mother": np.array([-1, -1, 0, 0]),
"father": np.array([-1, -1, 1, 1]),
"twin": np.array([-1, -1, -1, -1]),
"sex": np.array([0, 1, 0, 1], dtype=np.int8),
"generation": np.array([0, 0, 1, 1], dtype=np.int32),
})
# Or from a pandas DataFrame
# pg = PedigreeGraph.from_dataframe(df)
# pg = PedigreeGraph(df) # __init__ accepts both forms
# Extract pairs by relationship type, up to a given degree
pairs = pg.extract_pairs(max_degree=2)
print(pairs["FS"]) # full sibs: (idx1, idx2)
print(pairs["1C"]) # 1st cousins
print(PAIR_KINSHIP["FS"]) # 0.25Codes follow the convention up_down_n_anc:
| Code | Label | up | down | n_anc | Kinship | Degree |
|---|---|---|---|---|---|---|
MZ |
MZ twin | 0 | 0 | 0 | 0.5 | 0 |
MO |
Mother–offspring | 1 | 0 | 1 | 0.25 | 1 |
FO |
Father–offspring | 1 | 0 | 1 | 0.25 | 1 |
FS |
Full sib | 1 | 1 | 2 | 0.25 | 1 |
MHS |
Maternal half sib | 1 | 1 | 1 | 0.125 | 2 |
PHS |
Paternal half sib | 1 | 1 | 1 | 0.125 | 2 |
GP |
Grandparent | 2 | 0 | 1 | 0.125 | 2 |
Av |
Avuncular | 1 | 2 | 2 | 0.125 | 2 |
1C |
1st cousin | 2 | 2 | 2 | 0.0625 | 3 |
| ... | (full registry up to 2nd cousin / kinship 1/64) |
See REL_REGISTRY for the complete list.
The package ships an alternate relationship-counting engine in
pedigree_graph.experimental for exploring large-pedigree scaling:
from pedigree_graph import PedigreeGraph
from pedigree_graph.experimental import count_pairs_bfs
pg = PedigreeGraph(df)
counts = count_pairs_bfs(pg) # dict[str, int] over 23 codescount_pairs_bfs uses boolean sparse matmul (set-union semantics) plus
a parallel numba kernel for cousin-style codes. It is counts-only;
there is no pair-array equivalent of extract_pairs.
The submodule is not re-exported at the top level — callers must
import explicitly via pedigree_graph.experimental. First call emits
a FutureWarning.
-
Experimental contract. API, signature, and semantics may change or the function may be removed in any minor release. No deprecation cycle is owed.
-
Inbred-pedigree counting differs from the matrix engine. On non-inbred pedigrees the BFS counts equal
PedigreeGraph.count_pairsexactly. On inbred pedigrees, BFS counts distinct shared ancestors at depth ≥ 2 while the matrix engine counts paths (multiplicity); the four cousin-style codes (1C1R,H1C1R,1C2R,2C) may diverge. Seetests/test_experimental.py::test_inbred_with_cousins_cousin_codes_divergefor a hand-built fixture pinning the exact divergence. -
max_degree=5only. Lower values raiseNotImplementedError— usePedigreeGraph.count_pairs(max_degree=k)for partial extractions. -
No subsample support.
PedigreeGraph.from_subsample(...)graphs raiseNotImplementedError. Construct directly or use the matrix engine. -
Threading. The numba kernel uses
prangefor cousin-style enumeration. Numba readsNUMBA_NUM_THREADSat first JIT compilation; the optionaln_threadskwarg only takes effect on the first call in a process. SetNUMBA_NUM_THREADS=Nin the environment to control threading on all calls. -
Performance. Scaling claims (BFS faster than matrix above ~5M individuals, where the matrix engine OOMs) are unverified at the time of v0.2.0. The matrix engine is faster at n=2M (head-to-head benchmark in
external/pedsum/STATUS.md). Treat this engine as an experimental scalability spike, not a tuned alternative.
MIT — see LICENSE.