v0.3 closed: conformance task — DFG fitness × precision → F by protosphinx · Pull Request #11 · erphq/pm-bench

protosphinx · 2026-05-01T04:31:59Z

Stacked on top of #10 (CSV ingest). Merge order: #2 → #3 → #4 → #5 → #6 → #7 → #8 → #9 → #10 → this.

Summary

Adds the fifth and final v0 task: conformance checking via DFG fitness × precision. v0.3 closes — every task in the README's table now ships with a CPython baseline and a leaderboard entry.
DFG-based, no pm4py dep. The structure is right-shaped for a future alignment-based variant (PR v0.3 closed: conformance task — DFG fitness × precision → F #11+) behind a [discovery] extra.

What's new

score_conformance — F = 2fp / (f+p) where f and p are computed from set overlap of submitted DFG vs test-partition DFG. Catches both the "model too small" (loses fitness) and "model too big" (loses precision) failure modes.
pm_bench/conformance.py — DFG extraction, model JSON r/w. Submission format is a JSON file with a transitions list of [a, b] pairs.
New CLI verb pm-bench discover — discover <name> --baseline dfg --out model.json discovers a DFG from training cases. Score path takes --dataset + --split (not --prefixes) since the model is global, not per-prefix.
leaderboard/conformance/synthetic-toy.json with the dfg-ref entry: F=0.857 (fitness 1.0, precision 0.75 — the model covers all test transitions but carries 2 extras the test never uses).
leaderboard.py, CLI standings printer, and STANDINGS.md markdown all learn the conformance column set. pm-bench leaderboard --all --verify now walks 4 boards.
11 new tests (test_conformance.py) — extract_dfg, every score corner case (perfect/tiny/big/disjoint), JSON round-trip, board verification, e2e CLI. 108 total.

Smoke

$ pm-bench discover synthetic-toy --split split.json --out model.json --baseline dfg
wrote model with 8 transitions to model.json (baseline=dfg)

$ pm-bench score model.json --task conformance --dataset synthetic-toy --split split.json
{
  "task": "conformance",
  "fitness": 1.0,
  "precision": 0.75,
  "fscore": 0.857...,
  "n_test_transitions": 6,
  "n_model_transitions": 8
}

$ pm-bench leaderboard --all --verify
bottleneck/synthetic-toy: OK — 1 entry(ies)
conformance/synthetic-toy: OK — 1 entry(ies)
next-event/synthetic-toy: OK — 1 entry(ies)
remaining-time/synthetic-toy: OK — 1 entry(ies)

Test plan

pytest -q — 108 passed (was 97 on PR CSV ingest — pm-bench works on any event-log CSV without registry plumbing #10)
ruff check pm_bench tests — clean
Score corner cases (perfect, too-small, too-big, disjoint, empty)
e2e CLI — discover → score conformance round-trip on synthetic-toy
STANDINGS.md staleness canary caught itself when conformance column added; regenerated and re-passes

Roadmap impact

v0.3 ✅ closed in README and GOALS.
The stack now spans v0.0 through v0.4 plus the CSV ingest user-facing win. Open milestones beyond this PR series are: pinning the first real BPI dataset (one-time TOS step) and gnn baseline integration — neither blocked on more code from us.

- score_conformance — pure CPython, no pm4py dep. F = 2fp/(f+p) where f and p are computed from set overlap of the submitted DFG and the test-partition DFG - pm_bench/conformance.py — extract_dfg, write/read_model_json. Submission format: {"transitions": [["a","b"], ...]} - New CLI verb `pm-bench discover <name> --baseline dfg --out model.json` — discovers a DFG from training cases. Score path takes --dataset and --split (instead of --prefixes) since the model is global, not per-prefix - leaderboard/conformance/synthetic-toy.json with dfg-ref entry (F=0.857, fitness 1.0, precision 0.75); pm-bench leaderboard --all now walks 4 boards - leaderboard.py + CLI standings printer + STANDINGS.md learn the conformance column set - 11 new tests (test_conformance.py); 108 total, ruff clean - v0.3 (5-task scoring) closed: every task has a baseline + entry

No semantic change; ASCII-only punctuation across READMEs, GOALS, source comments, doctests, and config. Verified by running the existing test suite (no test asserts on em-dash text).

protosphinx · 2026-05-01T17:54:21Z

Merged into main as part of the audit-cleanup stack (commit 9c00b47). The full content of this PR is now on main.

protosphinx added 2 commits April 30, 2026 21:31

chore: replace em dashes with hyphens per writing style guide

55d193a

No semantic change; ASCII-only punctuation across READMEs, GOALS, source comments, doctests, and config. Verified by running the existing test suite (no test asserts on em-dash text).

This was referenced May 1, 2026

synthetic-toy → 200 cases: outcome leaderboard row lands; all 5 boards real #12

Closed

pm-bench stats <name> — one-shot summary stats for any log #13

Closed

protosphinx deleted the branch csv-ingest May 1, 2026 17:54

protosphinx closed this May 1, 2026

protosphinx deleted the conformance-task branch May 1, 2026 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3 closed: conformance task — DFG fitness × precision → F#11

v0.3 closed: conformance task — DFG fitness × precision → F#11
protosphinx wants to merge 2 commits into
csv-ingestfrom
conformance-task

protosphinx commented May 1, 2026

Uh oh!

protosphinx commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

protosphinx commented May 1, 2026

Summary

What's new

Smoke

Test plan

Roadmap impact

Uh oh!

protosphinx commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant