-
Notifications
You must be signed in to change notification settings - Fork 0
Publishing Results
PUMA can publish run results to PUMA Community, a PUMA Community public registry where users of the PUMA benchmarking tool share their verified results. Publishing is opt-in, anonymous by default, and entirely PR-based — no central server, no account except GitHub itself.
- Reproducibility for the field. Local LLM benchmarks are intrinsically hardware-dependent. The same model on different machines yields slightly different latency, energy, and occasionally different outputs. Sharing results across hardware diversifies the empirical record.
- Cross-team comparisons. Once your run is in the public archive, anyone comparing the same model and scenario can cite your numbers, replicate them, and contrast them with their own.
- Community knowledge base. Each submission becomes part of a queryable dataset of verified PUMA runs. Researchers and practitioners can identify which models genuinely shine, on which scenarios, under which budgets.
A submission contains:
- The scenario name and instance count.
- The model tag and prompting strategy.
- The hardware profile (
cpu-lite,gpu-entry, etc.) — not your specific CPU model or hostname. - All metric families: accuracy, calibration, efficiency, stability, robustness, fairness, sustainability.
- The PUMA version that produced the run.
- A self-chosen submitter alias (a free-form string, can be a pseudonym).
- A SHA-256 hash over the predictions summary for cryptographic integrity.
No raw prompts, no raw model responses, no environment variables, no file system paths, and no PII. The local validator scans the submission for patterns that look like email addresses, IPs, GitHub tokens, and absolute paths before it ever leaves your machine.
-
Create a GitHub Personal Access Token (PAT). Use the fine-grained PAT
creator at
https://github.com/settings/tokens?type=beta. Repository access:pumacp/puma-community. Permissions:Pull requests: Read and write,Contents: Read and write,Metadata: Read. -
Authenticate locally:
Paste the PAT when prompted. It is stored under
docker compose run --rm puma_runner puma auth login
~/.puma/credentials.tomlwith0600permissions. -
Dry-run first:
This builds the submission JSON locally without publishing. Inspect it under
docker compose run --rm puma_runner puma share-results \ --dry-run --run-id <your_run_id>
data/community/submissions/. -
Publish:
docker compose run --rm puma_runner puma share-results --run-id <your_run_id>
The PR is automatically validated against schema/submission.v1.json.
Integrity is verified by recomputing the SHA-256 hash. If both checks pass,
the auto-merge workflow merges your PR within a few minutes and the
PUMA Community badges refresh to reflect the new submission.
See the Anonymity and Privacy page on PUMA Community for the full list of what is and is not transmitted, plus the procedure for withdrawing a submission after publication.