data(hslm): real-world dataset validation benchmarks by gHashTag · Pull Request #579 · gHashTag/trinity

gHashTag · 2026-04-30T01:32:57Z

Summary

Domain-specific benchmark framework for validating HSLM on real-world data.

New file

src/b2t/domain_benchmark.zig — 239 LOC

Dataset types

Code completion (GitHub), Medical notes, Scientific papers (ArXiv), Synthetic

Features

DomainBenchmark: FP32 baseline + format comparison with PPL gap
BenchmarkSuite: Multi-dataset management with overall summary
Threshold checking: PPL within 10% of FP32
Formatted report with per-format PPL, accuracy, tok/sec, gap

Success criteria

HSLM achieves PPL within 10% of FP32 baseline
Clear advantage on at least one domain

Tests (3)

Baseline + comparison, multi-dataset suite, threshold check

Closes #424

- Add src/b2t/domain_benchmark.zig - DomainBenchmark: per-dataset FP32 baseline + format comparison PPL gap computation, threshold checking, formatted reports - BenchmarkSuite: multi-dataset benchmark management overall summary with average PPL gap, pass count - Dataset types: code_completion, medical_notes, scientific_papers, synthetic - Success: PPL within 10% of FP32 baseline - 3 tests: baseline+comparison, multi-dataset suite, threshold Closes #424

gHashTag merged commit a684270 into main Apr 30, 2026
8 of 16 checks passed

gHashTag deleted the data/424-real-world-dataset-benchmarks branch April 30, 2026 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

data(hslm): real-world dataset validation benchmarks#579

data(hslm): real-world dataset validation benchmarks#579
gHashTag merged 1 commit into
mainfrom
data/424-real-world-dataset-benchmarks

gHashTag commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gHashTag commented Apr 30, 2026

Summary

New file

Dataset types

Features

Success criteria

Tests (3)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant