Skip to content

data(hslm): real-world dataset validation benchmarks#579

Merged
gHashTag merged 1 commit into
mainfrom
data/424-real-world-dataset-benchmarks
Apr 30, 2026
Merged

data(hslm): real-world dataset validation benchmarks#579
gHashTag merged 1 commit into
mainfrom
data/424-real-world-dataset-benchmarks

Conversation

@gHashTag
Copy link
Copy Markdown
Owner

Summary

Domain-specific benchmark framework for validating HSLM on real-world data.

New file

  • src/b2t/domain_benchmark.zig — 239 LOC

Dataset types

  • Code completion (GitHub), Medical notes, Scientific papers (ArXiv), Synthetic

Features

  • DomainBenchmark: FP32 baseline + format comparison with PPL gap
  • BenchmarkSuite: Multi-dataset management with overall summary
  • Threshold checking: PPL within 10% of FP32
  • Formatted report with per-format PPL, accuracy, tok/sec, gap

Success criteria

  • HSLM achieves PPL within 10% of FP32 baseline
  • Clear advantage on at least one domain

Tests (3)

  • Baseline + comparison, multi-dataset suite, threshold check

Closes #424

- Add src/b2t/domain_benchmark.zig
- DomainBenchmark: per-dataset FP32 baseline + format comparison
  PPL gap computation, threshold checking, formatted reports
- BenchmarkSuite: multi-dataset benchmark management
  overall summary with average PPL gap, pass count
- Dataset types: code_completion, medical_notes,
  scientific_papers, synthetic
- Success: PPL within 10% of FP32 baseline
- 3 tests: baseline+comparison, multi-dataset suite, threshold

Closes #424
@gHashTag gHashTag merged commit a684270 into main Apr 30, 2026
8 of 16 checks passed
@gHashTag gHashTag deleted the data/424-real-world-dataset-benchmarks branch April 30, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data(hslm): Real-world dataset validation benchmarks

1 participant