Skip to content

data(hslm): Real-world dataset validation benchmarks #424

@gHashTag

Description

@gHashTag

Issue from Trinity Improvement Plan (Part 1, Priority 1)

Context

Current tests only use synthetic/standard datasets. Real-world validation needed.

Task Description

Benchmark HSLM on domain-specific datasets and compare against baselines.

Datasets to Test

  1. Code completion: GitHub codebase samples
  2. Medical notes: De-identified clinical text
  3. Scientific papers: ArXiv abstracts/full papers

Baselines to Compare

  1. FP32 baseline (full precision)
  2. Other ternary approaches
  3. Original HSLM paper benchmarks

Metrics to Collect

  1. Perplexity (PPL) on each dataset
  2. Accuracy (for tasks where applicable)
  3. Training speed (epochs/hour)
  4. Inference speed (tok/s)
  5. Model size (MB)

Deliverables

  1. Dataset + methodology publication
  2. Updated research documentation
  3. README with benchmark results
  4. Comparison paper (if significant findings)

Timeline

  • Week 1: Dataset preparation
  • Week 2-3: Training and evaluation
  • Week 4: Analysis and write-up

Success Criteria

  • HSLM achieves PPL within 10% of FP32 baseline
  • Clear advantage demonstrated on at least one domain
  • Results reproducible (code + data published)

Labels

priority: high, validation, research, hslm
type: experiment
component: hslm

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions