Skip to content

HDLBits Dataset for RLFT [test] #2

@noobsiecoder

Description

@noobsiecoder

Dataset Collection

  • Gather questions from HDLBits across categories:
    • Basics
    • Vectors
    • Modules & Hierarchy
    • Procedures
    • Combinational Logic (Gates, Multiplexers, Arithmetic Circuits, etc.)
    • Sequential Logic (Latches, Flip-flops, Counters, Shift Registers, etc.)
  • Expand coverage with augmented questions (variation, rephrasing, scaling difficulty)
  • Prepare additional sources if needed (e.g., RTL-Repo, custom prompts)

Dataset Structuring

  • Define consistent schema:
    • Question / Problem statement
    • Expected input/output behavior (testbench or truth table)
    • Ground truth solution (reference Verilog code)
  • Ensure compatibility with reward functions (compilation, synthesis, functional correctness, etc.)
  • Keep some for evals (5%-10%)

Documentation

  • Document dataset structure (fields, formatting rules)
  • Provide small example subset in repo for reference
  • Note augmentation methods used and rationale

Notes

  • Initial dataset size: ~20–50 HDLBits problems
  • Augmented to increase volume while retaining diversity
  • Benchmarking targets: VerilogEval, VeriReason Benchmarks, and RTL-Repo (for external validation)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions