v0.3.0b1 — Tree-sitter chunking, BABILong, training runbook
Minor release
Three coherent feature additions shipped together. Per the versioning plan, architectural additions get a minor bump — not 36-commit jumbo patches like the v0.1 line.
What's New
Tree-sitter AST chunking (opt-in)
The regex code chunker that has been around since v0.1.0 (with a # MVP heuristic — swap for tree-sitter later comment) is finally replaced. Tree-sitter understands real grammar boundaries — functions, classes, impl blocks, interfaces, type aliases — instead of matching def as a string.
Supported languages:
- Python
- Rust
- JavaScript
- TypeScript + TSX
Install the optional extra:
```bash
pip install helix-context[ast]
```
Auto-detected from metadata['path'] file extension. Falls back cleanly to the regex chunker when tree-sitter isn't installed or the language is unknown — zero breakage for existing users.
Why this matters: the regex chunker cut wherever it saw def or class, including inside docstrings and strings that happened to contain those keywords. Tree-sitter cuts at actual AST boundaries, so function bodies stay intact and class methods group correctly with their parent class.
BABILong multi-hop benchmark
New benchmarks/bench_babilong.py tests two- and three-hop reasoning across the genome. Based on bAbI (Weston et al., 2015) and BABILong (Kuratov et al., 2024).
Three task generators:
- task_1 — single supporting fact (sanity baseline)
- task_2 — two supporting facts (two-hop reasoning)
- task_3 — three supporting facts (three-hop reasoning)
Each task generates N=10 self-contained problems with distractor padding, ingests them as genes, queries with multi-hop questions, and measures retrieval rate, answer accuracy, and per-query latency.
Initial baseline: task_1 shows retrieval failure for pure narrative content with only proper names — the genome needs domain/entity anchors or source-path clues for reliable retrieval. This is a known limitation that the v0.2.0b2 authority boosts (source-path matching) should help with once the live server is restarted to pick up the new code.
Training runbook for DeBERTa re-train
New training/README.md documents the full DeBERTa fine-tune workflow. The existing 1,600-pair dataset was generated when the genome was ~3,500 genes — it's now at ~7,300 and covers concepts added since (SIKE, MoE decoder, cold-storage tiers) that the current trained models didn't see.
A full re-train isn't in this release (it needs an hour of GPU time and a spare Ollama teacher), but the runbook is ready for when you want to kick it off.
Included from v0.2.0b2 (already published but noted here for context)
- Retrieval authority boosts — source authority, domain primacy, creation recency
- IDF cap lowered 5.0 → 3.0 to reduce tangential rare-term over-boost
All 179 tests passing.
🤖 Generated with Claude Code