Release v0.3.0b1 — Tree-sitter chunking, BABILong, training runbook · mbachaud/helix-context

Minor release

Three coherent feature additions shipped together. Per the versioning plan, architectural additions get a minor bump — not 36-commit jumbo patches like the v0.1 line.

What's New

Tree-sitter AST chunking (opt-in)

The regex code chunker that has been around since v0.1.0 (with a # MVP heuristic — swap for tree-sitter later comment) is finally replaced. Tree-sitter understands real grammar boundaries — functions, classes, impl blocks, interfaces, type aliases — instead of matching def as a string.

Supported languages:

Python
Rust
JavaScript
TypeScript + TSX

Install the optional extra:
```bash
pip install helix-context[ast]
```

Auto-detected from metadata['path'] file extension. Falls back cleanly to the regex chunker when tree-sitter isn't installed or the language is unknown — zero breakage for existing users.

Why this matters: the regex chunker cut wherever it saw def or class, including inside docstrings and strings that happened to contain those keywords. Tree-sitter cuts at actual AST boundaries, so function bodies stay intact and class methods group correctly with their parent class.

BABILong multi-hop benchmark

New benchmarks/bench_babilong.py tests two- and three-hop reasoning across the genome. Based on bAbI (Weston et al., 2015) and BABILong (Kuratov et al., 2024).

Three task generators:

task_1 — single supporting fact (sanity baseline)
task_2 — two supporting facts (two-hop reasoning)
task_3 — three supporting facts (three-hop reasoning)

Each task generates N=10 self-contained problems with distractor padding, ingests them as genes, queries with multi-hop questions, and measures retrieval rate, answer accuracy, and per-query latency.

Initial baseline: task_1 shows retrieval failure for pure narrative content with only proper names — the genome needs domain/entity anchors or source-path clues for reliable retrieval. This is a known limitation that the v0.2.0b2 authority boosts (source-path matching) should help with once the live server is restarted to pick up the new code.

Training runbook for DeBERTa re-train

New training/README.md documents the full DeBERTa fine-tune workflow. The existing 1,600-pair dataset was generated when the genome was ~3,500 genes — it's now at ~7,300 and covers concepts added since (SIKE, MoE decoder, cold-storage tiers) that the current trained models didn't see.

A full re-train isn't in this release (it needs an hour of GPU time and a spare Ollama teacher), but the runbook is ready for when you want to kick it off.

Included from v0.2.0b2 (already published but noted here for context)

Retrieval authority boosts — source authority, domain primacy, creation recency
IDF cap lowered 5.0 → 3.0 to reduce tangential rare-term over-boost

All 179 tests passing.

🤖 Generated with Claude Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0b1 — Tree-sitter chunking, BABILong, training runbook

Choose a tag to compare

Sorry, something went wrong.