Skip to content

v0.1.0b2 — Score-gated expression, WAL durability, ΣĒMA cold-storage

Choose a tag to compare

@mbachaud mbachaud released this 10 Apr 03:24
· 486 commits to master since this release

What's New

Score-gated expression & retrieval quality

  • Coverage metric now uses extracted domain/entity signals instead of raw word splits — coverage: 0.19 → 0.85-1.0
  • Ellipticity improved from 0.37 avg → 0.60-0.74 (approaching aligned threshold)
  • Score-gated trimming drops weak-scoring tail candidates (< 20% of top score)
  • Dynamic density denominator scales by expressed/max ratio

WAL durability

  • checkpoint() method with PASSIVE/FULL/TRUNCATE modes
  • Periodic checkpoint in upsert (every 50/500 genes) + background 60s timer in server
  • Max crash loss reduced from ~13,700 genes to ~50

ΣĒMA cold-storage compression tiers

  • Three tiers: OPEN (full fidelity), EUCHROMATIN (summary + ΣĒMA), HETEROCHROMATIN (ΣĒMA + metadata only)
  • compact_genome() retroactive sweep with configurable thresholds
  • Density gate at ingest routes low-signal content directly to cold tiers
  • /admin/compact and /admin/checkpoint endpoints

Domain tagging

  • spaCy EntityRuler with project vocabulary (before statistical NER)
  • SPLADE weight boosted 2.5 → 3.5 as semantic safety net

Performance

  • Dedicated read-only SQLite connection — WAL readers no longer block writers
  • ΣĒMA vector cache: pre-materialized numpy matrix replaces 7K json_loads() per query
  • Mode B scan: 120s → <100ms

All 179 tests passing.

🤖 Generated with Claude Code