Skip to content

Roadmap

ruv edited this page May 25, 2026 · 1 revision

Roadmap

High-priority improvements for Ruflo beyond v3.10.


Priority 1: Fix Known Issues (High Impact, Short Effort)

Item 1 β€” Fix Skipped Integration Tests (#1872)

Impact: Reliability Effort: M (1–2 days per test)

Four CI-skipped tests cover production bugs:

  • HybridBackend persistence across reinit
  • SwarmCoordinator error propagation
  • scaleAgents direction
  • Workflow resume after interruption

Action: Un-skip, fix, validate. Each test is independent.

Item 2 β€” Witness Manifest Drift (#2047)

Impact: Security / Enterprise Adoption Effort: S (CI config change)

95 dist artifacts missing from witness manifest. Root cause: scheduled verify runs against source-only checkout (no build).

Fix: Run npm run build before verify in scheduled job, or add --source-mode flag.

Item 3 β€” Real-Model SOTA Validation (M5, #2125)

Impact: Benchmark Credibility Effort: S–M (API key or golden-response replay)

SOTA comparator is 9/10 complete. M5 missing: real-model integration quality (response quality, hallucination rate, tool-call accuracy).

Options:

  • Rotate test API key (non-expiring)
  • Golden-response replay harness (safer, reproducible)

Output: Runnable benchmark vs. LangGraph/AutoGen/CrewAI with real LLM.


Priority 2: Core Gaps (Medium Impact, Medium Effort)

Item 4 β€” Skill Synthesis Loop (ADR-113 / R-3)

Impact: High (highest-leverage capability gap vs. Hermes) Effort: L (5–10 days)

Agents should auto-generate new skills (Claude Code slash commands) on the fly. Currently requires manual plugin publication.

What it does:

  • Agent says "I need to do X"
  • Lead synthesizes a skill from description
  • Skill auto-tests + publishes to local registry
  • Agent uses it immediately

Why it matters: Hermes-class agents auto-extend themselves. Ruflo should too.


Item 5 β€” Streaming Responses (ADR-129 streaming)

Impact: Medium (UX improvement) Effort: M (3–5 days)

End-to-end token streaming for Managed Agents. Currently collect + return all tokens at once.

Benefit: Real-time feedback loop, sub-1s first-token latency.


Item 6 β€” Flash Attention Deployment

Impact: High (2.49x–7.47x speedup) Effort: M (5 days)

Flash Attention is implemented (ADR-092) but not fully deployed. Enable in production.

Blockers: Benchmark validation + regression testing.


Priority 3: Ecosystem & Scaling (Lower Impact, Longer Timeline)

Item 7 β€” BFT Consensus Load Test

Impact: Medium (production confidence) Effort: M (3–5 days)

Byzantine consensus is wired and tested at unit level, but not validated under production load (100+ messages/sec, 9+ agents).

Action: Load test suite + performance report.


Item 8 β€” Branding (CLI Output Cleanup)

Impact: Low (user perception) Effort: S (1 day)

CLI output still leaks claude-flow strings. Ruflo rebranding incomplete.

Action: Grep for claude-flow in CLI output, replace with ruflo.


Item 9 β€” Hive-Mind Wizard

Impact: Medium (UX) Effort: S (2 days)

Interactive wizard for swarm topology selection and configuration.

npx ruflo@latest hive-mind wizard

Should guide user through:

  • Team size
  • Trust model (hierarchical vs. mesh)
  • Consensus strategy
  • Network topology

Item 10 β€” Plugin Registry GUI

Impact: Low (discoverability) Effort: M (5 days)

Web UI for browsing and installing plugins. Currently CLI-only.

Locations:


2026 Timeline

Q Focus Items
Q2 2026 Bug fixes + M5 1, 2, 3
Q2/Q3 2026 Core features 4, 5, 6
Q3 2026 Scaling + UX 7, 8, 9, 10

Metrics

Success criteria:

  • Item 1–3: Done by end of Q2 2026
  • Integration test baseline: 1,999 β†’ 2,050 (50 new tests)
  • M5 benchmark: Real-model response quality measured
  • Witness drift: 0 missing artifacts in scheduled runs
  • Flash Attention: Live in v3.11.0

Known Blockers

  • ADR-113 skill synthesis: Requires eval of auto-generated skills (safety critical)
  • Streaming: Depends on Anthropic SDK update
  • BFT load test: Needs custom harness (not in existing suite)

Community Contributions Welcome

Items 7–10 are great for external contributors:

  • Well-scoped (2–5 days)
  • Self-contained
  • Clear success criteria

See Contributing guide.


Questions?


Ruflo v3.10.1 Β· GitHub Β· Roadmap

Clone this wiki locally