Roadmap

High-priority improvements for Ruflo beyond v3.10.

Priority 1: Fix Known Issues (High Impact, Short Effort)

Item 1 — Fix Skipped Integration Tests (#1872)

Impact: Reliability Effort: M (1–2 days per test)

Four CI-skipped tests cover production bugs:

HybridBackend persistence across reinit
SwarmCoordinator error propagation
scaleAgents direction
Workflow resume after interruption

Action: Un-skip, fix, validate. Each test is independent.

Item 2 — Witness Manifest Drift (#2047)

Impact: Security / Enterprise Adoption Effort: S (CI config change)

95 dist artifacts missing from witness manifest. Root cause: scheduled verify runs against source-only checkout (no build).

Fix: Run npm run build before verify in scheduled job, or add --source-mode flag.

Item 3 — Real-Model SOTA Validation (M5, #2125)

Impact: Benchmark Credibility Effort: S–M (API key or golden-response replay)

SOTA comparator is 9/10 complete. M5 missing: real-model integration quality (response quality, hallucination rate, tool-call accuracy).

Options:

Rotate test API key (non-expiring)
Golden-response replay harness (safer, reproducible)

Output: Runnable benchmark vs. LangGraph/AutoGen/CrewAI with real LLM.

Priority 2: Core Gaps (Medium Impact, Medium Effort)

Item 4 — Skill Synthesis Loop (ADR-113 / R-3)

Impact: High (highest-leverage capability gap vs. Hermes) Effort: L (5–10 days)

Agents should auto-generate new skills (Claude Code slash commands) on the fly. Currently requires manual plugin publication.

What it does:

Agent says "I need to do X"
Lead synthesizes a skill from description
Skill auto-tests + publishes to local registry
Agent uses it immediately

Why it matters: Hermes-class agents auto-extend themselves. Ruflo should too.

Item 5 — Streaming Responses (ADR-129 streaming)

Impact: Medium (UX improvement) Effort: M (3–5 days)

End-to-end token streaming for Managed Agents. Currently collect + return all tokens at once.

Benefit: Real-time feedback loop, sub-1s first-token latency.

Item 6 — Flash Attention Deployment

Impact: High (2.49x–7.47x speedup) Effort: M (5 days)

Flash Attention is implemented (ADR-092) but not fully deployed. Enable in production.

Blockers: Benchmark validation + regression testing.

Priority 3: Ecosystem & Scaling (Lower Impact, Longer Timeline)

Item 7 — BFT Consensus Load Test

Impact: Medium (production confidence) Effort: M (3–5 days)

Byzantine consensus is wired and tested at unit level, but not validated under production load (100+ messages/sec, 9+ agents).

Action: Load test suite + performance report.

Item 8 — Branding (CLI Output Cleanup)

Impact: Low (user perception) Effort: S (1 day)

CLI output still leaks claude-flow strings. Ruflo rebranding incomplete.

Action: Grep for claude-flow in CLI output, replace with ruflo.

Item 9 — Hive-Mind Wizard

Impact: Medium (UX) Effort: S (2 days)

Interactive wizard for swarm topology selection and configuration.

npx ruflo@latest hive-mind wizard

Should guide user through:

Team size
Trust model (hierarchical vs. mesh)
Consensus strategy
Network topology

Item 10 — Plugin Registry GUI

Impact: Low (discoverability) Effort: M (5 days)

Web UI for browsing and installing plugins. Currently CLI-only.

Locations:

https://plugins.ruflo.dev (discoverable)
Embedded in daemon (http://localhost:3000/plugins)

2026 Timeline

Q	Focus	Items
Q2 2026	Bug fixes + M5	1, 2, 3
Q2/Q3 2026	Core features	4, 5, 6
Q3 2026	Scaling + UX	7, 8, 9, 10

Metrics

Success criteria:

Item 1–3: Done by end of Q2 2026
Integration test baseline: 1,999 → 2,050 (50 new tests)
M5 benchmark: Real-model response quality measured
Witness drift: 0 missing artifacts in scheduled runs
Flash Attention: Live in v3.11.0

Known Blockers

ADR-113 skill synthesis: Requires eval of auto-generated skills (safety critical)
Streaming: Depends on Anthropic SDK update
BFT load test: Needs custom harness (not in existing suite)

Community Contributions Welcome

Items 7–10 are great for external contributors:

Well-scoped (2–5 days)
Self-contained
Clear success criteria

See Contributing guide.

Questions?

Discussions: GitHub Discussions
Issues: Open an issue

Ruflo v3.10.1 · GitHub · Roadmap

Ruflo v3.10.1 · npm · GitHub · Benchmarks

Roadmap

Roadmap

Priority 1: Fix Known Issues (High Impact, Short Effort)

Item 1 — Fix Skipped Integration Tests (#1872)

Item 2 — Witness Manifest Drift (#2047)

Item 3 — Real-Model SOTA Validation (M5, #2125)

Priority 2: Core Gaps (Medium Impact, Medium Effort)

Item 4 — Skill Synthesis Loop (ADR-113 / R-3)

Item 5 — Streaming Responses (ADR-129 streaming)

Item 6 — Flash Attention Deployment

Priority 3: Ecosystem & Scaling (Lower Impact, Longer Timeline)

Item 7 — BFT Consensus Load Test

Item 8 — Branding (CLI Output Cleanup)

Item 9 — Hive-Mind Wizard

Item 10 — Plugin Registry GUI

2026 Timeline

Metrics

Known Blockers

Community Contributions Welcome

Questions?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Getting Started

Reference

Architecture

Advanced

Operations

Clone this wiki locally