-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Roadmap
High-priority improvements for Ruflo beyond v3.10.
Impact: Reliability Effort: M (1β2 days per test)
Four CI-skipped tests cover production bugs:
-
HybridBackendpersistence across reinit -
SwarmCoordinatorerror propagation -
scaleAgentsdirection - Workflow resume after interruption
Action: Un-skip, fix, validate. Each test is independent.
Impact: Security / Enterprise Adoption Effort: S (CI config change)
95 dist artifacts missing from witness manifest. Root cause: scheduled verify runs against source-only checkout (no build).
Fix: Run npm run build before verify in scheduled job, or add --source-mode flag.
Impact: Benchmark Credibility Effort: SβM (API key or golden-response replay)
SOTA comparator is 9/10 complete. M5 missing: real-model integration quality (response quality, hallucination rate, tool-call accuracy).
Options:
- Rotate test API key (non-expiring)
- Golden-response replay harness (safer, reproducible)
Output: Runnable benchmark vs. LangGraph/AutoGen/CrewAI with real LLM.
Impact: High (highest-leverage capability gap vs. Hermes) Effort: L (5β10 days)
Agents should auto-generate new skills (Claude Code slash commands) on the fly. Currently requires manual plugin publication.
What it does:
- Agent says "I need to do X"
- Lead synthesizes a skill from description
- Skill auto-tests + publishes to local registry
- Agent uses it immediately
Why it matters: Hermes-class agents auto-extend themselves. Ruflo should too.
Impact: Medium (UX improvement) Effort: M (3β5 days)
End-to-end token streaming for Managed Agents. Currently collect + return all tokens at once.
Benefit: Real-time feedback loop, sub-1s first-token latency.
Impact: High (2.49xβ7.47x speedup) Effort: M (5 days)
Flash Attention is implemented (ADR-092) but not fully deployed. Enable in production.
Blockers: Benchmark validation + regression testing.
Impact: Medium (production confidence) Effort: M (3β5 days)
Byzantine consensus is wired and tested at unit level, but not validated under production load (100+ messages/sec, 9+ agents).
Action: Load test suite + performance report.
Impact: Low (user perception) Effort: S (1 day)
CLI output still leaks claude-flow strings. Ruflo rebranding incomplete.
Action: Grep for claude-flow in CLI output, replace with ruflo.
Impact: Medium (UX) Effort: S (2 days)
Interactive wizard for swarm topology selection and configuration.
npx ruflo@latest hive-mind wizardShould guide user through:
- Team size
- Trust model (hierarchical vs. mesh)
- Consensus strategy
- Network topology
Impact: Low (discoverability) Effort: M (5 days)
Web UI for browsing and installing plugins. Currently CLI-only.
Locations:
- https://plugins.ruflo.dev (discoverable)
- Embedded in daemon (http://localhost:3000/plugins)
| Q | Focus | Items |
|---|---|---|
| Q2 2026 | Bug fixes + M5 | 1, 2, 3 |
| Q2/Q3 2026 | Core features | 4, 5, 6 |
| Q3 2026 | Scaling + UX | 7, 8, 9, 10 |
Success criteria:
- Item 1β3: Done by end of Q2 2026
- Integration test baseline: 1,999 β 2,050 (50 new tests)
- M5 benchmark: Real-model response quality measured
- Witness drift: 0 missing artifacts in scheduled runs
- Flash Attention: Live in v3.11.0
- ADR-113 skill synthesis: Requires eval of auto-generated skills (safety critical)
- Streaming: Depends on Anthropic SDK update
- BFT load test: Needs custom harness (not in existing suite)
Items 7β10 are great for external contributors:
- Well-scoped (2β5 days)
- Self-contained
- Clear success criteria
See Contributing guide.
- Discussions: GitHub Discussions
- Issues: Open an issue
Ruflo v3.10.1 Β· GitHub Β· Roadmap
Ruflo v3.10.1 Β· npm Β· GitHub Β· Benchmarks