[Proposal] Stream-Aware RAG Plugin: Dynamic retrieval with hybrid fallback #37320

KERVIN-FARMER · 2026-06-11T05:23:23Z

KERVIN-FARMER
Jun 11, 2026

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
Please do not modify this template :) and fill in all the required fields.

Content

I'm building a plugin to bring Self-RAG concepts into Dify in a practical, stable way.

The Problem

Standard RAG fails on multi-step reasoning when initial context isn't enough.
Agent RAG relies on the LLM's "self-awareness" to trigger retrieval, which causes hallucinations.
API Limits: Standard LLM APIs don't allow mid-stream prompt injection. We need a robust fallback strategy, not a fragile single path.

The Solution: Hybrid Triggers & Fallbacks

Instead of one method, the plugin dynamically checks model confidence with a built-in fallback chain:
Target (V2): Chunk-level Confidence Check. Evaluate Perplexity/Entropy every N tokens via an external GPU service.
Fallback 1 (MVP): Prompt-Guided. LLM outputs a special tag (e.g., [RETRIEVE: query]). Plugin intercepts, fetches data, and continues.
Fallback 2: Keyword Catch. Triggers on words like "specifically" or "explain".
Fallback 3: Standard RAG. One-shot retrieval before generation.
Fallback 4: Pure LLM. If the vector DB is down, LLM answers from memory to prevent errors.

Tech Stack & Architecture

Zero-Intrusion: 100% Dify Plugin System. No core code modifications.
Async & Fast: Uses asyncio + httpx. No Celery (keeps streaming latency low).
State: Redis for caching retrieval results and tracking external GPU health.

Roadmap

Phase 1 (MVP): Build Fallback 1-4. Works perfectly with the current Plugin SDK via multi-turn orchestration.
Phase 2: Integrate the V2 Chunk-level Confidence Check using an external GPU service.

I've also added my skills.md here, and it's in Chinese to better suit the community's requirements
stream-aware-rag.md
dify-innovation.md

KERVIN-FARMER · 2026-06-11T05:23:53Z

KERVIN-FARMER
Jun 11, 2026
Author

Please check the
stream-aware-rag.md

0 replies

KERVIN-FARMER · 2026-06-11T06:00:33Z

KERVIN-FARMER
Jun 11, 2026
Author

@dosu Would you help me to check my proposal's viability and offer me some further suggestions? Thank you

1 reply

dosubot[bot] Bot Jun 11, 2026

Hi @KERVIN-FARMER! I'm Dosu and I'm helping the Dify team.

I've reviewed your Stream-Aware RAG proposal against the current Dify plugin architecture. Here's my assessment:

Viability Summary

The core V2 concept (chunk-level mid-stream token interception and injection) is not feasible with the current Plugin SDK. Dify's streaming architecture is append-only and designed for observability/replay via Redis Streams — there are no plugin hooks to intercept, pause, or inject context during token generation [1] [2].

However, your Fallback 1–4 chain is partially achievable using an iterative Agent Strategy plugin.

What Works Today (No Core Mods)

Fallback	Feasibility	Mechanism
Fallback 1 (Prompt-Guided `[RETRIEVE: query]`)	⚠️ Partially	Agent Strategy can run multi-turn loops: generate → parse output for tags → invoke retrieval tool → re-generate with new context. Not true mid-stream interception, but functionally similar. [3]
Fallback 2 (Keyword Catch)	✅ Same as above	Same iterative pattern — evaluate completed output, trigger retrieval, regenerate.
Fallback 3 (Standard RAG)	✅ Yes	KnowledgeRetrievalNode already does this, or a custom Datasource plugin can enhance it [4] [5].
Fallback 4 (Pure LLM)	✅ Yes	Simply skip retrieval in your strategy's logic if the vector DB is unreachable.

What Does NOT Work

True mid-stream token interception — the plugin SDK has no hooks between the LLM adapter and the event sink [1].
Perplexity/entropy scoring per N tokens (your V2 target) — plugins cannot access raw logprobs or streaming token data during generation.
Injecting context mid-generation — retrieval and generation are architecturally sequential [6].

Suggestions

Reframe Fallback 1 as multi-turn agent orchestration, not mid-stream interception. Your Agent Strategy plugin's invoke() generator can loop internally: generate a full response → scan for [RETRIEVE: query] → call a retrieval tool → prepend context → regenerate. This gives you the same user-facing behavior with slightly higher latency.
Use a custom Tool plugin for retrieval — wrap your vector DB + Redis cache logic into a Dify Tool that your Agent Strategy invokes. This keeps retrieval logic modular and testable [7].
Leverage the Datasource plugin type for custom RAG sources. This integrates directly with Dify's existing knowledge retrieval pipeline for Fallback 3 [5].
For V2 (perplexity scoring): this would require either a custom Model plugin that wraps an LLM and exposes confidence metadata, or — more realistically — core architecture changes to expose a streaming middleware layer. Consider proposing this as a Dify core feature request rather than a pure plugin.
Watch the dify-agent architecture evolution — it introduces session snapshots, deferred tool calls, and Layer lifecycle hooks [8] that could eventually enable more sophisticated mid-stream control, but these are not yet exposed to plugin developers.

Revised Architecture Recommendation

Agent Strategy Plugin (invoke loop)
├── Turn 1: LLM generates with initial context
├── Evaluate: Check for [RETRIEVE:] tags or keyword triggers
├── If triggered: Invoke Retrieval Tool Plugin → get context
├── Turn 2: LLM regenerates with enriched context
├── If retrieval fails: Fallback to pure LLM (Fallback 4)
└── Stream final result to user

This multi-turn pattern achieves your functional goals without core modifications. The tradeoff is latency (full generation cycles between retrieval steps) rather than true streaming interruption.

Would you like me to look into specific implementation details of the Agent Strategy plugin interface, or explore the Datasource plugin type for your retrieval layer?

To reply, just mention @dosu.

Share context across your team and agents. Try Dosu.

KERVIN-FARMER · 2026-06-11T06:33:35Z

KERVIN-FARMER
Jun 11, 2026
Author

@dosu Thank you for replying! Your comment is truly inspiring, though I don't know if your modified plan is still worthwhile to do and can add value to the Dify programme.

stream-aware-rag.md
The above one is my plan, would you read the file and then discuss the viability and worthwhileness of my plan

If you fell my plan is not as worthwhile to deal with, please read the following file to suggest me some new projects:
dify-innovation.md

7 replies

dosubot[bot] Bot Jun 11, 2026

Good news — you don't need ECS or AutoDL to contribute meaningful plugins to Dify. The entire plugin development workflow runs locally with Docker + Python. Let me refine the two options for your situation.

Updated Assessment: Option 2 vs Option 3

Option 2: Agent Evaluation Framework

Reality check: Agent evaluation is currently deeply embedded in Dify's core test infrastructure (dify-agent/tests/local/test_runner.py with 2,200+ lines of tests) [1]. It's not exposed as a plugin interface, meaning you'd likely need to contribute to Dify core rather than build a standalone plugin. This is harder to scope for an intern and harder to get merged without deep familiarity with the codebase.

Verdict: ⚠️ Higher risk, less clear path to a shippable deliverable.

Option 3: Intelligent Retrieval Router (Datasource Plugin)

This is your best bet. Here's why:

✅ No cloud needed — fully testable on a local Dify instance with remote debugging [2] [3]
✅ Clear interface — Datasource plugins have a well-defined contract (OnlineDocumentDatasource base class) [4]
✅ 17 reference implementations to study (Jina Reader, Notion, Confluence, etc.)
✅ Addresses real community pain — users want dynamic knowledge base routing based on query intent [5]
✅ LLM/AI career relevant — you'll learn RAG pipelines, embedding semantics, and retrieval optimization
✅ Can use free/cheap LLM APIs — OpenAI free tier, Groq, or local Ollama for the routing classifier

Concrete Plan for Option 3 (No Cloud Required)

What you'd build: A plugin that intelligently routes incoming queries to the most relevant knowledge source based on query classification — instead of blindly searching all datasets.

Local tech stack:

Local Dify instance (Docker Compose)
Python plugin with dify_plugin SDK
Ollama or any free LLM API for query classification
No GPU needed (classification is lightweight)

Step-by-Step Roadmap

Week 1: Setup & Study

Install Dify locally via Docker Compose
Install the Dify CLI tool for plugin scaffolding
Clone dify-official-plugins and study 2–3 datasource plugins (start with Jina Reader [6])
Read the Datasource Plugin Development guide
Get remote debugging working [3]

Week 2: MVP

Scaffold your plugin with the CLI
Implement basic query routing logic (keyword-based first, then LLM-based)
Test in a local knowledge pipeline with 2–3 sample datasets
Validate that routing improves retrieval relevance vs. querying all datasets

Week 3: Polish & Ship

Add configuration options (routing strategy, confidence thresholds)
Write tests
Document with a proper README
Submit PR to dify-official-plugins

Key Tips

No hot-reload — restart your plugin process after every code change [3]
Don't use localhost if Dify is in Docker — use your host IP or host.docker.internal [3]
Disable signature verification during development: set FORCE_VERIFYING_SIGNATURE=false in Dify config [7]
Use yield without trailing commas — a common bug that causes silent failures [3]

Resume Framing

"Designed and implemented an Intelligent Retrieval Router plugin for Dify (open-source LLM platform, 100k+ GitHub stars), using LLM-powered query classification to dynamically route RAG queries to optimal knowledge sources, reducing irrelevant context retrieval and improving answer quality."

Alternative Low-Cost Option: Agent Strategy Plugin

If you find the Datasource approach too narrow, consider building a custom Agent Strategy plugin instead. The existing ReAct [8] and Function Calling [9] strategies in dify-official-plugins are excellent references (~2,000 lines each). You could build something like:

A "Plan-and-Execute" agent strategy that decomposes complex tasks into subtasks before execution
A "Retrieval-Augmented Agent" strategy that iteratively retrieves context between reasoning steps (this captures the spirit of your Self-RAG idea, just implemented as multi-turn rather than mid-stream)

Both work locally with any LLM API.

My recommendation: Start with the Intelligent Retrieval Router (Option 3). It's the most scoped, most feasible, and directly builds the RAG/retrieval skills relevant to an AI career. Would you like me to dig into any specific part of this plan?

To reply, just mention @dosu.

Share context across your team and agents. Try Dosu.

KERVIN-FARMER Jun 11, 2026
Author

@dosu I guess the Alternative Low cost option may be better for me since they're not as narrow and can be presented on CV and accepted by the reviewers; Correct me if I'm wrong

dosubot[bot] Bot Jun 11, 2026

I'd actually push back slightly on that assumption. Here's why:

Agent Strategy Plugin: Honest Risk Assessment

The situation is less favorable than it might appear:

Only one agent strategy plugin exists — and it bundles both ReAct and Function Calling together in a single package (cot_agent) [1]. This consolidated approach suggests the team may not want a proliferation of separate strategy plugins.
No community demand for new reasoning patterns — I found no issues or discussions requesting Plan-and-Execute, Tree-of-Thought, or Reflection strategies. Community interest is focused on making existing agents more transparent (visualizing thinking steps) and multi-agent orchestration [2] [3].
Unclear acceptance path — There's no documented contribution process for agent strategy plugins, and recent PRs focus on infrastructure (sandbox agents, app types, versioning) rather than new reasoning strategies [4] [5].
High risk of non-acceptance — If you spend weeks building a Plan-and-Execute strategy and the maintainers don't see demand for it, it won't get merged, and an unmerged PR is much weaker on a CV than a merged one.

What Actually Looks Good on a CV

A merged contribution to a 100k+ star project > an unmerged ambitious PR.

For "accepted by reviewers" + "CV-worthy," your best options are ranked:

Option	Acceptance Risk	CV Appeal	Feasibility
Datasource Plugin (Retrieval Router)	Low — clear interface, 17 precedents	High — RAG + LLM routing	✅ Easy
Fix ReAct bugs + add reasoning visualization	Low — addresses known issues [6]	Medium-High — demonstrates debugging + understanding of agent internals	✅ Easy
New Agent Strategy (e.g., Plan-and-Execute)	High — no demand signal, unclear if maintainers want it	High if merged, low if not	⚠️ Uncertain

My Revised Recommendation

If your goal is "merged contribution that demonstrates AI/LLM skills on a CV", I'd suggest either:

Option A: The Intelligent Retrieval Router (Datasource plugin) — it's RAG-focused, clearly scoped, and has a straightforward path to acceptance.
Option B: Fix the existing ReAct strategy bugs [6] and contribute improved agent reasoning visibility (which the community is actively requesting). This shows you understand agent internals without the risk of building something nobody asked for.

Both are more likely to actually get merged than a novel agent strategy, and a merged PR is what matters for your CV.

That said — if you're passionate about building a new agent strategy and view it primarily as a learning exercise (even if it doesn't get merged), it's still valuable for understanding agent architectures. Just set expectations accordingly.

What direction feels right to you?

To reply, just mention @dosu.

Share context across your team and agents. Try Dosu.

KERVIN-FARMER Jun 11, 2026
Author

@dosu Thank you for replying, and here're the reply made by Claude to me, and would you give me the new advice based on that and all informations: ● 基于对 Dify 架构的深入分析，我现在为你构思 3
个真正创新的二次开发需求。这些需求都是"别人没做过"或"极少有人做"的方向，充分利用 ECS 和 AutoDL 资源联动。

🚀 需求一：Multi-Modal Chain-of-Thought 图谱推理引擎

为什么"别人没做过"（差异化分析）

创新点：

学术前沿工程化：结合最新的 Multimodal CoT（2024 CVPR）和 Graph-of-Thoughts（2024
NeurIPS）论文，将图像+文本的推理过程显式化为知识图谱
Dify 生态空白：当前 Dify 的 Agent 节点只支持文本推理，图像只能做简单的描述生成，没有深度的视觉推理能力
信息差：大部分 RAG 系统停留在"文本检索+LLM 回答"，极少有人把视觉推理中间步骤持久化并用于后续检索增强

与常见方案的区别：

❌ 烂大街：简单调用 GPT-4V/Claude 描述图片
❌ 伪创新：硬套 GraphRAG（几篇文档构建图谱意义不大）
✅ 真创新：在工业场景（医疗影像、工程图纸、监控视频）中，将视觉推理过程分解为多步子任务，构建推理图谱，支持
溯源和复用

业务/技术痛点

痛点场景：

医疗诊断助手：CT 影像诊断需要"发现异常区域 → 对比历史片子 → 结合病史推理 → 给出诊断建议"的多步推理
工业质检：缺陷图片需要"定位缺陷 → 分类缺陷类型 → 关联生产参数 → 追溯根因"
法律证据分析：扫描文件需要"OCR 提取 → 实体识别 → 关系抽取 → 构建证据链"

技术痛点：

Dify 当前只支持简单的图像输入节点，无法做多步视觉推理
Agent 推理过程是黑盒，无法可视化、溯源和复用
知识库只支持文本向量检索，无法检索"推理路径"

核心功能与实现路径

架构设计

┌─────────────────────────────────────────────────────────┐
│ ECS (Dify) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Custom Workflow Node: VisualCoTNode │ │
│ │ - 接收图像 + 问题 │ │
│ │ - 调用 AutoDL 推理服务 │ │
│ │ - 解析推理图谱并存储到 Neo4j │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Vector DB Extension: GraphRAG-VDB │ │
│ │ - 存储推理子步骤的文本向量 │ │
│ │ - 关联 Neo4j 图谱节点 ID │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
↕ HTTP API
┌─────────────────────────────────────────────────────────┐
│ AutoDL (GPU) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Visual CoT Reasoning Service │ │
│ │ - Qwen2-VL / CogVLM2 (开源视觉模型) │ │
│ │ - 自研 CoT Prompt Chain │ │
│ │ - 输出结构化推理图谱 (JSON) │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Neo4j Graph Database │ │
│ │ - 存储推理图谱 (节点=推理步骤, 边=逻辑关系) │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

实现路径

Step 1: AutoDL 端实现推理服务

autodl/visual_cot_service.py

class VisualCoTReasoner:
def init(self):
self.model = Qwen2VLForConditionalGeneration.from_pretrained(...)
self.cot_prompt_chain = [
"Step 1: Identify key visual elements",
"Step 2: Analyze spatial relationships",
"Step 3: Connect with domain knowledge",
"Step 4: Draw conclusion"
]

  def reason(self, image: Image, question: str) -> ReasoningGraph:
      """多步推理并返回图谱"""
      graph = ReasoningGraph()
      for step_prompt in self.cot_prompt_chain:
          # 调用视觉模型
          result = self.model.generate(image, step_prompt)
          # 解析推理步骤
          node = graph.add_reasoning_step(result)
      return graph.to_json()

Step 2: Dify 自定义节点

api/core/workflow/nodes/visual_cot/visual_cot_node.py

from graphon.nodes.base.node import Node
from .entities import VisualCoTNodeData

class VisualCoTNode(Node[VisualCoTNodeData]):
node_type = "visual_cot"

  def _run(self):
      # 1. 获取图像文件
      image_variable = self.graph_runtime_state.variable_pool.get(...)

      # 2. 调用 AutoDL 推理服务
      response = requests.post(
          f"{AUTODL_ENDPOINT}/visual_cot/reason",
          files={"image": image_variable.value},
          json={"question": self.node_data.question}
      )
      reasoning_graph = response.json()

      # 3. 存储到 Neo4j
      self._store_reasoning_graph(reasoning_graph)

      # 4. 提取关键步骤作为文本存入向量库
      self._index_reasoning_steps(reasoning_graph)

      yield NodeFinishEvent(outputs={"conclusion": reasoning_graph["conclusion"]})

Step 3: 向量数据库扩展（关联图谱）

api/packages/dify-vdb-graphrag/src/dify_vdb_graphrag/vector.py

class GraphRAGVector(BaseVector):
"""支持推理图谱检索的向量库"""

  def add_texts(self, texts, metadatas, **kwargs):
      # 除了存向量，还存 Neo4j 图节点 ID
      for text, metadata in zip(texts, metadatas):
          embedding = self.embeddings.embed_query(text)
          self.client.add(
              vector=embedding,
              payload={
                  "text": text,
                  "neo4j_node_id": metadata.get("reasoning_step_id")
              }
          )

  def search_with_reasoning_path(self, query: str, top_k: int):
      """检索相似步骤，并返回完整推理路径"""
      vectors = self.similarity_search(query, top_k)
      paths = []
      for vec in vectors:
          node_id = vec.metadata["neo4j_node_id"]
          path = self.neo4j_client.get_reasoning_path(node_id)
          paths.append(path)
      return paths

源码修改点：

api/core/workflow/nodes/ - 新增 visual_cot/ 目录
api/packages/dify-vdb-graphrag/ - 新增向量库扩展包
api/pyproject.toml - 注册 entry point: graphrag = "dify_vdb_graphrag:GraphRAGVectorFactory"
docker/docker-compose.yaml - 添加 Neo4j 服务

简历杀手锏亮点

学术前沿落地："将 CVPR 2024 Multimodal CoT 论文工程化，性能提升 X%"
架构创新："设计 Visual CoT + GraphRAG 混合架构，实现推理过程可溯源"
深度理解 Dify："基于 Dify Plugin System 和 Workflow Node Factory 扩展，无侵入式集成"
资源优化："通过 ECS-AutoDL 分离部署，GPU 推理服务独立扩展"
可量化成果："在医疗影像诊断场景准确率提升 X%，推理过程可视化率 100%"

潜在难点与避坑指南

难点 1: 视觉模型推理质量不稳定

解决：设计 Self-Consistency 机制，同一问题推理 3 次，投票选择最优路径
避坑：别直接用 GPT-4V（贵且不可控），用开源模型微调

难点 2: 推理图谱存储开销大

解决：只存关键推理步骤（用 LLM 自动摘要），完整推理过程压缩存 S3
避坑：Neo4j 内存占用高，使用 Community Edition + 定期归档

难点 3: ECS-AutoDL 网络延迟

解决：AutoDL 开启公网 IP + Nginx 反向代理，ECS 侧缓存常见推理路径
避坑：别用 AutoDL 内网穿透（不稳定），直接走公网 HTTPS

难点 4: 与 Dify Agent 节点集成复杂

解决：实现为独立的 Custom Node，通过变量池与其他节点通信
避坑：别修改 Agent 节点源码（难维护），用 Workflow 组合实现

🚀 需求二：实时多模态流式 RAG 引擎（Stream-Aware RAG）

为什么"别人没做过"（差异化分析）

创新点：

技术空白：当前 RAG 系统都是"检索 → 生成"的两阶段模式，生成过程中无法动态补充知识
学术前沿：Self-RAG（ICLR 2024）和 FLARE（EMNLP 2023）提出了"生成时检索"，但工程实现极少
Dify 痛点：Dify 的 Knowledge Retrieval 节点在 LLM 节点之前调用，生成过程中遇到知识缺失无法回溯

与常见方案的区别：

❌ 烂大街：简单的 Embedding + 相似度检索
❌ 伪创新：把文档切更细（治标不治本）
✅ 真创新：在 LLM 流式生成过程中，实时监测 token 不确定性，触发增量检索

业务/技术痛点

痛点场景：

客服问答：用户问"你们产品支持 OAuth2.0 的 PKCE 扩展吗？"，初始检索只找到"支持 OAuth2.0"，生成时需补充检索
PKCE 细节
代码助手：生成代码时遇到不常见的 API，需动态检索文档
法律咨询：生成法律意见时，发现需要引用某个判例，实时检索判例库

技术痛点：

Dify 的检索和生成是串行的，无法在生成时动态调整
LLM 流式输出无法被中断和重新规划
知识库检索无法根据生成上下文动态调整查询

核心功能与实现路径

架构设计

┌─────────────────────────────────────────────────────────┐
│ ECS (Dify) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ StreamRAGNode (替代 LLM Node) │ │
│ │ - 监听 LLM token stream │ │
│ │ - 计算 token 不确定性 (logprobs) │ │
│ │ - 触发动态检索 │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ DynamicRetriever (Celery Async Task) │ │
│ │ - 异步执行检索任务 │ │
│ │ - 结果注入回 LLM context │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
↕ WebSocket
┌─────────────────────────────────────────────────────────┐
│ AutoDL (GPU) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Uncertainty Estimator Service │ │
│ │ - 基于 Qwen2.5-7B（支持 logprobs） │ │
│ │ - 计算 token-level perplexity │ │
│ │ - 输出检索触发信号 │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ High-Throughput Embedding Service │ │
│ │ - BGE-M3 (多语言 + Dense/Sparse 混合) │ │
│ │ - Batch 优化（降低延迟） │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

实现路径

Step 1: AutoDL 实现不确定性估计服务

autodl/uncertainty_estimator.py

class UncertaintyEstimator:
def init(self):
self.model = Qwen2ForCausalLM.from_pretrained(...)
self.tokenizer = AutoTokenizer.from_pretrained(...)

  def estimate_uncertainty(self, tokens: List[int], logprobs: List[float]) -> float:
      """计算当前生成的不确定性"""
      # 方法 1: Token-level perplexity
      perplexity = torch.exp(-sum(logprobs) / len(logprobs))

      # 方法 2: Semantic entropy (检查多个候选 token 的语义相似度)
      top_tokens = self.get_top_k_tokens(tokens[-1])
      embeddings = [self.get_token_embedding(t) for t in top_tokens]
      entropy = calculate_semantic_entropy(embeddings)

      return (perplexity + entropy) / 2

  def should_trigger_retrieval(self, uncertainty: float, threshold: float) -> bool:
      return uncertainty > threshold

Step 2: Dify StreamRAG 节点

api/core/workflow/nodes/stream_rag/stream_rag_node.py

class StreamRAGNode(Node[StreamRAGNodeData]):
node_type = "stream_rag"

  def _run(self):
      # 初始检索
      initial_docs = self._retrieve(self.node_data.query)
      context = self._build_context(initial_docs)

      # 启动 LLM 流式生成
      llm_stream = self.model_instance.invoke(
          prompt=context + self.node_data.query,
          stream=True,
          logprobs=True  # 关键：返回 token 概率
      )

      generated_tokens = []
      retrieval_triggered = False

      for chunk in llm_stream:
          token = chunk.delta.content
          logprob = chunk.logprobs[0]

          # 发送给 AutoDL 估计不确定性
          uncertainty = self._estimate_uncertainty(generated_tokens, logprob)

          if uncertainty > self.node_data.uncertainty_threshold and not retrieval_triggered:
              # 触发异步检索
              retrieval_task = dynamic_retrieve.delay(
                  query=self._extract_query_from_context(generated_tokens),
                  dataset_id=self.node_data.dataset_id
              )
              retrieval_triggered = True

          # 如果检索完成，注入新知识
          if retrieval_triggered and retrieval_task.ready():
              new_docs = retrieval_task.result()
              # 重新生成（或继续生成时带上新 context）
              context += self._build_context(new_docs)
              # 这里需要重启 LLM stream（或用 speculative decoding）

          generated_tokens.append(token)
          yield LLMNodeChatModelStreamResponse(delta=Delta(content=token))

Step 3: 异步检索任务

api/tasks/stream_rag_tasks.py

from celery import shared_task

@shared_task(queue='dataset')
def dynamic_retrieve(query: str, dataset_id: str) -> List[Document]:
"""动态检索任务"""
# 调用 AutoDL 的高吞吐 Embedding 服务
embedding = requests.post(
f"{AUTODL_ENDPOINT}/embed",
json={"text": query}
).json()["embedding"]

  # 检索向量库
  vector_service = VectorService()
  docs = vector_service.search(dataset_id, embedding, top_k=3)

  return [{"content": doc.page_content, "metadata": doc.metadata} for doc in docs]

源码修改点：

api/core/workflow/nodes/stream_rag/ - 新增节点
api/tasks/stream_rag_tasks.py - 新增异步任务
api/core/model_runtime/ - 扩展 LLM 调用接口，确保返回 logprobs
前端 web/app/components/workflow/nodes/ - 添加 StreamRAG 节点 UI

简历杀手锏亮点

学术创新落地："实现 ICLR 2024 Self-RAG 论文的工程化，响应时间降低 X%"
系统创新："设计流式 RAG 架构，支持生成时动态检索，知识召回率提升 X%"
异步架构："基于 Celery + WebSocket 实现低延迟动态检索，用户体验无感知"
资源利用："AutoDL GPU 服务化部署不确定性估计模型，ECS 专注业务逻辑"
可量化成果："在客服问答场景，复杂问题准确率提升 X%，检索触发精准率 X%"

潜在难点与避坑指南

难点 1: LLM 流式生成无法中断和重启

解决：使用 Speculative Decoding 技术，预生成多个候选序列，检索完成后选择最优
避坑：别真的中断 LLM（会丢失状态），用"上下文注入"替代重新生成

难点 2: 不确定性估计延迟高

解决：AutoDL 侧用 TensorRT 优化推理，批处理 token（每 5 个 token 估计一次）
避坑：别每个 token 都调 API（延迟炸裂），用滑动窗口

难点 3: 异步检索结果如何注入

解决：设计"软注入"机制，将检索结果作为 system message 追加到上下文末尾
避坑：别强行打断生成（用户体验差），在句子/段落边界注入

难点 4: 如何判断需要检索什么

解决：用轻量级 NER 模型提取当前生成的实体，作为检索 query
避坑：别用复杂的 query 改写（延迟高），直接用实体 + 上下文

🚀 需求三：Agent 执行轨迹的对抗式自我改进系统（Adversarial Self-Refinement）

为什么"别人没做过"（差异化分析）

创新点：

学术前沿：Constitutional AI（Anthropic 2023）和 Self-Refine（NeurIPS 2023）提出 AI
自我批判和改进，但工程实现极少
Dify 空白：Dify 的 Agent 执行失败后只能手动调整 Prompt，没有自动改进机制
工业级痛点：Agent 在生产环境中失败率高（幻觉、工具调用错误），人工调试成本巨大

与常见方案的区别：

❌ 烂大街：Prompt 工程（人工迭代）
❌ 伪创新：简单 Reflection（Agent 自己总结错误，但不改进）
✅ 真创新：对抗式双 Agent 系统，Executor Agent 执行任务，Critic Agent 批判并生成改进建议，自动迭代优化

业务/技术痛点

痛点场景：

自动化运维：Agent 执行脚本失败，需要分析日志、修正脚本、重试
数据处理流水线：Agent 调用 API 失败（参数错误、权限问题），需要自动诊断并修复
复杂决策任务：Agent 规划路径不合理，需要批判并重新规划

技术痛点：

Dify Agent 失败后无法自动恢复，只能重新执行
没有执行轨迹的持久化和分析能力
无法从历史失败中学习

核心功能与实现路径

架构设计

┌─────────────────────────────────────────────────────────┐
│ ECS (Dify) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ SelfRefineAgentNode │ │
│ │ - 包装原 Agent 节点 │ │
│ │ - 捕获执行轨迹 │ │
│ │ - 触发 Critic Agent │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Trajectory Store (PostgreSQL + JSONB) │ │
│ │ - 存储执行轨迹、错误、改进建议 │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Refinement Loop Controller │ │
│ │ - 控制迭代次数（防止无限循环） │ │
│ │ - 判断是否改进成功 │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
↕ HTTP API
┌─────────────────────────────────────────────────────────┐
│ AutoDL (GPU) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Critic Agent Service │ │
│ │ - 基于 Qwen2.5-72B / DeepSeek-V3（强推理） │ │
│ │ - 分析执行轨迹，生成批判报告 │ │
│ │ - 输出改进建议（Prompt/工具参数/规划） │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Trajectory Embedding Service │ │
│ │ - 将执行轨迹编码为向量 │ │
│ │ - 检索相似失败案例 │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

实现路径

Step 1: AutoDL Critic Agent 服务

autodl/critic_agent_service.py

class CriticAgent:
def init(self):
self.model = Qwen2ForCausalLM.from_pretrained("Qwen2.5-72B")
self.critique_prompt = """
You are a critic AI. Analyze the following agent execution trajectory and identify issues:

      Trajectory:
      {trajectory}
      
      Error:
      {error}
      
      Provide:
      1. Root cause analysis
      2. Specific improvement suggestions (prompt modification, tool parameter adjustment, etc.)
      3. Alternative execution plan
      """

  def critique(self, trajectory: Dict, error: str) -> CritiqueReport:
      # 1. 检索相似失败案例（从向量库）
      similar_cases = self._retrieve_similar_failures(trajectory)

      # 2. 构造批判 Prompt
      prompt = self.critique_prompt.format(
          trajectory=json.dumps(trajectory, indent=2),
          error=error,
          similar_cases=similar_cases
      )

      # 3. 生成批判报告
      response = self.model.generate(prompt)

      # 4. 解析改进建议
      improvements = self._parse_improvements(response)

      return CritiqueReport(
          root_cause=improvements["root_cause"],
          suggestions=improvements["suggestions"],
          alternative_plan=improvements["alternative_plan"]
      )

Step 2: Dify SelfRefineAgent 节点

api/core/workflow/nodes/self_refine_agent/self_refine_agent_node.py

class SelfRefineAgentNode(Node[SelfRefineAgentNodeData]):
node_type = "self_refine_agent"

  MAX_REFINEMENT_ITERATIONS = 3

  def _run(self):
      iteration = 0
      current_config = self.node_data.agent_config

      while iteration < self.MAX_REFINEMENT_ITERATIONS:
          try:
              # 执行 Agent
              result = self._execute_agent(current_config)

              # 成功则返回
              yield NodeFinishEvent(outputs={"result": result})
              break

          except AgentExecutionError as e:
              # 捕获执行轨迹
              trajectory = self._capture_trajectory()

              # 存储失败案例
              self._store_failure_case(trajectory, str(e))

              # 调用 Critic Agent
              critique = self._call_critic_agent(trajectory, str(e))

              # 应用改进建议
              current_config = self._apply_improvements(
                  current_config,
                  critique.suggestions
              )

              iteration += 1

              yield AgentRefinementEvent(
                  iteration=iteration,
                  critique=critique.root_cause,
                  improvements=critique.suggestions
              )

      if iteration == self.MAX_REFINEMENT_ITERATIONS:
          yield NodeFinishEvent(
              outputs={"error": "Max refinement iterations reached"}
          )

  def _execute_agent(self, config: Dict) -> Dict:
      """执行原 Agent 节点逻辑"""
      # 复用 Dify 的 AgentNode，但捕获详细轨迹
      agent_node = AgentNode(config)
      return agent

dosubot[bot] Bot Jun 11, 2026

Thanks for sharing Claude's suggestions. I need to give you a frank reality check on all three — because Claude's proposals are designed around assumptions that don't match your situation (no ECS, no AutoDL, no GPU).

Critical Problem With All Three Proposals

Claude's suggestions all share the same fundamental issues:

They require modifying Dify core code — All three propose Custom Workflow Nodes (e.g., VisualCoTNode, StreamRAGNode, SelfRefineAgentNode). As I explained earlier, plugins cannot create custom workflow node types [1]. These would require PRs to Dify core, which are much harder to get merged as an intern without prior contribution history.
They require GPU infrastructure you don't have — All three depend on AutoDL running large models (Qwen2.5-7B/72B, CogVLM2, BGE-M3). You said you don't have and may not be able to get these resources.
They're multi-month projects — Each one involves multiple infrastructure components (Neo4j, WebSocket, Celery tasks, custom frontends). This is not intern-scoped work.

Proposal-by-Proposal Assessment

Proposal	Core Blocker	Realistic?
1. Visual CoT Graph Reasoning	Requires custom node + Neo4j + GPU vision model + frontend UI	❌ Way too large
2. Stream-Aware RAG	Same issues I identified in my first reply — mid-stream interception is architecturally impossible	❌ Not feasible
3. Adversarial Self-Refinement	Most interesting conceptually, but as designed requires custom node + GPU Critic model	⚠️ Concept is good, implementation path is wrong

What's Actually Salvageable

Option 3 (Self-Refinement) has the best core idea and can be adapted to work within plugin constraints:

Realistic Version: "Self-Refining Agent Strategy" Plugin

Instead of a custom workflow node, implement this as an Agent Strategy plugin that:

Runs the agent loop (like ReAct does today)
If the agent fails or produces low-quality output → captures the execution trace
Sends the trace + error to the same LLM (no separate GPU model needed) with a critic prompt
Applies the critique and retries with adjusted parameters
Loops up to N times

Why this works:

✅ No GPU needed — uses whatever LLM the user already configured in Dify
✅ No custom nodes — it's an Agent Strategy plugin (same type as the existing cot_agent) [2]
✅ No core modifications — 100% plugin SDK
✅ Novel — no existing self-refining agent strategy exists in Dify (only ReAct and Function Calling exist today)
✅ Academically grounded — you can cite Self-Refine (NeurIPS 2023) and Constitutional AI
✅ Addresses a real gap — the community has requested agent reasoning transparency and auto-recovery [3]

Resume framing:

"Implemented a Self-Refining Agent Strategy plugin for Dify (100k+ GitHub stars), applying Self-Refine (NeurIPS 2023) concepts to enable automatic failure diagnosis and iterative improvement in LLM agent execution."

What You'd Actually Build

Agent Strategy Plugin: self_refine_agent
├── strategies/
│   └── self_refine/
│       ├── self_refine.yaml          # Strategy config
│       └── self_refine_strategy.py   # Core logic
├── manifest.yaml
└── main.py

The strategy's invoke() generator would:

Execute the task (tool calls, reasoning)
On failure/low-confidence → generate a critique using the same LLM
Apply improvements to the prompt/tool parameters
Retry (up to max iterations)
Stream intermediate critique steps as visible events

My Updated Recommendation

Given your constraints (no cloud, AI career goals, want something novel + mergeable):

Build the Self-Refining Agent Strategy plugin. It's:

The right scope for one person
Genuinely novel in Dify's ecosystem
Academically interesting (citable papers)
Feasible without any cloud resources
A strong CV item if merged

The Datasource plugin (Retrieval Router) remains the safer backup if you want guaranteed acceptance with lower novelty.

Would you like me to dig into the Agent Strategy plugin interface so you can see exactly how to structure the self-refinement logic?

To reply, just mention @dosu.

Share context across your team and agents. Try Dosu.

ferhimedamine · 2026-06-11T21:53:04Z

ferhimedamine
Jun 11, 2026

The hybrid fallback chain is a solid architecture. A few thoughts on making each tier more effective based on production experience with similar patterns.

On the retrieval strategy itself: rather than treating vector-first with BM25 as a fallback chain (try vector, if insufficient fall back to BM25, then keyword, then pure LLM), the real power comes from fusion — running both retrieval paths simultaneously and combining results with weighted scoring. A typical effective split is 0.3 BM25 plus 0.7 vector similarity, followed by cross-encoder reranking on the merged result set. This catches cases where vector search misses exact terminology (product names, error codes, specific numbers) that BM25 handles well, and cases where BM25 misses semantically equivalent but differently-worded content that vector search handles well. Running them in parallel rather than as a fallback chain means you get the benefit of both on every query, not just when the primary fails.

The stream-aware aspect of the proposal — dynamically retrieving as the conversation evolves — is where things get interesting and where memory comes in. As the conversation progresses, the retrieval context should shift to reflect accumulated context, not just the latest user message. We handle this by maintaining a working memory of the evolving conversation context and re-ranking retrieved results against this accumulated context rather than just the initial query. Concretely: after each turn, update a running summary of what the conversation is about (topics, entities mentioned, constraints established), and use that summary as an additional signal when scoring retrieval results. A document that was irrelevant to the first query might become highly relevant by turn three once the user has narrowed the topic.

One practical consideration for the confidence-based retrieval trigger (your V2 target): perplexity/entropy thresholds need calibration per domain and per model. A model generating text about a familiar topic will have low perplexity regardless of whether it is hallucinating about specifics. In practice, we found that tracking retrieval hit rate (did the retrieved content actually get used in the response) is a better feedback signal than generation-time perplexity for deciding when retrieval adds value.

4 replies

KERVIN-FARMER Jun 12, 2026
Author

Thank you for your review! Your idea of 30% word search + 70% vector similarity + Jina reranker and parent page retrival are pretty riveting, and I agree with your point that recall rate is more important than perplexity and entropy.

Though, how would you comment on my proposal, or would you suggest that the newest proposal made by @dosu the most viable and worthwhile task that I can make contribution to?

Thank you

dosubot[bot] Bot Jun 12, 2026

Great question — and @ferhimedamine's feedback is genuinely valuable production wisdom. Let me synthesize everything.

Where Things Stand

Your original Stream-Aware RAG proposal has strong ideas but faces hard architectural blockers in Dify's plugin SDK (no mid-stream interception, no logprobs access, no custom nodes). @ferhimedamine's suggestions (fusion retrieval, working memory, hit-rate feedback) are excellent retrieval engineering insights, but they don't resolve the fundamental problem: Dify plugins can't intercept token streams or create new workflow node types [1].

My Honest Recommendation

Yes, the Self-Refining Agent Strategy plugin remains your best path given your constraints (no GPU, no cloud, intern-level scope, want something novel + mergeable). Here's why it still wins:

Criteria	Stream-Aware RAG	Fusion Retrieval Tool	Self-Refining Agent Strategy
Feasible without GPU?	❌	✅	✅
Works within plugin SDK?	❌ (needs custom node)	✅ (Tool plugin)	✅ (Agent Strategy plugin)
Novel in Dify ecosystem?	N/A (can't build it)	⚠️ Moderate (retrieval tools exist)	✅ High (nothing like it exists)
Academic citation?	Self-RAG (can't implement)	Hybrid retrieval (well-known)	Self-Refine NeurIPS 2023 ✅
CV differentiation?	N/A	Medium	High

How to Incorporate @ferhimedamine's Ideas

The good news: you don't have to choose either the retrieval improvements or the self-refinement. You can combine them:

Primary deliverable: Self-Refining Agent Strategy plugin (the novel contribution)
Supporting tool: A fusion retrieval Tool plugin (BM25 + vector + reranker, as @ferhimedamine described) that your self-refining agent can invoke during its retry loops

This way, when the agent fails on a knowledge-intensive task, the self-refinement loop can:

Diagnose "retrieval was insufficient" as the root cause
Retry with a refined query to your fusion retrieval tool
Use @ferhimedamine's working memory concept (accumulated conversation context) to improve the retry query

That's a compelling end-to-end story: "Self-refining agent with adaptive retrieval" — academically grounded, practically useful, and entirely buildable within the plugin SDK.

Next Steps

If you're ready to move forward with this direction:

Start by studying the existing cot_agent plugin (ReAct + Function Calling implementations) [2] [3]
Set up local Dify + remote plugin debugging
Build the self-refinement loop as an Agent Strategy
Optionally add the fusion retrieval Tool as a companion plugin

Would you like me to dig into the Agent Strategy interface details so you can start building?

To reply, just mention @dosu.

Share context across your team and agents. Try Dosu.

KERVIN-FARMER Jun 12, 2026
Author

@dosu I would choose the most viable and the most worthwhile direction, and also need to ensure the work can be merged. Please list me all the parts and works I need to deal with.

Also, would you write me skills, hooks or prompts to do that project? I can forward your writings to my agent to check, update and code.

Thank you

dosubot[bot] Bot Jun 12, 2026

Here's your complete project plan and a detailed prompt/skill you can hand to your coding agent.

Project: Self-Refining Agent Strategy Plugin for Dify

Why this is the most viable + mergeable direction:

Only 2 agent strategies exist today (ReAct + Function Calling) — there's clear room for a third [1]
Agent Strategy is a supported plugin type with a well-defined interface [2]
No core code modifications needed
No GPU/cloud required — uses whatever LLM the user configures
Academically grounded (Self-Refine, NeurIPS 2023)
Community has asked for agent reasoning transparency and auto-recovery [3]

All Parts & Work Items

Phase 1: Setup (Day 1)

Install Dify locally via Docker Compose
Clone dify-official-plugins repo and study cot_agent/ structure
Set up remote plugin debugging (.env with INSTALL_METHOD=remote) [4]
Verify you can run the existing cot_agent plugin locally

Phase 2: Plugin Scaffold (Day 2)

Create file structure:

self_refine_agent/
├── manifest.yaml
├── main.py
├── pyproject.toml
├── requirements.txt
├── README.md
├── _assets/
│   └── icon.svg
├── provider/
│   ├── self_refine.yaml
│   └── self_refine.py
├── strategies/
│   ├── self_refine.yaml
│   └── self_refine.py
└── prompt/
    └── templates.py

Phase 3: Core Implementation (Days 3–5)

strategies/self_refine.yaml — Strategy config with parameters
strategies/self_refine.py — Main _invoke() logic:
- Initial execution (tool calls + reasoning)
- Quality/success evaluation
- Critique generation (using same LLM)
- Improvement application + retry
- Streaming intermediate steps to user
prompt/templates.py — Critic and refinement prompt templates

Phase 4: Testing (Days 6–7)

Test with various LLMs (GPT-4o, Claude, Qwen via API)
Test tool invocation + failure recovery
Test max iteration enforcement
Test streaming UX (user sees critique + retry steps)

Phase 5: Polish & Submit (Days 8–9)

Write README.md with usage examples
Add multilingual labels (en_US + zh_Hans)
Create icon
Submit PR to dify-official-plugins

Prompt/Skill for Your Coding Agent

Here's a complete prompt you can forward:

# Self-Refining Agent Strategy Plugin for Dify

## Context
You are building a Dify Agent Strategy plugin called "self_refine_agent". This is a plugin for the Dify open-source LLM platform (https://github.com/langgenius/dify).

Agent Strategy plugins extend `BaseAgentStrategy` and implement a `_invoke()` generator method that yields `AgentInvokeMessage` objects. The plugin runs within the Dify Plugin Daemon and communicates via the `self.session` API.

## Reference Architecture
Study the existing `cot_agent` plugin at: https://github.com/langgenius/dify-official-plugins
- File: `strategies/ReAct.py` — ReAct loop implementation
- File: `strategies/function_calling.py` — Function Calling implementation
- File: `provider/agent.yaml` — Provider configuration
- File: `manifest.yaml` — Plugin manifest

## What to Build

### Strategy Logic: Self-Refining Agent

The strategy implements this loop:
1. **Execute**: Run agent task (invoke LLM + tools, like ReAct or Function Calling)
2. **Evaluate**: Check if the output is satisfactory (use LLM as judge)
3. **Critique**: If not satisfactory, generate a critique identifying specific issues
4. **Refine**: Re-execute with the critique injected as additional context
5. **Repeat** until quality threshold met or max refinements reached
6. **Stream** all intermediate steps (execution, critique, refinement) as visible log messages

### Key Interfaces Available

```python
# Base class to extend
from dify_plugin.interfaces.agent import AgentStrategy

class SelfRefineStrategy(AgentStrategy):
    def _invoke(self, params, user_id, conversation_id, app_id, message_id, credentials):
        # params contains: model, tools, instruction, query, context,
        #                   maximum_iterations, max_refinements, refinement_threshold
        ...
        yield self.create_text_message("final output")
        yield self.create_log_message(message="...", status=LogStatus.START)
        yield self.create_json_message({...})

# LLM invocation
chunks = self.session.model.llm.invoke(
    model_config=LLMModelConfig(**model_params),
    prompt_messages=[
        SystemPromptMessage(content="system prompt"),
        UserPromptMessage(content="user message"),
        AssistantPromptMessage(content="assistant response"),
    ],
    stream=True,
    stop=["Observation:", "Observation:\n"]
)
# Returns Generator[LLMResultChunk] when stream=True

# Tool invocation
tool_result = self.session.tool.invoke(
    provider=tool_provider,
    tool_name=tool_name,
    tool_parameters={"param": "value"}
)
# Returns list of ToolInvokeMessage objects

Strategy YAML Configuration

Create strategies/self_refine.yaml:

identity:
  name: self_refine
  author: <your-github-username>
  label:
    en_US: Self-Refine
    zh_Hans: 自我优化
description:
  en_US: "Self-refining agent strategy that automatically critiques and improves its output through iterative refinement. Based on Self-Refine (NeurIPS 2023)."
  zh_Hans: "自我优化 Agent 策略，通过迭代式自我批判和改进自动提升输出质量。基于 Self-Refine (NeurIPS 2023)。"
parameters:
  - name: model
    type: model-selector
    scope: tool-call&llm
    required: true
    label:
      en_US: Model
      zh_Hans: 模型
  - name: tools
    type: array[tools]
    required: true
    label:
      en_US: Tool list
      zh_Hans: 工具列表
  - name: instruction
    type: string
    required: true
    label:
      en_US: Instruction
      zh_Hans: 指令
    auto_generate:
      type: prompt_instruction
    template:
      enabled: true
  - name: query
    type: string
    required: true
    label:
      en_US: Query
      zh_Hans: 查询
  - name: context
    type: any
    scope: array[object]
    required: false
    label:
      en_US: Context
      zh_Hans: 上下文
  - name: maximum_iterations
    type: number
    required: true
    default: 5
    min: 1
    max: 500
    label:
      en_US: Maximum Iterations
      zh_Hans: 最大迭代次数
  - name: max_refinements
    type: number
    required: true
    default: 2
    min: 0
    max: 5
    label:
      en_US: Maximum Refinements
      zh_Hans: 最大优化次数
extra:
  python:
    source: strategies/self_refine.py

Core Implementation Pattern

# strategies/self_refine.py
def _invoke(self, params, user_id, conversation_id, app_id, message_id, credentials):
    model = params["model"]
    tools = params["tools"]
    query = params["query"]
    instruction = params["instruction"]
    max_iterations = params["maximum_iterations"]
    max_refinements = params["max_refinements"]
    
    refinement_count = 0
    previous_critique = None
    
    while refinement_count <= max_refinements:
        # === EXECUTION PHASE ===
        yield self.create_log_message(
            message=f"Attempt {refinement_count + 1}: Executing agent...",
            status=LogStatus.START
        )
        
        # Build prompt (inject critique from previous iteration if available)
        system_prompt = instruction
        if previous_critique:
            system_prompt += f"\n\n[SELF-REFINEMENT CONTEXT]\nPrevious attempt had these issues:\n{previous_critique}\nPlease address them in this attempt."
        
        # Run agent loop (ReAct or Function Calling style)
        agent_output = self._execute_agent_loop(
            model, tools, system_prompt, query, max_iterations
        )
        
        yield self.create_log_message(
            message=f"Attempt {refinement_count + 1}: Complete",
            status=LogStatus.FINISH
        )
        
        # === EVALUATION PHASE ===
        if refinement_count >= max_refinements:
            # No more refinements allowed, return current output
            break
        
        yield self.create_log_message(
            message="Evaluating output quality...",
            status=LogStatus.START
        )
        
        critique = self._evaluate_and_critique(model, query, agent_output)
        
        if critique["is_satisfactory"]:
            yield self.create_log_message(
                message="Output quality: PASS",
                status=LogStatus.FINISH
            )
            break
        else:
            yield self.create_log_message(
                message=f"Output quality: NEEDS IMPROVEMENT\nIssues: {critique['issues']}",
                status=LogStatus.FINISH
            )
            previous_critique = critique["issues"]
            refinement_count += 1
    
    # === FINAL OUTPUT ===
    yield self.create_text_message(agent_output)
    yield self.create_json_message({"refinement_count": refinement_count})

def _evaluate_and_critique(self, model, query, output):
    """Use LLM as judge to evaluate output quality"""
    eval_prompt = f"""Evaluate this agent output for the given query.
    
Query: {query}
Output: {output}

Respond in JSON:
{{"is_satisfactory": true/false, "issues": "description of issues or empty string"}}
"""
    result = self.session.model.llm.invoke(
        model_config=LLMModelConfig(**model),
        prompt_messages=[UserPromptMessage(content=eval_prompt)],
        stream=False,
        stop=[]
    )
    # Parse JSON from result
    return parse_evaluation(result.message.content)

Manifest.yaml

version: 0.0.1
type: plugin
author: "<your-github-username>"
name: "self_refine_agent"
label:
  en_US: "Self-Refine Agent Strategy"
  zh_Hans: "自我优化 Agent 策略"
created_at: "2025-06-12T00:00:00.000000000Z"
icon: icon.svg
description:
  en_US: "Agent strategy with automatic self-critique and iterative refinement. Based on Self-Refine (NeurIPS 2023)."
  zh_Hans: "支持自动自我批判和迭代优化的 Agent 策略。基于 Self-Refine (NeurIPS 2023)。"
tags:
  - "agent"
resource:
  memory: 1048576
  permission:
    tool:
      enabled: true
    model:
      enabled: true
      llm: true
plugins:
  agent_strategies:
    - "provider/self_refine.yaml"
meta:
  version: 0.0.2
  minimum_dify_version: "1.7.0"
  arch:
    - "amd64"
    - "arm64"
  runner:
    language: "python"
    version: "3.12"
    entrypoint: "main"

Implementation Requirements

Must extend AgentStrategy (from dify_plugin.interfaces.agent)
Must yield AgentInvokeMessage objects (text, log, json, retriever_resource)
Must enforce max_refinements to prevent infinite loops
Must stream intermediate steps (critique, refinement status) via log messages
Must handle tool invocation failures gracefully
Must support both streaming and non-streaming LLM responses
Must include multilingual labels (en_US + zh_Hans minimum)

Important Notes

NO trailing commas after yield statements
Use Python logging module, not print()
Test with remote debugging: set INSTALL_METHOD=remote in .env
The plugin process has no hot-reload — restart after code changes
Set author to your GitHub username (not "langgenius")


---

This prompt gives your coding agent everything it needs to scaffold and implement the plugin. The key files to produce are:
1. `manifest.yaml`
2. `provider/self_refine.yaml` + `provider/self_refine.py`
3. `strategies/self_refine.yaml` + `strategies/self_refine.py`
4. `prompt/templates.py`
5. `main.py`

Good luck! Let me know if you need me to dig into any specific part further.

To reply, just mention @dosu.

Docs are dead. Just use Dosu.

[Proposal] Stream-Aware RAG Plugin: Dynamic retrieval with hybrid fallback #37320

Uh oh!

KERVIN-FARMER Jun 11, 2026

Self Checks

Content

Replies: 4 comments · 12 replies

Uh oh!

KERVIN-FARMER Jun 11, 2026 Author

Uh oh!

KERVIN-FARMER Jun 11, 2026 Author

Uh oh!

dosubot[bot] Bot Jun 11, 2026

Viability Summary

What Works Today (No Core Mods)

What Does NOT Work

Suggestions

Revised Architecture Recommendation

Uh oh!

KERVIN-FARMER Jun 11, 2026 Author

Uh oh!

dosubot[bot] Bot Jun 11, 2026

Updated Assessment: Option 2 vs Option 3

Option 2: Agent Evaluation Framework

Option 3: Intelligent Retrieval Router (Datasource Plugin)

Concrete Plan for Option 3 (No Cloud Required)

Step-by-Step Roadmap

Key Tips

Resume Framing

Alternative Low-Cost Option: Agent Strategy Plugin

Uh oh!

KERVIN-FARMER Jun 11, 2026 Author

Uh oh!

dosubot[bot] Bot Jun 11, 2026

Agent Strategy Plugin: Honest Risk Assessment

What Actually Looks Good on a CV

My Revised Recommendation

Uh oh!

KERVIN-FARMER Jun 11, 2026 Author

autodl/visual_cot_service.py

api/core/workflow/nodes/visual_cot/visual_cot_node.py

api/packages/dify-vdb-graphrag/src/dify_vdb_graphrag/vector.py

autodl/uncertainty_estimator.py

api/core/workflow/nodes/stream_rag/stream_rag_node.py

api/tasks/stream_rag_tasks.py

autodl/critic_agent_service.py

api/core/workflow/nodes/self_refine_agent/self_refine_agent_node.py

Uh oh!

dosubot[bot] Bot Jun 11, 2026

Critical Problem With All Three Proposals

Proposal-by-Proposal Assessment

What's Actually Salvageable

Realistic Version: "Self-Refining Agent Strategy" Plugin

What You'd Actually Build

My Updated Recommendation

Uh oh!

ferhimedamine Jun 11, 2026

Uh oh!

KERVIN-FARMER Jun 12, 2026 Author

Uh oh!

dosubot[bot] Bot Jun 12, 2026

Where Things Stand

My Honest Recommendation

How to Incorporate @ferhimedamine's Ideas

Next Steps

Uh oh!

KERVIN-FARMER Jun 12, 2026 Author

Uh oh!

dosubot[bot] Bot Jun 12, 2026

Project: Self-Refining Agent Strategy Plugin for Dify

All Parts & Work Items

Phase 1: Setup (Day 1)

Phase 2: Plugin Scaffold (Day 2)

Phase 3: Core Implementation (Days 3–5)

Phase 4: Testing (Days 6–7)

Phase 5: Polish & Submit (Days 8–9)

Prompt/Skill for Your Coding Agent

Strategy YAML Configuration

Core Implementation Pattern

Manifest.yaml

Implementation Requirements

KERVIN-FARMER
Jun 11, 2026

Replies: 4 comments 12 replies

KERVIN-FARMER
Jun 11, 2026
Author

KERVIN-FARMER
Jun 11, 2026
Author

KERVIN-FARMER
Jun 11, 2026
Author

KERVIN-FARMER Jun 11, 2026
Author

KERVIN-FARMER Jun 11, 2026
Author

ferhimedamine
Jun 11, 2026

KERVIN-FARMER Jun 12, 2026
Author

KERVIN-FARMER Jun 12, 2026
Author