-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem? Please describe.
Many Pathway users build RAG systems, but when the answers look wrong it is hard to tell why they failed. It is not obvious whether the root cause is retrieval collapse, chunk drift, embedding mismatch, or something in the orchestration logic. People often end up debugging by trial and error on raw logs instead of following a structured checklist.
Describe the solution you'd like
I would like to contribute (or help design) a small example that uses Pathway to stream RAG diagnostics using the WFGY 16-problem map:
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md
The example could show a simple text RAG pipeline and log, for each query, which WFGY problem codes are suspected (for example No.1 hallucination and chunk drift, No.2 retrieval interpretation collapse, No.5 semantic vs embedding mismatch, and so on). The implementation can stay inside Pathway primitives, with a light layer of heuristics on top of existing logs and metrics.
Describe alternatives you've considered
Right now I run WFGY diagnostics separately in notebooks and external evaluation tools, then manually correlate those results with Pathway logs. This works for my own experiments, but it is not reproducible for other users and it does not show up as an official example in the docs.
Additional context
WFGY and the 16-problem map are MIT licensed and already referenced by several research groups, for example:
- Harvard MIMS Lab ToolUniverse
- University of Innsbruck Data Science Group (Rankify RAG toolkit)
- QCRI LLM Lab Multimodal RAG Survey
If you feel this fits Pathway, I am happy to adapt the taxonomy and open a PR as a self contained example or small docs page, without adding any heavy dependencies.