Hi, and thanks for CML – CI-native ML checks are exactly what a lot of teams need.
I maintain an open-source project called WFGY (MIT-licensed, ~1.5k GitHub stars). A key part of it is a 16-problem “ProblemMap” that lists common RAG / LLM failure modes such as retriever drift, vector store fragmentation, bad chunking strategies, and unstable prompts:
This ProblemMap has been integrated or referenced in:
- ToolUniverse by Harvard MIMS Lab
- Multimodal RAG Survey by QCRI LLM Lab
They use it mainly as a practical failure taxonomy.
Proposal
Given that CML is already about automated checks in CI, I would like to propose an example that:
- Runs a small RAG evaluation job in CI using CML
- Classifies failures according to the 16 ProblemMap categories
- Fails the CI job or surfaces warnings when specific failure-mode thresholds are exceeded
The idea is to stay fully within the CML + GitHub/GitLab workflow, and just use WFGY ProblemMap as an external, MIT-licensed checklist for naming the categories.
Question
Would you be open to:
- A docs example or template repo showing “RAG CI checks with CML + WFGY ProblemMap”, and
- A short reference link to the ProblemMap as the failure-mode taxonomy?
If yes, I can propose a minimal example and iterate based on your feedback.
If this is not what you want to cover with CML, no problem – thank you for considering it.
Hi, and thanks for CML – CI-native ML checks are exactly what a lot of teams need.
I maintain an open-source project called WFGY (MIT-licensed, ~1.5k GitHub stars). A key part of it is a 16-problem “ProblemMap” that lists common RAG / LLM failure modes such as retriever drift, vector store fragmentation, bad chunking strategies, and unstable prompts:
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md
This ProblemMap has been integrated or referenced in:
They use it mainly as a practical failure taxonomy.
Proposal
Given that CML is already about automated checks in CI, I would like to propose an example that:
The idea is to stay fully within the CML + GitHub/GitLab workflow, and just use WFGY ProblemMap as an external, MIT-licensed checklist for naming the categories.
Question
Would you be open to:
If yes, I can propose a minimal example and iterate based on your feedback.
If this is not what you want to cover with CML, no problem – thank you for considering it.