Skip to content

Proposal: CI-friendly RAG checks using WFGY 16-problem ProblemMap #1497

@onestardao

Description

@onestardao

Hi, and thanks for CML – CI-native ML checks are exactly what a lot of teams need.

I maintain an open-source project called WFGY (MIT-licensed, ~1.5k GitHub stars). A key part of it is a 16-problem “ProblemMap” that lists common RAG / LLM failure modes such as retriever drift, vector store fragmentation, bad chunking strategies, and unstable prompts:

This ProblemMap has been integrated or referenced in:

  • ToolUniverse by Harvard MIMS Lab
  • Multimodal RAG Survey by QCRI LLM Lab

They use it mainly as a practical failure taxonomy.

Proposal

Given that CML is already about automated checks in CI, I would like to propose an example that:

  1. Runs a small RAG evaluation job in CI using CML
  2. Classifies failures according to the 16 ProblemMap categories
  3. Fails the CI job or surfaces warnings when specific failure-mode thresholds are exceeded

The idea is to stay fully within the CML + GitHub/GitLab workflow, and just use WFGY ProblemMap as an external, MIT-licensed checklist for naming the categories.

Question

Would you be open to:

  • A docs example or template repo showing “RAG CI checks with CML + WFGY ProblemMap”, and
  • A short reference link to the ProblemMap as the failure-mode taxonomy?

If yes, I can propose a minimal example and iterate based on your feedback.

If this is not what you want to cover with CML, no problem – thank you for considering it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions