Skip to content

[Feature Request] Add a small "FreshStack RAG evaluation failure modes" checklist doc (docs only) #9

@onestardao

Description

@onestardao

Hi FreshStack team,

Thank you for releasing FreshStack. A benchmark and framework for RAG over technical documentation is extremely useful for both research and industry.

I have been working on 16-mode failure maps for RAG systems and recently contributed a robustness-related entry to Harvard MIMS Lab’s ToolUniverse. In FreshStack-style settings, I often see repeated issues:

  • retrieval that focuses on popular pages rather than the correct ones
  • confusion between similar APIs or versions in the documentation
  • answer evaluation that does not fully reflect grounding in the retrieved docs
  • experiments that are hard to reproduce because configuration details are not recorded

I would like to propose a small, documentation-only evaluation checklist for FreshStack users.

Proposed feature

Add a short markdown page under the repo, for example:

freshstack_rag_evaluation_failure_modes_and_checklist.md

The page could:

  1. List typical RAG failure modes specific to technical docs (API confusion, versioning, incomplete snippets).
  2. For each, describe:
    • symptoms in FreshStack evaluations
    • likely causes (retrieval settings, corpus preparation, query formulation).
  3. Provide a short checklist for running and reporting FreshStack experiments:
    • corpus version, retrieval configuration, model, and key evaluation settings.

Motivation

  • FreshStack is likely to become a standard reference for technical-doc RAG.
  • A small failure-mode and reporting checklist would help ensure that evaluations are interpretable and comparable across systems.
  • This is a docs-only change and should be straightforward to maintain.

If this is aligned with your goals for FreshStack, I would be glad to propose a concise initial draft in a PR.

Thank you for considering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions