Skip to content

synapt-dev/eval

Repository files navigation

@synapt/eval

PyPI Python License CI

Domain-agnostic eval framework for AI applications. Measure retrieval quality, generation accuracy, and policy compliance across any vertical.

Install

pip install synapt-eval

Or install from source:

pip install git+https://github.com/synapt-dev/eval.git

Quick Start

import asyncio
from synapt_eval import Fixture, EvalResult, CategoryMetrics
from synapt_eval.adapters import RetrievalAdapter, RetrievalCandidate
from synapt_eval.scoring import precision_at_k, recall_at_k
from synapt_eval.report_card import compose_report_card, generate_markdown

class MyRetrieval(RetrievalAdapter):
    async def retrieve(self, query: str, k: int = 10) -> list[RetrievalCandidate]:
        # Connect your vector store here
        return [RetrievalCandidate(id="doc1", score=0.95)]

# Run eval and generate report
results = [EvalResult(
    category="retrieval",
    metrics=CategoryMetrics(p_at_5=0.85, r_at_10=0.72, n=50),
)]
card = compose_report_card(results, run_id="my-first-eval")
print(generate_markdown(card))

See docs/quickstart.md for a complete walkthrough and examples/ for runnable code.

Architecture

synapt-eval separates the eval framework (scoring, review, reporting) from domain-specific adapters (your retrieval backend, your generation pipeline, your fixtures).

Layer              Module                      Purpose
-------            ------                      -------
Types              synapt_eval.types           Core data types (Fixture, EvalResult, CategoryMetrics)
Scoring            synapt_eval.scoring         Precision@K, Recall@K, Kendall's Tau
Adapters           synapt_eval.adapters        Customer-facing ABCs (Retrieval, Generation, Judge, Fixture)
Runner             synapt_eval.runner          Eval execution, orchestration, PR gate
Reviewer           synapt_eval.reviewer        Verdict framework, predicate chains, LLM judge bridge
Suggestion Engine  synapt_eval.suggestion_engine  Rule-based actionable recommendations
Report Card        synapt_eval.report_card     Markdown + JSON report generation
Trending           synapt_eval.trending        Self-hosted JSON history store + delta computation
CLI                synapt_eval.cli             Command-line viewer (synapt-eval trending)
Actions            synapt_eval.actions         GitHub Actions PR-gate adapter

Features

Feature Description
Scoring primitives Precision@K, Recall@K, Kendall's Tau rank correlation
Adapter pattern Plug in any retrieval/generation backend via ABCs
Reviewer SDK Composable predicate chains + LLM judge integration
Suggestion engine 10 baseline rules with decorator pattern for custom rules
Report card Markdown + JSON output with schema versioning
PR gate Regression detection with configurable thresholds
Trending Self-hosted history store with CLI viewer
GitHub Action uses: synapt-dev/eval@v0.1.0 for CI integration

GitHub Action

Add eval gating to your PR workflow:

- name: Run eval
  run: python my_eval_script.py --output results.json

- name: PR Gate
  uses: synapt-dev/eval@v0.1.0
  with:
    results-path: results.json
    baseline-path: baseline.json
    threshold: "0.05"
    fail-on: error

The action posts a report card comment on the PR and fails the workflow on regressions. See docs/pr-gate.md for full configuration.

CLI

# View eval trending history
synapt-eval trending --path .synapt-eval/history --format text

# Output as markdown or JSON
synapt-eval trending --format markdown
synapt-eval trending --format json --limit 5

Documentation

Guide Description
Quickstart End-to-end retrieval eval in 60 lines
Adapter API Writing custom adapters
Reviewer Framework Custom reviewers + judge integration
PR Gate GitHub Actions CI integration
Suggestions Writing custom suggestion rules
Trending Self-hosted trending CLI

Examples

Runnable examples in examples/:

Pro Tier

Want vertical-specific eval packs, a hosted dashboard, or SOC2 attestations? Visit synapt.dev for synapt-eval Pro.

License

MIT

About

Eval framework for AI quality discipline. 3-loop discipline (PR-gate, sprint, quarter), reviewer-framework SDK, suggestion engine. Library + GitHub Actions adapter.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors