Skip to content

Releases: intertwine/dspy-agent-skills

v0.2.3

25 May 05:29
f2f7055

Choose a tag to compare

DSPy 3.2.1 refresh

  • Retargeted install and maintainer validation guidance from exact DSPy 3.2.0 to current 3.2.1, while keeping committed example artifacts labeled by the DSPy version that produced them.
  • Added scripts/check_dspy_surface.py to validate the live DSPy API surface taught by the skills (GEPA, BetterTogether, Evaluate, LM, SIMBA, Embedder, configure_cache, and current primitives).
  • Updated GEPA guidance for current upstream best practices: train-heavy GEPA splits, GPT-5-class reflection model shape, literal-dict metric mismatch, supported component_selector strings, and when to try dspy.SIMBA.
  • Tightened the evaluation-harness reference around DSPy 3.2.1 semantics: GEPA-compatible five-argument metric signatures and aggregation-safe metric return shapes.
  • Added production cache guidance for dspy.configure_cache(restrict_pickle=True), project-local DSPY_CACHEDIR, and provider-side prompt caching.

Validation

  • uv run --with pytest python -m pytest tests/ -v -> 114 passed
  • env -u UV_EXCLUDE_NEWER uv run --with dspy==3.2.1 python scripts/check_dspy_surface.py -> passed
  • for f in skills/*/example_*.py; do env -u UV_EXCLUDE_NEWER uv run --with dspy==3.2.1 python "$f" --dry-run; done -> passed
  • git diff --check HEAD -> passed

Review

PR: #8

The release branch passed a read-only adversarial subagent review. Initial blockers were fixed and the targeted rereview returned: No blocking findings. Approved for release.

v0.2.2

25 May 05:06

Choose a tag to compare

Test suite hardening

  • Extended regression guards (.overall_score, dict metrics, stale RLM defaults, stale BetterTogether API) to cover articles/**/*.md in addition to skills/ and docs/. Uses recursive glob for parity with the skills/** pattern.
  • Added version consistency test: asserts plugin.json, marketplace.json, and README.md all carry the same version string. Regex is anchored to the ## Version heading to avoid false matches on changelog or prose mentions.
  • Added reference.md presence test: every skill directory must ship a reference.md for progressive disclosure.
  • Moved _ANTIPATTERN_MARKERS and _is_antipattern_context() above Rule 1 so both Rule 1 (.overall_score) and Rule 2 (dict metrics) share the same anti-pattern context check. Added "enforces" marker to allow meta-references that describe prohibitions. Dropped the overly broad "no " marker — "enforces" alone covers the article line that triggered it.
  • Test count: 87 → 105.

Example artifacts

  • Re-ran examples/01-rag-qa as a clean DSPy 3.1.3 vs 3.2.0 comparison on the same model pair; the current clean DSPy 3.2.0 result is 80.47 -> 100.00.
  • Kept examples/03-invoice-extraction on its historical DSPy 3.1.3 artifact after a clean probe: the 3.1.3 GEPA run was stopped before completion after finding a 0.944 candidate, and the 3.2.0 baseline on the same model pair already reached 0.944.
  • Updated README, examples index, and per-example version_comparison.{md,json} files so the published docs describe the clean comparison path and no longer depend on .venv-dspy313 / .venv-dspy320 state.

New content

  • Created skills/dspy-advanced-workflow/reference.md — the only skill that was missing one. Covers step-by-step failure modes, auto level selection, plateau debugging, export format tradeoffs, BetterTogether chaining, and sub-skill cross-references.
  • GEPA constructor snippet marked as a subset with pointer to the full surface in dspy-gepa-optimizer/reference.md.
  • reflection_minibatch_size guidance annotated with symptom context (plateau vs. oscillation) to avoid contradicting the advice in the GEPA optimizer reference.

Installer

  • Added --verify flag to scripts/install.sh: validates each expected skill exists at the destination, checks symlink targets or directory presence, and reports pass/fail per skill.
  • Updated docs/installation.md verification section to reference --verify.

Validation

  • uv run --with pytest python -m pytest tests/ -v -> 108 passed
  • Live reruns/probes:
    • examples/01-rag-qa -> 80.47 -> 100.00 with openrouter/mistralai/ministral-3b-2512
    • examples/03-invoice-extraction -> clean probe recorded 0.944 baseline under DSPy 3.2.0; historical artifact retained

v0.2.1

28 Apr 12:11

Choose a tag to compare

Patch release for Vercel skills CLI compatibility.

  • Documented the supported install path: npx skills add intertwine/dspy-agent-skills.
  • Clarified that bare npx skills add dspy-agent-skills currently requires an upstream CLI alias and is not repo-resolvable by itself.
  • Fixed dspy-evaluation-harness frontmatter so strict YAML parsers discover all five skills.
  • Added a regression guard for inline YAML frontmatter values containing : .

Validation:

  • uv run --with pytest python -m pytest tests/ -v -> 87 passed.
  • npx --yes skills add . --list -> Found 5 skills.
  • Temp Codex install wrote all five SKILL.md files under .agents/skills.

v0.2.0

21 Apr 21:02
7261609

Choose a tag to compare

DSPy 3.2.x refresh for the skill pack. This release candidate moves the skills, references, manifests, and regression guards from DSPy 3.1.x assumptions to the real DSPy 3.2.0 surface, while adding a concrete example for the biggest new optimizer-facing capability.

Highlights

  • Retargeted the repo from DSPy 3.1.x / 3.1.3 to DSPy 3.2.x / 3.2.0 across README, skill docs, manifests, and maintainer guidance.
  • Added skills/dspy-gepa-optimizer/example_bettertogether.py, a dry-run-capable example of DSPy 3.2.0's generalized dspy.BetterTogether(metric=..., bootstrap=..., gepa=...) API.
  • Updated dspy-fundamentals to document 3.2.x type-mismatch warnings, warn_on_type_mismatch=False, and the new dspy.BaseLM capability/ContextWindowExceededError guidance for custom backends.
  • Updated dspy-rlm-module for DSPy 3.2.0's max_output_chars=10_000 default and kwargs-only tool dispatch.
  • Updated dspy-gepa-optimizer to explain the new BetterTogether chaining model while keeping plain GEPA as the default recommendation.
  • Added a regression guard against stale BetterTogether constructor guidance and flipped the RLM default guard to the 3.2.0 value.
  • Refreshed examples/01-rag-qa and examples/02-math-reasoning with clean DSPy 3.2.0 live reruns, and added per-example version_comparison.{md,json} files to make the old-vs-new story explicit.
  • Kept examples/03-invoice-extraction on its historical DSPy 3.1.3 artifact, with the 3.2.0 probe sweep documented instead of forcing a misleading saturated or unstable rerun.
  • Validated the install path end to end, including scripts/install.sh --dry-run, a temp-HOME install, and new guidance for UV_EXCLUDE_NEWER when uv hides DSPy 3.2.0.

Validation

  • uv run --with pytest python -m pytest tests/ -v → full suite passed
  • All 6 skill examples executed via --dry-run under DSPy 3.2.0
  • All 3 end-to-end examples executed via --dry-run under DSPy 3.2.0
  • Live reruns under DSPy 3.2.0:
    • examples/01-rag-qa75.77 -> 100.00 with openrouter/mistralai/ministral-3b-2512
    • examples/02-math-reasoning85.00 -> 93.33 with openrouter/mistralai/ministral-3b-2512
    • examples/03-invoice-extraction → probe sweep recorded saturation or instability; historical artifact retained
  • scripts/install.sh --dry-run and a temp-HOME install both matched the documented dual-target install flow
  • During release prep, local uv run --with dspy still resolved DSPy 3.1.3 on this machine, so the 3.2.0 smoke tests were run in an isolated environment installed from the official 3.2.0 wheel.

0.1.0

20 Apr 01:09

Choose a tag to compare

First public release of dspy-agent-skills — a spec-compliant pack of 5 agent skills + 3 validated end-to-end examples that teach Claude Code and Codex CLI to build, optimize, and ship DSPy 3.1.x programs.

Skills

Skill Purpose
dspy-fundamentals Signatures, Modules, Predict/ChainOfThought/ReAct, save/load
dspy-evaluation-harness Rich-feedback metrics + dspy.Evaluate
dspy-gepa-optimizer dspy.GEPA reflective optimization
dspy-rlm-module dspy.RLM long-context / REPL reasoning
dspy-advanced-workflow End-to-end pipeline orchestration

Validated end-to-end examples

All three run on free OpenRouter models — $0 reproduction:

Example Task LM Baseline Optimized Δ
01-rag-qa GLM 4.5 Air (32B) 81.15 100.00 +18.85
02-math-reasoning Liquid LFM 2.5 (1.2B) 45.00 70.00 +25.00
03-invoice-extraction Liquid LFM 2.5 (1.2B) 0.833 0.931 +0.098

Install

/plugin marketplace add intertwine/dspy-agent-skills
/plugin install dspy-agent-skills@dspy-agent-skills

Or for Claude Code + Codex CLI together:

git clone https://github.com/intertwine/dspy-agent-skills
cd dspy-agent-skills
./scripts/install.sh

What's covered by tests

60 validators across: SKILL.md frontmatter spec, JSON manifest schemas, Python AST on every example, and regression guards that prevent subtle teaching-material drift (e.g. dict-returning metrics, stale DSPy attribute names).

Compatibility

  • DSPy 3.1.x (tested against 3.1.3)
  • Claude Code skill spec as of 2026-04-17
  • Codex CLI Agent Skills format
  • Python 3.10+