v0.5.2 — Behavioral Eval Suites & GAQL Rule Fix
What's New
Behavioral Eval Suites (28 prompts)
Added prompt-and-expectation test suites in tests/evals/ that validate AI behavior across all four workflow domains:
- read.json (8 evals) — Performance review, GDPR awareness, PMax analysis, GAQL queries, conversion diagnosis, landing pages, recommendations
- write.json (8 evals) — Campaign creation, RSA drafting, keyword safety (Broad Match + Manual CPC), negative keywords, budget caps, pause/remove safety, two-step write pattern
- tracking.json (6 evals) — Tracking verification, consent mode, event diagnosis, code generation, click-to-session ratio interpretation
- planning.json (6 evals) — Budget forecasting, keyword discovery, match type guidance, competition analysis, campaign planning workflow
These serve as regression checks when updating orchestration rules and as documentation of expected behavior for each workflow. Inspired by community PR #10 — the eval definitions are extracted and consolidated while the rule duplication from that PR is avoided.
GAQL Rule Fix
Fixed an incorrect rule in the orchestration docs that stated "metrics fields cannot appear in WHERE clauses." GAQL does allow metrics in WHERE (e.g., WHERE metrics.clicks > 5). The real constraint is field compatibility — certain resource attribute fields cannot be selected alongside specific metrics or segments in the same query. The corrected rule now explains this accurately.
Credits
Eval suite structure inspired by @antongulin's PR #10. The behavioral test definitions were valuable; they've been extracted into a standalone tests/evals/ directory without the rule duplication that the original PR introduced.