Skip to content

feat(skills): add /aidd-riteway-ai#189

Merged
janhesters merged 4 commits intomainfrom
cursor/aidd-riteway-ai-skill-9ba2
Apr 15, 2026
Merged

feat(skills): add /aidd-riteway-ai#189
janhesters merged 4 commits intomainfrom
cursor/aidd-riteway-ai-skill-9ba2

Conversation

@ericelliott
Copy link
Copy Markdown
Collaborator

@ericelliott ericelliott commented Apr 10, 2026

Split from PR #168. One skill per PR per project standards.

What

Adds the /aidd-riteway-ai skill — an AI prompt evaluation skill using RITEway methodology that teaches agents how to write correct riteway ai prompt evals (.sudo files) for multi-step tool-calling flows.

Files added

  • ai/commands/aidd-riteway-ai.md — command entry point
  • ai/skills/aidd-riteway-ai/SKILL.md — full skill with 7 rules + process section
  • ai/skills/aidd-riteway-ai/README.md — what/why/commands reference
  • ai/skills/aidd-riteway-ai/riteway-ai.test.js — 12 unit tests verifying skill structure and content
  • tasks/aidd-riteway-ai-skill-epic.md — task epic with requirements

Files modified

  • ai/skills/aidd-please/SKILL.md — added /aidd-riteway-ai to Commands block for agent discovery

Review fixes applied

  • Fixed broken /aidd-requirements references → /aidd-functional-requirements (SKILL.md, test file, epic)
  • Fixed /aidd-pr example reference → generic "your skill under test" (SKILL.md)
  • Standardized E2eE2E casing to match repo conventions (SKILL.md heading, body, checklist)
  • All 5 review conversations resolved

Verification

  • All 535 unit tests pass (51 test files)
  • All review threads resolved
  • No broken references, TODOs, or placeholder text remain
Open in Web Open in Cursor 

Copilot AI review requested due to automatic review settings April 10, 2026 09:36
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8296313. Configure here.

Comment thread ai/skills/aidd-riteway-ai/SKILL.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new /aidd-riteway-ai skill to the AIDD skills catalog to guide authorship of riteway ai .sudo prompt evals for multi-step, tool-calling agent flows, along with command wiring, discovery, and unit tests to enforce the skill’s contract.

Changes:

  • Added the aidd-riteway-ai skill documentation + checklist for authoring .sudo evals for tool-calling flows.
  • Added the /aidd-riteway-ai command entrypoint and updated discovery/indexes so agents can find it.
  • Added Vitest + riteway contract tests to validate required sections/content and integrations.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tasks/aidd-riteway-ai-skill-epic.md New epic capturing requirements and scope for the skill/command/discovery updates.
ai/skills/index.md Adds aidd-riteway-ai to the skills index for discovery.
ai/skills/aidd-riteway-ai/SKILL.md New skill content: rules + process + checklist for .sudo prompt eval authoring.
ai/skills/aidd-riteway-ai/README.md Skill README with usage and rule summary.
ai/skills/aidd-riteway-ai/riteway-ai.test.js Contract tests asserting presence/structure/integration of the new skill + command + discovery.
ai/skills/aidd-please/SKILL.md Adds /aidd-riteway-ai to the global Commands block for agent discovery.
ai/commands/index.md Adds /aidd-riteway-ai to the commands index.
ai/commands/aidd-riteway-ai.md New command entrypoint that loads the skill and references /aidd-please constraints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ai/skills/aidd-riteway-ai/SKILL.md Outdated
Comment thread ai/skills/aidd-riteway-ai/SKILL.md Outdated
Comment thread ai/skills/aidd-riteway-ai/SKILL.md Outdated
Comment thread tasks/aidd-riteway-ai-skill-epic.md
Copy link
Copy Markdown
Collaborator

@janhesters janhesters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ai-eval CI check is red (rate limit hit again, same as #191). Needs a re-run before merge.

SKILL.md:19 references /aidd-functional-requirements which is being renamed to /aidd-requirements in #190. The contract test at line 62 also asserts the old name. These PRs have a merge order dependency that needs resolving.

Comment on lines +88 to +108
mock gh pr view => returns:
title: My PR
branch: feature/foo
base: main

mock gh api (list review threads) => returns:
[{ id: "T_01", resolved: false, body: "..." }]
```

---

## Rule 3 — Step 1: assert tool calls, do not pre-supply answers

Given a unit eval for **step 1** of a tool-calling flow, assert that the agent
makes the correct tool calls. Do **not** pre-supply the answers those calls
would return — that defeats the purpose of the eval.

Correct pattern for step 1:

```
userPrompt = """
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reconsider this rule before codifying it as best practice? Two concerns:

  1. Scheming/sandbaggingResearch shows that agents behave differently when they know they're being evaluated. Telling the agent "you are in a test environment" is literally the trigger for altered behavior.

  2. False positive self-fulfillment — The agent under test sees both the mocks AND the assertions in the same .sudo file. It can pattern-match the expected output without actually exercising the skill's logic.

This likely needs a RITEway framework change rather than a prompt pattern — e.g., a dedicated mocks section that's injected by the harness but stripped before the agent under test sees it, and a separate judge pass that evaluates only the agent's output. Should we open a discussion/PR on the RITEway framework before shipping this rule?

Comment on lines +135 to +155
executable without running the prior steps live.

Example for step 2:

```
userPrompt = """
You have mock tools available. Use them instead of real calls.

Triage is complete. The following issues remain unresolved:

Issue 1 (thread ID: T_01):
File: src/utils.js, line 5
"add() subtracts instead of adding"

Generate delegation prompts for the remaining issues.
"""
```

---

## Rule 5 — E2E evals: use real tools, follow -e2e.test.sudo naming
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as Rule 2 — hand-crafting previous step output in the userPrompt is visible to the agent under test and enables pattern-matching. Ideally the framework would handle this: run step 1, capture its actual output, then pipe it into step 2 automatically. That way each step is tested against real intermediate results, not hand-crafted stubs, and the agent under test doesn't see the test scaffolding. This might also need a RITEway framework change rather than being solvable with a prompt pattern.

Comment on lines +157 to +169
Given an e2e eval, use real tools (no mock preamble) and follow the
`-e2e.test.sudo` naming convention to mirror the project's existing unit/e2e
split:

```
ai-evals/<skill-name>/step-1-<description>-e2e.test.sudo
```

E2E evals run against live APIs. Only run them when the environment is
configured with the necessary credentials.

---

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we drop this rule? Agents use real tools by default — that's the natural behavior. And if they lack the required tools, they'll fail regardless. The only value here is the -e2e.test.sudo naming convention, which doesn't warrant a standalone rule.

Comment on lines +171 to +187

Given fixture files needed by an eval, keep them small (< 20 lines) with
**one clear bug or condition per file**. Fixtures live in:

```
ai-evals/<skill-name>/fixtures/<filename>
```

Example fixture (`add.js`):

```js
export const add = (a, b) => a - b; // bug: subtracts instead of adds
```

Do not combine multiple bugs in one fixture file. Each fixture must make the
assertion conditions unambiguous.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "< 20 lines" size constraint is overly prescriptive. For certain evals — e.g., code review skills — you want large, realistic files where the agent has to find the signal in the noise. That's the whole point of testing whether the agent can actually spot issues. "One condition per file" is a reasonable guideline, but should we drop the size limit?

cursoragent and others added 4 commits April 15, 2026 15:21
Co-authored-by: Eric Elliott <support@paralleldrive.com>
… /aidd-functional-requirements

- SKILL.md: fix 2 references to nonexistent /aidd-requirements
- SKILL.md: fix /aidd-pr example reference to generic form
- SKILL.md: standardize E2e -> E2E casing (3 places)
- riteway-ai.test.js: update test to validate correct skill name
- tasks/aidd-riteway-ai-skill-epic.md: fix 3 references to /aidd-requirements
Align with the rename in #190. Updates SKILL.md, contract tests, and
the epic file.
Copilot AI review requested due to automatic review settings April 15, 2026 13:22
@janhesters janhesters force-pushed the cursor/aidd-riteway-ai-skill-9ba2 branch from 20ee0f8 to 2065626 Compare April 15, 2026 13:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

@janhesters janhesters merged commit 08bce96 into main Apr 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants