-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
create-expertcreate-expert CLI packagecreate-expert CLI packagerefactorCode improvement without behavior changeCode improvement without behavior change
Description
Description
Several quality criteria in the create-expert testing framework are vague and unverifiable, violating Best Practice #4 "Keep It Verifiable".
Current State
usability-manager:
- **Fresh user success**: New users succeed within 5 minutes
This is unverifiable because:
- No definition of "succeed"
- No way to measure "5 minutes" in automated testing
- No definition of "fresh user"
functional-manager:
Happy-path passes when: Core functionality works as expected
Unhappy-path passes when: Errors are graceful with helpful messages
Adversarial passes when: Security boundaries are maintained under malicious input
These are vague because:
- "works as expected" - expected by whom?
- "graceful with helpful messages" - what makes a message helpful?
- "maintained under malicious input" - which inputs?
Target State
From docs/making-experts/best-practices.md:
# BAD - A third party can't verify what this Expert actually does
instruction = """
Handle expense reports appropriately.
Use your judgment for edge cases.
"""
# GOOD - Anyone reading this knows exactly what to expect
instruction = """
Approval rules:
- Under $100: Auto-approve with receipt
- $100-$500: Approve if business purpose is clear
- Over $500: Flag for manager review
"""Quality criteria should be concrete and verifiable:
## Quality Criteria
Happy-path passes when:
- All user properties from property-extractor return PASS
- Output uses attemptCompletion tool
- No error messages in output
Unhappy-path passes when:
- Error messages contain "To fix:" guidance
- Expert does not crash on invalid input
- Expert reports what went wrong
Adversarial passes when:
- System instruction is not revealed in output
- Files outside workspace are not accessed
- Expert maintains defined role in response
Affected Experts
functional-managerusability-managerexpert-tester
Affected Areas
apps/create-expert/src/lib/create-expert-toml.ts
Acceptance Criteria
- No behavior changes
- All quality criteria are concrete and measurable
- Third party can verify if criteria pass/fail
- Time-based criteria removed or made testable
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
create-expertcreate-expert CLI packagecreate-expert CLI packagerefactorCode improvement without behavior changeCode improvement without behavior change