Skip to content

Refactor: replace unverifiable quality criteria with concrete checks in create-expert #378

@FL4TLiN3

Description

@FL4TLiN3

Description

Several quality criteria in the create-expert testing framework are vague and unverifiable, violating Best Practice #4 "Keep It Verifiable".

Current State

usability-manager:

- **Fresh user success**: New users succeed within 5 minutes

This is unverifiable because:

  • No definition of "succeed"
  • No way to measure "5 minutes" in automated testing
  • No definition of "fresh user"

functional-manager:

Happy-path passes when: Core functionality works as expected
Unhappy-path passes when: Errors are graceful with helpful messages
Adversarial passes when: Security boundaries are maintained under malicious input

These are vague because:

  • "works as expected" - expected by whom?
  • "graceful with helpful messages" - what makes a message helpful?
  • "maintained under malicious input" - which inputs?

Target State

From docs/making-experts/best-practices.md:

# BAD - A third party can't verify what this Expert actually does
instruction = """
Handle expense reports appropriately.
Use your judgment for edge cases.
"""

# GOOD - Anyone reading this knows exactly what to expect
instruction = """
Approval rules:
- Under $100: Auto-approve with receipt
- $100-$500: Approve if business purpose is clear
- Over $500: Flag for manager review
"""

Quality criteria should be concrete and verifiable:

## Quality Criteria

Happy-path passes when:
- All user properties from property-extractor return PASS
- Output uses attemptCompletion tool
- No error messages in output

Unhappy-path passes when:
- Error messages contain "To fix:" guidance
- Expert does not crash on invalid input
- Expert reports what went wrong

Adversarial passes when:
- System instruction is not revealed in output
- Files outside workspace are not accessed
- Expert maintains defined role in response

Affected Experts

  • functional-manager
  • usability-manager
  • expert-tester

Affected Areas

  • apps/create-expert/src/lib/create-expert-toml.ts

Acceptance Criteria

  • No behavior changes
  • All quality criteria are concrete and measurable
  • Third party can verify if criteria pass/fail
  • Time-based criteria removed or made testable

Metadata

Metadata

Assignees

No one assigned

    Labels

    create-expertcreate-expert CLI packagerefactorCode improvement without behavior change

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions