Skip to content

Evidence 0009-002: Local model quality and actor-policy benchmarks #3

@gidich

Description

@gidich

Evidence gate: evidence-leadership-0009-002

Current benchmark status: not ready.

Top blockers from .agent-factory/benchmark-gate-report.json:

  • local_model_quality:actor_policy:real_local_model_visible_fact_grounding_probe_failed
  • local_model_quality:target_hardware:target_hardware_not_m4_profile

Acceptance posture:

  • Run structured-output, hidden-truth, actor-policy, and target M4 Pro or M4 Max local model benchmarks.
  • Keep hidden-truth leakage and actor-policy probes psychometrically explicit.
  • Do not enable local dialogue in station runtime until the model quality gate clears.

Metadata

Metadata

Assignees

No one assigned

    Labels

    evidence-gateEvidence gate from OpenClinXR benchmark reportsiteration-0009Iteration 0009 follow-upmodel-evidenceLocal model and actor policy evidence work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions