Evidence 0009-002: Local model quality and actor-policy benchmarks

Evidence gate: `evidence-leadership-0009-002`

Current benchmark status: not ready.

Top blockers from `.agent-factory/benchmark-gate-report.json`:
- `local_model_quality:actor_policy:real_local_model_visible_fact_grounding_probe_failed`
- `local_model_quality:target_hardware:target_hardware_not_m4_profile`

Acceptance posture:
- Run structured-output, hidden-truth, actor-policy, and target M4 Pro or M4 Max local model benchmarks.
- Keep hidden-truth leakage and actor-policy probes psychometrically explicit.
- Do not enable local dialogue in station runtime until the model quality gate clears.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evidence 0009-002: Local model quality and actor-policy benchmarks #3

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Evidence 0009-002: Local model quality and actor-policy benchmarks #3

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions