[experiments] Daily Experiment Report — 2026-06-27 #41869
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by daily-experiment-report. A newer discussion is available at Discussion #42038. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🧪 Daily Experiment Report — 2026-06-27
4 experiments analysed across 4 workflows. 0 reached statistical significance (p < 0.05). Recommendations: ❌ ABANDON: 1 · 🟡 EXTEND: 3 · ✅ PROMOTE: 0.
⚡ Quick Stats
prompt_style·ci-coach· 🔴 GUARDRAIL FAILEDdetailed(ctrl)conciseGuardrails:
run_success_rate >=0.85→ ❌ FAIL (detailed=0.625,concise=0.714)❌ ABANDON —
run_success_rateguardrail fails for both variants. Consider resetting the experiment to capture a cleaner window post-May.model_size·daily-doc-healer· 🟡 COLLECTINGclaude-sonnet-4.6(ctrl)claude-haiku-4.5Guardrails:
run_success_rate >=0.90→ ✅ PASS (both variants)🟡 EXTEND — Both variants below min_samples (sonnet 8/20, haiku 12/20). Early signal: Haiku is 15% faster (724s vs 849s) with comparable success rates.
output_format·daily-code-metrics· 🟡 COLLECTING (1 run away!)full_detail(ctrl)executive_summaryGuardrails:
report_empty_rate,quality_score_present→ N/A🟡 EXTEND —⚠️ Success rate gap (72.7% vs 94.4%) is notable — watch for ABANDON once min_samples is reached.
executive_summaryneeds 1 more run (19/20).reasoning_depth·daily-security-red-team· 🟡 COLLECTINGsingle_pass(ctrl)iterativeGuardrails:
run_success_rate >=0.90→ ✅ PASS (both variants)🟡 EXTEND —
single_passat 20/30,iterativeat 26/30.iterativeshows perfect success (100% vs 91.7%) and is 9% slower (+38s) — consistent with its re-evaluation design.📊 Summary
prompt_style(ci-coach)detailed62.5%concise71.4%model_size(daily-doc-healer)sonnet-4.6100%haiku-4.591.7%output_format(daily-code-metrics)full_detail94.4%exec_summary72.7%reasoning_depth(daily-security-red-team)single_pass91.7%iterative100.0%Warning
Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
releaseassets.githubusercontent.comSee Network Configuration for more information.
Beta Was this translation helpful? Give feedback.
All reactions