[experiments] Daily Experiment Report — 2026-06-16 #39516
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by daily-experiment-report. A newer discussion is available at Discussion #39756. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🧪 Daily Experiment Report — 2026-06-16
28 experiments across 25 workflows: 6 reached
min_samples, 1 pre-significant (p<0.05 but needs more data). Recommendations — ✅ PROMOTE: 0 · 🟡 EXTEND: 23 · ❌ ABANDON: 5⚡ Quick Stats
subagent_model·smoke-copilot-aoai-apikeysmallsub-agent: 80.0% success vslarge: 26.7% success — 53 pp gap. Highly significant but min_samples=30 not yet reached.smalllargeRecommendation: 🟡 EXTEND — Strong signal; collect 30 runs per variant before concluding.
❌ Concluded Experiments (ABANDON — no detectable effect)
caveman·smoke-copilot(n=157, p=0.8928)yes(36%) vsno(33%)Recommendation: ❌ ABANDON — No statistically significant difference detected (p=0.8928) with all variants at min_samples — no detectable effect.
subagent_model·smoke-copilot(n=137, p=0.5176)small(40%) vslarge(29%)Recommendation: ❌ ABANDON — No statistically significant difference detected (p=0.5176) with all variants at min_samples — no detectable effect.
sub_agent_strategy·smoke-gemini(n=120, p=0.2000)single_agent(0%) vssub_agents(0%)Recommendation: ❌ ABANDON — No statistically significant difference detected (p=0.2000) with all variants at min_samples — no detectable effect.
sub_agent_strategy·smoke-antigravity(n=91, p=0.3825)single_agent(93%) vssub_agents(100%)Recommendation: ❌ ABANDON — No statistically significant difference detected (p=0.3825) with all variants at min_samples — no detectable effect.
prompt_style·daily-astrostylelite-markdown-spellcheck(n=44, p=0.3414)detailed(100%) vsconcise(94%)Recommendation: ❌ ABANDON — No statistically significant difference detected (p=0.3414) with all variants at min_samples — no detectable effect.
🟡 Active Experiments (EXTEND — Collecting Data)
View all 23 active experiments
tone_variantaw-failure-investigatorclinical(100%)prompt_styledaily-community-attributionconcise(93%)output_formatdaily-issues-reportcollapsible(0%)subagent_modelsmoke-copilot-aoai-apikeysmall(80%)cavemansmoke-copilot-aoai-apikeyno(71%)reasoning_depthdaily-security-red-teamiterative(100%)prompt_styledaily-newsdetailed(0%)output_formatdaily-compiler-qualitydetailed(82%)output_formatdeep-reportfull_briefing(100%)semgrep_output_formatdaily-semgrep-scanstructured_sections(100%)output_formatdaily-code-metricsfull_detail(88%)sub_agent_strategyagent-persona-explorerper_scenario(100%)prompt_compressionagent-performance-analyzerverbose(100%)tool_verbositygpcleanfull_bash(100%)reasoning_depthdaily-factsingle_pass(53%)prompt_styleci-coachdetailed(57%)tone_styletypistconversational(100%)subagent_modelsmoke-copilot-aoai-entrasmall(57%)cavemansmoke-copilot-aoai-entrano(57%)summary_detaildependabot-campaigndetailed(100%)prompt_styleissue-arboristconcise(0%)prefetch_strategyweekly-blog-post-writerlazy(100%)sub_agent_strategyarchitecture-guardiansub_agents(100%)Warning
Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
proxy.golang.orgSee Network Configuration for more information.
Beta Was this translation helpful? Give feedback.
All reactions