[experiments] Daily Experiment Report — 2026-06-12 #38814
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by daily-experiment-report. A newer discussion is available at Discussion #39042. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🧪 Daily Experiment Report — 2026-06-12
23 experiments analysed across 21 workflows (5 ABANDON · 3 need outcome data · 15 EXTEND). No experiments reached statistical significance (p < 0.05). 2 workflows have critical failures and require immediate attention.
⚡ Quick Stats
🚨 Critical Alerts
daily-issues-report/output_format— 0% success rate across all variants (all 30 measured runs failed — workflow appears broken)daily-compiler-quality/output_format—detailedsr=77.8% < run_success_rate threshold >=0.85daily-news/prompt_style— 0% success rate across all variants (all 30 measured runs failed — workflow appears broken)❌ ABANDON Recommendations
❌
smoke-copilot/cavemannoyes❌
smoke-copilot/subagent_modellargesmall🚨
daily-issues-report/output_format🚨
daily-compiler-quality/output_format🚨
daily-news/prompt_style📊 smoke-copilot Factorial Interaction (caveman × subagent_model)
Chi-square independence test p = 0.157 — no significant interaction detected.
However,
SPARSE_CELL_RISK(cells < 20 runs): do not promote based on main effects alone.🔬 Ready for Analysis (Needs Outcome Data)
These experiments have enough variant runs but require more outcome metrics to compute significance.
🔬
smoke-gemini/sub_agent_strategy🔬
smoke-antigravity/sub_agent_strategy🔬
smoke-pi/sub_agent_decompositionsingle_agentparallel_sub_agents📈 View ready-experiment charts
🟡 Collecting Data (EXTEND)
Notable experiments near significance:
daily-community-attribution/prompt_style—concise94.7% (n=19) vsverbose84.2% (n=19), p = 0.2904daily-security-red-team/reasoning_depth—iterative100.0% (n=17) vssingle_pass78.6% (n=14), p = 0.0510daily-code-metrics/output_format—executive_summary84.6% (n=13) vsfull_detail84.6% (n=13), p = 1.0000View all 15 EXTEND experiments
ci-coach#32335prompt_styleconcisevsdetailedagent-performance-analyzer#33280prompt_compressioncavemanvsverboseagent-persona-explorersub_agent_strategybatchvsper_scenarioaw-failure-investigator#36105tone_variantclinicalvsnarrativevsassertivedaily-astrostylelite-markdown-spellcheckprompt_styledetailedvsconcisedaily-community-attributionprompt_styleconcisevsverbosedaily-security-red-team#31673reasoning_depthiterativevssingle_passdaily-code-metrics#1output_formatexecutive_summaryvsfull_detailtypist#34032tone_styleconversationalvsformaldeep-reportoutput_formatexecutive_briefvsfull_briefingvsannotated_briefdaily-semgrep-scan#32795semgrep_output_formatstructured_sectionsvsprosevsbullet_listgpcleantool_verbosityfull_bashvsminimal_toolsetsmoke-copilot-aoai-apikeycavemanyesvsnosmoke-copilot-aoai-apikeysubagent_modelsmallvslargedaily-fact#31324reasoning_depthsingle_passvsmulti_candidateReferences: §27406359516
Warning
Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
proxy.golang.orgSee Network Configuration for more information.
Beta Was this translation helpful? Give feedback.
All reactions