You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
37 experiments across 34 workflows. 9 ready (all variants ≥ min_samples), 28 still collecting. ⚠️ 1 balance alert.
Note:gh aw experiments CLI could not be installed (network restriction). State read from experiments/* git branches. GitHub API unavailable — outcome p-values not computed.
⚡ Quick Stats
Metric
Value
Active experiments
37
🟢 Ready for analysis
9
🟡 Still collecting
28
⚠️ Balance alerts
1
Outcome significance
0 (GitHub API unavailable)
🟢 Ready for Analysis (9 experiments)
All variants ≥ min_samples. Outcome significance pending GitHub API access.
Experiment
Workflow
Variants
n (ctrl/var)
Balance
Rec
caveman
smoke-copilot
no / yes
97/96
✅ p=0.900
READY
subagent_model
smoke-copilot
small / large
86/87
✅ p=0.898
READY
sub_agent_strategy
smoke-gemini
single_agent / sub_agents
58/92
⚠️ p=0.006
EXTEND (fix balance)
sub_agent_strategy
smoke-antigravity
single_agent / sub_agents
58/63
✅ p=0.654
READY
caveman
smoke-copilot-aoai-apikey
yes / no
35/34
✅ p=0.871
READY
subagent_model
smoke-copilot-aoai-apikey
small / large
35/34
✅ p=0.871
READY
prompt_style
daily-community-attribution
concise / verbose
24/24
✅ p=1.000
READY
caveman
smoke-copilot-aoai-entra
no / yes
20/21
✅ p=0.850
READY
subagent_model
smoke-copilot-aoai-entra
small / large
20/21
✅ p=0.850
READY
📊 Charts for top 5 experiments
caveman · smoke-copilot
model · smoke-copilot_subagent
strategy · smoke-gemini_sub_agent
strategy · smoke-antigravity_sub_agent
caveman · smoke-copilot-aoai-apikey
🟡 Still Collecting — Top 10 by runs
Experiment
Workflow
Runs
min_s
Slowest variant
tone_variant
aw-failure-investigator
86
50
narrative: ████░░░░ 25/50
prompt_style
daily-astrostylelite-markdown-spellcheck
50
30
concise: ██████░░ 24/30
output_format
daily-issues-report
47
30
collapsible: ██████░░ 22/30
prompt_style
daily-news
42
30
concise: █████░░░ 18/30
reasoning_depth
daily-security-red-team
41
30
single_pass: █████░░░ 19/30
output_format
daily-compiler-quality
37
20
detailed: ██████░░ 15/20
output_format
daily-code-metrics
36
20
executive_summary: ███████░ 17/20
semgrep_output_format
daily-semgrep-scan
36
30
bullet_list: ███░░░░░ 10/30
output_format
deep-report
34
15
full_briefing: ██████░░ 11/15
prompt_compression
agent-performance-analyzer
33
14
verbose: ███████░ 12/14
+18 more — see all in git state branches.
🔍 Detailed Analysis — READY Experiments
caveman · smoke-copilot
Balance: ✅ balanced (chi2=0.01, p=0.9002) · min_samples=20 · total runs=193
no ← ctrl: n=97 (50.3%) ████████ 97/20
yes: n=96 (49.7%) ████████ 96/20
Rec: READY_FOR_ANALYSIS
subagent_model · smoke-copilot
Balance: ✅ balanced (chi2=0.01, p=0.8978) · min_samples=20 · total runs=173
small ← ctrl: n=86 (49.7%) ████████ 86/20
large: n=87 (50.3%) ████████ 87/20
Rec: READY_FOR_ANALYSIS
sub_agent_strategy · smoke-gemini
Balance: ⚠️IMBALANCED (chi2=7.71, p=0.0055) · min_samples=30 · total runs=150
single_agent ← ctrl: n=58 (38.7%) ████████ 58/30
sub_agents: n=92 (61.3%) ████████ 92/30
⚠️Assignment is imbalanced (p=0.0055 < 0.05). Fix randomization before interpreting outcomes.
Rec: EXTEND (fix balance first)
sub_agent_strategy · smoke-antigravity
Balance: ✅ balanced (chi2=0.21, p=0.6539) · min_samples=30 · total runs=121
single_agent ← ctrl: n=58 (47.9%) ████████ 58/30
sub_agents: n=63 (52.1%) ████████ 63/30
Rec: READY_FOR_ANALYSIS
caveman · smoke-copilot-aoai-apikey
Balance: ✅ balanced (chi2=0.01, p=0.8713) · min_samples=20 · total runs=69
yes ← ctrl: n=35 (50.7%) ████████ 35/20
no: n=34 (49.3%) ████████ 34/20
Rec: READY_FOR_ANALYSIS
Analysis: 37 experiments · 1616 total variant assignments · balance threshold p < 0.05
Run: 27944101270
Warning
Firewall blocked 2 domains
The following domains were blocked by the firewall during workflow execution:
proxy.golang.org
releaseassets.githubusercontent.com
To allow these domains, add them to the network.allowed list in your workflow frontmatter:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
🧪 Daily Experiment Report — 2026-06-22
37 experiments across 34 workflows. 9 ready (all variants ≥⚠️ 1 balance alert.
min_samples), 28 still collecting.⚡ Quick Stats
🟢 Ready for Analysis (9 experiments)
cavemansmoke-copilotno/yessubagent_modelsmoke-copilotsmall/largesub_agent_strategysmoke-geminisingle_agent/sub_agentssub_agent_strategysmoke-antigravitysingle_agent/sub_agentscavemansmoke-copilot-aoai-apikeyyes/nosubagent_modelsmoke-copilot-aoai-apikeysmall/largeprompt_styledaily-community-attributionconcise/verbosecavemansmoke-copilot-aoai-entrano/yessubagent_modelsmoke-copilot-aoai-entrasmall/large📊 Charts for top 5 experiments
caveman·smoke-copilotmodel·smoke-copilot_subagentstrategy·smoke-gemini_sub_agentstrategy·smoke-antigravity_sub_agentcaveman·smoke-copilot-aoai-apikey🟡 Still Collecting — Top 10 by runs
tone_variantaw-failure-investigatornarrative: ████░░░░ 25/50prompt_styledaily-astrostylelite-markdown-spellcheckconcise: ██████░░ 24/30output_formatdaily-issues-reportcollapsible: ██████░░ 22/30prompt_styledaily-newsconcise: █████░░░ 18/30reasoning_depthdaily-security-red-teamsingle_pass: █████░░░ 19/30output_formatdaily-compiler-qualitydetailed: ██████░░ 15/20output_formatdaily-code-metricsexecutive_summary: ███████░ 17/20semgrep_output_formatdaily-semgrep-scanbullet_list: ███░░░░░ 10/30output_formatdeep-reportfull_briefing: ██████░░ 11/15prompt_compressionagent-performance-analyzerverbose: ███████░ 12/14+18 more — see all in git state branches.
🔍 Detailed Analysis — READY Experiments
caveman·smoke-copilotBalance: ✅ balanced (chi2=0.01, p=0.9002) · min_samples=20 · total runs=193
no← ctrl: n=97 (50.3%) ████████ 97/20yes: n=96 (49.7%) ████████ 96/20Rec: READY_FOR_ANALYSIS
subagent_model·smoke-copilotBalance: ✅ balanced (chi2=0.01, p=0.8978) · min_samples=20 · total runs=173
small← ctrl: n=86 (49.7%) ████████ 86/20large: n=87 (50.3%) ████████ 87/20Rec: READY_FOR_ANALYSIS
sub_agent_strategy·smoke-geminiBalance:⚠️ IMBALANCED (chi2=7.71, p=0.0055) · min_samples=30 · total runs=150
single_agent← ctrl: n=58 (38.7%) ████████ 58/30sub_agents: n=92 (61.3%) ████████ 92/30Rec: EXTEND (fix balance first)
sub_agent_strategy·smoke-antigravityBalance: ✅ balanced (chi2=0.21, p=0.6539) · min_samples=30 · total runs=121
single_agent← ctrl: n=58 (47.9%) ████████ 58/30sub_agents: n=63 (52.1%) ████████ 63/30Rec: READY_FOR_ANALYSIS
caveman·smoke-copilot-aoai-apikeyBalance: ✅ balanced (chi2=0.01, p=0.8713) · min_samples=20 · total runs=69
yes← ctrl: n=35 (50.7%) ████████ 35/20no: n=34 (49.3%) ████████ 34/20Rec: READY_FOR_ANALYSIS
Warning
Firewall blocked 2 domains
The following domains were blocked by the firewall during workflow execution:
proxy.golang.orgreleaseassets.githubusercontent.comSee Network Configuration for more information.
Beta Was this translation helpful? Give feedback.
All reactions