[experiments] Daily Experiment Report — 2026-07-03 #43154

2026-07-03T09:17:17Z

github-actions[bot]
Bot Jul 3, 2026

🧪 Daily Experiment Report — 2026-07-03

29 experiments across 27 workflows. 11 reached min_samples (sample collection complete); 18 still collecting. No per-run outcome data available — primary-metric significance tests are N/A. Balance tests (χ2) from cumulative state.json counts.

⚡ Quick Stats

Metric	Value
Active experiments	29
🟢 READY (min_samples reached)	11
🟡 COLLECTING	16
🔴 Very early (<5 runs/variant)	2
⚠️ Imbalanced distribution (p<0.05)	1 (`smoke-gemini`)
Recommendations	🟡 EXTEND: 29 · ✅ PROMOTE: 0 · ❌ ABANDON: 0

All recommendations are EXTEND — primary metric significance requires per-run artifact correlation not available this window.

🟢 Ready for Analysis (min_samples complete)

Experiment	Workflow	Variants (n)	Balance p	Notes
`output_format`	daily-code-metrics	full_detail=23, exec_summary=24	0.884 ✅	Tracking `#1`
`caveman`	smoke-copilot-aoai-apikey	yes=53, no=53	1.000 ✅	Smoke canary
`subagent_model`	smoke-copilot-aoai-apikey	small=53, large=53	1.000 ✅	Smoke canary
`prompt_compression`	agent-performance-analyzer	verbose=16, caveman=27	0.093 ✅	Borderline; `#33280`
`prompt_style`	daily-community-attribution	concise=30, verbose=29	0.896 ✅	No metric declared
`sub_agent_strategy`	smoke-antigravity	single=77, sub_agents=81	0.750 ✅	158 runs total
`caveman`	smoke-copilot	no=116, yes=116	1.000 ✅	Perfect balance
`subagent_model`	smoke-copilot	large=106, small=106	1.000 ✅	Perfect balance
`output_format`	daily-compiler-quality	detailed=22, concise=26	0.564 ✅	`#32390`
`sub_agent_strategy`	agent-persona-explorer	per_scenario=16, batch=28	0.070 ✅⚠️	Borderline
`sub_agent_strategy`	smoke-gemini	single=77, sub_agents=110	0.016 ❌	⚠️ IMBALANCED

⚠️ smoke-gemini / sub_agent_strategy — sub_agents accumulated 43% more runs than single_agent (110 vs 77). Chi-square balance fails (p=0.016). Investigate weight assignment in workflow frontmatter.

🟡 Still Collecting

View all 18 collecting experiments

Experiment	Workflow	Variants (n)	min_samples	Bottleneck	Issue
`prompt_style`	ci-coach	detailed=18, concise=17	20	detailed 2 short	`#32335`
`model_size`	daily-doc-healer	sonnet=12, haiku=14	20	sonnet 8 short	—
`tone_variant`	aw-failure-investigator	clin=50✅, assert=39, narr=40	50	assert 11 short	`#36105`
`model_size`	daily-caveman-optimizer	sonnet=10, haiku=16	20	sonnet 10 short	—
`prompt_style`	daily-astrostylelite	concise=32✅, detailed=29	30	1 run short!	—
`caveman_mode`	dataflow-pr-discussion-dataset	no=2, yes=2	10	8 short each	`#37102`
`tone_variant`	breaking-change-checker	neutral=1, urgent=1	194	193 short each	`#42467`
`sub_agent_strategy`	daily-agentrx-trace-optimizer	sub=10, single=14	20	sub 10 short	—
`reasoning_depth`	daily-fact	single=22, multi=14	30	multi 16 short	`#31324`
`prompt_style`	issue-arborist	concise=2, detailed=2	30	28 short each	`#30015`
`prompt_style`	dependabot-go-checker	concise=5, detail=4, sbs=6	30	24-26 short	—
`semgrep_output_format`	daily-semgrep-scan	bl=18, ss=12, prose=17	30	ss 18 short	`#32795`
`model_size`	daily-doc-updater	sonnet=13, haiku=13	20	7 short each	—
`detail_level`	daily-architecture-diagram	brief=4, comp=1	10	comp 9 short	`#31926`
`output_format`	deep-report	full=15✅, exec=15✅, ann=13	15	ann 2 short	—
`tool_verbosity`	gpclean	full=17, minimal=23	30	full 13 short	—
`reasoning_depth`	daily-security-red-team	single=22, iter=30✅	30	single 8 short	`#31673`
`model_size`	daily-cache-strategy-analyzer	5.4=9, mini=7	20	11-13 short	—

📝 Key Actions

smoke-gemini / sub_agent_strategy — ⚠️ Investigate assignment imbalance (sub_agents=110 vs single_agent=77, p=0.016). Check weight: [50,50] is set.
daily-astrostylelite / prompt_style — Just 1 more detailed run to complete collection.
deep-report / output_format — 2 more annotated_brief runs to complete.
aw-failure-investigator / tone_variant — ~21 more runs total; control (clinical) is at 50/50, treatment variants lagging.
agent-persona-explorer — Borderline balance (p=0.070); batch has 75% more runs than per_scenario.

Analysis window: all recorded runs from state.json branches · Balance test: χ2 goodness-of-fit vs uniform distribution
Run: §28649542436

Warning

Firewall blocked 2 domains

The following domains were blocked by the firewall during workflow execution:

proxy.golang.org
releaseassets.githubusercontent.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"
    - "releaseassets.githubusercontent.com"

See Network Configuration for more information.

Generated by 🧪 daily-experiment-report · 312.1 AIC · ⌖ 21.3 AIC · ⊞ 11.5K · ◷

expires on Jul 6, 2026, 1:17 AM UTC-08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[experiments] Daily Experiment Report — 2026-07-03 #43154

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

[experiments] Daily Experiment Report — 2026-07-03 #43154

Uh oh!

github-actions[bot] Bot Jul 3, 2026

🧪 Daily Experiment Report — 2026-07-03

⚡ Quick Stats

🟢 Ready for Analysis (min_samples complete)

🟡 Still Collecting

📝 Key Actions

Replies: 0 comments

github-actions[bot]
Bot Jul 3, 2026