[experiments] Daily Experiment Report — 2026-06-25 #41408

2026-06-25T09:12:17Z

github-actions[bot]
Bot Jun 25, 2026

🧪 Daily Experiment Report — 2026-06-25

35 experiments across 32 workflows · 11 reached sample targets · 0 statistically significant (p < 0.05) · Outcome data fetched for 8 experiments via Actions run history.

⚡ Quick Stats

Metric	Value
Active experiments	35
All variants at min_samples	11
Still collecting	24
Statistically significant (p < 0.05)	0
Unbalanced (balance p < 0.05)	1 (smoke-gemini ⚠️)
Recommendations	✅ PROMOTE: 0 · 🟡 EXTEND: 27 · ❌ ABANDON: 6

🔬 Detailed Analysis (8 experiments with outcome data)

`prompt_style` · `ci-coach`

🟡 COLLECTING · Tracking: #32335

📊 Statistics

Variant	n	Succ %	Dur (s)	95% CI	p-value
`detailed` (ctrl)	16	62.5%	658	[478, 838]	(ref)
`concise`	13	69.2%	844	[576, 1113]	0.70

🟡 EXTEND — ⚠️ Guardrail violation: run_success_rate < 0.85 for both variants (detailed=62.5%, concise=69.2%). Need 4+7 more runs.

`tone_style` · `typist`

🟢 READY · Tracking: #34032

📊 Statistics

Variant	n	Succ %	Dur (s)	95% CI	p-value
`formal` (ctrl)	10	90.0%	978	[858, 1098]	(ref)
`conversational`	12	100%	945	[766, 1125]	0.26

❌ ABANDON — No significant effect (p=0.26). Both variants at min_samples. conversational shows 100% vs formal 90% — directional but not significant.

`caveman` · `smoke-copilot`

🟢 READY · Tracking: —

📊 Statistics

Variant	n (last30)	Succ %	Dur (s)	p-value
`no` (ctrl)	15	80.0%	807	(ref)
`yes`	14	71.4%	830	0.59

❌ ABANDON — No significant effect (p=0.59). Both variants well past min_samples (105/104 runs). Caveman mode shows no measurable impact.

`subagent_model` · `smoke-copilot`

🟢 READY · Tracking: —

📊 Statistics

Variant	n (last30)	Succ %	Dur (s)	p-value
`large` (ctrl)	15	80.0%	835	(ref)
`small`	14	71.4%	800	0.59

❌ ABANDON — No significant effect (p=0.59). large vs small subagent model shows no meaningful difference in success or duration.

`sub_agent_strategy` · `smoke-gemini`

🟢 READY · Tracking: —

📊 Statistics

Variant	Total n	Succ %	Dur (s, last30)	Balance p
`sub_agents` (ctrl)	98	100%	409	⚠️ p=0.016
`single_agent`	67	100%	648	—

❌ ABANDON — ⚠️ UNBALANCED assignment (sub_agents=98, single_agent=67, balance p=0.016). Both variants 100% success — no outcome test possible. Duration p=0.35.

`sub_agent_strategy` · `smoke-antigravity`

🟢 READY · Tracking: —

📊 Statistics

Variant	Total n	Succ %	Dur (s, last30)	p-value
`single_agent` (ctrl)	64	100%	364	(ref)
`sub_agents`	72	100%	607	0.40

❌ ABANDON — Both variants 100% success — no success rate test possible. Duration not significant (p=0.40). No detectable effect.

`prompt_style` · `daily-community-attribution`

🟢 READY · Tracking: —

📊 Statistics

Variant	n (last30)	Succ %	Dur (s)	p-value
`concise` (ctrl)	14	100%	549	(ref)
`verbose`	16	81.2%	638	0.088

❌ ABANDON — Close to significant: concise=100% vs verbose=81% success (p=0.088). Below p<0.05 threshold — no detectable effect at current sample.

`prompt_compression` · `agent-performance-analyzer`

🟡 COLLECTING · Tracking: #33280

📊 Statistics

Variant	n (last30)	Succ %	Dur (s)	p-value
`verbose` (ctrl)	11	100%	856	(ref)
`caveman`	19	89.5%	756	0.27

🟡 EXTEND — verbose variant at 11/14 runs (79%). Directional: verbose=100% vs caveman=89.5% success; caveman 100s faster. p=0.27 — insufficient power yet.

🟢 Ready — Outcome Data Pending

Experiment	Workflow	Counts	min_samples
`caveman`	`smoke-copilot-aoai-apikey`	yes=42, no=42	20
`subagent_model`	`smoke-copilot-aoai-apikey`	small=42, large=42	20
`caveman`	`smoke-copilot-aoai-entra`	no=28, yes=28	20
`subagent_model`	`smoke-copilot-aoai-entra`	large=28, small=28	20
`sub_agent_decomposition`	`smoke-pi`	single=34, parallel=42	20

🟡 Collecting Data — Close to Threshold

Experiment	Workflow	Counts	min_samples	Still Need	Issue
`prompt_compression`	`agent-performance-analyzer`	verbose=13, caveman=22	14	1	`#33280`
`output_format`	`daily-compiler-quality`	detailed=17, concise=23	20	3	`#32390`
`output_format`	`daily-code-metrics`	full_detail=21, exec_summary=18	20	2	`#1`
`prompt_style`	`ci-coach`	detailed=16, concise=13	20	7	`#32335`
`sub_agent_strategy`	`agent-persona-explorer`	batch=23, per_scenario=12	14	2	—
`output_format`	`deep-report`	all ~11-13	15	4	—
`prompt_style`	`daily-news`	detailed=24, concise=21	30	9	`#31190`
`output_format`	`daily-issues-report`	collapsible=24, inline=26	30	4	`#30573`
`reasoning_depth`	`daily-security-red-team`	iterative=24, single_pass=20	30	10	`#31673`
`tone_variant`	`aw-failure-investigator`	clinical=38, assertive=31, narrative=28	50	22	`#36105`

Analysis: all runs in experiment branches + last 30 GitHub Actions runs · Significance threshold: p < 0.05 (two-tailed)
Run: §28158418818

Warning

Firewall blocked 2 domains

The following domains were blocked by the firewall during workflow execution:

proxy.golang.org
releaseassets.githubusercontent.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"
    - "releaseassets.githubusercontent.com"

See Network Configuration for more information.

Generated by 🧪 daily-experiment-report · 345.3 AIC · ⌖ 24.8 AIC · ⊞ 11.3K · ◷

expires on Jun 28, 2026, 1:12 AM UTC-08:00

2026-06-26T09:14:17Z

github-actions[bot]
Bot Jun 26, 2026
Author

This discussion has been marked as outdated by daily-experiment-report.

A newer discussion is available at Discussion #41642.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[experiments] Daily Experiment Report — 2026-06-25 #41408

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[experiments] Daily Experiment Report — 2026-06-25 #41408

Uh oh!

github-actions[bot] Bot Jun 25, 2026

🧪 Daily Experiment Report — 2026-06-25

⚡ Quick Stats

🔬 Detailed Analysis (8 experiments with outcome data)

prompt_style · ci-coach

tone_style · typist

caveman · smoke-copilot

subagent_model · smoke-copilot

sub_agent_strategy · smoke-gemini

sub_agent_strategy · smoke-antigravity

prompt_style · daily-community-attribution

prompt_compression · agent-performance-analyzer

🟢 Ready — Outcome Data Pending

🟡 Collecting Data — Close to Threshold

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Jun 26, 2026 Author

github-actions[bot]
Bot Jun 25, 2026

`prompt_style` · `ci-coach`

`tone_style` · `typist`

`caveman` · `smoke-copilot`

`subagent_model` · `smoke-copilot`

`sub_agent_strategy` · `smoke-gemini`

`sub_agent_strategy` · `smoke-antigravity`

`prompt_style` · `daily-community-attribution`

`prompt_compression` · `agent-performance-analyzer`

github-actions[bot]
Bot Jun 26, 2026
Author