[experiments] Daily Experiment Report — 2026-05-24 #34398

2026-05-24T08:58:19Z

github-actions[bot]
Bot May 24, 2026

🧪 Daily Experiment Report — 2026-05-24

19 experiments analyzed across 18 workflows. All experiments are currently collecting data — none have reached statistical readiness (min_samples).

⚡ Quick Stats

Metric	Value
Active experiments	19
Ready for analysis	0
Statistically significant (p < 0.05)	0
Recommendations	✅ PROMOTE: 0 · 🟡 EXTEND: 19 · ❌ ABANDON: 0

📊 All Experiments (Sorted by Progress)

View Full Experiment Details

Experiment	Workflow	Variants	Progress	Total Runs	min_samples	Status
`caveman`	`smoke-copilot`	`True vs False`	░░░░░░░░░░ 0%	43	20	🟡 COLLECTING
`subagent_model`	`smoke-copilot`	`small vs large`	█████░░░░░ 57%	23	20	🟡 COLLECTING
`prompt_style`	`daily-community-attribution`	`concise vs verbose`	████░░░░░░ 47%	19	20	🟡 COLLECTING
`output_format`	`deep-report`	`full_briefing vs executive_brief vs annotated_brief`	███░░░░░░░ 31%	14	15	🟡 COLLECTING
`prompt_style`	`daily-astrostylelite-markdown-spellcheck`	`concise vs detailed`	███░░░░░░░ 31%	19	30	🟡 COLLECTING
`output_format`	`daily-issues-report`	`collapsible vs inline`	██░░░░░░░░ 28%	18	30	🟡 COLLECTING
`sub_agent_strategy`	`smoke-gemini`	`single_agent vs sub_agents`	██░░░░░░░░ 25%	15	30	🟡 COLLECTING
`output_format`	`daily-compiler-quality`	`detailed vs concise`	██░░░░░░░░ 22%	9	20	🟡 COLLECTING
`prompt_style`	`daily-news`	`detailed vs concise`	██░░░░░░░░ 20%	13	30	🟡 COLLECTING
`reasoning_depth`	`daily-security-red-team`	`single_pass vs iterative`	██░░░░░░░░ 20%	12	30	🟡 COLLECTING
`output_format`	`daily-code-metrics`	`full_detail vs executive_summary`	██░░░░░░░░ 20%	8	20	🟡 COLLECTING
`prompt_style`	`ci-coach`	`detailed vs concise`	█░░░░░░░░░ 15%	6	20	🟡 COLLECTING
`prompt_compression`	`agent-performance-analyzer`	`verbose vs caveman`	█░░░░░░░░░ 14%	4	14	🟡 COLLECTING
`reasoning_depth`	`daily-fact`	`single_pass vs multi_candidate`	█░░░░░░░░░ 13%	8	30	🟡 COLLECTING
`sub_agent_decomposition`	`smoke-pi`	`single_agent vs parallel_sub_agents`	█░░░░░░░░░ 12%	5	20	🟡 COLLECTING
`semgrep_output_format`	`daily-semgrep-scan`	`bullet_list vs structured_sections vs prose`	░░░░░░░░░░ 7%	7	30	🟡 COLLECTING
`sub_agent_strategy`	`agent-persona-explorer`	`per_scenario vs batch`	░░░░░░░░░░ 7%	3	14	🟡 COLLECTING
`prompt_style`	`issue-arborist`	`concise vs detailed`	░░░░░░░░░░ 6%	4	30	🟡 COLLECTING
`detail_level`	`daily-architecture-diagram`	`brief vs comprehensive`	░░░░░░░░░░ 0%	1	10	🟡 COLLECTING

🔍 Highlighted Experiments

`caveman` · `smoke-copilot`

(not specified)

Progress Toward min_samples=20:

True: ░░░░░░░░░░ 0/20 (0%)
False: ░░░░░░░░░░ 0/20 (0%)

Recommendation: EXTEND — Some variants have not reached min_samples=20

`subagent_model` · `smoke-copilot`

(not specified)

Progress Toward min_samples=20:

small: ██████░░░░ 11/20 (55%)
large: ██████░░░░ 12/20 (60%)

Recommendation: EXTEND — Some variants have not reached min_samples=20

`prompt_style` · `daily-community-attribution`

(not specified)

Progress Toward min_samples=20:

concise: █████░░░░░ 10/20 (50%)
verbose: ████░░░░░░ 9/20 (45%)

Recommendation: EXTEND — Some variants have not reached min_samples=20

`output_format` · `deep-report`

H0: no change in discussion engagement or token cost. H1: executive_brief reduces token usage by ≥20% without reducing engagement; annotated_brief improves actionability.

Progress Toward min_samples=15:

full_briefing: ███░░░░░░░ 5/15 (33%)
executive_brief: ███░░░░░░░ 4/15 (26%)
annotated_brief: ███░░░░░░░ 5/15 (33%)

Recommendation: EXTEND — Some variants have not reached min_samples=15

`prompt_style` · `daily-astrostylelite-markdown-spellcheck`

Concise prompt reduces token consumption ≥20% without degrading fix precision. H0: no difference in fix rate.

Progress Toward min_samples=30:

concise: ███░░░░░░░ 8/30 (26%)
detailed: ████░░░░░░ 11/30 (36%)

Recommendation: EXTEND — Some variants have not reached min_samples=30

📈 Next Steps

All 19 experiments are actively collecting data. None have reached the minimum sample size threshold yet. The experiments will continue running on their configured schedules, and this report will be updated daily with progress.

When an experiment reaches readiness (all variants have ≥ min_samples):

Statistical tests will be performed (Mann-Whitney U, Welch's t-test, or chi-square as configured)
Guardrail metrics will be evaluated
A recommendation will be issued: PROMOTE, EXTEND, or ABANDON

Analysis window: last 30 runs per workflow · Significance threshold: p < 0.05 (two-tailed)
Run: §26356661591

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

proxy.golang.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"

See Network Configuration for more information.

Generated by 🧪 daily-experiment-report · ● son45 3.5M · ◷

expires on May 27, 2026, 8:58 AM UTC

2026-05-25T09:26:52Z

github-actions[bot]
Bot May 25, 2026
Author

This discussion has been marked as outdated by daily-experiment-report.

A newer discussion is available at Discussion #34609.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experiments] Daily Experiment Report — 2026-05-24 #34398

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[experiments] Daily Experiment Report — 2026-05-24 #34398

Uh oh!

github-actions[bot] Bot May 24, 2026

🧪 Daily Experiment Report — 2026-05-24

⚡ Quick Stats

📊 All Experiments (Sorted by Progress)

🔍 Highlighted Experiments

caveman · smoke-copilot

subagent_model · smoke-copilot

prompt_style · daily-community-attribution

output_format · deep-report

prompt_style · daily-astrostylelite-markdown-spellcheck

📈 Next Steps

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 25, 2026 Author

github-actions[bot]
Bot May 24, 2026

`caveman` · `smoke-copilot`

`subagent_model` · `smoke-copilot`

`prompt_style` · `daily-community-attribution`

`output_format` · `deep-report`

`prompt_style` · `daily-astrostylelite-markdown-spellcheck`

github-actions[bot]
Bot May 25, 2026
Author