Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ai-moderator.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 56 additions & 3 deletions .github/workflows/daily-experiment-report.md
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,28 @@ plt.close()
After saving each chart, upload it using the `upload_asset` safe-output tool and store the returned
asset URLs β€” they will be embedded in the discussion body.

## Step 5.5 β€” Build `min_samples` Progress Bars

Add a helper to render per-variant progress toward `min_samples` using fixed-width Unicode bars:

```python
def render_progress_bar(current, target, width=10):
if target <= 0:
return "β–‘" * width + f" {current}/{target} (N/A)"
ratio = max(0.0, min(1.0, current / target))
filled = int(round(ratio * width))
bar = "β–ˆ" * filled + "β–‘" * (width - filled)
return f"{bar} {current}/{target} ({ratio*100:.0f}%)"
```

Use this helper in the per-experiment sample-size table:

```
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘ 15/20 (75%)
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 20/20 (100%)
β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 5/20 (25%)
```

## Step 6 β€” Render ASCII Comparison Table

For each experiment, produce an ASCII table inside a fenced code block:
Expand Down Expand Up @@ -417,6 +439,11 @@ Use h3 (`###`) or lower for all headers in your report. Never use h1 (`#`) or h2

Wrap long sections in `<details><summary><b>Section Name</b></summary>` tags to improve readability and reduce scrolling. Keep critical summaries and key metrics always visible.

Use visual cues consistently:
- Use emojis strategically (for example: `πŸ“Š` charts, `βœ…` success, `⚠️` warnings, `❌` failures)
- Use status badges for readiness (`🟒 READY`, `🟑 COLLECTING`, `πŸ”΄ FAILED`)
- Bold final recommendations and wrap variant names in inline code

Suggested structure:
- Brief summary (always visible)
- Key metrics or highlights (always visible)
Expand All @@ -429,21 +456,42 @@ Suggested structure:
[1–2 sentence executive summary: N experiments analysed across M workflows,
K reached significance (p < 0.05), list recommendations at a glance.]

### ⚑ Quick Stats

| Metric | Value |
|--------|-------|
| Active experiments | N |
| Ready for analysis | R |
| Statistically significant (p < 0.05) | K |
| Recommendations | βœ… PROMOTE: P Β· 🟑 EXTEND: E Β· ❌ ABANDON: A |

---

#### `<experiment_name>` Β· `<workflow_basename>`

> **Status**: 🟒 READY / 🟑 COLLECTING / πŸ”΄ FAILED
> **Variants**: `<v1>` vs `<v2>` Β· **Window**: last 30 runs Β· **Analysed**: N runs with artifacts
> **min_samples**: <min_samples> per variant
> **min_samples**: <min_samples> per variant Β· **Significance**: p = <p-value>
Comment on lines +472 to +474

<hypothesis if declared>

![Success Rate Chart](<ASSET_URL_success_rate>)
<details>
<summary><b>πŸ“ˆ View Detailed Statistics</b></summary>

**Sample Sizes & Progress**
| Variant | Runs | Progress |
|---------|------|----------|
| `<control>` | n | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘ n/<min_samples> (##%) |
| `<variant_B>` | n | β–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘ n/<min_samples> (##%) |

![Duration Chart](<ASSET_URL_duration>)
![πŸ“Š Success Rate Chart](<ASSET_URL_success_rate>)

![⏱️ Duration Chart](<ASSET_URL_duration>)

<ASCII comparison table from Step 6 inside a ``` code block>

</details>

**Recommendation: PROMOTE / EXTEND / ABANDON** β€” <one sentence rationale>

---
Expand All @@ -452,10 +500,15 @@ Suggested structure:

### πŸ“Š Summary

<details>
<summary><b>View Full Experiments Table</b></summary>

| Experiment | Workflow | Control | Best variant | p-value | Guardrails | Recommendation |
|-----------|---------|---------|-------------|---------|-----------|----------------|
| ... | ... | ... | ... | ... | PASS/FAIL | ... |

</details>

> Analysis window: last 30 runs per workflow Β· Significance threshold: p < 0.05 (two-tailed)
> Run: [${{ github.run_id }}](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})
```
Expand Down
Loading