Profile generation pipeline: validation efficiency, real-time progress, and performance benchmarking

## Context

Profile generation on the 2.0.x pipeline (Gemini CLI subprocess → MCP → machine API) takes **5–7 minutes on Raspberry Pi**. The 2.1.0 branch (`feat/2.1.0-milestone-implementation`) has already begun migrating to a Python SDK approach (#214, see `docs/ISSUE_214_SDK_MIGRATION_PLAN.md`), which eliminates the Node.js CLI cold-start and MCP handshake overhead.

However, the current 2.1.0 implementation is **Stage 1 only** — it uses the SDK for text generation and then does server-side JSON extraction + `async_create_profile()`. Stages 2–4 (tool-calling loop, retries, validation handling) are not yet implemented. This issue covers the remaining performance, validation, and UX work needed to make the new pipeline production-ready.

## Current state on `feat/2.1.0-milestone-implementation`

| Aspect | Status | Notes |
|---|---|---|
| SDK text generation | ✅ Done | `get_vision_model().async_generate_content()` replaces `subprocess.run(["gemini", ...])` |
| `SDK_OUTPUT_INSTRUCTIONS` prompt | ✅ Done | Tells model to return JSON block directly instead of calling tools |
| Server-side profile creation | ✅ Done | `async_create_profile(profile_json_check)` via `pyMeticulous` |
| CLI subprocess removed | ✅ Done | `import subprocess` removed from `coffee.py` |
| Tool-calling loop (Stage 2-3) | ❌ Not started | Model generates JSON, server parses—no tool-call loop yet |
| Validation retry loop | ❌ Not started | If JSON extraction fails or schema validation fails, no retry |
| Progress transparency to user | ❌ Not started | User sees same opaque spinner for the entire duration |
| Token optimization | ❌ Not started | Prompt is still ~39.5K chars (see #227) |

## Proposal — Three workstreams

### 1. Validation & retry efficiency

**Problem**: In the CLI pipeline, the model calls `create_profile` via MCP → validation fails → model reads error → retries → often 2-3 round-trips, each adding ~1-2 minutes. The SDK approach currently does zero retries—if the extracted JSON is invalid, it just fails.

**Solution**:
- **Server-side pre-validation**: Before calling `async_create_profile()`, validate the extracted JSON against the OEPF schema locally using the same `ProfileValidator` the MCP server uses (it's pure Python in `apps/mcp-server/meticulous-mcp/src/meticulous_mcp/profile_validator.py`)
- **Structured retry**: If validation fails, feed the specific validation errors back into a focused SDK prompt: *"Your profile JSON had these errors: [errors]. Fix them and return only the corrected JSON."* — this is much cheaper than a full re-generation
- **Bounded retries**: Max 2 validation-fix attempts (configurable). Each retry only sends the JSON + errors, not the full 39K prompt
- **Separate validation from creation**: Validate → only then push to machine. Currently the MCP `create_profile_tool` validates AND saves in one step — if the machine-save fails it's hard to distinguish from validation failure

**Expected impact**: Validation retries drop from ~2 min/attempt (full Gemini CLI + MCP round-trip) to ~10-15 seconds (focused SDK call with small prompt)

### 2. Real-time progress transparency

**Problem**: Users currently see an opaque loading screen with funny rotating messages and a progress bar that takes 7 minutes. They have no idea what's actually happening—is it thinking? Validating? Uploading to machine? Failed and retrying?

**Solution** — phased approach:

#### Phase A: Structured status updates (SSE or WebSocket)

Replace the single `POST /api/analyze_and_profile` → wait → response pattern with a **streaming progress** approach:

```
Frontend sends POST → gets back a generation_id
Frontend opens SSE stream: GET /api/generate/progress/{generation_id}
Server pushes status events as work progresses:

  { "phase": "analyzing",    "message": "Analyzing coffee image..." }
  { "phase": "generating",   "message": "Generating espresso profile..." }
  { "phase": "validating",   "message": "Validating profile schema..." }
  { "phase": "retrying",     "message": "Fixing validation issues (attempt 2/3)..." }
  { "phase": "uploading",    "message": "Uploading profile to machine..." }
  { "phase": "complete",     "message": "Profile created!", "result": {...} }
  { "phase": "failed",       "message": "...", "error": {...} }
```

**Alternative**: Use the existing WebSocket at `/api/ws/live` to multiplex progress events alongside MQTT telemetry.

#### Phase B: Segmented progress bar

Replace the single progress bar with a **multi-segment** progress bar where each segment represents a phase:

```
[Analyze ✓] [Generate ████░░] [Validate] [Upload]
```

Each segment completes as the server pushes phase transitions. If a retry happens, the validate segment resets with an indicator.

#### Phase C: Personality layer

Once the structured updates work, add the tongue-in-cheek barista personality:

| Phase | Plain message | Personality message |
|---|---|---|
| analyzing | "Analyzing coffee image..." | "Checking out these beans... 👀" |
| generating | "Generating profile..." | "Channeling my inner barista genius..." |
| validating | "Validating schema..." | "Making sure the puck science checks out..." |
| retrying | "Fixing validation (attempt 2)..." | "Oops, let me re-tamp that... 🔨" |
| uploading | "Uploading to machine..." | "Sending the recipe to your Meticulous..." |
| complete | "Profile created!" | "Dial-in complete! Time to pull some shots! ☕" |

The existing loading message system (`LOADING_MESSAGES` in `LoadingView.tsx` + `loading.messages` in locales) can be repurposed as an idle animation within each phase.

### 3. Performance benchmarking

**Problem**: We don't know how much the SDK migration actually improves speed, and we need a baseline to measure token optimization (#227) against.

**Requirements**:
- **Benchmark script/test** that generates a profile from a standard input on Pi and measures wall-clock time for each phase
- **Target**: Under 2 minutes total on Raspberry Pi (from 7 minutes in 2.0.x)
- **Breakdown**: Track time spent in each phase separately:
  - Image analysis (SDK call)
  - Profile generation (SDK call)
  - Validation (local)
  - Retry attempts (SDK calls, if any)
  - Machine upload (HTTP to Meticulous)

**How to benchmark on Pi**:
```bash
# Time a profile generation via API
time curl -X POST http://192.168.50.22:3550/api/analyze_and_profile \
  -F "user_prefs=Ethiopian medium roast, fruity notes" \
  -w "\nHTTP %{http_code} in %{time_total}s\n"
```

Server-side timing is already partially instrumented (`generation_start`, `create_start` in the 2.1.0 branch). Extend to cover all phases and log a timing summary.

## Implementation plan

| Step | Description | Depends on | Estimated effort |
|---|---|---|---|
| 1 | Import `ProfileValidator` into FastAPI server (pure Python copy or shared package) | — | Small |
| 2 | Add server-side pre-validation before `async_create_profile` | Step 1 | Small |
| 3 | Implement focused validation-retry loop (max 2 attempts, small prompt) | Step 2 | Medium |
| 4 | Add SSE endpoint for generation progress | — | Medium |
| 5 | Push phase events from the profile generation flow | Step 4 | Medium |
| 6 | Frontend: consume SSE stream, show phase-aware progress | Step 5 | Medium |
| 7 | Frontend: segmented progress bar | Step 6 | Small |
| 8 | Add personality messages (Phase C) | Step 7 | Small |
| 9 | Performance benchmarking harness + Pi measurement | Steps 1-3 | Small |
| 10 | Token optimization integration (from #227) | Step 9 baseline | Medium |

## Acceptance criteria

- [x] Profile generation uses server-side OEPF validation before machine upload
- [x] Failed validation triggers a focused retry (max 2) with specific error feedback
- [x] User sees real-time phase updates during generation (not just a spinner)
- [x] Progress bar reflects actual phases, not just elapsed time
- [x] Personality messages appear in the loading UX
- [x] Profile generation on Raspberry Pi benchmarked; target: ≤2 min (from ~7 min)
- [x] Timing breakdown logged server-side for each phase
- [x] All existing tests pass + new tests for retry logic and progress events

## Related issues

- #214 — Replace Gemini CLI subprocess with Python SDK (Stage 1 done on `feat/2.1.0-milestone-implementation`)
- #227 — Reduce token usage via distilled knowledge mode
- #176 — AI-free operation mode



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile generation pipeline: validation efficiency, real-time progress, and performance benchmarking #229

Context

Current state on `feat/2.1.0-milestone-implementation`

Proposal — Three workstreams

1. Validation & retry efficiency

2. Real-time progress transparency

Phase A: Structured status updates (SSE or WebSocket)

Phase B: Segmented progress bar

Phase C: Personality layer

3. Performance benchmarking

Implementation plan

Acceptance criteria

Related issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Aspect	Status	Notes
SDK text generation	✅ Done	`get_vision_model().async_generate_content()` replaces `subprocess.run(["gemini", ...])`
`SDK_OUTPUT_INSTRUCTIONS` prompt	✅ Done	Tells model to return JSON block directly instead of calling tools
Server-side profile creation	✅ Done	`async_create_profile(profile_json_check)` via `pyMeticulous`
CLI subprocess removed	✅ Done	`import subprocess` removed from `coffee.py`
Tool-calling loop (Stage 2-3)	❌ Not started	Model generates JSON, server parses—no tool-call loop yet
Validation retry loop	❌ Not started	If JSON extraction fails or schema validation fails, no retry
Progress transparency to user	❌ Not started	User sees same opaque spinner for the entire duration
Token optimization	❌ Not started	Prompt is still ~39.5K chars (see #227)

Phase	Plain message	Personality message
analyzing	"Analyzing coffee image..."	"Checking out these beans... 👀"
generating	"Generating profile..."	"Channeling my inner barista genius..."
validating	"Validating schema..."	"Making sure the puck science checks out..."
retrying	"Fixing validation (attempt 2)..."	"Oops, let me re-tamp that... 🔨"
uploading	"Uploading to machine..."	"Sending the recipe to your Meticulous..."
complete	"Profile created!"	"Dial-in complete! Time to pull some shots! ☕"

Step	Description	Depends on	Estimated effort
1	Import `ProfileValidator` into FastAPI server (pure Python copy or shared package)	—	Small
2	Add server-side pre-validation before `async_create_profile`	Step 1	Small
3	Implement focused validation-retry loop (max 2 attempts, small prompt)	Step 2	Medium
4	Add SSE endpoint for generation progress	—	Medium
5	Push phase events from the profile generation flow	Step 4	Medium
6	Frontend: consume SSE stream, show phase-aware progress	Step 5	Medium
7	Frontend: segmented progress bar	Step 6	Small
8	Add personality messages (Phase C)	Step 7	Small
9	Performance benchmarking harness + Pi measurement	Steps 1-3	Small
10	Token optimization integration (from #227)	Step 9 baseline	Medium

Profile generation pipeline: validation efficiency, real-time progress, and performance benchmarking #229

Description

Context

Current state on feat/2.1.0-milestone-implementation

Proposal — Three workstreams

1. Validation & retry efficiency

2. Real-time progress transparency

Phase A: Structured status updates (SSE or WebSocket)

Phase B: Segmented progress bar

Phase C: Personality layer

3. Performance benchmarking

Implementation plan

Acceptance criteria

Related issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Current state on `feat/2.1.0-milestone-implementation`