Context
Profile generation on the 2.0.x pipeline (Gemini CLI subprocess → MCP → machine API) takes 5–7 minutes on Raspberry Pi. The 2.1.0 branch (feat/2.1.0-milestone-implementation) has already begun migrating to a Python SDK approach (#214, see docs/ISSUE_214_SDK_MIGRATION_PLAN.md), which eliminates the Node.js CLI cold-start and MCP handshake overhead.
However, the current 2.1.0 implementation is Stage 1 only — it uses the SDK for text generation and then does server-side JSON extraction + async_create_profile(). Stages 2–4 (tool-calling loop, retries, validation handling) are not yet implemented. This issue covers the remaining performance, validation, and UX work needed to make the new pipeline production-ready.
Current state on feat/2.1.0-milestone-implementation
| Aspect |
Status |
Notes |
| SDK text generation |
✅ Done |
get_vision_model().async_generate_content() replaces subprocess.run(["gemini", ...]) |
SDK_OUTPUT_INSTRUCTIONS prompt |
✅ Done |
Tells model to return JSON block directly instead of calling tools |
| Server-side profile creation |
✅ Done |
async_create_profile(profile_json_check) via pyMeticulous |
| CLI subprocess removed |
✅ Done |
import subprocess removed from coffee.py |
| Tool-calling loop (Stage 2-3) |
❌ Not started |
Model generates JSON, server parses—no tool-call loop yet |
| Validation retry loop |
❌ Not started |
If JSON extraction fails or schema validation fails, no retry |
| Progress transparency to user |
❌ Not started |
User sees same opaque spinner for the entire duration |
| Token optimization |
❌ Not started |
Prompt is still ~39.5K chars (see #227) |
Proposal — Three workstreams
1. Validation & retry efficiency
Problem: In the CLI pipeline, the model calls create_profile via MCP → validation fails → model reads error → retries → often 2-3 round-trips, each adding ~1-2 minutes. The SDK approach currently does zero retries—if the extracted JSON is invalid, it just fails.
Solution:
- Server-side pre-validation: Before calling
async_create_profile(), validate the extracted JSON against the OEPF schema locally using the same ProfileValidator the MCP server uses (it's pure Python in apps/mcp-server/meticulous-mcp/src/meticulous_mcp/profile_validator.py)
- Structured retry: If validation fails, feed the specific validation errors back into a focused SDK prompt: "Your profile JSON had these errors: [errors]. Fix them and return only the corrected JSON." — this is much cheaper than a full re-generation
- Bounded retries: Max 2 validation-fix attempts (configurable). Each retry only sends the JSON + errors, not the full 39K prompt
- Separate validation from creation: Validate → only then push to machine. Currently the MCP
create_profile_tool validates AND saves in one step — if the machine-save fails it's hard to distinguish from validation failure
Expected impact: Validation retries drop from ~2 min/attempt (full Gemini CLI + MCP round-trip) to ~10-15 seconds (focused SDK call with small prompt)
2. Real-time progress transparency
Problem: Users currently see an opaque loading screen with funny rotating messages and a progress bar that takes 7 minutes. They have no idea what's actually happening—is it thinking? Validating? Uploading to machine? Failed and retrying?
Solution — phased approach:
Phase A: Structured status updates (SSE or WebSocket)
Replace the single POST /api/analyze_and_profile → wait → response pattern with a streaming progress approach:
Frontend sends POST → gets back a generation_id
Frontend opens SSE stream: GET /api/generate/progress/{generation_id}
Server pushes status events as work progresses:
{ "phase": "analyzing", "message": "Analyzing coffee image..." }
{ "phase": "generating", "message": "Generating espresso profile..." }
{ "phase": "validating", "message": "Validating profile schema..." }
{ "phase": "retrying", "message": "Fixing validation issues (attempt 2/3)..." }
{ "phase": "uploading", "message": "Uploading profile to machine..." }
{ "phase": "complete", "message": "Profile created!", "result": {...} }
{ "phase": "failed", "message": "...", "error": {...} }
Alternative: Use the existing WebSocket at /api/ws/live to multiplex progress events alongside MQTT telemetry.
Phase B: Segmented progress bar
Replace the single progress bar with a multi-segment progress bar where each segment represents a phase:
[Analyze ✓] [Generate ████░░] [Validate] [Upload]
Each segment completes as the server pushes phase transitions. If a retry happens, the validate segment resets with an indicator.
Phase C: Personality layer
Once the structured updates work, add the tongue-in-cheek barista personality:
| Phase |
Plain message |
Personality message |
| analyzing |
"Analyzing coffee image..." |
"Checking out these beans... 👀" |
| generating |
"Generating profile..." |
"Channeling my inner barista genius..." |
| validating |
"Validating schema..." |
"Making sure the puck science checks out..." |
| retrying |
"Fixing validation (attempt 2)..." |
"Oops, let me re-tamp that... 🔨" |
| uploading |
"Uploading to machine..." |
"Sending the recipe to your Meticulous..." |
| complete |
"Profile created!" |
"Dial-in complete! Time to pull some shots! ☕" |
The existing loading message system (LOADING_MESSAGES in LoadingView.tsx + loading.messages in locales) can be repurposed as an idle animation within each phase.
3. Performance benchmarking
Problem: We don't know how much the SDK migration actually improves speed, and we need a baseline to measure token optimization (#227) against.
Requirements:
- Benchmark script/test that generates a profile from a standard input on Pi and measures wall-clock time for each phase
- Target: Under 2 minutes total on Raspberry Pi (from 7 minutes in 2.0.x)
- Breakdown: Track time spent in each phase separately:
- Image analysis (SDK call)
- Profile generation (SDK call)
- Validation (local)
- Retry attempts (SDK calls, if any)
- Machine upload (HTTP to Meticulous)
How to benchmark on Pi:
# Time a profile generation via API
time curl -X POST http://192.168.50.22:3550/api/analyze_and_profile \
-F "user_prefs=Ethiopian medium roast, fruity notes" \
-w "\nHTTP %{http_code} in %{time_total}s\n"
Server-side timing is already partially instrumented (generation_start, create_start in the 2.1.0 branch). Extend to cover all phases and log a timing summary.
Implementation plan
| Step |
Description |
Depends on |
Estimated effort |
| 1 |
Import ProfileValidator into FastAPI server (pure Python copy or shared package) |
— |
Small |
| 2 |
Add server-side pre-validation before async_create_profile |
Step 1 |
Small |
| 3 |
Implement focused validation-retry loop (max 2 attempts, small prompt) |
Step 2 |
Medium |
| 4 |
Add SSE endpoint for generation progress |
— |
Medium |
| 5 |
Push phase events from the profile generation flow |
Step 4 |
Medium |
| 6 |
Frontend: consume SSE stream, show phase-aware progress |
Step 5 |
Medium |
| 7 |
Frontend: segmented progress bar |
Step 6 |
Small |
| 8 |
Add personality messages (Phase C) |
Step 7 |
Small |
| 9 |
Performance benchmarking harness + Pi measurement |
Steps 1-3 |
Small |
| 10 |
Token optimization integration (from #227) |
Step 9 baseline |
Medium |
Acceptance criteria
Related issues
Context
Profile generation on the 2.0.x pipeline (Gemini CLI subprocess → MCP → machine API) takes 5–7 minutes on Raspberry Pi. The 2.1.0 branch (
feat/2.1.0-milestone-implementation) has already begun migrating to a Python SDK approach (#214, seedocs/ISSUE_214_SDK_MIGRATION_PLAN.md), which eliminates the Node.js CLI cold-start and MCP handshake overhead.However, the current 2.1.0 implementation is Stage 1 only — it uses the SDK for text generation and then does server-side JSON extraction +
async_create_profile(). Stages 2–4 (tool-calling loop, retries, validation handling) are not yet implemented. This issue covers the remaining performance, validation, and UX work needed to make the new pipeline production-ready.Current state on
feat/2.1.0-milestone-implementationget_vision_model().async_generate_content()replacessubprocess.run(["gemini", ...])SDK_OUTPUT_INSTRUCTIONSpromptasync_create_profile(profile_json_check)viapyMeticulousimport subprocessremoved fromcoffee.pyProposal — Three workstreams
1. Validation & retry efficiency
Problem: In the CLI pipeline, the model calls
create_profilevia MCP → validation fails → model reads error → retries → often 2-3 round-trips, each adding ~1-2 minutes. The SDK approach currently does zero retries—if the extracted JSON is invalid, it just fails.Solution:
async_create_profile(), validate the extracted JSON against the OEPF schema locally using the sameProfileValidatorthe MCP server uses (it's pure Python inapps/mcp-server/meticulous-mcp/src/meticulous_mcp/profile_validator.py)create_profile_toolvalidates AND saves in one step — if the machine-save fails it's hard to distinguish from validation failureExpected impact: Validation retries drop from ~2 min/attempt (full Gemini CLI + MCP round-trip) to ~10-15 seconds (focused SDK call with small prompt)
2. Real-time progress transparency
Problem: Users currently see an opaque loading screen with funny rotating messages and a progress bar that takes 7 minutes. They have no idea what's actually happening—is it thinking? Validating? Uploading to machine? Failed and retrying?
Solution — phased approach:
Phase A: Structured status updates (SSE or WebSocket)
Replace the single
POST /api/analyze_and_profile→ wait → response pattern with a streaming progress approach:Alternative: Use the existing WebSocket at
/api/ws/liveto multiplex progress events alongside MQTT telemetry.Phase B: Segmented progress bar
Replace the single progress bar with a multi-segment progress bar where each segment represents a phase:
Each segment completes as the server pushes phase transitions. If a retry happens, the validate segment resets with an indicator.
Phase C: Personality layer
Once the structured updates work, add the tongue-in-cheek barista personality:
The existing loading message system (
LOADING_MESSAGESinLoadingView.tsx+loading.messagesin locales) can be repurposed as an idle animation within each phase.3. Performance benchmarking
Problem: We don't know how much the SDK migration actually improves speed, and we need a baseline to measure token optimization (#227) against.
Requirements:
How to benchmark on Pi:
Server-side timing is already partially instrumented (
generation_start,create_startin the 2.1.0 branch). Extend to cover all phases and log a timing summary.Implementation plan
ProfileValidatorinto FastAPI server (pure Python copy or shared package)async_create_profileAcceptance criteria
Related issues
feat/2.1.0-milestone-implementation)