Skip to content

Profile generation pipeline: validation efficiency, real-time progress, and performance benchmarking #229

@hessius

Description

@hessius

Context

Profile generation on the 2.0.x pipeline (Gemini CLI subprocess → MCP → machine API) takes 5–7 minutes on Raspberry Pi. The 2.1.0 branch (feat/2.1.0-milestone-implementation) has already begun migrating to a Python SDK approach (#214, see docs/ISSUE_214_SDK_MIGRATION_PLAN.md), which eliminates the Node.js CLI cold-start and MCP handshake overhead.

However, the current 2.1.0 implementation is Stage 1 only — it uses the SDK for text generation and then does server-side JSON extraction + async_create_profile(). Stages 2–4 (tool-calling loop, retries, validation handling) are not yet implemented. This issue covers the remaining performance, validation, and UX work needed to make the new pipeline production-ready.

Current state on feat/2.1.0-milestone-implementation

Aspect Status Notes
SDK text generation ✅ Done get_vision_model().async_generate_content() replaces subprocess.run(["gemini", ...])
SDK_OUTPUT_INSTRUCTIONS prompt ✅ Done Tells model to return JSON block directly instead of calling tools
Server-side profile creation ✅ Done async_create_profile(profile_json_check) via pyMeticulous
CLI subprocess removed ✅ Done import subprocess removed from coffee.py
Tool-calling loop (Stage 2-3) ❌ Not started Model generates JSON, server parses—no tool-call loop yet
Validation retry loop ❌ Not started If JSON extraction fails or schema validation fails, no retry
Progress transparency to user ❌ Not started User sees same opaque spinner for the entire duration
Token optimization ❌ Not started Prompt is still ~39.5K chars (see #227)

Proposal — Three workstreams

1. Validation & retry efficiency

Problem: In the CLI pipeline, the model calls create_profile via MCP → validation fails → model reads error → retries → often 2-3 round-trips, each adding ~1-2 minutes. The SDK approach currently does zero retries—if the extracted JSON is invalid, it just fails.

Solution:

  • Server-side pre-validation: Before calling async_create_profile(), validate the extracted JSON against the OEPF schema locally using the same ProfileValidator the MCP server uses (it's pure Python in apps/mcp-server/meticulous-mcp/src/meticulous_mcp/profile_validator.py)
  • Structured retry: If validation fails, feed the specific validation errors back into a focused SDK prompt: "Your profile JSON had these errors: [errors]. Fix them and return only the corrected JSON." — this is much cheaper than a full re-generation
  • Bounded retries: Max 2 validation-fix attempts (configurable). Each retry only sends the JSON + errors, not the full 39K prompt
  • Separate validation from creation: Validate → only then push to machine. Currently the MCP create_profile_tool validates AND saves in one step — if the machine-save fails it's hard to distinguish from validation failure

Expected impact: Validation retries drop from ~2 min/attempt (full Gemini CLI + MCP round-trip) to ~10-15 seconds (focused SDK call with small prompt)

2. Real-time progress transparency

Problem: Users currently see an opaque loading screen with funny rotating messages and a progress bar that takes 7 minutes. They have no idea what's actually happening—is it thinking? Validating? Uploading to machine? Failed and retrying?

Solution — phased approach:

Phase A: Structured status updates (SSE or WebSocket)

Replace the single POST /api/analyze_and_profile → wait → response pattern with a streaming progress approach:

Frontend sends POST → gets back a generation_id
Frontend opens SSE stream: GET /api/generate/progress/{generation_id}
Server pushes status events as work progresses:

  { "phase": "analyzing",    "message": "Analyzing coffee image..." }
  { "phase": "generating",   "message": "Generating espresso profile..." }
  { "phase": "validating",   "message": "Validating profile schema..." }
  { "phase": "retrying",     "message": "Fixing validation issues (attempt 2/3)..." }
  { "phase": "uploading",    "message": "Uploading profile to machine..." }
  { "phase": "complete",     "message": "Profile created!", "result": {...} }
  { "phase": "failed",       "message": "...", "error": {...} }

Alternative: Use the existing WebSocket at /api/ws/live to multiplex progress events alongside MQTT telemetry.

Phase B: Segmented progress bar

Replace the single progress bar with a multi-segment progress bar where each segment represents a phase:

[Analyze ✓] [Generate ████░░] [Validate] [Upload]

Each segment completes as the server pushes phase transitions. If a retry happens, the validate segment resets with an indicator.

Phase C: Personality layer

Once the structured updates work, add the tongue-in-cheek barista personality:

Phase Plain message Personality message
analyzing "Analyzing coffee image..." "Checking out these beans... 👀"
generating "Generating profile..." "Channeling my inner barista genius..."
validating "Validating schema..." "Making sure the puck science checks out..."
retrying "Fixing validation (attempt 2)..." "Oops, let me re-tamp that... 🔨"
uploading "Uploading to machine..." "Sending the recipe to your Meticulous..."
complete "Profile created!" "Dial-in complete! Time to pull some shots! ☕"

The existing loading message system (LOADING_MESSAGES in LoadingView.tsx + loading.messages in locales) can be repurposed as an idle animation within each phase.

3. Performance benchmarking

Problem: We don't know how much the SDK migration actually improves speed, and we need a baseline to measure token optimization (#227) against.

Requirements:

  • Benchmark script/test that generates a profile from a standard input on Pi and measures wall-clock time for each phase
  • Target: Under 2 minutes total on Raspberry Pi (from 7 minutes in 2.0.x)
  • Breakdown: Track time spent in each phase separately:
    • Image analysis (SDK call)
    • Profile generation (SDK call)
    • Validation (local)
    • Retry attempts (SDK calls, if any)
    • Machine upload (HTTP to Meticulous)

How to benchmark on Pi:

# Time a profile generation via API
time curl -X POST http://192.168.50.22:3550/api/analyze_and_profile \
  -F "user_prefs=Ethiopian medium roast, fruity notes" \
  -w "\nHTTP %{http_code} in %{time_total}s\n"

Server-side timing is already partially instrumented (generation_start, create_start in the 2.1.0 branch). Extend to cover all phases and log a timing summary.

Implementation plan

Step Description Depends on Estimated effort
1 Import ProfileValidator into FastAPI server (pure Python copy or shared package) Small
2 Add server-side pre-validation before async_create_profile Step 1 Small
3 Implement focused validation-retry loop (max 2 attempts, small prompt) Step 2 Medium
4 Add SSE endpoint for generation progress Medium
5 Push phase events from the profile generation flow Step 4 Medium
6 Frontend: consume SSE stream, show phase-aware progress Step 5 Medium
7 Frontend: segmented progress bar Step 6 Small
8 Add personality messages (Phase C) Step 7 Small
9 Performance benchmarking harness + Pi measurement Steps 1-3 Small
10 Token optimization integration (from #227) Step 9 baseline Medium

Acceptance criteria

  • Profile generation uses server-side OEPF validation before machine upload
  • Failed validation triggers a focused retry (max 2) with specific error feedback
  • User sees real-time phase updates during generation (not just a spinner)
  • Progress bar reflects actual phases, not just elapsed time
  • Personality messages appear in the loading UX
  • Profile generation on Raspberry Pi benchmarked; target: ≤2 min (from ~7 min)
  • Timing breakdown logged server-side for each phase
  • All existing tests pass + new tests for retry logic and progress events

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions