-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Closed
Copy link
Labels
Description
Architecture Reference
Stage 2: Semantic Logic Parsing β 'LLMs transform raw video transcripts and visual cues into structured, actionable instructions.'
Current State
The pipeline extracts basic metadata (title, technologies, features list) but does NOT produce structured, actionable instructions. The extracted_info dict contains:
title(often defaults to generic)technologies(list of strings)features(list of strings)tutorial_steps(raw text, not structured)
Gap
The architecture calls for 'structured, actionable instructions' β step-by-step build plans that Stage 3 (code generation) can follow deterministically. Currently Stage 3 gets loose text and falls back to templates.
Implementation
- Add a structured output schema for Gemini's video analysis response (JSON mode)
- Extract ordered build steps with dependencies: 'Step 1: Create React app β Step 2: Install tailwind β Step 3: Create Header component...'
- Each step should include: action, file path, code snippet (if visible), dependencies
- Output a
BuildPlanJSON artifact that Stage 3 consumes
Acceptance Criteria
- Video analysis returns a structured BuildPlan with ordered steps
- Each step has: action type, target file, code content, prerequisites
- Code generator consumes BuildPlan instead of raw text
Reactions are currently unavailable