You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a form creator (Maya), in order to digitize a paper form without technical skills, I want to upload a PDF form and review the structured specs the system extracted from it
Preconditions:
Maya is authenticated (Slice 1)
PDF form available for upload
Acceptance Criteria:
Upload page accepts PDF files
System extracts structure from PDF and produces a DataCollectionSpec
System generates a default FormSpec based on the extracted DataCollectionSpec
Both specs are displayed in the catalog as browsable, reviewable content
Maya can see what fields were extracted, their types, grouping, and conditions
Maya can see the proposed form layout (pages, sections, delivery modes)
Extracted specs are persisted as a FormProject in git
Extraction errors or low-confidence fields are flagged for review
Form projects are stored as bare git repos with version history
Project detail page shows version history with commit-level snapshots
Projects are publicly viewable at user-scoped URLs (/:owner/:slug)
Mutations (delete, re-extract) are restricted to project owners via service-layer permission checks
Authenticated users can fork projects they do not own
User profile pages list a user's projects at /:owner
Git repository browsing (tree, blob, commits) available at GitHub-style URLs
Read-only git clone served over HTTP
Home page shows dashboard for authenticated users, landing page for anonymous visitors
Success Metrics:
Extraction accuracy: percentage of fields correctly identified vs. source PDF
Time from upload to reviewable spec < 30 seconds
Establish baseline evaluation metrics for LLM extraction quality
Notes:
First LLM integration point — uses Claude API (Opus/Sonnet baseline)
LLM service uses strategy pattern: PdfExtractor interface with ApiPdfExtractor implementation
Evaluation: compare extracted spec against manually-created ground truth for test PDFs
Future experiments: alternative models, prompting strategies, chunking approaches
Form projects stored as bare git repos at data/repos/<slug>.git
ProjectService layer enforces ownership permissions; route handlers are thin wrappers
GitHub-style URL structure: /:owner/:slug, /:owner/:slug/tree/:ref/*, /:owner/:slug/settings, etc.
Definition of Done:
Acceptance criteria met
Threat model updated -- any new trust boundaries, data flows, or attack surfaces are reflected in catalog/architecture/threat-model.md
Technical documentation updated -- architecture docs and decisions are current
LLM extraction service has interface abstraction (swappable implementations)
At least one test PDF with ground truth for evaluation
User Story:
As a form creator (Maya), in order to digitize a paper form without technical skills, I want to upload a PDF form and review the structured specs the system extracted from it
Preconditions:
Acceptance Criteria:
Success Metrics:
Notes:
PdfExtractorinterface withApiPdfExtractorimplementationdata/repos/<slug>.git/:owner/:slug,/:owner/:slug/tree/:ref/*,/:owner/:slug/settings, etc.Definition of Done:
catalog/architecture/threat-model.md