-
Notifications
You must be signed in to change notification settings - Fork 161
[Enhancement]: PDF Field-to-Data Mapping is Positional-Only — Fragile, Unscalable, and Agency-Incompatible #42
Description
The current Fill.fill_form() method in backend.py maps extracted JSON data to PDF form fields using pure positional indexing — it sorts PDF annotations by their visual position (top-to-bottom, left-to-right) and fills them sequentially with answers_list[i]:
python
answers_list = list(textbox_answers.values())
i = 0
for annot in sorted_annots:
if annot.Subtype == '/Widget' and annot.T:
if i < len(answers_list):
annot.V = f'{answers_list[i]}'
i += 1
This approach has multiple fundamental problems:
Dictionary ordering dependency: Python dict ordering (insertion order) only works if the LLM returns fields in the exact order they appear in the definitions list. If the LLM response order changes, values silently go into the wrong fields.
No field name matching: The PDF field names (from annot.T) are completely ignored. A field named "Phone" in the PDF could receive the value for "Employee Name" — just because of its vertical position on the page.
Multi-form incompatibility: FireForm's core promise is "file everywhere" — filling multiple agency PDF forms from one JSON. But each agency's PDF has different field orders, names, and layouts. A positional approach means the same JSON data would fill completely different fields in different agency PDFs.
Fragile to PDF layout changes: If an agency reorders or adds a single field in their PDF template, every subsequent field gets misaligned.
Proposed Solution
Implement a template mapping configuration system:
Create a per-agency mapping config file (JSON/YAML) that maps standardized JSON keys to specific PDF field names:
yaml
agency_a_mapping.yaml
mappings:
employee_name: "Text Field 1"
phone_number: "Phone"
email: "Email Address"
date: "Date_Field"
Modify Fill.fill_form() to use field name matching instead of positional indexing:
python
for annot in page.Annots:
field_name = annot.T[1:-1]
if field_name in mapping:
json_key = mapping[field_name]
annot.V = str(textbox_answers.get(json_key, ""))
Provide a CLI tool or utility to auto-generate mapping templates by extracting field names from a PDF.
Impact
This is the single most important architectural issue blocking FireForm's core "file everywhere" capability
Without field-name-based mapping, FireForm cannot reliably fill even a single agency's form — let alone multiple
Directly aligns with GSoC deliverable: "configurable PDF template mapping and auto-fill system capable of supporting multiple agency forms"
✅ Acceptance Criteria
How will we know this is finished?
- Feature works in Docker container.
- Documentation updated in
docs/. - JSON output validates against the schema.
📌 Additional Context
Add any other screenshots, links to fire department forms, or research here.