Clinical Operations & Records Automation
CORA turns clinical PDFs or images into structured JSON. You upload a file; a FastAPI backend classifies pages (text vs scan), chunks them, runs GPT‑4o with rules from a YAML skill file, then merges everything into a single schema (document_metadata, patients, visits, labs, imaging, transition-of-care flags, and more). A Next.js UI can drive uploads, show results, and offer grounded patient chat. Jobs are async so big files are not tied to one short HTTP request.
Clinical wording lives in backend/cora_skill.yaml, not hard-coded in Python—change the YAML to tune extraction without rewriting the pipeline. At API startup the app loads that skill from disk (CORA_SKILL_PATH, or backend/cora_skill.yaml when you run from backend/); there is no separate skill upload on each ingest request.
| What | Where |
|---|---|
| REST API + pipeline source | backend/main.py (HTTP), backend/pipeline.py (orchestration), classifier.py, chunker.py, extractor.py, schemas.py, job_store.py, patient_store.py, tests in backend/tests/ |
| Skill file (complete) + how the loader treats it | backend/cora_skill.yaml — full rules, loaded once at startup (see above). backend/skill_loader.py documents what Python reads (boundary_patterns, chunk size) vs what is passed verbatim to the model. More design notes in CORA-Plan.md |
| Example inputs → example API JSON | example-runs/ — paired input_* files (PDF and/or PNG/JPEG/etc.) with output_*.json. Each JSON is the body of GET /v1/jobs/{job_id}/result, not the UI’s reshaped view. document_metadata.filename in JSON may show the server-stored name (e.g. UUID); repo filenames are renamed for clarity. Use synthetic / de-identified data only |
| Model / system card | MODEL_CARD.md — capabilities, limits, failure modes, privacy |
Optional: GET /v1/jobs/{job_id}/trace for provenance-style fields; interactive API reference at http://127.0.0.1:8000/docs when the backend is running.
Needs: Python 3.12+, Node 20+, an OpenAI key with gpt‑4o (vision for scanned pages). If your venv path has spaces, PyMuPDF can fail to build—use a venv under e.g. ~/venvs/cora.
Backend (from backend/):
pip install -r requirements.txt
export OPENAI_API_KEY="sk-..."
# optional: export CORA_SKILL_PATH=/absolute/path/to/skill.yaml (default: backend/cora_skill.yaml when cwd is backend/)
uvicorn main:app --reload --port 8000Frontend (from frontend/): copy .env.example to frontend/.env.local, set NEXT_PUBLIC_CORA_API_URL=http://127.0.0.1:8000, then npm install and npm run dev.
Smoke test: POST /v1/documents/ingest with a supported file (see below) → poll GET /v1/jobs/{job_id} until complete → GET /v1/jobs/{job_id}/result.
curl -X POST http://127.0.0.1:8000/v1/documents/ingest \
-F "file=@example-runs/input_1.pdf"Tests: cd backend && pytest · cd frontend && npm run test
- Ingest: PDF or PNG, JPEG, GIF, or WebP (validated magic bytes). Scans inside PDFs and image uploads use the vision-capable path where needed. DICOM is not supported. The skill file is not sent per request—it is loaded at startup (
CORA_SKILL_PATHor defaultcora_skill.yaml). - Flow: Upload → classify → chunk with regex boundaries from the skill file → extract per chunk with rolling summary →
_build_final_result→ SQLite +backend/results/{job_id}.json. - Risk tier (
HIGH/MEDIUM/LOWon each patient): a simple rules layer inpipeline.pyfrom discharge/follow-up/med-rec windows and flags—not a clinical score. SeeMODEL_CARD.mdfor intent.
| Method | Path |
|---|---|
| POST | /v1/documents/ingest |
| GET | /v1/jobs/{job_id}, /v1/jobs/{job_id}/result, /v1/jobs/{job_id}/trace |
| GET | /v1/patients, /v1/patients/{id}, /v1/patients/{id}/timeline |
| POST | /v1/patients/chat |
| GET | /v1/metrics/bootstrap, /v1/metrics/dashboard |
Details and try-it: Swagger at /docs.
backend/ FastAPI app, pipeline, skill YAML, SQLite (runtime), tests
frontend/ Next.js app
example-runs/ sample PDFs/images + saved `…/result` JSON
CORA-Plan.md deeper architecture / contracts
MODEL_CARD.md limitations & privacy
Do not commit .env or real PHI. See .gitignore.
Ramblin' Hacks 2026 · Georgia Tech College of Computing · Healthcare Track — Elevance Health Challenge
- Rohit Prakash Gogi — Georgia Tech, Computer Science (AI & Information Networks)
- Keerthi Veeramachaneni — Georgia Tech, Computer Science (AI & Theory)