CORA

Clinical Operations & Records Automation

CORA turns clinical PDFs or images into structured JSON. You upload a file; a FastAPI backend classifies pages (text vs scan), chunks them, runs GPT‑4o with rules from a YAML skill file, then merges everything into a single schema (document_metadata, patients, visits, labs, imaging, transition-of-care flags, and more). A Next.js UI can drive uploads, show results, and offer grounded patient chat. Jobs are async so big files are not tied to one short HTTP request.

Clinical wording lives in backend/cora_skill.yaml, not hard-coded in Python—change the YAML to tune extraction without rewriting the pipeline. At API startup the app loads that skill from disk (CORA_SKILL_PATH, or backend/cora_skill.yaml when you run from backend/); there is no separate skill upload on each ingest request.

Deliverables (where to find things)

What	Where
REST API + pipeline source	`backend/main.py` (HTTP), `backend/pipeline.py` (orchestration), `classifier.py`, `chunker.py`, `extractor.py`, `schemas.py`, `job_store.py`, `patient_store.py`, tests in `backend/tests/`
Skill file (complete) + how the loader treats it	`backend/cora_skill.yaml` — full rules, loaded once at startup (see above). `backend/skill_loader.py` documents what Python reads (`boundary_patterns`, chunk size) vs what is passed verbatim to the model. More design notes in `CORA-Plan.md`
Example inputs → example API JSON	`example-runs/` — paired *`input_` files (PDF and/or PNG/JPEG/etc.) with `output_.json`. Each JSON is the body of `GET /v1/jobs/{job_id}/result`, not the UI’s reshaped view. `document_metadata.filename` in JSON may show the server-stored name (e.g. UUID); repo filenames are renamed for clarity. Use synthetic / de-identified data only*
Model / system card	`MODEL_CARD.md` — capabilities, limits, failure modes, privacy

Optional: GET /v1/jobs/{job_id}/trace for provenance-style fields; interactive API reference at http://127.0.0.1:8000/docs when the backend is running.

Run it

Needs: Python 3.12+, Node 20+, an OpenAI key with gpt‑4o (vision for scanned pages). If your venv path has spaces, PyMuPDF can fail to build—use a venv under e.g. ~/venvs/cora.

Backend (from backend/):

pip install -r requirements.txt
export OPENAI_API_KEY="sk-..."
# optional: export CORA_SKILL_PATH=/absolute/path/to/skill.yaml  (default: backend/cora_skill.yaml when cwd is backend/)
uvicorn main:app --reload --port 8000

Frontend (from frontend/): copy .env.example to frontend/.env.local, set NEXT_PUBLIC_CORA_API_URL=http://127.0.0.1:8000, then npm install and npm run dev.

Smoke test: POST /v1/documents/ingest with a supported file (see below) → poll GET /v1/jobs/{job_id} until complete → GET /v1/jobs/{job_id}/result.

curl -X POST http://127.0.0.1:8000/v1/documents/ingest \
  -F "file=@example-runs/input_1.pdf"

Tests: cd backend && pytest · cd frontend && npm run test

Behavior notes (short)

Ingest: PDF or PNG, JPEG, GIF, or WebP (validated magic bytes). Scans inside PDFs and image uploads use the vision-capable path where needed. DICOM is not supported. The skill file is not sent per request—it is loaded at startup (CORA_SKILL_PATH or default cora_skill.yaml).
Flow: Upload → classify → chunk with regex boundaries from the skill file → extract per chunk with rolling summary → _build_final_result → SQLite + backend/results/{job_id}.json.
Risk tier (HIGH / MEDIUM / LOW on each patient): a simple rules layer in pipeline.py from discharge/follow-up/med-rec windows and flags—not a clinical score. See MODEL_CARD.md for intent.

API (high level)

Method	Path
POST	`/v1/documents/ingest`
GET	`/v1/jobs/{job_id}`, `/v1/jobs/{job_id}/result`, `/v1/jobs/{job_id}/trace`
GET	`/v1/patients`, `/v1/patients/{id}`, `/v1/patients/{id}/timeline`
POST	`/v1/patients/chat`
GET	`/v1/metrics/bootstrap`, `/v1/metrics/dashboard`

Details and try-it: Swagger at /docs.

Repo layout

backend/     FastAPI app, pipeline, skill YAML, SQLite (runtime), tests
frontend/    Next.js app
example-runs/  sample PDFs/images + saved `…/result` JSON
CORA-Plan.md   deeper architecture / contracts
MODEL_CARD.md  limitations & privacy

Do not commit .env or real PHI. See .gitignore.

Built at

Ramblin' Hacks 2026 · Georgia Tech College of Computing · Healthcare Track — Elevance Health Challenge

Team

Rohit Prakash Gogi — Georgia Tech, Computer Science (AI & Information Networks)
Keerthi Veeramachaneni — Georgia Tech, Computer Science (AI & Theory)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CORA

Deliverables (where to find things)

Run it

Behavior notes (short)

API (high level)

Repo layout

Built at

Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
backend		backend
example-runs		example-runs
files		files
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
CORA-Plan.md		CORA-Plan.md
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

CORA

Deliverables (where to find things)

Run it

Behavior notes (short)

API (high level)

Repo layout

Built at

Team

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages