Skip to content

rohitgogi/cora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CORA

Clinical Operations & Records Automation

CORA turns clinical PDFs or images into structured JSON. You upload a file; a FastAPI backend classifies pages (text vs scan), chunks them, runs GPT‑4o with rules from a YAML skill file, then merges everything into a single schema (document_metadata, patients, visits, labs, imaging, transition-of-care flags, and more). A Next.js UI can drive uploads, show results, and offer grounded patient chat. Jobs are async so big files are not tied to one short HTTP request.

Clinical wording lives in backend/cora_skill.yaml, not hard-coded in Python—change the YAML to tune extraction without rewriting the pipeline. At API startup the app loads that skill from disk (CORA_SKILL_PATH, or backend/cora_skill.yaml when you run from backend/); there is no separate skill upload on each ingest request.


Deliverables (where to find things)

What Where
REST API + pipeline source backend/main.py (HTTP), backend/pipeline.py (orchestration), classifier.py, chunker.py, extractor.py, schemas.py, job_store.py, patient_store.py, tests in backend/tests/
Skill file (complete) + how the loader treats it backend/cora_skill.yaml — full rules, loaded once at startup (see above). backend/skill_loader.py documents what Python reads (boundary_patterns, chunk size) vs what is passed verbatim to the model. More design notes in CORA-Plan.md
Example inputs → example API JSON example-runs/ — paired input_* files (PDF and/or PNG/JPEG/etc.) with output_*.json. Each JSON is the body of GET /v1/jobs/{job_id}/result, not the UI’s reshaped view. document_metadata.filename in JSON may show the server-stored name (e.g. UUID); repo filenames are renamed for clarity. Use synthetic / de-identified data only
Model / system card MODEL_CARD.md — capabilities, limits, failure modes, privacy

Optional: GET /v1/jobs/{job_id}/trace for provenance-style fields; interactive API reference at http://127.0.0.1:8000/docs when the backend is running.


Run it

Needs: Python 3.12+, Node 20+, an OpenAI key with gpt‑4o (vision for scanned pages). If your venv path has spaces, PyMuPDF can fail to build—use a venv under e.g. ~/venvs/cora.

Backend (from backend/):

pip install -r requirements.txt
export OPENAI_API_KEY="sk-..."
# optional: export CORA_SKILL_PATH=/absolute/path/to/skill.yaml  (default: backend/cora_skill.yaml when cwd is backend/)
uvicorn main:app --reload --port 8000

Frontend (from frontend/): copy .env.example to frontend/.env.local, set NEXT_PUBLIC_CORA_API_URL=http://127.0.0.1:8000, then npm install and npm run dev.

Smoke test: POST /v1/documents/ingest with a supported file (see below) → poll GET /v1/jobs/{job_id} until completeGET /v1/jobs/{job_id}/result.

curl -X POST http://127.0.0.1:8000/v1/documents/ingest \
  -F "file=@example-runs/input_1.pdf"

Tests: cd backend && pytest · cd frontend && npm run test


Behavior notes (short)

  • Ingest: PDF or PNG, JPEG, GIF, or WebP (validated magic bytes). Scans inside PDFs and image uploads use the vision-capable path where needed. DICOM is not supported. The skill file is not sent per request—it is loaded at startup (CORA_SKILL_PATH or default cora_skill.yaml).
  • Flow: Upload → classify → chunk with regex boundaries from the skill file → extract per chunk with rolling summary → _build_final_result → SQLite + backend/results/{job_id}.json.
  • Risk tier (HIGH / MEDIUM / LOW on each patient): a simple rules layer in pipeline.py from discharge/follow-up/med-rec windows and flags—not a clinical score. See MODEL_CARD.md for intent.

API (high level)

Method Path
POST /v1/documents/ingest
GET /v1/jobs/{job_id}, /v1/jobs/{job_id}/result, /v1/jobs/{job_id}/trace
GET /v1/patients, /v1/patients/{id}, /v1/patients/{id}/timeline
POST /v1/patients/chat
GET /v1/metrics/bootstrap, /v1/metrics/dashboard

Details and try-it: Swagger at /docs.


Repo layout

backend/     FastAPI app, pipeline, skill YAML, SQLite (runtime), tests
frontend/    Next.js app
example-runs/  sample PDFs/images + saved `…/result` JSON
CORA-Plan.md   deeper architecture / contracts
MODEL_CARD.md  limitations & privacy

Do not commit .env or real PHI. See .gitignore.


Built at

Ramblin' Hacks 2026 · Georgia Tech College of Computing · Healthcare Track — Elevance Health Challenge

Team

  • Rohit Prakash Gogi — Georgia Tech, Computer Science (AI & Information Networks)
  • Keerthi Veeramachaneni — Georgia Tech, Computer Science (AI & Theory)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors