Runnable MVP for the CitySort plan: ingest, extract, classify, validate, and route city documents with human-in-the-loop review.
- FastAPI backend with SQLite persistence and audit events
- Upload pipeline with document lifecycle states (
ingested,routed,needs_review,approved,corrected,failed) - Bulk database import API to ingest documents from SQLite/PostgreSQL/MySQL using a SELECT query
- Durable async job queue with worker thread + persisted job state in SQLite
- OCR provider switch:
local(native text + PDF parsing)azure_di(Azure Document Intelligence)
- Classification provider switch:
rules(local keyword model)openai(JSON classification)anthropic(JSON classification)
- Automatic fallback to local processing when provider credentials/calls are unavailable
- Department queues and analytics APIs
- Human review API and dashboard workflow
- Audit trail API per document (
/api/documents/{id}/audit) - Rules config APIs (
GET/PUT /api/config/rules,POST /api/config/rules/reset) - Auth/RBAC APIs:
- bootstrap first admin (
POST /api/auth/bootstrap) - login (
POST /api/auth/login) - current user (
GET /api/auth/me) - admin user management (
GET/POST /api/auth/users,PATCH /api/auth/users/{id}/role)
- bootstrap first admin (
- Platform operations APIs for enterprise-style controls:
- Connectivity checks (
GET /api/platform/connectivity,POST /api/platform/connectivity/check) - Manual deployments + history (
POST /api/platform/deployments/manual,GET /api/platform/deployments) - Team invitations (
POST /api/platform/invitations,GET /api/platform/invitations) - API key lifecycle (
POST /api/platform/api-keys,GET /api/platform/api-keys,POST /api/platform/api-keys/{id}/revoke) - Platform summary (
GET /api/platform/summary)
- Connectivity checks (
- Job APIs:
- list jobs (
GET /api/jobs) - job detail (
GET /api/jobs/{id})
- list jobs (
- Web dashboard for upload, queue monitoring, analytics, and review actions
- Enhanced review pane with extracted fields, validation issues, corrected JSON fields, text preview, and audit history
- Reprocess action on selected documents to apply latest rules/providers without re-upload
- Rules editor panel in dashboard to update doc types, keywords, required fields, and routing without code changes
- Form-based rules builder (add/remove types, comma-separated keywords/required fields) so most users never need JSON
- Dashboard topbar actions wired end-to-end:
Connect: runs provider/database readiness checksManual Deploy: records and returns deploy resultInvite: creates invitation token/linkNew API Key: issues a new key (shown once)
- Unit tests for core pipeline logic
backend/app/main.py: API routes and orchestrationbackend/app/pipeline.py: pipeline corebackend/app/providers.py: Azure/OpenAI/Anthropic integrationsbackend/app/auth.py: token auth, password hashing, RBAC enforcementbackend/app/jobs.py: durable background workerbackend/app/deployments.py: local/Render/GitHub deploy triggersbackend/app/document_tasks.py: reusable document-processing task logicbackend/app/rules.py: runtime rule loading/validation/persistencebackend/app/config.py: environment-based configbackend/tests/test_platform_api.py: platform operations API testsfrontend/index.html: dashboard shellfrontend/app.v2.js: dashboard behaviordeploy/k8s/: Kubernetes manifests (namespace, deployment, service, ingress, HPA, config)docker-compose.yml: local container orchestrationscripts/run_demo.sh: end-to-end demo runner
cd citysort
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txtcd citysort
cp .env.example .envDefault .env values run fully local.
To enable external providers:
- Azure OCR: set
CITYSORT_OCR_PROVIDER=azure_diand fillAZURE_DOCUMENT_INTELLIGENCE_* - OpenAI classifier: set
CITYSORT_CLASSIFIER_PROVIDER=openaiandOPENAI_API_KEY - Anthropic classifier: set
CITYSORT_CLASSIFIER_PROVIDER=anthropicandANTHROPIC_API_KEY - Optional custom rules file path:
CITYSORT_RULES_PATH(defaults todata/document_rules.json) - Confidence gate for auto-routing:
CITYSORT_CONFIDENCE_THRESHOLD(default0.82) - Always-human-review types:
CITYSORT_FORCE_REVIEW_DOC_TYPES(comma-separated) - Primary database:
CITYSORT_DATABASE_URL- Development:
sqlite:///data/citysort.db - Production:
postgresql://...
- Development:
- Auth and RBAC:
CITYSORT_REQUIRE_AUTH=trueenables authentication checksCITYSORT_AUTH_SECRETsigns user access tokensCITYSORT_STRICT_AUTH_SECRET=trueblocks startup if using weak/default secrets
- Deployment provider:
CITYSORT_DEPLOY_PROVIDER=local|render|githubCITYSORT_DEPLOY_COMMANDfor local deploy executionCITYSORT_RENDER_*orCITYSORT_GITHUB_*to trigger external deploy pipelines
- Durable worker:
CITYSORT_WORKER_ENABLED=trueCITYSORT_WORKER_POLL_INTERVAL_SECONDSCITYSORT_WORKER_MAX_ATTEMPTSCITYSORT_QUEUE_BACKEND=sqlite|redisCITYSORT_REDIS_URLandCITYSORT_REDIS_JOB_QUEUE_NAMEwhen using Redis queueing
- Security/operations:
CITYSORT_ENFORCE_HTTPS=trueCITYSORT_CORS_ALLOWED_ORIGINS=https://your-ui.exampleCITYSORT_RATE_LIMIT_*CITYSORT_ENCRYPTION_AT_REST_ENABLED=true+CITYSORT_ENCRYPTION_KEYCITYSORT_PROMETHEUS_ENABLED=true(/metrics)CITYSORT_SENTRY_DSN=...
CITYSORT_OCR_PROVIDER=azure_di
CITYSORT_CLASSIFIER_PROVIDER=openai
OPENAI_MODEL=gpt-4o-mini
CITYSORT_CONFIDENCE_THRESHOLD=0.92
CITYSORT_FORCE_REVIEW_DOC_TYPES=other,benefits_application,court_filingcd citysort
source .venv/bin/activate
uvicorn backend.app.main:app --reload --port 8000Dashboard: http://localhost:8000 Health: http://localhost:8000/health Readiness: http://localhost:8000/readyz Liveness: http://localhost:8000/livez Metrics: http://localhost:8000/metrics
Create first admin (only works when there are no existing users):
curl -X POST http://localhost:8000/api/auth/bootstrap \
-H "Content-Type: application/json" \
-d '{
"email": "admin@citysort.local",
"password": "ChangeMe12345!",
"full_name": "CitySort Admin"
}'Login:
curl -X POST http://localhost:8000/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@citysort.local","password":"ChangeMe12345!"}'cd citysort
source .venv/bin/activate
PYTHONPATH=backend pytest backend/tests -q./scripts/backup.sh
./scripts/restore.sh <db_backup_file> [uploads_archive]python scripts/migrate_sqlite_to_postgres.py \
--sqlite-path data/citysort.db \
--postgres-url postgresql://user:pass@host:5432/citysortdocs/deployment-guide.mddocs/operations-runbook.mddocs/incident-response-playbook.mddocs/architecture-roadmap.md
- CI workflow:
.github/workflows/ci.yml(tests + Docker build) - Deploy workflow:
.github/workflows/deploy.yml(manual dev/staging/production dispatch scaffold)
This starts the API, uploads sample documents, prints results/analytics/queues, and shuts down the server.
cd citysort
./scripts/run_demo.shSample docs used by the demo are in assets/samples.
Use the dashboard Database Import panel, or call the API directly:
curl -X POST http://localhost:8000/api/documents/import/database \
-H "Content-Type: application/json" \
-d '{
"database_url": "postgresql://user:pass@localhost:5432/files_db",
"query": "SELECT filename, content, content_type FROM incoming_files",
"filename_column": "filename",
"content_column": "content",
"content_type_column": "content_type",
"source_channel": "database_import",
"actor": "ops_user",
"process_async": false,
"limit": 500
}'Notes:
database_urlsupports:- SQLite:
sqlite:///absolute/path/to/files.dbor/absolute/path/to/files.db - PostgreSQL:
postgresql://user:pass@host:5432/dbname - MySQL:
mysql://user:pass@host:3306/dbname
- SQLite:
- Query must be a single
SELECT/WITH ... SELECTstatement. - Provide either
content_column(BLOB/text) orfile_path_column(path on server).
cd citysort
git remote add origin https://github.com/imranow/Citysort.git
git push -u origin main.env is intentionally excluded from git. Only commit .env.example.
This repo includes render.yaml, so you can deploy with a Render Blueprint.
- In Render, choose New + -> Blueprint.
- Connect
imranow/Citysort. - Render will detect
render.yaml. - Set any required secret env vars in Render dashboard:
OPENAI_API_KEY(if using OpenAI classification)ANTHROPIC_API_KEY(if using Anthropic classification)AZURE_DOCUMENT_INTELLIGENCE_ENDPOINTandAZURE_DOCUMENT_INTELLIGENCE_API_KEY(if using Azure OCR)
- Deploy. Render provides a public URL.
Default deployment uses local rule-based classification:
CITYSORT_OCR_PROVIDER=localCITYSORT_CLASSIFIER_PROVIDER=rules
This repo includes a production Dockerfile at Dockerfile.
cd citysort
docker build -t citysort:latest .
docker run -p 8000:8000 --env-file .env citysort:latestOr run with compose (includes volume + healthcheck):
cd citysort
docker compose up --buildKubernetes manifests are provided in deploy/k8s.
- Update the image in
deploy/k8s/deployment.yamlto your published image. - Set real secrets in
deploy/k8s/secret.example.yaml(or create your ownSecret). - Update the host in
deploy/k8s/ingress.yaml. - Apply:
cd citysort
kubectl apply -k deploy/k8s