A premium, mobile-first PWA for searching electoral roll data with bilingual support (English + Telugu).
├── apps/web # Next.js 14 App Router + Tailwind + Supabase
├── packages/db # SQL Migrations for Supabase
└── packages/ingest # Ingestion worker and scripts
cd apps/web
npm install
npm run devCreate .env.local in apps/web:
NEXT_PUBLIC_SUPABASE_URL=your_project_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_keyFor ingestion worker, also set:
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key- Go to
/adminin the app (requires admin role) - Click "Add Document"
- Paste your Google Drive sharing link
- Select document type (Structured/Scanned)
- Click "Add to Queue"
cd packages/ingest
npx ts-node worker.ts # Process all queued
npx ts-node worker.ts --document_id=<uuid> # Process specific docRun in Supabase SQL Editor:
INSERT INTO electoral.allowed_users (email, role)
VALUES ('user@example.com', 'admin');- Go to
/admin/queue - Find the failed document
- Click "Retry" or run manually with CLI
- PDFs > 50MB should use Drive links (not uploaded)
- Service role key is NEVER exposed to frontend
- All voter data protected by Row Level Security
- Import
apps/webas the root - Add environment variables
- Deploy
V2 is a fully isolated development environment for unstable handwritten OCR ingestion.
| Branch | Purpose | Deploys To |
|---|---|---|
main |
V1 Production (stable) | electoral.app |
release/v1 |
V1 hotfixes only | electoral.app |
v2 |
V2 Development | v2.electoral.app |
- Supabase:
civic-intel-v2(xvdretunljtbkyzmmnok) - Vercel:
54-v2project (create manually) - GitHub Actions:
v2environment with approval gates
V2 uses a "canonical view" pattern for handwritten OCR:
voters (base) ←→ voters_patch (corrections)
↓
voters_canonical_v2 (merged view)
Key tables:
voters_candidate- Raw OCR extractions (may have errors)voters_patch- Verified correctionsqa_issues- Quality issues trackingsource_assets- Images and textract outputs
git checkout v2
cp apps/web/.env.v2.template apps/web/.env.local
# Fill in SUPABASE_SERVICE_ROLE_KEY from dashboard
npm run devV1 remains independent at all times:
- V1 code:
mainbranch, unaffected by v2 - V1 data:
Division 54Supabase project, separate from V2 - V1 deploy:
deploy-production.ymlonly touches main
- Phase 1: Foundation skeleton + PWA
- Phase 2: Supabase Auth + RLS + real search
- Phase 3: Documents & Ingestion system
- Phase 4: Voice Search + Read Aloud
- Phase 5: Voice-First Upgrade
- Phase 6: Production Hardening & Quality Gates
- Phase 7: Self-Improving Search + Telugu/Roman Superpowers + Active Learning
- Phase 8: Durable Ingestion + Scale Spine (Zero-Duplicate, Zero-Stuck)
- Phase 9: Secure Data Foundation + Search (RLS, Audit Logging)
- Phase 10: Field Operations (Offline-First, Visits, Issues, Sync)
- Phase 11: Intelligence Layer (Analytics, ML Scoring, Dashboards, Segments)
- Phase 12: Production Launch & Corporate Hardening (CI/CD, Security, DR)
Search works with both Telugu and Roman scripts:
- Type "Ramesh" → finds "రమేష్"
- Type "శ్రీనివాస్" → finds matching records
- Fuzzy matching handles typos and OCR errors
Each voter record has normalized fields:
name_norm_te- Telugu normalized namename_norm_roman- Roman (ITRANS) normalized nameaddress_norm_te/roman- Normalized addressessearch_blob- Combined search field
The system uses indic-transliteration for Telugu ↔ Roman conversion.
cd packages/ingest
python -m normalize.backfillThis populates normalized columns for all existing voters.
The queue_reviews.py script uses:
- Uncertainty sampling - Low confidence records
- Diversity sampling - Random pages across documents
- Random sampling - Avoid blind spots
Run manually:
cd packages/ingest
python -m normalize.queue_reviewssource_confidence < 0.6→ triggers reviewsource_confidence = 1.0→ marked as reviewed- Adjust in
queue_reviews.pyand admin review page
Phase 8 implements durable, idempotent ingestion with:
- Job leasing with heartbeats
- Event logging for traceability
- Artifact storage for replay
- Natural key constraints for deduplication
cd packages/ingest
python -m durable.run- Expanded voter schema with family, contact, benefits fields
- Role-based RLS policies (admin, leader, booth_incharge, field_worker)
- Audit logging for all sensitive operations
- Advanced search RPC with fuzzy matching
Offline-first field ops system:
- IndexedDB storage with booth packs
- Visit logging with sync queue
- Issue tracking and schemes
- Service worker for background sync
Analytics and ML scoring:
- Materialized views for dashboards
- Segment builder with query DSL
- Turnout propensity and priority models
- SHAP-based explainability
Corporate-grade deployment:
- CI/CD with GitHub Actions
- k6 load testing suite
- Security headers and OWASP checklist
- Backup/DR with PITR
- Compliance documentation