A Databricks-native platform that protects India's 300M+ first-time digital finance users across the complete fraud lifecycle — prevention, detection, legal guidance, government scheme access, and complaint filing — in 10 Indian languages with voice support.
Detailed setup instructions:
how_to_run.md| Interactive architecture:Architecture_Diagram.html| Presentation:Presentation.html
flowchart LR
U["User\n10 languages + voice"] --> APP["Databricks App\n(Gradio)"]
subgraph Modules["5 Connected Modules"]
SA["Saavdhaan\nLending Analyzer"]
SU["Suraksha\nFraud Detection"]
AD["Adhikar\nRights Chatbot"]
SM["Samriddhi\nScheme Navigator"]
NI["Nivaaran\nComplaint Drafter"]
end
APP --> SA & SU & AD & SM & NI
subgraph RAG["RAG Pipeline (orchestration.py)"]
direction LR
R1["Translate\nSarvam Mayura"] --> R2["Rewrite\nChat Memory"] --> R3["Retrieve\nVector Search / FAISS"] --> R4["Generate\nLlama-4-Maverick"] --> R5["Translate Back"]
end
subgraph ML["Spark MLlib + MLflow"]
F["GBT Fraud Classifier\n@champion"]
P["KMeans Personas\n@champion"]
end
subgraph Lakehouse["Databricks Lakehouse"]
DT["Delta Tables\n(CDF enabled)"]
VS["Vector Search Index\n(databricks-bge-large-en)"]
VOL["UC Volume\nFAISS + app_cache"]
UC["Unity Catalog\nworkspace.default"]
end
SA & AD & SM & NI --> RAG
SU --> F
SM --> P
F & P --> DT
RAG --> VS & VOL
VS --> DT
User (10 languages + voice)
│
▼
Databricks App (Gradio UI, OAuth M2M service principal)
│
├──► Saavdhaan ──► RAG Pipeline ──► Vector Search (unified_corpus, CDF auto-sync)
├──► Suraksha ──► Spark MLlib GBT (MLflow fraud_detector@champion) ──► Delta: upi_transactions
├──► Adhikar ──► RAG Pipeline ──► Vector Search ──► FAISS fallback (UC Volume)
├──► Samriddhi ──► Spark MLlib KMeans (MLflow persona_kmeans@champion) + RAG
├──► Nivaaran ──► RAG Pipeline ──► Llama-4-Maverick (primary) / sarvam-m (fallback)
└──► Performance ──► Delta tables via SQL Statement API (live metrics)
Google Drive (seed files)
↓ notebook 00b
UC Volume: /Volumes/workspace/default/project_files/
↓ notebooks 01-06
Delta Tables: upi_transactions, bns_sections, rbi_circulars, gov_schemes, ...
↓ notebook 07
unified_corpus (merged, Change Data Feed enabled)
↓ notebook 08 ↓ notebook 09
Vector Search Index FAISS Index on UC Volume
(databricks-bge-large-en) (all-MiniLM-L6-v2)
↓ notebook 10 ↓ notebook 11
MLflow: fraud_detector@champion MLflow: persona_kmeans@champion
| Layer | Technologies |
|---|---|
| Databricks Platform | Delta Lake (CDF), Unity Catalog, Spark MLlib, MLflow, Vector Search, Databricks Apps, Llama-4-Maverick, SQL Statement API, OAuth M2M |
| Indian AI | Sarvam Mayura (translation), Saaras v3 (STT), Bulbul v3 (TTS), sarvam-m (fallback LLM) |
| Legal/Financial Data | BNS 2023, BNS-IPC Mapping, RBI Digital Lending Circulars, MyScheme.gov.in, BhashaBench |
| Languages | English, Hindi, Tamil, Telugu, Bengali, Marathi, Kannada, Malayalam, Gujarati, Punjabi |
| Fallbacks | LLM: Databricks → Sarvam |
- Databricks workspace with Unity Catalog enabled (DBR 14.3+)
- Databricks CLI installed (docs)
- Sarvam AI API key
- HuggingFace token (for BhashaBench datasets)
export DATABRICKS_HOST="https://<your-workspace>.cloud.databricks.com"
export DATABRICKS_TOKEN="<your-personal-access-token>"databricks secrets create-scope artha-nyaya
databricks secrets put-secret artha-nyaya sarvam_api_key
databricks secrets put-secret artha-nyaya hf_tokendatabricks sync . /Workspace/Users/<your-email>/Databricks_HackathonOpen each in Databricks UI and run sequentially on a cluster:
| # | Notebook | Purpose | Required? |
|---|---|---|---|
| 1 | 00_setup_secrets_and_volume.py |
Create secret scope, catalog, volume, verify APIs | Yes |
| 2 | 00b_sync_repo_data_to_volume.py |
Download seed data from Google Drive | Yes |
| 3 | 01_ingest_upi_transactions.py |
Ingest UPI transactions | Yes |
| 4 | 02_ingest_bns_sections.py |
Ingest BNS 2023 legal sections | Yes |
| 5 | 03_ingest_bns_ipc_mapping.py |
Ingest BNS-IPC mapping | Yes |
| 6 | 04_ingest_rbi_circulars.py |
Ingest RBI circulars | Yes |
| 7 | 05_ingest_gov_schemes.py |
Ingest government schemes | Yes |
| 8 | 06_ingest_bhashabench_eval.py |
Ingest BhashaBench benchmarks | Yes |
| 9 | 07_build_unified_corpus.py |
Merge into unified corpus (CDF enabled) | Yes |
| 10 | 08_setup_vector_search.py |
Create Vector Search endpoint + index | Yes |
| 11 | 09_build_faiss_fallback.py |
Build FAISS fallback index | Yes |
| 12 | 10_train_fraud_model.py |
Train GBT fraud classifier (MLflow) | Yes |
| 13 | 11_train_persona_clusters.py |
Train KMeans personas | Yes |
| 14 | 12_evaluate_rag_bhashabench.py |
RAG evaluation | Optional |
| 15 | 13_smoke_test_end_to_end.py |
End-to-end smoke test | Optional |
| 16 | 14_notebook_ui_fallback.py |
Notebook UI fallback | Skip |
| 17 | 15_grant_app_permissions.py |
Grant service principal permissions | Yes (after Step 6) |
| 18 | 16_current_permissions.py |
Permissions audit | Optional |
databricks apps create artha-nyaya-suite --description "Artha-Nyaya Suite"- Go to Databricks UI → Compute → Apps → artha-nyaya-suite
- Copy the Service Principal Client ID
- Grant secret scope access:
databricks secrets put-acl artha-nyaya <CLIENT_ID> READ
- Run notebook 15 in Databricks UI (enter Client ID in the widget) — this grants USE_CATALOG, USE_SCHEMA, READ_VOLUME, SELECT on tables, EXECUTE on models, and secret scope READ ACL
databricks apps deploy artha-nyaya-suite \
--source-code-path /Workspace/Users/<your-email>/Databricks_HackathonThe app URL appears in databricks apps get artha-nyaya-suite or app logs.
For full details, CLI permission commands, and troubleshooting, see
how_to_run.md.
1. Saavdhaan — Predatory Lending Detection
- Paste these example terms into the text box:
Interest rate: 36% per week. Processing fee: 15% of loan amount. We will access your contacts, photos, and location. Recovery agents may contact your family members. Late fee: Rs 500 per day. - Click "Analyze for Safety"
- See the safety scorecard (0-100) with flagged violations + RAG legal analysis citing RBI guidelines
- Two buttons appear: "Know Your Rights" and "Report / File Complaint"
2. Cross-Module Navigation → Nivaaran
- Click "Report / File Complaint"
- App jumps to Nivaaran tab with complaint type pre-filled as "Digital Lending App Harassment" and description pre-filled with the flagged terms
- Click "Generate Complaint" → formal legal draft with BNS sections + RBI citations
3. Suraksha — Fraud Detection
- Navigate to Suraksha tab
- Click "Load Transactions" (demo user:
ramesh.kumar@oksbi) - See transaction table with FRAUD FLAGGED rows
- Click "Explain Flagged Fraud" → AI explains why with legal references
- Click "Know Your Rights" → jumps to Adhikar with suggested questions shown
4. Adhikar — Legal Rights Chatbot
- Yellow hint box shows suggested questions from Suraksha
- Type:
What is BNS section 318 about cheating? - Follow up:
What's the punishment for that? - The system resolves "that" using conversation memory and retrieves the correct answer
5. Samriddhi — Government Schemes
- Navigate to Samriddhi tab
- Click "Find Schemes For Me" → KMeans persona matching → relevant government schemes displayed
6. Performance Dashboard
- Open Performance tab → live fraud model AUC (0.9999), RAG proxy accuracy (62%), architecture overview
| Metric | Value |
|---|---|
| Fraud Model AUC | 0.9999 |
| Fraud-class F1 | 0.9668 |
| Fraud Precision | 0.9711 |
| Fraud Recall | 0.9626 |
| Training Data | 5M+ rows |
| RAG Proxy Accuracy (BhashaBench Hindi) | 62.0% |
| RAG Token F1 | 0.384 |
| RAG Avg Latency | 4.8s |
| Languages Supported | 10 |
| Connected Modules | 5 + Performance dashboard |
Databricks_Hackathon/
├── app/
│ ├── main.py # Gradio entry point (6 tabs + cross-module nav)
│ ├── fraud_module.py # Suraksha — fraud detection
│ ├── rights_module.py # Adhikar — legal rights chatbot
│ ├── scheme_module.py # Samriddhi — scheme matching
│ ├── nivaaran_module.py # Nivaaran — complaint drafter
│ ├── saavdhaan_module.py # Saavdhaan — predatory lending detector
│ ├── eval_module.py # Performance — eval metrics dashboard
│ ├── orchestration.py # RAG pipeline (translate → retrieve → generate)
│ ├── voice_module.py # Voice I/O (STT + TTS)
│ └── ui_helpers.py # Shared UI formatting
├── lib/
│ ├── sarvam_client.py # Sarvam API client
│ ├── llm_client.py # LLM client (Databricks + Sarvam fallback)
│ ├── retrieval.py # Unified retriever (Vector Search + FAISS)
│ ├── app_cache.py # Parquet cache for app container
│ └── secrets.py # Three-tier secret loading
├── notebooks/ # 00-16: run in order (see How to Run)
├── app.yaml # Databricks App deployment config
├── requirements.txt # Python dependencies
├── how_to_run.md # Detailed setup & deployment guide
├── Architecture_Diagram.html # Interactive architecture diagram
├── Presentation.html # 4-slide presentation deck
└── README.md
Built for the Databricks Hackathon.