Skip to content

samarth2018/Databricks_Hackathon

Repository files navigation

Artha-Nyaya Suite

A Databricks-native platform that protects India's 300M+ first-time digital finance users across the complete fraud lifecycle — prevention, detection, legal guidance, government scheme access, and complaint filing — in 10 Indian languages with voice support.

Detailed setup instructions: how_to_run.md | Interactive architecture: Architecture_Diagram.html | Presentation: Presentation.html


Architecture

flowchart LR
    U["User\n10 languages + voice"] --> APP["Databricks App\n(Gradio)"]

    subgraph Modules["5 Connected Modules"]
      SA["Saavdhaan\nLending Analyzer"]
      SU["Suraksha\nFraud Detection"]
      AD["Adhikar\nRights Chatbot"]
      SM["Samriddhi\nScheme Navigator"]
      NI["Nivaaran\nComplaint Drafter"]
    end

    APP --> SA & SU & AD & SM & NI

    subgraph RAG["RAG Pipeline (orchestration.py)"]
      direction LR
      R1["Translate\nSarvam Mayura"] --> R2["Rewrite\nChat Memory"] --> R3["Retrieve\nVector Search / FAISS"] --> R4["Generate\nLlama-4-Maverick"] --> R5["Translate Back"]
    end

    subgraph ML["Spark MLlib + MLflow"]
      F["GBT Fraud Classifier\n@champion"]
      P["KMeans Personas\n@champion"]
    end

    subgraph Lakehouse["Databricks Lakehouse"]
      DT["Delta Tables\n(CDF enabled)"]
      VS["Vector Search Index\n(databricks-bge-large-en)"]
      VOL["UC Volume\nFAISS + app_cache"]
      UC["Unity Catalog\nworkspace.default"]
    end

    SA & AD & SM & NI --> RAG
    SU --> F
    SM --> P
    F & P --> DT
    RAG --> VS & VOL
    VS --> DT
Loading

How Databricks Components Connect

User (10 languages + voice)
  │
  ▼
Databricks App (Gradio UI, OAuth M2M service principal)
  │
  ├──► Saavdhaan ──► RAG Pipeline ──► Vector Search (unified_corpus, CDF auto-sync)
  ├──► Suraksha  ──► Spark MLlib GBT (MLflow fraud_detector@champion) ──► Delta: upi_transactions
  ├──► Adhikar   ──► RAG Pipeline ──► Vector Search ──► FAISS fallback (UC Volume)
  ├──► Samriddhi ──► Spark MLlib KMeans (MLflow persona_kmeans@champion) + RAG
  ├──► Nivaaran  ──► RAG Pipeline ──► Llama-4-Maverick (primary) / sarvam-m (fallback)
  └──► Performance ──► Delta tables via SQL Statement API (live metrics)

Data Pipeline (Notebook Execution)

Google Drive (seed files)
    ↓ notebook 00b
UC Volume: /Volumes/workspace/default/project_files/
    ↓ notebooks 01-06
Delta Tables: upi_transactions, bns_sections, rbi_circulars, gov_schemes, ...
    ↓ notebook 07
unified_corpus (merged, Change Data Feed enabled)
    ↓ notebook 08                    ↓ notebook 09
Vector Search Index              FAISS Index on UC Volume
(databricks-bge-large-en)       (all-MiniLM-L6-v2)
    ↓ notebook 10                    ↓ notebook 11
MLflow: fraud_detector@champion   MLflow: persona_kmeans@champion

Tech Stack

Layer Technologies
Databricks Platform Delta Lake (CDF), Unity Catalog, Spark MLlib, MLflow, Vector Search, Databricks Apps, Llama-4-Maverick, SQL Statement API, OAuth M2M
Indian AI Sarvam Mayura (translation), Saaras v3 (STT), Bulbul v3 (TTS), sarvam-m (fallback LLM)
Legal/Financial Data BNS 2023, BNS-IPC Mapping, RBI Digital Lending Circulars, MyScheme.gov.in, BhashaBench
Languages English, Hindi, Tamil, Telugu, Bengali, Marathi, Kannada, Malayalam, Gujarati, Punjabi
Fallbacks LLM: Databricks → Sarvam

How to Run

Prerequisites

  • Databricks workspace with Unity Catalog enabled (DBR 14.3+)
  • Databricks CLI installed (docs)
  • Sarvam AI API key
  • HuggingFace token (for BhashaBench datasets)

Step 1: Export Credentials

export DATABRICKS_HOST="https://<your-workspace>.cloud.databricks.com"
export DATABRICKS_TOKEN="<your-personal-access-token>"

Step 2: Store Secrets

databricks secrets create-scope artha-nyaya
databricks secrets put-secret artha-nyaya sarvam_api_key
databricks secrets put-secret artha-nyaya hf_token

Step 3: Sync Repo to Workspace

databricks sync . /Workspace/Users/<your-email>/Databricks_Hackathon

Step 4: Run Notebooks in Order

Open each in Databricks UI and run sequentially on a cluster:

# Notebook Purpose Required?
1 00_setup_secrets_and_volume.py Create secret scope, catalog, volume, verify APIs Yes
2 00b_sync_repo_data_to_volume.py Download seed data from Google Drive Yes
3 01_ingest_upi_transactions.py Ingest UPI transactions Yes
4 02_ingest_bns_sections.py Ingest BNS 2023 legal sections Yes
5 03_ingest_bns_ipc_mapping.py Ingest BNS-IPC mapping Yes
6 04_ingest_rbi_circulars.py Ingest RBI circulars Yes
7 05_ingest_gov_schemes.py Ingest government schemes Yes
8 06_ingest_bhashabench_eval.py Ingest BhashaBench benchmarks Yes
9 07_build_unified_corpus.py Merge into unified corpus (CDF enabled) Yes
10 08_setup_vector_search.py Create Vector Search endpoint + index Yes
11 09_build_faiss_fallback.py Build FAISS fallback index Yes
12 10_train_fraud_model.py Train GBT fraud classifier (MLflow) Yes
13 11_train_persona_clusters.py Train KMeans personas Yes
14 12_evaluate_rag_bhashabench.py RAG evaluation Optional
15 13_smoke_test_end_to_end.py End-to-end smoke test Optional
16 14_notebook_ui_fallback.py Notebook UI fallback Skip
17 15_grant_app_permissions.py Grant service principal permissions Yes (after Step 6)
18 16_current_permissions.py Permissions audit Optional

Step 5: Create the App

databricks apps create artha-nyaya-suite --description "Artha-Nyaya Suite"

Step 6: Find Client ID & Grant Permissions

  1. Go to Databricks UIComputeAppsartha-nyaya-suite
  2. Copy the Service Principal Client ID
  3. Grant secret scope access:
    databricks secrets put-acl artha-nyaya <CLIENT_ID> READ
  4. Run notebook 15 in Databricks UI (enter Client ID in the widget) — this grants USE_CATALOG, USE_SCHEMA, READ_VOLUME, SELECT on tables, EXECUTE on models, and secret scope READ ACL

Step 7: Deploy

databricks apps deploy artha-nyaya-suite \
  --source-code-path /Workspace/Users/<your-email>/Databricks_Hackathon

The app URL appears in databricks apps get artha-nyaya-suite or app logs.

For full details, CLI permission commands, and troubleshooting, see how_to_run.md.


Demo Steps

The Connected Journey (recommended 5-minute flow)

1. Saavdhaan — Predatory Lending Detection

  • Paste these example terms into the text box:
    Interest rate: 36% per week. Processing fee: 15% of loan amount.
    We will access your contacts, photos, and location. Recovery agents
    may contact your family members. Late fee: Rs 500 per day.
    
  • Click "Analyze for Safety"
  • See the safety scorecard (0-100) with flagged violations + RAG legal analysis citing RBI guidelines
  • Two buttons appear: "Know Your Rights" and "Report / File Complaint"

2. Cross-Module Navigation → Nivaaran

  • Click "Report / File Complaint"
  • App jumps to Nivaaran tab with complaint type pre-filled as "Digital Lending App Harassment" and description pre-filled with the flagged terms
  • Click "Generate Complaint" → formal legal draft with BNS sections + RBI citations

3. Suraksha — Fraud Detection

  • Navigate to Suraksha tab
  • Click "Load Transactions" (demo user: ramesh.kumar@oksbi)
  • See transaction table with FRAUD FLAGGED rows
  • Click "Explain Flagged Fraud" → AI explains why with legal references
  • Click "Know Your Rights" → jumps to Adhikar with suggested questions shown

4. Adhikar — Legal Rights Chatbot

  • Yellow hint box shows suggested questions from Suraksha
  • Type: What is BNS section 318 about cheating?
  • Follow up: What's the punishment for that?
  • The system resolves "that" using conversation memory and retrieves the correct answer

5. Samriddhi — Government Schemes

  • Navigate to Samriddhi tab
  • Click "Find Schemes For Me" → KMeans persona matching → relevant government schemes displayed

6. Performance Dashboard

  • Open Performance tab → live fraud model AUC (0.9999), RAG proxy accuracy (62%), architecture overview

Results

Metric Value
Fraud Model AUC 0.9999
Fraud-class F1 0.9668
Fraud Precision 0.9711
Fraud Recall 0.9626
Training Data 5M+ rows
RAG Proxy Accuracy (BhashaBench Hindi) 62.0%
RAG Token F1 0.384
RAG Avg Latency 4.8s
Languages Supported 10
Connected Modules 5 + Performance dashboard

Repository Layout

Databricks_Hackathon/
├── app/
│   ├── main.py                   # Gradio entry point (6 tabs + cross-module nav)
│   ├── fraud_module.py           # Suraksha — fraud detection
│   ├── rights_module.py          # Adhikar — legal rights chatbot
│   ├── scheme_module.py          # Samriddhi — scheme matching
│   ├── nivaaran_module.py        # Nivaaran — complaint drafter
│   ├── saavdhaan_module.py       # Saavdhaan — predatory lending detector
│   ├── eval_module.py            # Performance — eval metrics dashboard
│   ├── orchestration.py          # RAG pipeline (translate → retrieve → generate)
│   ├── voice_module.py           # Voice I/O (STT + TTS)
│   └── ui_helpers.py             # Shared UI formatting
├── lib/
│   ├── sarvam_client.py          # Sarvam API client
│   ├── llm_client.py             # LLM client (Databricks + Sarvam fallback)
│   ├── retrieval.py              # Unified retriever (Vector Search + FAISS)
│   ├── app_cache.py              # Parquet cache for app container
│   └── secrets.py                # Three-tier secret loading
├── notebooks/                    # 00-16: run in order (see How to Run)
├── app.yaml                      # Databricks App deployment config
├── requirements.txt              # Python dependencies
├── how_to_run.md                 # Detailed setup & deployment guide
├── Architecture_Diagram.html     # Interactive architecture diagram
├── Presentation.html             # 4-slide presentation deck
└── README.md

Built for the Databricks Hackathon.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors