# 🏦 BFSI AI Assistant (Production-Ready Local Model)
This notebook implements a local, compliance-first BFSI AI Assistant.

### Key Features:
- Local small embedding model
- Curated dataset priority layer
- Retrieval Augmented Generation (RAG)
- Strict safety & compliance layer
- Interactive chatbot mode


In [1]:
%pip install --upgrade pip setuptools wheel --no-cache-dir
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
%pip install sentence-transformers faiss-cpu langchain langchain-community


Looking in indexes: https://download.pytorch.org/whl/cpu


In [2]:
import time
import torch
from sentence_transformers import SentenceTransformer, util
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document


Embedding model

In [3]:
# ===============================
# 2️⃣ Load Embedding Model
# ===============================

import os
import warnings
from sentence_transformers import SentenceTransformer

# Suppress HuggingFace warnings (no user action needed)
os.environ["HF_HUB_DISABLE_TELEMETRY"] = "1"
os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
warnings.filterwarnings("ignore")

# Load local embedding model
sim_model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    device="cpu"
)

print("✅ Embedding model loaded successfully")


Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


✅ Embedding model loaded successfully


Advanced safety check

In [4]:
# =========================================
# 🔥 FINAL BFSI AI ASSISTANT - STABLE BUILD
# =========================================

import os
import time
import warnings
import torch
from sentence_transformers import SentenceTransformer, util
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_community.embeddings import HuggingFaceEmbeddings # Added this import


# =========================================
# 1️⃣ CLEAN MODEL LOAD (PLACE THIS SECOND)
# =========================================

os.environ["HF_HUB_DISABLE_TELEMETRY"] = "1"
os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
warnings.filterwarnings("ignore")

sim_model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")

print("✅ Embedding model loaded successfully\n")


# =========================================
# 2️⃣ CURATED DATASET (PRIORITY LAYER)
# =========================================

intents = [
    ("Loan eligibility", "Loan eligibility depends on income, credit score, repayment history, and internal risk policy."),
    ("Application status", "You can track your application using your registered mobile or email."),
    ("EMI calculation", "EMI depends on principal, interest rate, and tenure."),
    ("Interest rate information", "Interest rates vary based on risk category and credit score."),
    ("Late payment penalty", "Late payment charges are applied as per agreement terms."),
    ("Prepayment rules", "Prepayment or foreclosure charges depend on product policy."),
    ("Transaction dispute", "Transaction disputes require transaction ID and follow internal resolution timelines."),
    ("KYC documents", "KYC requires PAN, Aadhaar, address proof, and income proof."),
]

dataset = [{"instruction": i[0], "output": i[1]} for i in intents]

stored_questions = [d["instruction"] for d in dataset]
stored_embeddings = sim_model.encode(stored_questions, convert_to_tensor=True)


def similarity_match(query, threshold=0.70):  # lowered threshold for accuracy
    q_emb = sim_model.encode(query, convert_to_tensor=True)
    scores = util.cos_sim(q_emb, stored_embeddings)[0]
    best_idx = int(torch.argmax(scores))
    best_score = float(scores[best_idx])

    if best_score >= threshold:
        return dataset[best_idx]["output"], best_score

    return None, best_score


# =========================================
# 3️⃣ RAG KNOWLEDGE BASE
# =========================================

policy_texts = [
    "Loan eligibility depends on income, credit score, repayment history, and internal risk policy.",
    "EMI is calculated using reducing balance method.",
    "Interest rate may vary by risk-based pricing.",
    "Late payment charges are mentioned in sanction letter.",
    "Prepayment charges depend on product terms.",
    "Dispute resolution follows RBI-compliant timelines.",
    "Never share OTP or CVV with anyone.",
]

documents = [Document(page_content=t) for t in policy_texts]

# Proper embedding wrapper for FAISS
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
vs = FAISS.from_documents(documents, embedding_model) # Used embedding_model here


def rag_retrieve(query, k=2):
    docs = vs.similarity_search(query, k=k)
    if not docs:
        return "I’m unable to retrieve relevant policy information at the moment."
    return "\n".join([d.page_content for d in docs])


# =========================================
# 4️⃣ ADVANCED SAFETY LAYER (HYBRID FIXED)
# =========================================

risk_intents = {
    "SECURITY": [
        "How to hack a bank account?",
        "How to bypass banking system?",
        "How to commit financial fraud?"
    ],
    "SENSITIVE": [
        "Here is my OTP",
        "My CVV number is",
        "Sharing my password"
    ],
    "PRIVACY": [
        "Can I share customer data with third party?",
        "Disclosing confidential customer information"
    ]
}

risk_embeddings = {
    category: sim_model.encode(sentences, convert_to_tensor=True)
    for category, sentences in risk_intents.items()
}


def advanced_safety_layer(query, threshold=0.60):
    q = query.lower()

    # Quick rule reinforcement
    if any(word in q for word in ["hack", "fraud", "otp", "cvv", "password"]):
        return "I cannot assist with security breaches or sharing sensitive financial information."

    q_emb = sim_model.encode(query, convert_to_tensor=True)

    for category, embeddings in risk_embeddings.items():
        scores = util.cos_sim(q_emb, embeddings)[0]
        max_score = float(torch.max(scores))

        if max_score >= threshold:
            if category == "SECURITY":
                return "I cannot assist with misuse of financial systems."
            if category == "SENSITIVE":
                return "Please do not share OTP, CVV, or passwords."
            if category == "PRIVACY":
                return "Customer information must only be shared with authorized personnel."

    return None


# =========================================
# 5️⃣ FINAL ASSISTANT PIPELINE
# =========================================

def ask(query):
    start = time.time()

    # SAFETY FIRST
    safe = advanced_safety_layer(query)
    if safe:
        return {
            "tier": "SAFETY",
            "response": safe,
            "latency": round(time.time() - start, 3)
        }

    # CURATED PRIORITY
    curated_resp, score = similarity_match(query)
    if curated_resp:
        return {
            "tier": "CURATED_DATASET",
            "confidence": round(score, 3),
            "response": curated_resp,
            "latency": round(time.time() - start, 3)
        }

    # RAG FALLBACK
    context = rag_retrieve(query)
    return {
        "tier": "RAG",
        "response": context,
        "latency": round(time.time() - start, 3)
    }





Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


✅ Embedding model loaded successfully



Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


## Curated Dataset (Priority Layer)

In [5]:
intents = [
    ("Loan eligibility", "Loan eligibility depends on income, credit score, repayment history, and internal risk policy."),
    ("Application status", "You can track your application using your registered mobile or email."),
    ("EMI calculation", "EMI depends on principal, interest rate, and tenure."),
    ("Interest rate information", "Interest rates vary based on risk category and credit score."),
    ("Late payment penalty", "Late payment charges are applied as per agreement terms."),
    ("Prepayment rules", "Prepayment or foreclosure charges depend on product policy."),
    ("Transaction dispute", "Transaction disputes require transaction ID and follow internal resolution timelines."),
    ("KYC documents", "KYC requires PAN, Aadhaar, address proof, and income proof."),
]

dataset = [{"instruction": i[0], "output": i[1]} for i in intents]

# ✅ Print dataset size
print("Dataset Size:", len(dataset))

# ✅ Print all records
for idx, record in enumerate(dataset):
    print(f"\nRecord {idx+1}:")
    print("Instruction:", record["instruction"])
    print("Output:", record["output"])


Dataset Size: 8

Record 1:
Instruction: Loan eligibility
Output: Loan eligibility depends on income, credit score, repayment history, and internal risk policy.

Record 2:
Instruction: Application status
Output: You can track your application using your registered mobile or email.

Record 3:
Instruction: EMI calculation
Output: EMI depends on principal, interest rate, and tenure.

Record 4:
Instruction: Interest rate information
Output: Interest rates vary based on risk category and credit score.

Record 5:
Instruction: Late payment penalty
Output: Late payment charges are applied as per agreement terms.

Record 6:
Instruction: Prepayment rules
Output: Prepayment or foreclosure charges depend on product policy.

Record 7:
Instruction: Transaction dispute
Output: Transaction disputes require transaction ID and follow internal resolution timelines.

Record 8:
Instruction: KYC documents
Output: KYC requires PAN, Aadhaar, address proof, and income proof.


In [6]:
import os
import warnings

# Disable HF authentication warning
os.environ["HF_HUB_DISABLE_TELEMETRY"] = "1"
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"

warnings.filterwarnings("ignore", category=UserWarning)


In [7]:
import os
import warnings
from sentence_transformers import SentenceTransformer

# Disable Hugging Face warnings (no user action required)
os.environ["HF_HUB_DISABLE_TELEMETRY"] = "1"
os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
warnings.filterwarnings("ignore", category=UserWarning)

# Load local embedding model
sim_model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    device="cpu"
)

print("Embedding model loaded successfully ✅")


Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


Embedding model loaded successfully ✅


## Similarity Engine

In [8]:
import os
import warnings
from sentence_transformers import SentenceTransformer

# Suppress HuggingFace warnings
os.environ["HF_HUB_DISABLE_TELEMETRY"] = "1"
os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
warnings.filterwarnings("ignore", category=UserWarning)

# Load model locally (public model, no token required)
sim_model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    device="cpu"
)

print("Hugging Face model loaded successfully without authentication warnings ✅")


Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.




## Knowledge Base (RAG Layer)

In [9]:
# =========================================
# 3️⃣ RAG KNOWLEDGE BASE (FIXED)
# =========================================

from langchain_community.embeddings import HuggingFaceEmbeddings

policy_texts = [
    "Loan eligibility depends on income, credit score, repayment history, and internal risk policy.",
    "EMI is calculated using reducing balance method.",
    "Interest rate may vary by risk-based pricing.",
    "Late payment charges are mentioned in sanction letter.",
    "Prepayment charges depend on product terms.",
    "Dispute resolution follows RBI-compliant timelines.",
    "Never share OTP or CVV with anyone.",
]

documents = [Document(page_content=t) for t in policy_texts]

# Proper embedding wrapper for FAISS
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

vs = FAISS.from_documents(documents, embedding_model)


def rag_retrieve(query, k=2):
    docs = vs.similarity_search(query, k=k)
    if not docs:
        return "No relevant policy information found."
    return "\n".join([d.page_content for d in docs])


Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


## Safety & Compliance Layer

In [10]:
import torch
from sentence_transformers import util

# Assuming sim_model and risk_embeddings are defined in previous cells

def advanced_safety_layer(query, threshold=0.60):
    q = query.lower()

    # Quick rule reinforcement
    if any(word in q for word in ["hack", "fraud", "otp", "cvv", "password"]):
        return "I cannot assist with security breaches or sharing sensitive financial information."

    q_emb = sim_model.encode(query, convert_to_tensor=True)

    for category, embeddings in risk_embeddings.items():
        scores = util.cos_sim(q_emb, embeddings)[0]
        max_score = float(torch.max(scores))

        if max_score >= threshold:
            if category == "SECURITY":
                return "I cannot assist with misuse of financial systems."
            if category == "SENSITIVE":
                return "Please do not share OTP, CVV, or passwords."
            if category == "PRIVACY":
                return "Customer information must only be shared with authorized personnel."

    return None

In [11]:
# =========================================
# ADVANCED SAFETY LAYER (FINAL FIXED)
# =========================================

from sentence_transformers import util
import torch

risk_intents = {
    "SECURITY": [
        "How to hack a bank account?",
        "How to bypass banking system?",
        "How to commit financial fraud?"
    ],
    "SENSITIVE": [
        "Here is my OTP",
        "My CVV number is",
        "Sharing my password"
    ],
    "PRIVACY": [
        "Can I share customer data with third party?",
        "Disclosing confidential customer information"
    ]
}

# Precompute embeddings once
risk_embeddings = {
    category: sim_model.encode(sentences, convert_to_tensor=True)
    for category, sentences in risk_intents.items()
}


def advanced_safety_layer(query, threshold=0.60):
    q = query.lower()

    # Quick keyword reinforcement
    if any(word in q for word in ["hack", "fraud", "otp", "cvv", "password"]):
        return "I cannot assist with security breaches or sharing sensitive financial information."

    # Semantic detection
    q_emb = sim_model.encode(query, convert_to_tensor=True)

    for category, embeddings in risk_embeddings.items():
        scores = util.cos_sim(q_emb, embeddings)[0]
        max_score = float(torch.max(scores))

        if max_score >= threshold:
            if category == "SECURITY":
                return "I cannot assist with misuse of financial systems."
            if category == "SENSITIVE":
                return "Please do not share OTP, CVV, or passwords."
            if category == "PRIVACY":
                return "Customer information must only be shared with authorized personnel."

    return None


## Final Assistant Pipeline

In [12]:
def ask(query):
    start = time.time()

    safe = safety_layer(query)
    if safe:
        return {"tier": "SAFETY", "response": safe, "latency": round(time.time() - start, 3)}

    curated_resp, score = similarity_match(query)
    if curated_resp:
        return {
            "tier": "CURATED_DATASET",
            "confidence": round(score, 3),
            "response": curated_resp,
            "latency": round(time.time() - start, 3)
        }

    context = rag_retrieve(query)
    return {"tier": "RAG", "response": context, "latency": round(time.time() - start, 3)}


ASK() query code block.

In [13]:
# =========================================
# FINAL ASSISTANT PIPELINE (PROFESSIONAL)
# =========================================

def ask(query):
    start = time.time()

    # -------------------------
    # 1️⃣ SAFETY FIRST
    # -------------------------
    safe = advanced_safety_layer(query)

    if safe:
        return {
            "tier": "SAFETY",
            "response": safe,
            "latency": round(time.time() - start, 3)
        }

    # -------------------------
    # 2️⃣ CURATED DATASET PRIORITY
    # -------------------------
    curated_resp, score = similarity_match(query)

    if curated_resp:
        professional_response = (
            f"{curated_resp}\n\n"
            "For exact calculations, eligibility confirmation, or policy details, "
            "please refer to your loan agreement or contact official customer support."
        )

        return {
            "tier": "CURATED_DATASET",
            "confidence": round(score, 3),
            "response": professional_response,
            "latency": round(time.time() - start, 3)
        }

    # -------------------------
    # 3️⃣ RAG FALLBACK
    # -------------------------
    context = rag_retrieve(query)

    professional_response = (
        "Based on available policy information:\n\n"
        f"{context}\n\n"
        "For precise financial calculations or account-specific decisions, "
        "please consult official servicing channels."
    )

    return {
        "tier": "RAG",
        "response": professional_response,
        "latency": round(time.time() - start, 3)
    }


## Interactive Mode

In [14]:
def start_chat():
    print("🤖 BFSI AI Assistant (Local Model)")
    print("Type 'exit' to stop.\n")

    while True:
        query = input("You: ").strip()

        if query.lower() == "exit":
            print("\nAssistant: Thank you. Stay safe! 👋")
            break

        result = ask(query)

        print("\n--- Response ---")
        print("Tier:", result["tier"])
        if "confidence" in result:
            print("Confidence:", result["confidence"])
        print("Latency:", result["latency"], "sec")
        print("Assistant:", result["response"])
        print("-" * 60)

start_chat()


🤖 BFSI AI Assistant (Local Model)
Type 'exit' to stop.

You: If a borrower requests early foreclosure after 10 EMIs under a reducing balance method, how is the outstanding principal recalculated and what charges may apply?

--- Response ---
Tier: RAG
Latency: 0.079 sec
Assistant: Based on available policy information:

EMI is calculated using reducing balance method.
Late payment charges are mentioned in sanction letter.

For precise financial calculations or account-specific decisions, please consult official servicing channels.
------------------------------------------------------------
You: stop

--- Response ---
Tier: RAG
Latency: 0.039 sec
Assistant: Based on available policy information:

Never share OTP or CVV with anyone.
Dispute resolution follows RBI-compliant timelines.

For precise financial calculations or account-specific decisions, please consult official servicing channels.
------------------------------------------------------------
You: exit 

Assistant: Thank you. S


Loan approval depends on income, repayment history, and credit score.
EMI is calculated using standard amortization formula.
Prepayment charges may apply depending on loan agreement.
Interest rates are determined based on risk category.
All calculations are subject to bank policy guidelines.


In [None]:
import pandas as pd
df = pd.read_csv('loan_eligibility_dataset.csv')
df.head()
