# üöÄ EmpathicGateway: Interactive Backend Demo
### _Run the AI Brain of the system right here!_

This notebook allows you to execute the core logic of the EmpathicGateway backend. You will:
1.  **Initialize** the AI models.
2.  **Train** a fresh intent classifier on synthetic data.
3.  **Run Inference** on your own text to see Priority and PII masking in action.

---
### üõ†Ô∏è Step 1: Install Dependencies
Run this cell to ensure you have the required libraries.



In [2]:
!pip install sentence-transformers scikit-learn pandas joblib transformers numpy



### üß† Step 2: Define The AI Architecture
Here we define the `BertEmbedder` class, which connects our lightweight Logistic Regression to the powerful BERT Language Model.



In [3]:
import pandas as pd
import numpy as np
import re
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.base import BaseEstimator, TransformerMixin
from sentence_transformers import SentenceTransformer


# This is the exact class from backend/train_model.py
class BertEmbedder(BaseEstimator, TransformerMixin):
    def __init__(self, model_name="all-MiniLM-L6-v2"):
        self.model_name = model_name
        self.model = None

    def fit(self, X, y=None):
        print(f"üì• Loading BERT ({self.model_name})...")
        self.model = SentenceTransformer(self.model_name)
        return self

    def transform(self, X):
        if self.model is None:
            self.model = SentenceTransformer(self.model_name)

        # Handle inputs
        if hasattr(X, "tolist"):
            texts = X.tolist()
        else:
            texts = X

        return self.model.encode(texts, show_progress_bar=False)


def map_priority(intent):
    if intent in ["payment_issue", "fraud_report", "stolen_card"]:
        return 1  # CRITICAL
    elif intent in ["track_order", "cancel_order"]:
        return 2  # HIGH
    else:
        return 3  # NORMAL


print("‚úÖ Architecture Defined!")

‚úÖ Architecture Defined!


### üéì Step 3: Train the Model (Live!)
We will use the **Synthetic Dataset** strategy directly in this notebook. Notice how we explicitly teach the model about "Just Browsing" vs "Fraud".



In [4]:
# 1. Create Training Data
data = [
    # --- CRITICAL (Priority 1) ---
    {"text": "my wallet was stolen", "intent": "fraud_report"},
    {"text": "someone used my credit card", "intent": "fraud_report"},
    {"text": "unauthorized charge on my account", "intent": "payment_issue"},
    {"text": "i need to block my card immediately", "intent": "stolen_card"},
    # --- HIGH (Priority 2) ---
    {"text": "where is my order", "intent": "track_order"},
    {"text": "cancel my order please", "intent": "cancel_order"},
    {"text": "change my shipping address", "intent": "track_order"},
    # --- NORMAL (Priority 3) ---
    {"text": "hello", "intent": "greeting"},
    {"text": "just browsing thanks", "intent": "chit_chat"},
    {"text": "i am just looking around", "intent": "chit_chat"},
    {"text": "thank you for the help", "intent": "chit_chat"},
    {"text": "do you have this in blue", "intent": "product_question"},
]

# Multiply data to mimic real training volume
df = pd.DataFrame(data * 5)
df["priority"] = df["intent"].apply(map_priority)

print(f"üìö Dataset Created: {len(df)} samples")
print(df.head())

# 2. Build Pipeline
pipeline = Pipeline(
    [
        ("embedding", BertEmbedder(model_name="all-MiniLM-L6-v2")),
        ("classifier", LogisticRegression(C=1.0, max_iter=500)),
    ]
)

# 3. Train
print("\n‚öôÔ∏è Training Model... (This uses CPU, might take 10-20s)")
pipeline.fit(df["text"], df["intent"])
print("‚úÖ Model Trained Successfully!")

üìö Dataset Created: 60 samples
                                  text         intent  priority
0                 my wallet was stolen   fraud_report         1
1          someone used my credit card   fraud_report         1
2    unauthorized charge on my account  payment_issue         1
3  i need to block my card immediately    stolen_card         1
4                    where is my order    track_order         2

‚öôÔ∏è Training Model... (This uses CPU, might take 10-20s)
üì• Loading BERT (all-MiniLM-L6-v2)...
‚úÖ Model Trained Successfully!


  raw_prediction = X @ weights.T + intercept  # ndarray, likely C-contiguous
  raw_prediction = X @ weights.T + intercept  # ndarray, likely C-contiguous
  raw_prediction = X @ weights.T + intercept  # ndarray, likely C-contiguous
  grad[:, :n_features] = grad_pointwise.T @ X + l2_reg_strength * weights
  grad[:, :n_features] = grad_pointwise.T @ X + l2_reg_strength * weights
  grad[:, :n_features] = grad_pointwise.T @ X + l2_reg_strength * weights


### üõ°Ô∏è Step 4: PII Masking Logic
The backend creates a "Safe Text" version of every request. Here is the logic:



In [5]:
def mask_pii(text):
    safe_text = text
    detected_types = []

    # 1. Email Regex
    email_pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    if re.search(email_pattern, safe_text):
        safe_text = re.sub(email_pattern, "[EMAIL]", safe_text)
        detected_types.append("EMAIL")

    # 2. Credit Card Regex (Simple 16 digits)
    cc_pattern = r"(?:\d[ -]*?){13,16}"
    # Avoid false positives with simple check
    matches = re.findall(cc_pattern, safe_text)
    for m in matches:
        if len(re.sub(r"\D", "", m)) >= 13:
            safe_text = safe_text.replace(m, "[CREDIT_CARD]")
            if "CREDIT_CARD" not in detected_types:
                detected_types.append("CREDIT_CARD")

    return safe_text, detected_types


print("‚úÖ PII System Ready.")

‚úÖ PII System Ready.


### üéÆ Step 5: Interactive Demo
**Try it yourself!** Change the `text` variable below and run the cell.



In [6]:
# --- INPUT YOUR TEXT HERE ---
user_input = "I lost my wallet and my email is murat@test.com"
# ----------------------------

# 1. Safety First (PII)
safe_input, pii = mask_pii(user_input)

# 2. Model Prediction
prediction = pipeline.predict([safe_input])[0]
probs = pipeline.predict_proba([safe_input])[0]
confidence = max(probs)
priority = map_priority(prediction)

# 3. Visualization
priority_map = {1: "üî¥ CRITICAL", 2: "üü† HIGH", 3: "üü¢ NORMAL"}
priority_label = priority_map.get(priority, "UNKNOWN")

print(f"üìù Original: '{user_input}'")
print(f"üõ°Ô∏è Masked:   '{safe_input}'")
print("-" * 30)
print(f"üß† Intent:   {prediction.upper()}")
print(f"üö¶ Priority: {priority_label}")
print(f"üìä Conf:     {confidence:.1%}")

if pii:
    print(f"‚ö†Ô∏è PII Detected: {pii}")

üìù Original: 'I lost my wallet and my email is murat@test.com'
üõ°Ô∏è Masked:   'I lost my wallet and my email is [EMAIL]'
------------------------------
üß† Intent:   FRAUD_REPORT
üö¶ Priority: üî¥ CRITICAL
üìä Conf:     35.4%
‚ö†Ô∏è PII Detected: ['EMAIL']


  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b


### üß™ Validation: The "Just Browsing" Test
Let's verify our specific fix for the 'Just browsing' edge case.



In [8]:
test_cases = [
    "my card is stolen!",
    "I need a refund immediately",
    "Hello there",
    "Where is my stuff?",
]

print(f"{'INPUT':<30} | {'INTENT':<15} | {'PRIORITY'}")
print("-" * 60)

for text in test_cases:
    pred = pipeline.predict([text])[0]
    prio = map_priority(pred)
    label = {1: "CRITICAL", 2: "HIGH", 3: "NORMAL"}[prio]
    print(f"{text:<30} | {pred:<15} | {label}")

INPUT                          | INTENT          | PRIORITY
------------------------------------------------------------
my card is stolen!             | fraud_report    | CRITICAL
I need a refund immediately    | track_order     | HIGH
Hello there                    | greeting        | NORMAL
Where is my stuff?             | track_order     | HIGH


  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
  ret = a @ b
