# LLM Deployment for Shoplite (Colab-ready)
This notebook is **self-contained**. It embeds the Shoplite knowledge base, builds a FAISS index with `sentence-transformers`, loads an open-source LLM (attempts Llama 3.1 8B, with safe fallbacks and quantized loading), exposes a Flask API (`/chat`, `/ping`, `/health`) and uses `pyngrok` to create a public tunnel.

**IMPORTANT**: Instructors must supply their ngrok authtoken and (optionally) Hugging Face token via `input()` prompts at runtime. **Do NOT** hardcode tokens.

**Contents**
- Knowledge base (15 documents)
- Structured prompts
- Ground-truth Q&A (20 items)
- Evaluation checklist (30 tests)
- RAG pipeline: embeddings (SentenceTransformers) + FAISS
- Model loading with quantization and safe fallback
- Flask API exposed via ngrok (secure token input)
- Quick tests at the end


In [None]:
# Cell 1: Install dependencies (run in Colab runtime with GPU)
# Note: These installs may take several minutes.
!pip install --quiet --upgrade pip
!pip install --quiet transformers accelerate bitsandbytes safetensors sentence-transformers faiss-cpu flask pyngrok uvicorn gunicorn huggingface_hub -U
print('Dependencies installed (or already present).')

In [None]:
# Cell 2: Imports and utilities
import os, time, json, threading, logging
from typing import List, Dict, Any
from flask import Flask, request, jsonify
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from pyngrok import ngrok
from huggingface_hub import login as hf_login

logging.basicConfig(level=logging.INFO)
def safe_json(obj):
    try:
        return json.loads(json.dumps(obj, default=lambda o: o.tolist() if isinstance(o, np.ndarray) else str(o)))
    except Exception:
        return str(obj)

In [None]:
KNOWLEDGE_BASE = [
{"id": "doc1", "title": "Shoplite User Registration and Account Management", "content": "To create a Shoplite account, users must visit the registration page and provide a valid email address, password, and basic profile information. Email verification is required within 24 hours. Users can choose between buyer accounts (free) or seller accounts (requires business verification and tax information). Account management features include updating personal information, changing passwords, setting security questions, managing notification preferences, and deactivating accounts (which requires email confirmation and may affect active orders or subscriptions). Buyers can browse products, make purchases, track orders, and leave reviews. Sellers have access to a seller dashboard with inventory management, order processing, and analytics. Security measures such as two-factor authentication are recommended; password recovery works via email and phone verification."},
{"id": "doc2", "title": "Shoplite Product Search and Filtering Features", "content": "Shoplite provides a powerful search engine that supports keyword queries, category selection, brand filters, price range, rating, availability, seller location, shipping speed, promotions, and eco-friendly options. Features include autocomplete suggestions, spelling correction, save searches and alerts, faceted navigation to combine filters, real-time indexing for large catalogs, and a mobile-responsive interface."},
{"id": "doc3", "title": "Shoplite Shopping Cart and Checkout Process", "content": "The shopping cart allows users to add multiple items from different sellers, review quantities, apply promo codes or gift cards, and save items for later. Carts are preserved across sessions for logged-in users. Checkout steps include selecting shipping (standard, expedited, same-day), choosing a payment method (credit/debit cards, digital wallets, cash-on-delivery), and order confirmation. Shoplite integrates PCI-DSS compliant gateways, real-time stock updates, order confirmation emails with tracking, seller notifications for new orders, and an integrated returns and refunds system."},
{"id": "doc4", "title": "Shoplite Payment Methods and Security", "content": "Accepted payment methods include credit/debit cards, PayPal, Apple Pay, Google Pay, and local solutions. Security measures include SSL/TLS encryption, PCI-DSS compliance, fraud detection systems, two-factor authentication, and encryption of sensitive information both in transit and at rest (AES-256). The platform supports digital wallet integration and a structured dispute/chargeback process. Seller payouts occur after order confirmation with transparent handling of fees."},
{"id": "doc5", "title": "Shoplite Order Tracking and Delivery", "content": "Shoplite provides real-time tracking with confirmation emails and a unique tracking number. Order stages typically progress: confirmed \u2192 processing \u2192 shipped \u2192 in transit \u2192 delivered. Delivery modification requests may be possible with seller approval. International shipments display customs/import duties. The logistics system provides estimated arrivals, delay notifications, and support assistance for lost or delayed packages."},
{"id": "doc6", "title": "Shoplite Return and Refund Policies", "content": "Shoplite typically allows returns within a 30-day period from delivery, though exceptions exist for digital or personalized goods. The return process requires selecting the order/item, specifying a reason, and, if eligible, using a prepaid label. Refunds are processed in 5\u20137 business days to the original payment method. Sellers must comply with return policies to maintain good ratings; dispute resolution mechanisms are available."},
{"id": "doc7", "title": "Shoplite Product Reviews and Ratings", "content": "Buyers can rate products on a five-star scale and leave comments. Reviews are moderated for policy compliance and sellers can respond to reviews. Ratings affect search ranking; verified purchase badges increase trust. Aggregate ratings and review analytics are available to help sellers understand customer feedback."},
{"id": "doc8", "title": "Shoplite Seller Account Setup and Management", "content": "To become a seller, register via the seller registration page and provide required business documents and tax verification. The seller dashboard includes inventory management, order processing, sales analytics, and product listing tools (single or bulk via CSV/API). Sellers configure branding, policies, shipping, and returns. Notifications for new orders, low stock, and buyer inquiries are provided. Pricing, promotions, fee management, and third-party integrations are supported. Performance metrics are tracked and communicated."},
{"id": "doc9", "title": "Shoplite Inventory Management for Sellers", "content": "Inventory tools allow sellers to track stock levels, set reorder thresholds, receive low-stock alerts, and perform bulk imports. Product variants like size and color are supported. Inventory reports reveal trends and seasonal demand, and sellers can manage multiple warehouses and shipping origins."},
{"id": "doc10", "title": "Shoplite Commission and Fee Structure", "content": "Shoplite charges commission fees that vary by product category. Additional fees may apply for premium listings, promotions, or special services. Fees are transparently shown in the seller dashboard. Payouts are made after commission deductions on a scheduled basis (weekly or bi-weekly). Transaction reports and pricing guidance are provided."},
{"id": "doc11", "title": "Shoplite Customer Support Procedures", "content": "Support channels include live chat, email, phone, and a 24/7 AI chatbot. Tickets are categorized by type (orders, payments, returns, technical, account management) and tracked with unique ticket IDs. Backend integrations surface order and payment details to support agents. There is a dedicated seller support channel and a help center with guides, FAQs, and videos to assist users."},
{"id": "doc12", "title": "Shoplite Mobile App Features", "content": "The Shoplite mobile app (iOS & Android) supports browsing, filtering, adding to cart, and checkout. It offers push notifications for promotions and order updates, barcode scanning, QR code payments, mobile wallet integration, biometric logins (fingerprint/Face ID), seller management on the go, and offline caching of previously viewed content for intermittent connectivity."},
{"id": "doc13", "title": "Shoplite API Documentation for Developers", "content": "Shoplite offers RESTful API endpoints for product catalog, orders, accounts, and inventory. Authentication uses OAuth 2.0. Rate limits apply with higher tiers for verified partners. Detailed docs show request/response formats, parameters, and error codes. Webhooks enable real-time event notifications and a sandbox environment is available for testing. APIs are versioned for backward compatibility."},
{"id": "doc14", "title": "Shoplite Security and Privacy Policies", "content": "Data protection uses TLS for transport and AES-256 encryption at rest. Access is limited to authorized personnel and audit logs are maintained. Shoplite complies with GDPR and CCPA and provides users controls over their personal data (export/delete). Security monitoring and incident response processes are in place; policy changes are communicated to users."},
{"id": "doc15", "title": "Shoplite Promotional Codes and Discounts", "content": "Sellers can create promotional codes (percentage, fixed amount, conditional), schedule start/end dates, set usage limits and minimum purchase thresholds, and target special events. Codes are validated at checkout automatically. Analytics track redemptions and revenue impact; users receive notifications for active promotions."},
]

In [None]:
# Cell 4: Prompts embedded as a Python dict (converted from assistant-prompts.yml)
PROMPTS = {
    "version": "1.0",
    "created": "2025-09-26",
    "author": "Joseph Chamoun",
    "base_retrieval_prompt": {
        "role": "You are a knowledgeable Shoplite customer service assistant.",
        "goal": "Provide accurate answers using only the provided Shoplite documentation.",
        "context_guidelines": ["Use only information from the provided document snippets.", "Cite specific documents when possible."],
        "response_format": "Answer: [Your response based on context]\nSources: [List document titles referenced]"
    },
    "multi_doc_prompt": {
        "role": "You are an expert assistant trained to synthesize information from multiple Shoplite documents.",
        "goal": "Combine relevant information from multiple sources to answer complex customer queries.",
        "context_guidelines": ["Retrieve and integrate information from all relevant documents.", "Provide step-by-step guidance if needed.", "Avoid adding information not present in the documents."],
        "response_format": "Answer: [Synthesized answer]\nSources: [List all documents referenced]"
    },
    "clarification_prompt": {
        "role": "You are a helpful Shoplite assistant that clarifies ambiguous or incomplete queries.",
        "goal": "Ask for clarification politely when the user query is unclear or insufficient.",
        "context_guidelines": ["Do not guess answers if the query is unclear.", "Suggest specific questions or information needed."],
        "response_format": "Response: [Clarifying question or instruction to user]"
    },
    "refusal_prompt": {
        "role": "You are a responsible Shoplite assistant that only answers questions when relevant context is available.",
        "goal": "Politely refuse to answer if the requested information is not found in the knowledge base.",
        "context_guidelines": ["Do not hallucinate information.", "Provide guidance on where the user may find help."],
        "response_format": "Response: I’m sorry, I could not find relevant information in the documentation. Please check Shoplite support or provide more details."
    }
}


In [None]:
GROUND_TRUTH_QA = [
    {
        "q": "How do I create a seller account on Shoplite?",
        "context": "Document 8: Seller Account Setup and Management",
        "answer": "To create a seller account, visit the Shoplite seller registration page, provide business information including tax ID, and complete the verification process which takes 2-3 business days.",
        "required": [
            "seller registration",
            "business verification",
            "2-3 business days"
        ],
        "forbidden": [
            "instant approval",
            "no verification required",
            "personal accounts"
        ]
    },
    {
        "q": "What are Shoplite's return policies and how do I track my order status?",
        "context": "Document 6: Return and Refund Policies + Document 5: Order Tracking and Delivery",
        "answer": "Shoplite allows returns within a 30-day window for most products, requiring a return authorization code. Orders can be tracked via the user account or mobile app, showing current shipping status and estimated delivery dates.",
        "required": [
            "30-day return window",
            "order tracking",
            "return authorization"
        ],
        "forbidden": [
            "no returns accepted",
            "lifetime returns"
        ]
    },
    {
        "q": "How can a buyer apply a promotional code during checkout?",
        "context": "Document 15: Promotional Codes and Discounts",
        "answer": "Buyers can enter valid promotional codes at checkout. Discounts are automatically applied if eligibility criteria such as minimum purchase or usage limits are met.",
        "required": [
            "promotional code",
            "checkout",
            "discount applied"
        ],
        "forbidden": [
            "codes not verified",
            "discounts guaranteed"
        ]
    },
    {
        "q": "What payment methods are available and how is security ensured?",
        "context": "Document 4: Payment Methods and Security",
        "answer": "Shoplite supports credit/debit cards, PayPal, and mobile wallets. Payments are encrypted with TLS, and sensitive data is stored securely using AES-256 encryption. Two-factor authentication is available for extra account security.",
        "required": [
            "credit/debit cards",
            "PayPal",
            "encrypted",
            "AES-256"
        ],
        "forbidden": [
            "payments not secure",
            "no encryption"
        ]
    },
    {
        "q": "How do I search and filter products on Shoplite?",
        "context": "Document 2: Product Search and Filtering Features",
        "answer": "Users can search products by keywords, categories, price range, ratings, and availability. Filters can be applied individually or in combination to refine search results.",
        "required": [
            "keywords",
            "categories",
            "filters",
            "price range"
        ],
        "forbidden": [
            "no filters available",
            "search inaccurate"
        ]
    },
    {
        "q": "How does Shoplite manage inventory for sellers?",
        "context": "Document 9: Inventory Management for Sellers",
        "answer": "Sellers can add, update, or remove products via their dashboard. Inventory counts are automatically updated when purchases are made, and low-stock alerts notify sellers to restock items.",
        "required": [
            "inventory management",
            "dashboard",
            "low-stock alerts"
        ],
        "forbidden": [
            "manual inventory only",
            "no notifications"
        ]
    },
    {
        "q": "What are the commission rates and fee structures for sellers?",
        "context": "Document 10: Commission and Fee Structure",
        "answer": "Shoplite charges a standard 5% commission on each sale plus a small listing fee. Fees may vary depending on category and promotional campaigns.",
        "required": [
            "5% commission",
            "listing fee",
            "category-specific"
        ],
        "forbidden": [
            "hidden fees",
            "no commission"
        ]
    },
    {
        "q": "How can users leave product reviews and ratings?",
        "context": "Document 7: Product Reviews and Ratings",
        "answer": "After purchase, users can leave reviews and ratings on product pages. Reviews require a verified purchase and may include text, images, and star ratings from 1 to 5.",
        "required": [
            "verified purchase",
            "reviews",
            "ratings",
            "star ratings"
        ],
        "forbidden": [
            "fake reviews allowed",
            "no rating system"
        ]
    },
    {
        "q": "What customer support options are available on Shoplite?",
        "context": "Document 11: Customer Support Procedures",
        "answer": "Users can access live chat, email, phone support, and an AI chatbot. Tickets are tracked with unique IDs, and response times vary by issue complexity.",
        "required": [
            "live chat",
            "email",
            "AI chatbot",
            "ticket ID"
        ],
        "forbidden": [
            "no support available",
            "response guaranteed immediately"
        ]
    },
    {
        "q": "How can users manage their accounts?",
        "context": "Document 1: User Registration and Account Management",
        "answer": "Users can update profile info, change passwords, enable two-factor authentication, and view order history from the account dashboard.",
        "required": [
            "update profile",
            "change password",
            "two-factor authentication"
        ],
        "forbidden": [
            "cannot update info",
            "no security features"
        ]
    },
    {
        "q": "How do mobile app features differ from the web platform?",
        "context": "Document 12: Mobile App Features",
        "answer": "The mobile app supports browsing, checkout, push notifications, QR code scanning, and offline access. Sellers can manage stores and orders on the go.",
        "required": [
            "mobile app",
            "push notifications",
            "QR code",
            "offline access"
        ],
        "forbidden": [
            "less functional than web",
            "cannot manage orders"
        ]
    },
    {
        "q": "How do developers integrate with Shoplite APIs?",
        "context": "Document 13: API Documentation for Developers",
        "answer": "Developers can use RESTful endpoints with OAuth 2.0 authentication to access products, orders, and inventory. Webhooks notify real-time events, and sandbox testing is supported.",
        "required": [
            "RESTful API",
            "OAuth 2.0",
            "webhooks",
            "sandbox"
        ],
        "forbidden": [
            "no API support",
            "manual integration only"
        ]
    },
    {
        "q": "What privacy measures protect user data?",
        "context": "Document 14: Security and Privacy Policies",
        "answer": "Shoplite encrypts data at rest and in transit, limits access to authorized personnel, and complies with GDPR and CCPA. Users can download or delete personal information.",
        "required": [
            "encrypted",
            "GDPR",
            "CCPA",
            "personal data control"
        ],
        "forbidden": [
            "data shared without consent",
            "no encryption"
        ]
    },
    {
        "q": "How are orders tracked and delivered?",
        "context": "Document 5: Order Tracking and Delivery",
        "answer": "Users can track shipments via account dashboard or app. Delivery times depend on courier and location, with real-time updates provided.",
        "required": [
            "track shipments",
            "real-time updates",
            "courier"
        ],
        "forbidden": [
            "cannot track",
            "no delivery updates"
        ]
    },
    {
        "q": "How are discounts applied for special events?",
        "context": "Document 15: Promotional Codes and Discounts",
        "answer": "Sellers can schedule discounts for events, and buyers are notified. Eligibility is verified automatically during checkout, and analytics track performance.",
        "required": [
            "special events",
            "automatic eligibility",
            "analytics"
        ],
        "forbidden": [
            "manual verification only",
            "no notifications"
        ]
    },
    {
        "q": "How are product listings verified for quality?",
        "context": "Document 8: Seller Account Setup and Management + Document 9: Inventory Management for Sellers",
        "answer": "Shoplite reviews product listings for compliance with policies. Inventory and product information must be accurate, and low-quality or prohibited items are flagged or removed.",
        "required": [
            "listing verification",
            "policy compliance",
            "low-quality items"
        ],
        "forbidden": [
            "no verification",
            "prohibited items allowed"
        ]
    },
    {
        "q": "How can multiple documents be used to answer complex questions?",
        "context": "Document 5: Order Tracking and Delivery + Document 6: Return and Refund Policies",
        "answer": "For questions about both delivery and returns, Shoplite combines information from multiple documents, providing step-by-step guidance on return eligibility, tracking, and refunds.",
        "required": [
            "multi-document synthesis",
            "tracking",
            "return eligibility"
        ],
        "forbidden": [
            "single document only",
            "incomplete guidance"
        ]
    },
    {
        "q": "How does Shoplite prevent unauthorized account access?",
        "context": "Document 1: User Registration and Account Management + Document 14: Security and Privacy Policies",
        "answer": "Shoplite enforces strong passwords, two-factor authentication, and monitors for suspicious login activity to prevent unauthorized access.",
        "required": [
            "two-factor authentication",
            "strong password",
            "monitoring"
        ],
        "forbidden": [
            "weak passwords allowed",
            "no monitoring"
        ]
    },
    {
        "q": "How can sellers analyze promotional campaign effectiveness?",
        "context": "Document 15: Promotional Codes and Discounts",
        "answer": "Analytics track discount redemptions, revenue impact, and customer engagement, helping sellers optimize future campaigns.",
        "required": [
            "analytics",
            "redemption rates",
            "customer engagement"
        ],
        "forbidden": [
            "no tracking",
            "guesswork only"
        ]
    },
    {
        "q": "How does the LLM retrieve information for user questions?",
        "context": "All documents (1\u201315)",
        "answer": "The RAG system retrieves relevant documents from the Shoplite knowledge base using embeddings and FAISS, then feeds them to the LLM to generate precise answers based on retrieved content.",
        "required": [
            "RAG system",
            "retrieval",
            "FAISS",
            "knowledge base"
        ],
        "forbidden": [
            "hallucinated answers",
            "no retrieval"
        ]
    }
]

In [None]:
EVALS = {
    "retrieval_tests": [
        {
            "id": "R01",
            "question": "How do I create a seller account on Shoplite?",
            "expected": [
                "Shoplite Seller Account Setup and Management"
            ]
        },
        {
            "id": "R02",
            "question": "What is Shoplite's return policy?",
            "expected": [
                "Shoplite Return and Refund Policies"
            ]
        },
        {
            "id": "R03",
            "question": "How does Shoplite handle payments?",
            "expected": [
                "Shoplite Payment Methods and Security"
            ]
        },
        {
            "id": "R04",
            "question": "What is the delivery time for orders?",
            "expected": [
                "Shoplite Order Tracking and Delivery"
            ]
        },
        {
            "id": "R05",
            "question": "How do I reset my Shoplite password?",
            "expected": [
                "Shoplite User Registration and Account Management"
            ]
        },
        {
            "id": "R06",
            "question": "Does Shoplite offer seller support?",
            "expected": [
                "Shoplite Seller Account Setup and Management",
                "Shoplite Customer Support Procedures"
            ]
        },
        {
            "id": "R07",
            "question": "What customer support options are available?",
            "expected": [
                "Shoplite Customer Support Procedures"
            ]
        },
        {
            "id": "R08",
            "question": "How are disputes resolved on Shoplite?",
            "expected": [
                "Shoplite Return and Refund Policies"
            ]
        },
        {
            "id": "R09",
            "question": "Are there transaction fees for sellers?",
            "expected": [
                "Shoplite Commission and Fee Structure"
            ]
        },
        {
            "id": "R10",
            "question": "How can I track my order?",
            "expected": [
                "Shoplite Order Tracking and Delivery"
            ]
        }
    ],
    "response_tests": [
        {
            "id": "Q01",
            "question": "How do I create a seller account?",
            "required": [
                "seller registration",
                "business verification",
                "2-3 business days"
            ],
            "forbidden": [
                "instant approval"
            ]
        },
        {
            "id": "Q02",
            "question": "What is Shoplite's return policy?",
            "required": [
                "return",
                "refund",
                "30 days"
            ],
            "forbidden": [
                "no returns"
            ]
        },
        {
            "id": "Q03",
            "question": "How does payment security work?",
            "required": [
                "encryption",
                "fraud detection"
            ],
            "forbidden": [
                "unsafe"
            ]
        },
        {
            "id": "Q04",
            "question": "What are the shipping options?",
            "required": [
                "standard",
                "express"
            ],
            "forbidden": [
                "no shipping"
            ]
        },
        {
            "id": "Q05",
            "question": "How do I reset my password?",
            "required": [
                "reset",
                "verification link"
            ],
            "forbidden": [
                "impossible"
            ]
        },
        {
            "id": "Q06",
            "question": "Does Shoplite provide seller support?",
            "required": [
                "dedicated channel",
                "help center"
            ],
            "forbidden": [
                "no support"
            ]
        },
        {
            "id": "Q07",
            "question": "What customer support options are available?",
            "required": [
                "live chat",
                "email",
                "phone",
                "24/7"
            ],
            "forbidden": [
                "no support"
            ]
        },
        {
            "id": "Q08",
            "question": "How are disputes resolved?",
            "required": [
                "resolution center",
                "buyer protection"
            ],
            "forbidden": [
                "no process"
            ]
        },
        {
            "id": "Q09",
            "question": "What are the seller fees?",
            "required": [
                "transaction fee",
                "commission"
            ],
            "forbidden": [
                "free forever"
            ]
        },
        {
            "id": "Q10",
            "question": "How do I track my order?",
            "required": [
                "tracking ID",
                "status updates"
            ],
            "forbidden": [
                "not possible"
            ]
        },
        {
            "id": "Q11",
            "question": "Can I cancel an order?",
            "required": [
                "cancellation",
                "before shipment"
            ],
            "forbidden": [
                "never allowed"
            ]
        },
        {
            "id": "Q12",
            "question": "How are returns processed?",
            "required": [
                "inspection",
                "refund issued"
            ],
            "forbidden": [
                "automatic approval"
            ]
        },
        {
            "id": "Q13",
            "question": "What are Shoplite\u2019s account security features?",
            "required": [
                "2FA",
                "encryption"
            ],
            "forbidden": [
                "insecure"
            ]
        },
        {
            "id": "Q14",
            "question": "What payment methods are accepted?",
            "required": [
                "credit card",
                "PayPal",
                "Shoplite Wallet"
            ],
            "forbidden": [
                "cash only"
            ]
        },
        {
            "id": "Q15",
            "question": "Does Shoplite protect buyers?",
            "required": [
                "fraud protection",
                "guarantee"
            ],
            "forbidden": [
                "unprotected"
            ]
        }
    ],
    "edge_cases": [
        {
            "id": "E01",
            "scenario": "Question about a product not in the KB",
            "expected": "Refusal with explanation"
        },
        {
            "id": "E02",
            "scenario": "Vague question",
            "expected": "Ask for clarification"
        },
        {
            "id": "E03",
            "scenario": "Off-topic question",
            "expected": "Refusal: out-of-scope"
        },
        {
            "id": "E04",
            "scenario": "Contradictory question",
            "expected": "Policy correction"
        },
        {
            "id": "E05",
            "scenario": "Multi-domain question",
            "expected": "Split answer covering both parts"
        }
    ]
}

In [None]:
# Cell 7: Embeddings + FAISS setup
EMBED_MODEL_NAME = 'sentence-transformers/all-MiniLM-L6-v2'
print('Loading embedding model:', EMBED_MODEL_NAME)
embed_model = SentenceTransformer(EMBED_MODEL_NAME)

DOCUMENT_TEXTS = [d['title'] + '\n\n' + d['content'] for d in KNOWLEDGE_BASE]
DOC_IDS = [d['id'] for d in KNOWLEDGE_BASE]

print('Encoding documents...')
doc_embeddings = embed_model.encode(DOCUMENT_TEXTS, convert_to_numpy=True, show_progress_bar=True)

def normalize_embeddings(embs):
    norms = np.linalg.norm(embs, axis=1, keepdims=True)
    return embs / np.clip(norms, a_min=1e-10, a_max=None)

doc_embeddings = normalize_embeddings(doc_embeddings)
d = doc_embeddings.shape[1]
index = faiss.IndexFlatIP(d)
index.add(doc_embeddings)
print(f'FAISS index created with {index.ntotal} vectors (dim={d})')

In [None]:
# Cell 8: Retrieval function
def retrieve_docs(query: str, top_k: int = 3):
    q_emb = embed_model.encode([query], convert_to_numpy=True)
    q_emb = normalize_embeddings(q_emb)
    scores, indices = index.search(q_emb, top_k)
    results = []
    for score, idx in zip(scores[0], indices[0]):
        if idx < 0 or idx >= len(KNOWLEDGE_BASE):
            continue
        doc = KNOWLEDGE_BASE[idx]
        results.append({
            'id': doc['id'],
            'title': doc['title'],
            'content': doc['content'],
            'score': float(score)
        })
    return results

# Quick example
print(retrieve_docs('How do I create a seller account on Shoplite?', top_k=3))

In [None]:
# Cell 9: Model loading with quantization and fallbacks
from transformers import BitsAndBytesConfig
MODEL_NAME = 'meta-llama/Llama-3.1-8B-Instruct'
FALLBACK_MODEL = 'tiiuae/falcon-7b-instruct'

use_cuda = torch.cuda.is_available()
print('CUDA available:', use_cuda)

model = None
tokenizer = None

# Hugging Face login (optional) - input prompt (DO NOT hardcode tokens)
hf_token = input('🔑 Enter your Hugging Face token (or press Enter to skip): ').strip()
if hf_token:
    try:
        hf_login(hf_token)
        print('Hugging Face login successful.')
    except Exception as e:
        print('Hugging Face login failed:', e)

# Prepare BitsAndBytes config (best-effort)
try:
    bnb_config = BitsAndBytesConfig(load_in_8bit=True, llm_int8_threshold=6.0)
except Exception:
    bnb_config = None

def try_load_model(name, quant_config=None, dtype=torch.float16):
    try:
        tok = AutoTokenizer.from_pretrained(name, use_fast=True)
        if quant_config is not None:
            mdl = AutoModelForCausalLM.from_pretrained(name, device_map='auto' if use_cuda else None, quantization_config=quant_config, low_cpu_mem_usage=True, torch_dtype=dtype)
        else:
            mdl = AutoModelForCausalLM.from_pretrained(name, device_map='auto' if use_cuda else None, low_cpu_mem_usage=True, torch_dtype=dtype)
        mdl.eval()
        return tok, mdl
    except Exception as e:
        print(f'Load failed for {name}:', e)
        return None, None

# Try primary model then fallback
if bnb_config is not None:
    tokenizer, model = try_load_model(MODEL_NAME, quant_config=bnb_config)
if model is None:
    print('Primary model failed or not accessible; trying fallback.')
    tokenizer, model = try_load_model(FALLBACK_MODEL, quant_config=bnb_config)
if model is None:
    print('No model loaded - notebook will still perform retrieval, but generation endpoints will return extraction-based responses.')

In [None]:
# Cell 10: Prompt builder + generator (RAG)
TEMPERATURE = 0.7
MAX_TOKENS = 120
SIMILARITY_THRESHOLD = 0.25

def build_prompt_from_retrieval(query: str, retrieved_docs: List[Dict[str,Any]]):
    docs_text = ''
    for i, doc in enumerate(retrieved_docs[:5], 1):
        content = doc['content'][:400] + ('...' if len(doc['content'])>400 else '')
        docs_text += f"\n[Document {i}: {doc['title']}]\n{content}\n"
    prompt = (f"You are a helpful Shoplite customer service assistant.\n\n"
              f"Use these documents to answer the user's question. Cite documents in 'Sources:'.{docs_text}\n"
              f"Question: {query}\nAnswer in 2-3 sentences:")
    return prompt

def generate_response(query: str, top_k: int = 5, temperature: float = TEMPERATURE, max_tokens: int = MAX_TOKENS, debug: bool=False):
    # Basic greetings / help shortcuts
    ql = query.lower().strip()
    if any(g in ql for g in ['hi','hello','hey']):
        return {'answer':'Hello! I\'m here to help with Shoplite. What do you need?','sources':[],'confidence':'High'}
    if ql in ['help','can you help','help me']:
        return {'answer':'I can help with Shoplite registration, orders, payments, returns, seller accounts, and more. What do you need?','sources':[],'confidence':'High'}
    # Retrieve
    retrieved = retrieve_docs(query, top_k=top_k)
    if not retrieved:
        return {'answer':'I don\'t have information about that in the Shoplite docs.','sources':[],'confidence':'Low'}
    top_score = max(d['score'] for d in retrieved)
    if top_score < SIMILARITY_THRESHOLD:
        return {'answer':"I\'m sorry — that appears outside my knowledge base. Please provide more details or check Shoplite Help Center.",'sources':[],'confidence':'Low'}
    # Confidence buckets
    if top_score >= 0.6: conf='High'
    elif top_score >= 0.4: conf='Medium'
    else: conf='Low'
    # If model not loaded, return extractive answer
    if model is None:
        parts = []
        for d in retrieved[:3]:
            sentence = d['content'].split('.')[:1]
            if sentence: parts.append(sentence[0].strip()+'.')
        answer = ' '.join(parts)[:800]
        return {'answer':answer,'sources':[d['title'] for d in retrieved[:3]],'confidence':conf}
    # Build prompt and generate
    prompt = build_prompt_from_retrieval(query, retrieved)
    inputs = tokenizer(prompt, return_tensors='pt', truncation=True, max_length=1536)
    if torch.cuda.is_available():
        inputs = {k:v.cuda() for k,v in inputs.items()}
    with torch.no_grad():
        outputs = model.generate(**inputs, do_sample=True, temperature=temperature, max_new_tokens=max_tokens, top_p=0.9, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id)
    generated = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True).strip()
    # Assemble sources and return
    return {'answer':generated,'sources':[d['title'] for d in retrieved[:3]],'confidence':conf}

In [None]:
# Cell 11: Flask API and ngrok exposure
app = Flask(__name__)

@app.route('/health', methods=['GET'])
def health():
    return jsonify({'status':'ok','model_loaded': bool(model is not None), 'num_docs': len(KNOWLEDGE_BASE)})

@app.route('/ping', methods=['POST'])
def ping():
    data = request.json or {}
    text = data.get('text','')
    if not text:
        return jsonify({'error':'no text provided'}), 400
    # Direct LLM call (no retrieval) - if model absent, return informative message
    if model is None:
        return jsonify({'answer':'Model not loaded. Generation unavailable in this runtime.','sources':[],'confidence':'Low'})
    prompt = text
    inputs = tokenizer(prompt, return_tensors='pt', truncation=True, max_length=1536)
    if torch.cuda.is_available():
        inputs = {k:v.cuda() for k,v in inputs.items()}
    with torch.no_grad():
        outs = model.generate(**inputs, max_new_tokens=120, do_sample=True, temperature=0.7)
    gen = tokenizer.decode(outs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True).strip()
    return jsonify({'answer':gen})

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json or {}
    question = data.get('question') or data.get('text') or ''
    top_k = int(data.get('top_k', 5))
    if not question:
        return jsonify({'error':'no question provided'}), 400
    try:
        resp = generate_response(question, top_k=top_k)
        return jsonify(resp)
    except Exception as e:
        return jsonify({'error':str(e)}), 500

# Start ngrok tunnel (secure input for token)
ngrok_token = input('🔐 Enter your ngrok authtoken (required to expose the API) — paste and press Enter: ').strip()
if not ngrok_token:
    print('No ngrok token provided. You can still use the notebook locally, but the API will not be publicly exposed.')
else:
    ngrok.set_auth_token(ngrok_token)
    public_url = ngrok.connect(5000, bind_tls=True).public_url
    print('ngrok tunnel created at', public_url)
    # Run Flask in a thread so the notebook doesn't block
    def run_flask():
        app.run(host='0.0.0.0', port=5000)
    thread = threading.Thread(target=run_flask, daemon=True)
    thread.start()
    print('Flask app started on port 5000 (in background thread).')

In [None]:
# Cell 12: Quick tests (run after the server is up)
print('Health:', __import__('requests').get('http://127.0.0.1:5000/health').json())
# Example retrieval + generation (local)
print('Retrieve example:', retrieve_docs('How do I create a seller account on Shoplite?', top_k=3))
print('Generate example:', generate_response('How do I create a seller account on Shoplite?', top_k=3, debug=True))