# LLM Deployment for Shoplite (Colab-ready)

This notebook is self-contained. It embeds the Shoplite knowledge base, builds a FAISS index with `sentence-transformers`, loads Google's FLAN-T5 (instruction-tuned model), exposes a Flask API (`/chat`, `/ping`, `/health`) and uses `pyngrok` to create a public tunnel.

**IMPORTANT**: Instructors must supply their ngrok authtoken via an `input()` prompt at runtime. Do NOT hardcode tokens. FLAN-T5 is used instead of Llama for reliability and no authentication requirements.

In [None]:
# Cell 1: Install dependencies
# Run this cell in Colab (GPU runtime recommended but not required for FLAN-T5)
!pip install --quiet --upgrade pip
!pip install --quiet transformers accelerate sentence-transformers faiss-cpu flask pyngrok torch -U

In [None]:
# Cell 2: Imports and utilities
import os, time, json, threading
from typing import List, Dict, Any
from flask import Flask, request, jsonify
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer
import torch
from transformers import AutoTokenizer, T5ForConditionalGeneration
from pyngrok import ngrok

# Helper to safely serialize numpy arrays to JSON
class NpEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        return super().default(obj)

In [None]:
# Cell 3: Knowledge Base (embedded)
KNOWLEDGE_BASE = [
    {"id": "doc1", "title": "Shoplite User Registration and Account Management", "content": "To create a Shoplite account, users must visit the registration page and provide a valid email address, password, and basic profile information. Email verification is required within 24 hours. Users can choose between: - Buyer accounts (free) - Seller accounts (requires business verification and tax information). Account Management Features: Update personal information, Change passwords, Set security questions, Manage notification preferences, Deactivate accounts (requires email confirmation; may affect active orders/subscriptions). Buyer Access: product browsing, purchasing, order tracking, reviews. Seller Access: seller dashboard, inventory management, order processing, analytics. Security Measures: two-factor authentication recommended, password recovery via email and phone verification."},
    {"id": "doc2", "title": "Shoplite Product Search and Filtering Features", "content": "Shoplite provides a powerful search engine: Search Capabilities: keyword queries, category selection, brand filters. Filtering Options: price range, rating, availability, seller location, shipping speed, promotions, eco-friendly options. Features: Autocomplete suggestions, Spelling correction, Save searches & alerts, Faceted navigation, Optimized for large catalogs with real-time indexing, Mobile responsive interface."},
    {"id": "doc3", "title": "Shoplite Shopping Cart and Checkout Process", "content": "Add multiple items from different sellers; Review quantities, apply promo codes/gift cards; Cart preserved across sessions for logged-in users. Checkout Steps: 1. Shipping selection (standard, expedited, same-day) 2. Payment selection (credit/debit cards, digital wallets, cash-on-delivery) 3. Order confirmation. Security & Processing: PCI-DSS compliant payment gateways, Real-time stock updates, Order confirmation emails with tracking, Seller notifications for new orders, Integrated returns and refunds system."},
    {"id": "doc4", "title": "Shoplite Payment Methods and Security", "content": "Accepted Payment Methods: credit/debit cards, PayPal, Apple Pay, Google Pay, local solutions. Security Measures: SSL encryption, PCI-DSS compliance, fraud detection, two-factor authentication, sensitive info encrypted in transit and at rest. Other Features: digital wallet integration, structured dispute/chargeback process, seller payments after order confirmation."},
    {"id": "doc5", "title": "Shoplite Order Tracking and Delivery", "content": "Real-time tracking with confirmation emails and unique tracking number. Stages: confirmed -> processing -> shipped -> in transit -> delivered. Delivery modification requests (seller approval required). International shipments display customs/import duties. Optimized logistics with estimated arrival and delay notifications. Support assistance for lost/delayed packages."},
    {"id": "doc6", "title": "Shoplite Return and Refund Policies", "content": "Return Period: typically 30 days from delivery. Process: select order/item, specify reason, use prepaid label if eligible. Refunds: processed in 5–7 business days to original payment method. Digital/personalized items may have exceptions. Automated order status updates. Sellers must comply with policies to maintain ratings. Dispute resolution available."},
    {"id": "doc7", "title": "Shoplite Product Reviews and Ratings", "content": "Buyers rate products on a five-star scale and leave comments. Reviews moderated for compliance. Sellers can respond to reviews. Ratings influence search ranking. Verified purchase badges for authenticity. Aggregate ratings provided. Review analytics available for sellers."},
    {"id": "doc8", "title": "Shoplite Seller Account Setup and Management", "content": "Create seller account with business documents and tax verification. Seller Dashboard: inventory management, order processing, sales analytics. Product listing via individual or bulk upload (CSV/API). Profile customization: branding, policies, shipping, returns. Notifications: new orders, low stock, inquiries. Pricing, promotions, and shipping fee management. Performance metrics tracked; third-party integrations supported."},
    {"id": "doc9", "title": "Shoplite Inventory Management for Sellers", "content": "Track stock levels, reorder thresholds, and availability in real-time. Low-stock alerts. Bulk imports supported. Variants (size, color, bundle) supported. Inventory reports for trends and seasonal demand. Manage warehouses and shipping locations."},
    {"id": "doc10", "title": "Shoplite Commission and Fee Structure", "content": "Commission fees per product category. Additional fees: premium listings, promotions, special services. Transparent notifications in dashboard. Payments made after commission deduction (weekly/bi-weekly). Transaction reports available. Pricing guidance provided."},
    {"id": "doc11", "title": "Shoplite Customer Support Procedures", "content": "Support via live chat, email, phone, and AI chatbot (24/7). Ticket categorization: orders, payments, returns, technical, account management. Unique tracking IDs. Backend integration for order/payment info. Dedicated seller support channel. Help center with guides, FAQs, videos. Fast, transparent, fair resolution."},
    {"id": "doc12", "title": "Shoplite Mobile App Features", "content": "iOS & Android support. Browse, filter, add to cart, purchase. Push notifications for promotions/order updates. Barcode scanning and QR code payments. Mobile wallets, fingerprint, Face ID login. Seller management on-the-go. Offline caching for previously loaded content. Intuitive, responsive, accessible interface."},
    {"id": "doc13", "title": "Shoplite API Documentation for Developers", "content": "RESTful API endpoints: product catalog, orders, accounts, inventory. OAuth 2.0 authentication. Rate limiting (higher for verified partners). Detailed docs: request/response, parameters, error codes. Webhooks for real-time events. Sandbox environment for testing. Versioned API with backward-compatible updates."},
    {"id": "doc14", "title": "Shoplite Security and Privacy Policies", "content": "Data Protection: TLS encryption, AES-256 at rest, authorized access. Two-factor authentication & strong passwords. GDPR & CCPA compliance. Security monitoring for suspicious activity. Clear privacy policies: data collection, usage, third-party sharing. Policy change notifications; user control over data."},
    {"id": "doc15", "title": "Shoplite Promotional Codes and Discounts", "content": "Sellers create promotions: discount codes, seasonal sales, bundle offers. Code types: percentage, fixed, conditional. Start/end dates, usage limits, minimum purchase configurable. Automatic verification at checkout. Analytics: redemption, revenue, engagement. User notifications for active promotions. Special events highlighted on homepage/app. Compliance with platform policies."}
]

In [None]:
# Cell 4: Prompts embedded as Python dict
PROMPTS = {
    "version": "1.0",
    "created": "2025-09-26",
    "author": "Joseph Chamoun",
    "base_retrieval_prompt": {
        "role": "You are a helpful Shoplite customer service assistant.",
        "goal": "Provide accurate answers using only the provided Shoplite documentation.",
        "context_guidelines": ["Use only information from the provided document snippets", "Cite specific documents when possible"],
        "response_format": "Answer: [Your response based on context]\nSources: [List document titles referenced]"
    },
    "multi_doc_synthesis": {
        "role": "You are an expert Shoplite support agent who synthesizes multiple documents.",
        "goal": "Combine information from multiple retrieved documents to create a concise, accurate answer.",
        "context_guidelines": ["State which documents you used", "When information conflicts, show both options and recommend the safer/default one"],
        "response_format": "Answer: [Synthesis]\nSources: [Doc titles]\nConfidence: [High|Medium|Low]"
    }
}

In [None]:
# Cell 5: Build embeddings and FAISS index
EMBED_MODEL_NAME = 'sentence-transformers/all-MiniLM-L6-v2'
print('Loading embedding model:', EMBED_MODEL_NAME)
embed_model = SentenceTransformer(EMBED_MODEL_NAME)

DOCUMENT_TEXTS = [d['title'] + '\n\n' + d['content'] for d in KNOWLEDGE_BASE]
DOC_IDS = [d['id'] for d in KNOWLEDGE_BASE]

print('Encoding documents...')
doc_embeddings = embed_model.encode(DOCUMENT_TEXTS, convert_to_numpy=True, show_progress_bar=True)

def normalize_embeddings(embs):
    norms = np.linalg.norm(embs, axis=1, keepdims=True)
    return embs / np.clip(norms, a_min=1e-10, a_max=None)

doc_embeddings = normalize_embeddings(doc_embeddings)
d = doc_embeddings.shape[1]
index = faiss.IndexFlatIP(d)
index.add(doc_embeddings)
print(f'FAISS index created with {index.ntotal} vectors (dim={d})')

In [None]:
# Cell 6: Retrieval functions
def retrieve_docs(query: str, top_k: int = 3):
    q_emb = embed_model.encode([query], convert_to_numpy=True)
    q_emb = normalize_embeddings(q_emb)
    scores, indices = index.search(q_emb, top_k)
    results = []
    for score, idx in zip(scores[0], indices[0]):
        if idx < 0 or idx >= len(KNOWLEDGE_BASE):
            continue
        doc = KNOWLEDGE_BASE[idx]
        results.append({
            'id': doc['id'],
            'title': doc['title'],
            'content': doc['content'],
            'score': float(score)
        })
    return results

# Quick test
print('Testing retrieval:')
test_results = retrieve_docs('How do I create a seller account?', top_k=2)
for r in test_results:
    print(f"- {r['title']} (score: {r['score']:.3f})")

In [None]:
# Cell 7: Model loading - FLAN-T5
MODEL_NAME = 'google/flan-t5-large'
print('Loading model:', MODEL_NAME)
use_cuda = torch.cuda.is_available()
print('CUDA available:', use_cuda)

try:
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    model = T5ForConditionalGeneration.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.float16 if use_cuda else torch.float32,
        device_map='auto' if use_cuda else None
    )
    model.eval()
    print('Model loaded successfully!')
except Exception as e:
    print('Model loading failed:', e)
    model = None
    tokenizer = None

In [None]:
# Cell 8: Generation pipeline
def build_prompt(query: str, docs: List[Dict]):
    context = '\n\n'.join([f"Document: {d['title']}\n{d['content']}" for d in docs])
    return f"""Answer this question using only the provided Shoplite documentation.

Context:
{context}

Question: {query}

Answer:"""

def generate_response(query: str, top_k: int = 3):
    retrieved = retrieve_docs(query, top_k)
    
    if not model or not tokenizer:
        answer = 'Model not loaded. Retrieved documents:\n'
        for d in retrieved:
            answer += f"- {d['title']}\n"
        return {'answer': answer, 'sources': [d['title'] for d in retrieved]}
    
    try:
        prompt = build_prompt(query, retrieved)
        inputs = tokenizer(prompt, return_tensors='pt', max_length=1024, truncation=True)
        
        if use_cuda:
            inputs = {k: v.cuda() for k, v in inputs.items()}
            model.to('cuda')
        
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_length=256,
                num_beams=4,
                early_stopping=True,
                temperature=0.7
            )
        
        answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
        return {
            'answer': answer,
            'sources': [d['title'] for d in retrieved],
            'confidence': 'High' if retrieved and retrieved[0]['score'] > 0.7 else 'Medium'
        }
    except Exception as e:
        return {'answer': f'Error: {str(e)}', 'sources': [], 'confidence': 'Low'}

In [None]:
# Cell 9: Flask API
app = Flask(__name__)

@app.route('/health', methods=['GET'])
def health():
    return jsonify({
        'status': 'ok',
        'model_loaded': model is not None,
        'num_docs': len(KNOWLEDGE_BASE)
    })

@app.route('/ping', methods=['POST'])
def ping():
    data = request.json or {}
    text = data.get('text', 'Hello')
    return jsonify({'reply': f'Echo: {text}'})

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json or {}
    query = data.get('query')
    if not query:
        return jsonify({'error': 'missing query'}), 400
    
    result = generate_response(query)
    return jsonify(result)

def run_flask():
    app.run(host='0.0.0.0', port=5000, debug=False)

flask_thread = threading.Thread(target=run_flask, daemon=True)
flask_thread.start()
print('Flask server started on port 5000')
time.sleep(2)

In [None]:
# Cell 10: ngrok setup
print('=== NGROK SETUP ===')
ngrok_token = input('Enter your ngrok authtoken: ').strip()

if ngrok_token:
    try:
        ngrok.set_auth_token(ngrok_token)
        public_url = ngrok.connect(5000)
        print(f'Public URL: {public_url}')
        print(f'Health: {public_url}/health')
        print(f'Chat: {public_url}/chat')
    except Exception as e:
        print('ngrok failed:', e)
        public_url = None
else:
    print('No token provided. Test locally at http://127.0.0.1:5000')
    public_url = None

In [None]:
# Cell 11: Testing
import requests

print('Testing local endpoints:')
try:
    # Health check
    r = requests.get('http://127.0.0.1:5000/health', timeout=5)
    print('Health:', r.json())
    
    # Chat test
    r = requests.post('http://127.0.0.1:5000/chat', 
                     json={'query': 'How do I create a seller account?'}, 
                     timeout=15)
    result = r.json()
    print('Chat test:')
    print('Answer:', result.get('answer', '')[:200] + '...')
    print('Sources:', result.get('sources', []))
    
except Exception as e:
    print('Test failed:', e)

print('\nSystem ready!')

# Usage Notes

## API Endpoints
- `GET /health` - System status
- `POST /ping` - Simple test
- `POST /chat` - RAG queries

## Features
- FLAN-T5 Large (770M parameters)
- FAISS semantic search
- 15 Shoplite documents
- No authentication required

## Troubleshooting
- If model fails to load, system uses retrieval-only mode
- Free ngrok tokens work for testing
- Restart runtime if you get memory errors