<a href="https://colab.research.google.com/github/oriane-br/livedrop-oriane/blob/main/notebooks/llm-deployment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# ========== COMPLETE SHOPLITE RAG SYSTEM ==========
# Copy ALL this code into a SINGLE Colab cell and run it!

# Install all dependencies
!pip install -q transformers torch sentence-transformers faiss-cpu flask flask-ngrok pyngrok accelerate

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from flask import Flask, request, jsonify
from flask_ngrok import run_with_ngrok
import time
from typing import List, Dict, Any
from pyngrok import ngrok

print("🚀 Starting Shoplite RAG System Deployment...")

# ========== KNOWLEDGE BASE ==========
KNOWLEDGE_BASE = [
    {
        "id": "doc1", "title": "Shoplite User Registration Process",
        "content": "To create a Shoplite account, users visit the registration page and provide their email address, password, and basic profile information. Email verification is required within 24 hours of registration. Users can choose between buyer accounts (free) or seller accounts (requires business verification). Buyer accounts gain immediate access to shopping features, while seller accounts undergo a 2-3 business day verification process. During registration, users must agree to Shoplite's terms of service and privacy policy. Password requirements include minimum 8 characters with at least one uppercase letter and one number. Users can later update their profile information through the account settings page. Two-factor authentication is available as an optional security feature. Account deletion can be requested through privacy settings with a 14-day recovery window."
    },
    {
        "id": "doc2", "title": "Product Search and Filtering Features",
        "content": "Shoplite's search functionality uses advanced algorithms to deliver relevant product results. The search bar accepts natural language queries and supports filters by price range, brand, seller rating, and product condition. Advanced filtering options include availability status, shipping speed, and eco-friendly certifications. The platform features faceted search navigation, allowing users to progressively refine results. Search results can be sorted by relevance, price, customer ratings, newest arrivals, and bestselling status. Recent search history is saved for logged-in users. Autocomplete suggestions appear as users type. Image search capability allows users to upload product photos to find similar items."
    },
    {
        "id": "doc3", "title": "Shopping Cart and Checkout Process",
        "content": "The Shoplite shopping cart supports items from multiple sellers in a single session. Users can adjust quantities, save items for later, or remove products easily. The cart displays real-time pricing including item subtotals, estimated taxes, and shipping costs. Proceeding to checkout requires users to select shipping addresses, choose delivery methods, and apply promotional codes. The checkout process is streamlined into three steps: shipping information, payment method, and order review. Guest checkout is available but limits order tracking capabilities. Cart abandonment features include saved carts for 30 days. Maximum cart capacity is 50 items, with bulk editing tools for managing large orders."
    },
    {
        "id": "doc4", "title": "Payment Methods and Security",
        "content": "Shoplite accepts major credit cards (Visa, MasterCard, American Express), PayPal, Apple Pay, Google Pay, and Shoplite Wallet credits. Regional payment methods include bank transfers for certain markets and installment plans. Security measures include PCI DSS compliance, tokenization of payment information, and 3D Secure authentication. All transactions are encrypted using TLS 1.3 protocols. Fraud detection systems monitor transactions for suspicious patterns. Payment processing typically takes 1-2 business days for verification. Failed payment attempts allow three retries before order cancellation. Refunds are processed to the original payment method within 5-7 business days."
    },
    {
        "id": "doc5", "title": "Order Tracking and Delivery",
        "content": "Once an order is confirmed, users receive an email with order details and estimated delivery timeline. Tracking numbers become active within 24 hours of shipment processing. The order tracking page provides real-time updates including package location, carrier information, and delivery exceptions. Delivery options include standard (3-7 business days), express (1-2 business days), and same-day delivery in eligible metropolitan areas. International orders may experience additional customs processing time. Failed delivery attempts result in packages being held at local carrier facilities. Delivery notifications are sent via email and SMS at key milestones."
    },
    {
        "id": "doc6", "title": "Return and Refund Policies",
        "content": "Shoplite offers a 30-day return window for most items from the delivery date. Products must be in original condition with all tags attached and packaging intact. Excluded categories include personalized items, digital products, and perishable goods. Return initiation requires authorization through the order history page. Approved returns generate prepaid shipping labels for most cases. Refunds are processed within 5-7 business days after returned items are received and inspected. The refund method matches the original payment type. Partial refunds may apply for returned items missing components or showing wear."
    },
    {
        "id": "doc7", "title": "Seller Account Setup and Management",
        "content": "Seller accounts on Shoplite require business verification including tax identification numbers and business registration documents. The verification process typically takes 2-3 business days. Sellers can list up to 100 products initially, with limits increasing based on sales performance. Seller dashboard provides analytics on sales performance, customer reviews, and inventory levels. Commission rates start at 8% for most categories, with reduced rates for high-volume sellers. Payouts are processed weekly via direct bank transfer or PayPal. Account suspension may occur for policy violations with appeal processes available."
    }
]

print(f"✅ Knowledge base loaded with {len(KNOWLEDGE_BASE)} documents")

# ========== PROMPTS ==========
PROMPTS = {
    "base_retrieval_prompt": {
        "role": "You are a precise Shoplite customer service assistant with access to official documentation.",
        "goal": "Provide accurate, concise answers using only the provided context from Shoplite documentation.",
        "context_guidelines": [
            "Use ONLY information from the provided document snippets",
            "If the context doesn't contain relevant information, politely decline to answer",
            "Cite specific document titles when referencing information",
            "Be factual and avoid speculation",
            "Maintain a helpful and professional tone"
        ],
        "response_format": "Answer: [Direct response based strictly on the provided context]\nSources: [List the specific document titles referenced]\nConfidence: [High/Medium/Low based on how well the context matches the question]"
    },
    "no_context_prompt": {
        "role": "You are a cautious Shoplite assistant who avoids providing unverified information.",
        "goal": "Politely decline to answer when the provided context doesn't contain relevant information.",
        "context_guidelines": [
            "Do not make up information or use external knowledge",
            "Clearly state that the information isn't available in Shoplite documentation",
            "Suggest alternative ways the user might find the information",
            "Maintain helpful tone while being firm about limitations"
        ],
        "response_format": "Response: I'm unable to answer this question based on the available Shoplite documentation.\nSuggestion: [Specific alternative action like contacting support or checking specific resources]\nAvailable Topics: [List 2-3 related topics that are covered in our knowledge base]"
    }
}

print("✅ Prompts loaded successfully")

# ========== TINYLLAMA MODEL ==========
print("🚀 Loading TinyLlama (1.1B parameters - No authentication required)...")

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

print("✅ TinyLlama loaded successfully!")

# ========== RAG PIPELINE ==========
print("🔍 Setting up RAG pipeline...")

embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
document_contents = [doc["content"] for doc in KNOWLEDGE_BASE]
document_embeddings = embedding_model.encode(document_contents)

dimension = document_embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)
faiss.normalize_L2(document_embeddings)
index.add(document_embeddings)

print(f"✅ FAISS index created with {index.ntotal} documents")

class RAGSystem:
    def __init__(self, knowledge_base, embedding_model, faiss_index, llm, tokenizer, prompts):
        self.knowledge_base = knowledge_base
        self.embedding_model = embedding_model
        self.faiss_index = faiss_index
        self.llm = llm
        self.tokenizer = tokenizer
        self.prompts = prompts

    def retrieve_documents(self, query: str, k: int = 3) -> List[Dict]:
        query_embedding = self.embedding_model.encode([query])
        faiss.normalize_L2(query_embedding)
        scores, indices = self.faiss_index.search(query_embedding, k)
        retrieved_docs = []
        for i, (score, idx) in enumerate(zip(scores[0], indices[0])):
            if idx < len(self.knowledge_base):
                retrieved_docs.append({**self.knowledge_base[idx], "similarity_score": float(score)})
        return retrieved_docs

    def format_prompt(self, query: str, retrieved_docs: List[Dict], prompt_type: str = "base_retrieval_prompt") -> str:
        prompt_config = self.prompts[prompt_type]
        context = "\n\n".join([f"Document: {doc['title']}\nContent: {doc['content']}" for doc in retrieved_docs])
        prompt = f"""<|system|>
{prompt_config['role']}

GOAL: {prompt_config['goal']}

CONTEXT GUIDELINES:
{chr(10).join(['- ' + guideline for guideline in prompt_config['context_guidelines']])}

RESPONSE FORMAT:
{prompt_config['response_format']}

CONTEXT DOCUMENTS:
{context}</s>

<|user|>
Question: {query}</s>

<|assistant|>
"""
        return prompt

    def generate_response(self, prompt: str, max_tokens: int = 512) -> str:
        try:
            inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
            with torch.no_grad():
                outputs = self.llm.generate(
                    inputs.input_ids.cuda(),
                    max_new_tokens=max_tokens,
                    temperature=0.3,
                    do_sample=True,
                    pad_token_id=self.tokenizer.eos_token_id,
                    eos_token_id=self.tokenizer.eos_token_id
                )
            response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            if "<|assistant|>" in response:
                response = response.split("<|assistant|>")[-1].strip()
            return response
        except Exception as e:
            return f"Error generating response: {str(e)}"

    def process_query(self, query: str) -> Dict[str, Any]:
        retrieved_docs = self.retrieve_documents(query)
        if not retrieved_docs:
            prompt_type = "no_context_prompt"
            prompt = self.format_prompt(query, [], prompt_type)
        else:
            prompt_type = "base_retrieval_prompt"
            prompt = self.format_prompt(query, retrieved_docs, prompt_type)
        response = self.generate_response(prompt)
        return {
            "question": query,
            "response": response,
            "retrieved_documents": [{"title": doc["title"], "similarity_score": doc["similarity_score"]} for doc in retrieved_docs],
            "prompt_type_used": prompt_type,
            "timestamp": time.time()
        }

rag_system = RAGSystem(KNOWLEDGE_BASE, embedding_model, index, model, tokenizer, PROMPTS)
print("✅ RAG system initialized successfully!")

# ========== FLASK API ==========
app = Flask(__name__)

@app.route('/health', methods=['GET'])
def health_check():
    return jsonify({
        "status": "healthy",
        "timestamp": time.time(),
        "model_loaded": True,
        "knowledge_base_size": len(KNOWLEDGE_BASE)
    })

@app.route('/ping', methods=['POST'])
def direct_llm():
    try:
        data = request.get_json()
        prompt = data.get('prompt', '')
        if not prompt:
            return jsonify({"error": "No prompt provided"}), 400
        full_prompt = f"<|system|>You are a helpful assistant.</s><|user|>{prompt}</s><|assistant|>"
        inputs = tokenizer(full_prompt, return_tensors="pt", truncation=True, max_length=2048)
        with torch.no_grad():
            outputs = model.generate(
                inputs.input_ids.cuda(),
                max_new_tokens=256,
                temperature=0.7,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id
            )
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        return jsonify({"prompt": prompt, "response": response, "type": "direct_llm"})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

@app.route('/chat', methods=['POST'])
def rag_chat():
    try:
        data = request.get_json()
        question = data.get('question', '')
        if not question:
            return jsonify({"error": "No question provided"}), 400
        result = rag_system.process_query(question)
        return jsonify(result)
    except Exception as e:
        return jsonify({"error": str(e)}), 500

print("✅ Flask API routes defined!")

# ========== NGROK SETUP ==========
print("\n" + "="*60)
print("🔐 NGROK AUTHENTICATION")
print("="*60)
print("Get your FREE ngrok token from: https://dashboard.ngrok.com/get-started/your-authtoken")
print("="*60)

ngrok_token = input("Enter your ngrok token: ").strip()

if not ngrok_token:
    print("⚠️  Using free version (limited to 2 hours)")
else:
    ngrok.set_auth_token(ngrok_token)
    print("✅ Ngrok token set successfully!")

public_url = ngrok.connect(5000).public_url
print(f"✅ Ngrok tunnel created: {public_url}")

# ========== START SERVER ==========
print("\n" + "="*60)
print("🚀 SHOPLITE RAG API STARTING")
print("="*60)
print(f"📡 Public URL: {public_url}")
print("🔧 Available Endpoints:")
print("   - GET  /health - System health check")
print("   - POST /ping   - Direct LLM chat")
print("   - POST /chat   - RAG-powered questions")
print("="*60)
print("⚡ Server is running! Use the URL above to test your API.")
print("💡 Use Ctrl+C in Colab to stop the server")
print("="*60)

# Quick test
print("\n🧪 Running quick system test...")
try:
    test_result = rag_system.process_query("How do I create a seller account?")
    print(f"✅ Test successful! Response: {test_result['response'][:100]}...")
except Exception as e:
    print(f"❌ Test failed: {e}")

print("\n🎯 YOUR RAG SYSTEM IS READY FOR TESTING!")
print(f"🌐 Use this URL with your chat interface: {public_url}")

run_with_ngrok(app)
app.run()

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m32.9 MB/s[0m eta [36m0:00:00[0m
[?25h🚀 Starting Shoplite RAG System Deployment...
✅ Knowledge base loaded with 7 documents
✅ Prompts loaded successfully
🚀 Loading TinyLlama (1.1B parameters - No authentication required)...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

✅ TinyLlama loaded successfully!
🔍 Setting up RAG pipeline...


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ FAISS index created with 7 documents
✅ RAG system initialized successfully!
✅ Flask API routes defined!

🔐 NGROK AUTHENTICATION
Get your FREE ngrok token from: https://dashboard.ngrok.com/get-started/your-authtoken
Enter your ngrok token: 33VpwGbuVEllhpbTSo5Nq4C7Zza_4HBbwbpJo4uRHwnfjEERk
✅ Ngrok token set successfully!
✅ Ngrok tunnel created: https://matilda-nonallelic-malaysia.ngrok-free.dev

🚀 SHOPLITE RAG API STARTING
📡 Public URL: https://matilda-nonallelic-malaysia.ngrok-free.dev
🔧 Available Endpoints:
   - GET  /health - System health check
   - POST /ping   - Direct LLM chat
   - POST /chat   - RAG-powered questions
⚡ Server is running! Use the URL above to test your API.
💡 Use Ctrl+C in Colab to stop the server

🧪 Running quick system test...
✅ Test successful! Response: Error generating response: Found no NVIDIA driver on your system. Please check that you have an NVID...

🎯 YOUR RAG SYSTEM IS READY FOR TESTING!
🌐 Use this URL with your chat interface: https://matilda-nonall

 * Running on http://127.0.0.1:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m


 * Running on http://matilda-nonallelic-malaysia.ngrok-free.dev
 * Traffic stats available on http://127.0.0.1:4040
