
```markdown
 AI Support-Chat Demo – Complete Setup Guide  

> **Goal:** Ingest *Supportco Manual.pdf* → build a vector DB → launch a **live AI chat GUI** that answers questions using semantic search.

---

## 1. System Prerequisites (run once per laptop)

```bash
# Homebrew (if you don’t have it)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Docker Desktop
brew install --cask docker
# → Open **Docker.app**, let it finish setup, keep it running

# Python 3.11
brew install python@3.11
python3 --version   # should show 3.11.x
```

---

## 2. Project folder

```bash
mkdir -p ~/VectorDB && cd ~/VectorDB
```

Place these files in the folder:

* `build_vector_db.py`  
* `support_chat.py`  
* `support_docs/Supportco Manual.pdf` (or any PDFs/TXTs)
* Any other documents loaded in the directory of PDF, TXT will automatically be loaded into the database

---

## 3. Create & activate the virtual environment

```bash
python3 -m venv venv
chmod +x venv/bin/activate      # fix permission if needed
source venv/bin/activate
```

*Verify*  

```bash
which python   # → …/VectorDB/venv/bin/python
pip --version
```

---

## 4. Install **all** Python dependencies (once)

```bash
pip install --upgrade pip

pip install \
  langchain \
  langchain-core \
  langchain-community \
  langchain-text-splitters \
  sentence-transformers \
  qdrant-client \
  openai \
  PyPDF2 \
  python-docx \
  "unstructured[all-docs]" \
  markdown \
  "numpy<2" \
  ipywidgets
```

*Quick sanity-check*

```python
from langchain_core.documents import Document
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
print("All imports OK")
```

---

## 5. Start Qdrant (Docker)

```bash
mkdir -p qdrant_storage

docker run -d \
  --name qdrant_demo \
  -p 6333:6333 -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  qdrant/qdrant
```

*Verify*

```bash
curl http://localhost:6333
# → JSON with "title":"qdrant..."
```

---

## 6. Build the vector DB (run **once**)

```bash
python build_vector_db.py
```

Expected output:

```
Collection 'support_docs' created.
  → Upserted batch 1
Vector DB built successfully!
SUCCESS! Run: python support_chat.py
```

---

## 7. Launch the GUI chat

```bash
python support_chat.py
```

A window **“Supportco AI Support Assistant”** opens.  
Try:  
> `How do I reset my password?`  
→ Answer appears instantly.

---

## 8. (Optional) Clean shutdown

```bash
docker stop qdrant_demo && docker rm qdrant_demo
deactivate   # leave the venv
```

---

## Full Dependency List

| Category | Package | Reason |
|----------|---------|--------|
| **Core** | `python@3.11` (brew) | Interpreter |
| **System** | `docker` (brew cask) | Run Qdrant |
| **LangChain** | `langchain`, `langchain-core`, `langchain-community`, `langchain-text-splitters` | RAG pipeline |
| **Embeddings** | `sentence-transformers` | `all-MiniLM-L6-v2` |
| **Vector DB** | `qdrant-client` | HTTP client |
| **Docs** | `PyPDF2`, `python-docx`, `unstructured[all-docs]`, `markdown` | PDF/DOCX/MD/TXT |
| **Numerics** | `numpy<2` | Compatibility |
| **Widgets** | `ipywidgets` | Jupyter (optional) |
| **GUI** | `tkinter` (built-in) | Desktop window |

*One-liner (after `source venv/bin/activate`)*

```bash
pip install --upgrade pip && \
pip install langchain langchain-core langchain-community langchain-text-splitters \
            sentence-transformers qdrant-client openai \
            PyPDF2 python-docx "unstructured[all-docs]" markdown \
            "numpy<2" ipywidgets
```

---




## This version of build_vector_db.py and asupport_chat.py just have each section of the code clearly deliniated for a class demo


In [1]:
# ==============================================================
# build_vector_db.py - FINAL WORKING VERSION
# PIPELINE: 0 → Ingest | 1 → Chunk | 2 → Embed | 3 → Index
# ==============================================================

import os
from pathlib import Path
from typing import List

from langchain_core.documents import Document
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain_community.document_loaders import UnstructuredMarkdownLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.http.models import (
    Distance,
    VectorParams,
    HnswConfigDiff,
    PointStruct,
)

# CONFIG
DATA_DIR = "support_docs"
COLLECTION_NAME = "support_docs"
EMBEDDING_MODEL = "all-MiniLM-L6-v2"
CHUNK_SIZE = 500
CHUNK_OVERLAP = 50
QDRANT_URL = "http://localhost:6333"


# ==============================================================
# STEP 0: INGEST RAW DOCUMENTS
# Load PDFs, .txt, .md from support_docs/
# ==============================================================
def load_documents() -> List[Document]:
    print("\n" + "="*60)
    print("STEP 0: INGEST RAW DOCUMENTS")
    print("="*60)
    data_dir = Path(DATA_DIR)
    if not data_dir.exists():
        raise FileNotFoundError(f"Create '{DATA_DIR}' folder with your files.")
    loaders = {".pdf": PyPDFLoader, ".txt": TextLoader, ".md": UnstructuredMarkdownLoader}
    docs = []
    print("Loading documents...")
    for file_path in data_dir.rglob("*"):
        ext = file_path.suffix.lower()
        if ext in loaders:
            print(f"  → {file_path.name}")
            for doc in loaders[ext](str(file_path)).load():
                doc.metadata.update({"source": file_path.name, "category": file_path.parent.name})
                docs.append(doc)
    print(f"Loaded {len(docs)} sections.")
    return docs


# ==============================================================
# STEP 1: CHUNKING (content-aware)
# Split long docs into 500-char chunks with 50 overlap
# ==============================================================
def chunk_documents(docs: List[Document]) -> List[Document]:
    print("\n" + "="*60)
    print("STEP 1: CHUNKING")
    print("="*60)
    splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
    chunks = splitter.split_documents(docs)
    print(f"Created {len(chunks)} chunks.")
    return chunks


# ==============================================================
# STEP 2: EMBEDDING
# Convert each chunk into a 384-dim vector using all-MiniLM-L6-v2
# ==============================================================
def embed_chunks(chunks: List[Document]) -> List[dict]:
    print("\n" + "="*60)
    print("STEP 2: EMBEDDING")
    print("="*60)
    model = SentenceTransformer(EMBEDDING_MODEL)
    embedded = []
    print("Generating embeddings...")
    for i, c in enumerate(chunks):
        vec = model.encode(c.page_content).tolist()
        embedded.append({
            "id": i,
            "vector": vec,
            "payload": {
                "text": c.page_content,
                "source": c.metadata.get("source")
            }
        })
        if i % 100 == 0 and i > 0:
            print(f"  → Embedded {i}/{len(chunks)} chunks")
    print(f"Embedded {len(embedded)} vectors.")
    return embedded


# ==============================================================
# STEP 3: INDEXING & STORAGE IN QDRANT
# Create collection + upsert all vectors
# ==============================================================
def build_qdrant_index(points: List[dict]):
    print("\n" + "="*60)
    print("STEP 3: INDEXING & STORAGE IN QDRANT")
    print("="*60)
    client = QdrantClient(url=QDRANT_URL)
    if client.collection_exists(COLLECTION_NAME):
        client.delete_collection(COLLECTION_NAME)
        print(f"Deleted existing collection: {COLLECTION_NAME}")
    
    client.create_collection(
        collection_name=COLLECTION_NAME,
        vectors_config=VectorParams(size=len(points[0]["vector"]), distance=Distance.COSINE),
        hnsw_config=HnswConfigDiff(
            m=16,
            ef_construct=200,
            full_scan_threshold=10000,
        ),
    )
    print(f"Collection '{COLLECTION_NAME}' created.")

    BATCH = 100
    for i in range(0, len(points), BATCH):
        batch = points[i:i+BATCH]
        client.upsert(
            collection_name=COLLECTION_NAME,
            points=[PointStruct(id=p["id"], vector=p["vector"], payload=p["payload"]) for p in batch]
        )
        print(f"  → Upserted batch {i//BATCH + 1}")
    print("Vector DB built successfully!")


# ==============================================================
# MAIN: Run all steps
# ==============================================================
if __name__ == "__main__":
    print("=== BUILDING VECTOR DATABASE ===\n")
    docs = load_documents()
    chunks = chunk_documents(docs)
    points = embed_chunks(chunks)
    build_qdrant_index(points)
    Path("vector_db_built.flag").touch()
    print("\n" + "="*60)
    print("SUCCESS! Vector DB is ready!")
    print("Next: Run 'python support_chat.py' for the AI chat interface.")
    print("="*60)

=== BUILDING VECTOR DATABASE ===


STEP 0: INGEST RAW DOCUMENTS
Loading documents...
  → SupportCo Online Support Personnel Instruction Manual.pdf
Loaded 31 sections.

STEP 1: CHUNKING
Created 103 chunks.

STEP 2: EMBEDDING
Generating embeddings...
  → Embedded 100/103 chunks
Embedded 103 vectors.

STEP 3: INDEXING & STORAGE IN QDRANT
Deleted existing collection: support_docs
Collection 'support_docs' created.
  → Upserted batch 1
  → Upserted batch 2
Vector DB built successfully!

SUCCESS! Vector DB is ready!
Next: Run 'python support_chat.py' for the AI chat interface.


In [11]:
# ==============================================================
# support_chat.py - AI SUPPORT CHAT GUI with OpenAI GPT
# Uses .env for API key | Modern OpenAI v1.0+ API
# Answers questions using your PDF + GPT-3.5/GPT-4
# Requires: vector DB built + Qdrant running
# ==============================================================

import tkinter as tk
from tkinter import scrolledtext, messagebox
from typing import List
import os
from pathlib import Path
from dotenv import load_dotenv  # ← Loads .env file

# ================================
# LOAD ENVIRONMENT VARIABLES
# ================================
load_dotenv()  # Reads .env file in project root

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    raise ValueError(
        "OPENAI_API_KEY not found!\n"
        "Create a .env file in your project folder with:\n"
        "OPENAI_API_KEY=sk-...\n"
        "Or run: export OPENAI_API_KEY='sk-...'"
    )

# ================================
# CONFIG
# ================================
EMBEDDING_MODEL = "all-MiniLM-L6-v2"
QDRANT_URL = "http://localhost:6333"
COLLECTION_NAME = "support_docs"
TOP_K = 3
OPENAI_MODEL = "gpt-5-chat-latest"  # or gpt-3.5-turbo or "gpt-4o" for better quality

# ================================
# INITIALIZE CLIENTS
# ================================
print("\n" + "="*60)
print("INITIALIZING AI SUPPORT CHAT WITH OPENAI")
print("="*60)

from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from openai import OpenAI  # ← Modern OpenAI v1.0+

print("Loading embedding model...")
model = SentenceTransformer(EMBEDDING_MODEL)
client = QdrantClient(url=QDRANT_URL)
openai_client = OpenAI(api_key=OPENAI_API_KEY)  # ← Secure client


# ================================
# STEP 2: EMBED USER QUERY
# ================================
def embed_query(query: str):
    print("\n Embedding user query...")
    query_vec = model.encode(query).tolist()
    print(f"  → Query vector generated (dim={len(query_vec)})")
    return query_vec


# ================================
# STEP 3: SEARCH QDRANT
# ================================
def search_qdrant(query: str) -> List[str]:
    print("Searching Qdrant vector database...")
    query_vec = embed_query(query)
    
    search_result = client.search(
        collection_name=COLLECTION_NAME,
        query_vector=query_vec,
        limit=TOP_K,
        with_payload=True,
    )
    
    contexts = []
    print(f"  → Found {len(search_result)} relevant chunks:")
    for i, hit in enumerate(search_result):
        text = hit.payload.get("text", "")
        source = hit.payload.get("source", "Unknown")
        score = hit.score
        print(f"     [{i+1}] Score: {score:.3f} | Source: {source}")
        contexts.append(f"[From: {source}]\n{text}\n")
    
    return contexts


# ================================
# STEP 4: GENERATE ANSWER WITH OPENAI (v1.0+)
# ================================
def generate_answer(query: str, contexts: List[str]) -> str:
    print("\nGenerating answer with OpenAI GPT...")

    if not contexts:
        return "I'm sorry, I couldn't find any information about that in the support manual."

    context_str = "\n\n".join(contexts)

    system_prompt = f"""
You are a helpful, accurate, and friendly support assistant for Supportco.
Your knowledge comes ONLY from the Supportco Manual provided below.

RULES:
1. Answer the user's ORIGINAL QUESTION using ONLY the provided context.
2. Use the original question to guide your tone, focus, and relevance.
3. If the answer is not in the context, say: "I don't have that information in the manual."
4. Be concise, clear, and step-by-step.
5. Cite the source (e.g., "From: Supportco Manual.pdf") when possible.
6. Never guess or make up information.
7. Always use normal and polite english - don't cite technical terms or abbreviations - always translate abbreviations into plain english
8. please don't ask users to perform additional steps unless you can describe the next steps from the provided context

CONTEXT FROM SUPPORTCO MANUAL:
{context_str.strip()}
"""

    try:
        response = openai_client.chat.completions.create(
            model=OPENAI_MODEL,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": query}  # ← USER QUESTION
            ],
            temperature=0.3,
            max_tokens=300
        )
        answer = response.choices[0].message.content.strip()
        return answer

    except openai.AuthenticationError:
        return "OpenAI API key is invalid. Check your .env file."
    except openai.RateLimitError:
        return "OpenAI rate limit reached. Try again in a moment."
    except Exception as e:
        return f"Error with OpenAI: {str(e)}"


# ================================
# GUI: Support Chat Interface
# ================================
class SupportChatGUI:
    def __init__(self, root):
        self.root = root
        self.root.title("Supportco AI Support Assistant (OpenAI + RAG)")
        self.root.geometry("850x650")
        self.root.configure(bg="#f0f2f5")

        # Header
        header = tk.Label(
            root, 
            text="Supportco AI Assistant", 
            font=("Helvetica", 16, "bold"),
            bg="#f0f2f5",
            fg="#1a1a1a"
        )
        header.pack(pady=10)

        # Chat display
        self.chat_display = scrolledtext.ScrolledText(
            root,
            wrap=tk.WORD,
            width=90,
            height=28,
            font=("Helvetica", 11),
            bg="white",
            fg="#1a1a1a",
            state=tk.DISABLED
        )
        self.chat_display.pack(padx=20, pady=10, fill=tk.BOTH, expand=True)

        # Input frame
        input_frame = tk.Frame(root, bg="#f0f2f5")
        input_frame.pack(padx=20, pady=10, fill=tk.X)

        self.entry = tk.Entry(
            input_frame,
            font=("Helvetica", 12),
            relief=tk.FLAT,
            bg="white",
            fg="#1a1a1a"
        )
        self.entry.pack(side=tk.LEFT, fill=tk.X, expand=True, padx=(0, 10))
        self.entry.bind("<Return>", self.send_message)

        send_btn = tk.Button(
            input_frame,
            text="Send",
            command=self.send_message,
            bg="#007bff",
            fg="white",
            font=("Helvetica", 11, "bold"),
            relief=tk.FLAT,
            cursor="hand2"
        )
        send_btn.pack(side=tk.RIGHT)

        # Welcome
        self.add_message("Assistant", "Hello! I'm your AI support assistant. Ask me anything about Supportco!")

    def add_message(self, sender: str, message: str):
        self.chat_display.config(state=tk.NORMAL)
        self.chat_display.insert(tk.END, f"{sender}: {message}\n\n")
        self.chat_display.config(state=tk.DISABLED)
        self.chat_display.see(tk.END)

    def send_message(self, event=None):
        query = self.entry.get().strip()
        if not query:
            return
        
        self.add_message("You", query)
        self.entry.delete(0, tk.END)
        self.add_message("Assistant", "Searching manual and generating answer...")

        try:
            contexts = search_qdrant(query)
            answer = generate_answer(query, contexts)
            self.chat_display.after(100, lambda: self.add_message("Assistant", answer))
        except Exception as e:
            self.add_message("Assistant", f"Error: {e}")


# ================================
# LAUNCH GUI
# ================================
if __name__ == "__main__":
    if not Path("vector_db_built.flag").exists():
        messagebox.showerror(
            "Vector DB Not Found",
            "Please run build_vector_db.py first to create the vector database!"
        )
    else:
        root = tk.Tk()
        app = SupportChatGUI(root)
        root.mainloop()


INITIALIZING AI SUPPORT CHAT WITH OPENAI
Loading embedding model...
Searching Qdrant vector database...

 Embedding user query...
  → Query vector generated (dim=384)
  → Found 3 relevant chunks:
     [1] Score: 0.413 | Source: SupportCo Online Support Personnel Instruction Manual.pdf
     [2] Score: 0.400 | Source: SupportCo Online Support Personnel Instruction Manual.pdf
     [3] Score: 0.352 | Source: SupportCo Online Support Personnel Instruction Manual.pdf

Generating answer with OpenAI GPT...


  search_result = client.search(


Searching Qdrant vector database...

 Embedding user query...
  → Query vector generated (dim=384)
  → Found 3 relevant chunks:
     [1] Score: 0.253 | Source: SupportCo Online Support Personnel Instruction Manual.pdf
     [2] Score: 0.244 | Source: SupportCo Online Support Personnel Instruction Manual.pdf
     [3] Score: 0.231 | Source: SupportCo Online Support Personnel Instruction Manual.pdf

Generating answer with OpenAI GPT...
