#  RAG Chatbot in Jupyter using DeepSeek API
This notebook implements a Retrieval-Augmented Generation (RAG) chatbot using your own `.pdf`, `.docx`, or `.txt` files.  
It uses:
- FAISS for semantic search
- sentence-transformers` for local embeddings
- DeepSeek Chat API for final answers


In [None]:
# pip install faiss-cpu tiktoken openai python-docx pdfplumber PyMuPDF sentence-transformers --quiet

In [None]:
# pip install openai

## Step 2: Import Libraries and Set API Key
We import all the core libraries and set your DeepSeek API key for use in later steps.


In [1]:
import os
import fitz  # PyMuPDF
import docx
import faiss
import numpy as np
import tiktoken
import requests
from typing import List
from IPython.display import display, Markdown
from sklearn.preprocessing import normalize
from sentence_transformers import SentenceTransformer

# Set your DeepSeek API Key
os.environ["DEEPSEEK_API_KEY"] = "ENTER_YOUR_DEEPSEEK_API_KEY"


## 📄 Step 3: Load a Text Document
We support `.pdf`, `.docx`, and `.txt`.  
The file content will be extracted as plain text for chunking and embedding.


In [11]:
from tkinter.filedialog import askopenfilename

# Function to extract text from supported file types
def extract_text_from_file(filepath: str) -> str:
    if filepath.endswith(".pdf"):
        doc = fitz.open(filepath)
        return "\n".join([page.get_text() for page in doc])
    elif filepath.endswith(".docx"):
        doc = docx.Document(filepath)
        return "\n".join([para.text for para in doc.paragraphs])
    elif filepath.endswith(".txt"):
        with open(filepath, "r", encoding="utf-8") as f:
            return f.read()
    else:
        raise ValueError("Unsupported file type")

# Prompt user to enter file path manually
filepath = input("Enter the path to your file (.pdf/.docx/.txt): ")
raw_text = extract_text_from_file(filepath)
print(f"Loaded file: {filepath}")
print(f"Sample: {raw_text[:500]}")


Enter the path to your file (.pdf/.docx/.txt):  /Users/sanketmuchhala/Downloads/llm.pdf


Loaded file: /Users/sanketmuchhala/Downloads/llm.pdf
Sample: I. Understanding Language Models
Chapter 1: An Introduction to Large Language 
Models
This chapter provides a comprehensive overview of Large Language Models (LLMs) and the 
evolution of Language AI, marking humanity's inflection point with AI systems capable of 
human-like text generation.
The Evolution of Language AI
The chapter traces the development from simple bag-of-words representations in the 1950s to 
today's sophisticated models:
•
Bag-of-Words (1950s-2000s): Simple word counting appro


## Step 4: Chunk the Document Text
We split the full text into overlapping chunks to preserve context for semantic search.


In [12]:
# Split text into overlapping chunks using token count
def split_text(text, max_tokens=500, overlap=50):
    tokenizer = tiktoken.get_encoding("cl100k_base")
    words = text.split()
    chunks = []
    i = 0

    while i < len(words):
        chunk = words[i:i + max_tokens]
        tokens = tokenizer.encode(" ".join(chunk))
        while len(tokens) > max_tokens:
            chunk = chunk[:-1]
            tokens = tokenizer.encode(" ".join(chunk))
        chunks.append(" ".join(chunk))
        i += max_tokens - overlap
    return chunks

chunks = split_text(raw_text)
print(f" Split into {len(chunks)} chunks.")


 Split into 22 chunks.


## Step 5: Generate Embeddings Locally
We use a free, fast local embedding model (`all-MiniLM-L6-v2`) to convert chunks into vectors.


In [13]:
# Use SentenceTransformers to get local embeddings (no API cost)

from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer('all-MiniLM-L6-v2')  # small and fast

def get_local_embeddings(texts: List[str]) -> List[List[float]]:
    return embedder.encode(texts, convert_to_numpy=True).tolist()

    embeddings = []
    for i in range(0, len(texts), 5):  # batch of 5 for safety
        batch = texts[i:i+5]
        response = requests.post(url, headers=headers, json={
            "model": "deepseek-embedding-2",
            "input": batch
        })
        if response.status_code != 200:
            raise Exception(f"Embedding API Error: {response.text}")
        data = response.json()
        for item in data["data"]:
            embeddings.append(item["embedding"])

    return embeddings

embeddings = get_local_embeddings(chunks)
print(f"Retrieved embeddings for {len(embeddings)} chunks.")


Retrieved embeddings for 22 chunks.


##  Step 6: Store Embeddings in FAISS
We use FAISS to store all chunk vectors for efficient similarity search.


In [14]:
# Create FAISS index and add embeddings
dimension = len(embeddings[0])
index = faiss.IndexFlatL2(dimension)

embeddings_np = np.array(embeddings).astype("float32")
index.add(embeddings_np)

print(f"Stored {index.ntotal} vectors in FAISS.")

Stored 22 vectors in FAISS.


## Step 7: Ask a Question & Retrieve Top Chunks
We encode your question, perform vector search in FAISS, and retrieve the top relevant chunks.


In [15]:
# Assume this is already run earlier:
# document_embeddings = get_local_embeddings(chunks)
# faiss_index.add(np.array(document_embeddings).astype("float32"))

# Perform similarity search using user question
def retrieve_top_k(query: str, k=4):
    query_embedding = embedder.encode([query], convert_to_numpy=True)
    D, I = index.search(query_embedding.astype("float32"), k)
    return [chunks[i] for i in I[0]]

# Ask a question
question = input("Ask a question: ")
top_chunks = retrieve_top_k(question)

# Display retrieved context
print("Top Retrieved Chunks:")
for i, chunk in enumerate(top_chunks):
    print(f"\n--- Chunk {i+1} ---\n{chunk[:400]}")



Ask a question:  tell me about evolution of AI


Top Retrieved Chunks:

--- Chunk 1 ---
I. Understanding Language Models Chapter 1: An Introduction to Large Language Models This chapter provides a comprehensive overview of Large Language Models (LLMs) and the evolution of Language AI, marking humanity's inflection point with AI systems capable of human-like text generation. The Evolution of Language AI The chapter traces the development from simple bag-of-words representations in the

--- Chunk 2 ---
fine-tuning pretrained text generation models to adapt them for specific tasks and behaviors. Fine-tuning transforms base models into more useful, instruction-following systems through two main approaches: supervised fine-tuning and preference tuning. The Three LLM Training Steps 1. Language Modeling (Pretraining) – Base models are pretrained on massive text datasets using next-token prediction. –

--- Chunk 3 ---
musical score, visual effects, ambition, themes, and emotional weight. It has also received praise from many astronomers for 

## Step 8: Query DeepSeek with Retrieved Context
We combine the relevant chunks and send them to DeepSeek Chat API for a grounded answer.


In [16]:
import os
import requests
from IPython.display import Markdown, display

# Function to call DeepSeek API and get an answer using retrieved context
def ask_deepseek(question: str, context: str) -> str:
    api_key = os.getenv("DEEPSEEK_API_KEY")
    if not api_key:
        raise ValueError("DeepSeek API key not set. Please set it using os.environ['DEEPSEEK_API_KEY'].")

    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    # Limit context to avoid token overflow
    max_context_chars = 6000
    context = context[:max_context_chars]

    prompt = f"""You are a helpful assistant. Use the following context to answer the question.
    
Context:
{context}

Question: {question}
Answer:"""

    url = "https://api.deepseek.com/v1/chat/completions"
    payload = {
        "model": "deepseek-chat",
        "messages": [
            {"role": "system", "content": "You are a knowledgeable assistant."},
            {"role": "user", "content": prompt}
        ]
    }

    try:
        response = requests.post(url, headers=headers, json=payload)
        response.raise_for_status()
        result = response.json()["choices"][0]["message"]["content"]
        return result.strip()
    except requests.exceptions.RequestException as e:
        print("API request failed:", e)
        print("Full response text:", response.text if 'response' in locals() else "No response.")
        return "Failed to get a response from DeepSeek."

# Combine chunks and ask a question
combined_context = "\n".join(top_chunks)
answer = ask_deepseek(question, combined_context)

# Display result
if answer:
    display(Markdown(f"### DeepSeek Answer:\n\n{answer}"))
else:
    print("No answer received.")


### DeepSeek Answer:

The evolution of AI, particularly in the context of language models, has seen significant advancements over the decades. Here's a breakdown based on the provided context:

### **Key Stages in the Evolution of Language AI**  
1. **Bag-of-Words (1950s–2000s)**  
   - Early approach focused on counting word occurrences in text.  
   - Ignored semantic meaning and context.  

2. **Dense Vector Embeddings (2013)**  
   - Introduced by **Word2vec**, which used neural networks to represent words as vectors.  
   - Captured semantic relationships (e.g., "king – man + woman ≈ queen").  

3. **Attention Mechanisms (2014–2017)**  
   - Enhanced Recurrent Neural Networks (RNNs) by allowing models to focus on relevant parts of input text.  
   - Paved the way for the **Transformer architecture** (Vaswani et al., 2017), which revolutionized NLP.  

4. **Modern LLMs (2018+)**  
   - Two dominant architectures emerged:  
     - **Encoder-only models (e.g., BERT)**: Optimized for understanding language (e.g., classification, semantic search).  
     - **Decoder-only models (e.g., GPT)**: Specialized in generative tasks (e.g., text completion).  
   - Scale exploded (e.g., GPT-1 had 117M parameters; GPT-3 reached 175B).  

### **Training Paradigms**  
- **Pretraining**: Self-supervised learning on vast text corpora (next-token prediction).  
- **Fine-Tuning**: Adapts models to specific tasks:  
  - **Supervised Fine-Tuning (SFT)**: Uses labeled data for instruction-following.  
  - **Preference Tuning**: Aligns outputs with human preferences (e.g., safety, quality).  
- **Efficient Methods**:  
  - **Parameter-Efficient Fine-Tuning (PEFT)**: Updates only small subsets of parameters (e.g., LoRA, QLoRA).  
  - **Quantization**: Reduces computational costs (e.g., 4-bit precision with QLoRA).  

### **Applications & Challenges**  
- **Applications**: Chatbots, document retrieval, multimodal systems, and more.  
- **Considerations**: Addressing bias, transparency, harmful content, and intellectual property.  

This progression reflects a shift from rule-based systems to sophisticated, general-purpose AI capable of human-like text generation and understanding.

## Step 9: Display the Answer
Finally, we show the generated answer using the retrieved context from your file.


In [15]:
# Join retrieved context
combined_context = "\n".join(top_chunks)

# Call DeepSeek API with your question and retrieved document context
answer = ask_deepseek(question, combined_context)

# Display nicely formatted markdown output
if answer:
    display(Markdown(f"###  DeepSeek Answer:\n\n{answer}"))
else:
    print(" No answer received.")


### 🤖 DeepSeek Answer:

The provided text appears to be a collection of code snippets and explanations related to various advanced techniques in natural language processing (NLP) and machine learning. It covers topics such as:

1. **Embeddings and Semantic Search**:  
   - Using embeddings (e.g., with `co.embed`) for document retrieval and building a search index with FAISS.  
   - Comparing keyword-based search (BM25) with semantic search.

2. **Prompt Engineering**:  
   - Techniques for improving LLM outputs, such as iterative refinement, modular design, and reasoning enhancement (e.g., chain-of-thought).  
   - Methods for structured output generation (e.g., JSON constraints using `llama-cpp-python`).

3. **Advanced Text Generation**:  
   - Loading quantized models (e.g., GGUF format) for efficient inference.  
   - Using LangChain for model I/O, chains (e.g., prompt templates), and extending LLM capabilities.

4. **Text Clustering and Topic Modeling**:  
   - Unsupervised techniques for grouping similar texts using embeddings (e.g., `SentenceTransformer`), dimensionality reduction (e.g., UMAP), and clustering (e.g., HDBSCAN).

### Likely Source:  
This is likely a chapter or section from a **technical book, course material, or research documentation** focused on NLP, LLMs, or machine learning. The content is advanced and practical, with code examples and explanations tailored for practitioners.

### File Type:  
It could be part of a:  
- **Jupyter Notebook** (mix of code and markdown).  
- **Technical report or whitepaper**.  
- **Online tutorial or blog post series**.  

The file itself might be a `.ipynb`, `.md`, `.txt`, or `.py` file, depending on how it was saved.