# **Step 1: Install Dependencies**
Before running the code, ensure you have the necessary packages installed.

## Install Ollama
Ollama is required for embeddings and chat-based interactions.

```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull the DeepSeek-R1 model
ollama pull deepseek-r1



---

### **2️⃣ Load the PDF and Create Embeddings**

# **Step 2: Load and Process the PDF**
We will:
1. Load a PDF using `PyMuPDFLoader`.
2. Split it into smaller text chunks for efficient retrieval.
3. Generate embeddings using `OllamaEmbeddings` with DeepSeek-R1.
4. Store these embeddings in a **ChromaDB** vector database.


In [1]:
!pip install gradio langchain langchain-community chromadb pymupdf



In [2]:
import re
from concurrent.futures import ThreadPoolExecutor
import gradio as gr
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.llms import Ollama
from langchain_community.vectorstores import Chroma
from chromadb.config import Settings
from chromadb import Client


  from .autonotebook import tqdm as notebook_tqdm


In [3]:

# Load the document
loader = PyMuPDFLoader("document-20-24.pdf")
documents = loader.load()

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)


In [4]:

# Initialize embeddings model
embedding_function = OllamaEmbeddings(model="deepseek-r1")

# Generate embeddings in parallel
def generate_embedding(chunk):
    return embedding_function.embed_query(chunk.page_content)

with ThreadPoolExecutor() as executor:
    embeddings = list(executor.map(generate_embedding, chunks))

# Initialize Chroma client
client = Client(Settings())

# Delete and create new collection
try:
    client.delete_collection(name="foundations_of_llms")
except ValueError as e:
    print(f"Error deleting collection: {e}")

collection = client.create_collection(name="foundations_of_llms")

# Add documents and embeddings to Chroma
for idx, chunk in enumerate(chunks):
    collection.add(
        documents=[chunk.page_content], 
        metadatas=[{'id': idx}], 
        embeddings=[embeddings[idx]], 
        ids=[str(idx)]
    )

# Initialize retriever
retriever = Chroma(collection_name="foundations_of_llms", client=client, embedding_function=embedding_function).as_retriever()

  embedding_function = OllamaEmbeddings(model="deepseek-r1")


Error deleting collection: Collection foundations_of_llms does not exist.


  retriever = Chroma(collection_name="foundations_of_llms", client=client, embedding_function=embedding_function).as_retriever()


# **Step 3: Context Retrieval and Chat**
Now, we will:
1. **Retrieve** relevant sections from ChromaDB based on user queries.
2. **Generate responses** using `Ollama` with DeepSeek-R1.


In [6]:
# Initialize LLM for answering questions
llm = Ollama(model="deepseek-r1")

def retrieve_context(question):
    """Retrieve relevant context from stored embeddings."""
    results = retriever.invoke(question)
    context = "\n\n".join([doc.page_content for doc in results])
    return context

def query_deepseek(question, context):
    """Use DeepSeek-R1 to generate an answer."""
    formatted_prompt = f"Question: {question}\n\nContext: {context}"
    
    # Generate response
    response = llm.invoke(formatted_prompt)

    # Clean response
    final_answer = re.sub(r'<think>.*?</think>', '', response, flags=re.DOTALL).strip()
    return final_answer

def ask_question(question):
    """Retrieve context and answer the question."""
    context = retrieve_context(question)
    answer = query_deepseek(question, context)
    return answer


# **Step 4: Deploy the Chatbot UI**
Finally, we will use **Gradio** to create a simple web interface where users can input their questions and receive answers based on the processed PDF.


In [14]:
from fastapi import FastAPI
from pydantic import BaseModel
import gradio as gr
import nest_asyncio
import uvicorn
import asyncio

app = FastAPI()

class QuestionRequest(BaseModel):
    question: str

# FastAPI endpoint
@app.post("/api/predict")
async def predict(request: QuestionRequest):
    answer = ask_question(request.question)  # Call your RAG pipeline
    return {"answer": answer}

# Gradio interface
interface = gr.Interface(
    fn=ask_question,
    inputs="text",
    outputs="text",
    title="RAG Chatbot: Foundations of LLMs",
)

nest_asyncio.apply()  # Allows running an event loop inside Jupyter

async def start_server():
    config = uvicorn.Config(app, host="0.0.0.0", port=7650)
    server = uvicorn.Server(config)
    await server.serve()

# Run FastAPI server
await start_server()

# Run Gradio on a different port
interface.launch(server_name="0.0.0.0", server_port=7860, share=True)  # Use `share=True` if you want a public link


INFO:     Started server process [282329]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7650 (Press CTRL+C to quit)


INFO:     127.0.0.1:42726 - "GET / HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:42726 - "GET /favicon.ico HTTP/1.1" 404 Not Found


INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [282329]
Exception ignored in: <coroutine object start_server at 0x73de8e8cb140>
Traceback (most recent call last):
  File "<string>", line 1, in <lambda>
KeyError: '__import__'
Exception ignored in: <coroutine object start_server at 0x73de8e8cb140>
Traceback (most recent call last):
  File "<string>", line 1, in <lambda>
KeyError: '__import__'


OSError: Cannot find empty port in range: 7860-7860. You can specify a different port by setting the GRADIO_SERVER_PORT environment variable or passing the `server_port` parameter to `launch()`.

In [17]:
# Set up the Gradio interface
interface = gr.Interface(
    fn=ask_question,
    inputs="text",
    outputs="text",
    title="Volvo Chatbot: Troubleshooting",
    description="Ask any question about the Volvo manual book. Powered by DeepSeek-R1."
)

# Launch the interface
interface.launch(server_name="0.0.0.0", server_port=9990, share=True)  # Use `share=True` if you want a public link


Exception in thread Thread-33 (run):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/home/lena/deep_bot/venv/lib/python3.12/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lena/deep_bot/venv/lib/python3.12/site-packages/uvicorn/server.py", line 66, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lena/deep_bot/venv/lib/python3.12/site-packages/nest_asyncio.py", line 26, in run
    loop = asyncio.get_event_loop()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lena/deep_bot/venv/lib/python3.12/site-packages/nest_asyncio.py", line 40, in _get_event_loop
    loop = events.get_event_loop_policy().get_event_loop()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

OSError: Cannot find empty port in range: 9990-9990. You can specify a different port by setting the GRADIO_SERVER_PORT environment variable or passing the `server_port` parameter to `launch()`.

# Final Notes

    📄 This notebook will process the PDF once and store embeddings in ChromaDB, so you don’t need to reprocess it every time.
    ⚡ The chatbot retrieves only relevant sections before generating an answer using DeepSeek-R1.
    🚀 Gradio provides an easy-to-use UI for interacting with the chatbot.

Try running each cell one by one in a Jupyter Notebook, and let me know if you need any modifications! 🚀