## **Nugen Intelligence**
<img src="https://nugen.in/logo.png" alt="Nugen Logo" width="200"/>

# **Chat-with-PDF using Nugen APIs**
---

This documentation explains the implementation of a chat-with-PDF functionality, where PDF documents are embedded into a vector database, and queries are answered based on contextual search from these embeddings. The code uses Nugen APIs for generating embeddings and language model completions, and Qdrant as the vector database to store and retrieve these embeddings.


# Setup and Configuration
Importing Libraries and Environment Setup

In [None]:
!pip install pymupdf
!pip install qdrant-client
!pip install chainlit
import os
import sys
import hashlib
import requests
import fitz  # PyMuPDF
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, VectorParams
import chainlit as cl

# Load environment variables


Collecting pymupdf
  Downloading PyMuPDF-1.24.10-cp310-none-manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting PyMuPDFb==1.24.10 (from pymupdf)
  Downloading PyMuPDFb-1.24.10-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.4 kB)
Downloading PyMuPDF-1.24.10-cp310-none-manylinux2014_x86_64.whl (3.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.5/3.5 MB[0m [31m28.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading PyMuPDFb-1.24.10-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (15.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.9/15.9 MB[0m [31m82.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyMuPDFb, pymupdf
Successfully installed PyMuPDFb-1.24.10 pymupdf-1.24.10
Collecting qdrant-client
  Downloading qdrant_client-1.11.2-py3-none-any.whl.metadata (10 kB)
Collecting grpcio-tools>=1.41.0 (from qdrant-client)
  Downloading grpcio_tools-1.66.1-cp310-cp310-manylinux_2_17_x86_64.manylinux201

# **Explanation:**


* **os and sys:** These modules are used to interact with the system environment and handle operations such as reading environment variables and exiting the program.

* **hashlib:** Utilized for generating a unique hash of the PDF files to check for duplicates in the database.

* **request** : For making API calls to Nugen's language model and embedding API.
* **fitz (PyMuPDF):** A library for reading and extracting text from PDF files.

* **QdrantClient:** A client to connect and interact with the Qdrant vector database, where embeddings are stored.
* **dotenv:** Loads environment variables from a .env file to securely manage API keys and database URLs.

* **chainlit:** Used to interact with users and manage messages within a chat-like interface.










## **Defining Global Variables and Model Configuration**



In [None]:
USE_API_PROVIDER = "NUGEN"
if USE_API_PROVIDER == "NUGEN":
    NUGEN_API_KEY = <GET YOUR NUGEN API KEYS>
    LLM_API_URL = "https://api.nugen.in/inference"
    model_llm = "nugen-flash-instruct"
    model_embed = "nugen-flash-embed"
    EMBED_DIMENSION = 768
    EMBED_CHUNK_SIZE = int(EMBED_DIMENSION * 0.95)
    EMBED_CHUNK_OVERLAP = int(EMBED_CHUNK_SIZE * 0.10)
    LLM_API_PROVIDER_KEY = NUGEN_API_KEY
else:
    print("Unexpected USE_API_PROVIDER=", USE_API_PROVIDER)
    sys.exit()

qdrant_client = QdrantClient(os.getenv("QDRANT_CLIENT_URL"))
collection_name = "pdf_embeddings"
top_k = 5


# **USE API PROVIDER:**

This variable determines which provider's API will be used. In this case, it is set to "NUGEN", so all API calls are directed to Nugen’s services.

# **Nugen API Configuration:**


*   **NUGEN_API_KEY:** API key for **Nugen's domain-aligned model services**, loaded from environment variables.
*   **LLM_API_URL:** The endpoint for Nugen’s large language model inference API.
*   **model_llm and model_embed:** These specify which models to use for instruction-based completion and text embeddings.
      
        1. model_llm: nugen-flash-instruct (used for answering user queries).
        2. model_embed: nugen-flash-embed (used for generating embeddings from text).
    
# **Embedding Parameters:**

*   **EMBED_DIMENSION:** Dimension of the embedding vector (768 for Nugen's embeddings).

*  **EMBED_CHUNK_SIZE and EMBED_CHUNK_OVERLAP:** These control how PDF content is split into chunks for embedding. A chunk is the amount of text processed together, and overlap ensures continuity between adjacent chunks.

**QdrantClient:** The client object for connecting to Qdrant (the vector database where embeddings are stored). It connects using the URL provided by the environment variable QDRANT_CLIENT_URL.

**Collection Name:** collection_name is set to pdf_embeddings. This is the Qdrant collection where embeddings related to the PDFs will be stored.

**top_k:** Defines the number of top results to retrieve from the Qdrant database when searching for relevant context based on the user query.



# **Setting up the Qdrant Collection**

In [None]:
def setup_qdrant_collection(qdrant_client, collection_name, embed_dim):
    try:
        collections = qdrant_client.get_collections().collections
        if collection_name not in [collection.name for collection in collections]:
            qdrant_client.create_collection(
                collection_name=collection_name,
                vectors_config=VectorParams(size=embed_dim, distance="Cosine")
            )
            print(f"Collection '{collection_name}' created.")
        else:
            print(f"Collection '{collection_name}' already exists.")
    except Exception as e:
        print(f"Error setting up Qdrant collection: {e}")


* This function checks if a collection (i.e., a "bucket" for storing embeddings) already exists in Qdrant.

* If the collection does not exist, it creates a new one with vector size (embed_dim) based on the embedding dimensions of the Nugen model.

* Cosine distance is used as the metric for comparing vectors, which is standard for similarity searches.






# **Extracting Text and Splitting PDF into Chunks**

In [None]:
def pdf_to_text_chunks(pdf_path, chunk_size, overlap_size):
    doc = fitz.open(pdf_path)
    text = "".join([page.get_text() for page in doc])
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size-overlap_size)]
    return text, chunks

* This function opens the PDF using PyMuPDF (fitz) and extracts all the text from each page of the document.
* The entire text is then split into chunks of a specific size (chunk_size) with some overlap (overlap_size). Overlapping chunks help maintain continuity in embeddings, which can improve retrieval performance.



# **Generating Embeddings for Text Chunks**

In [None]:
def create_embedding(text, model_embed):
    url = f"{LLM_API_URL}/embeddings"
    headers = {
        "Authorization": f"Bearer {LLM_API_PROVIDER_KEY}",
        "Content-Type": "application/json"
    }
    data = {
        "model": model_embed,
        "input": text
    }
    response = requests.post(url, json=data, headers=headers)
    response.raise_for_status()
    return response.json()['data'][0]['embedding']

* This function calls the Nugen embedding API to generate embeddings for the given text.
* It sends a POST request to Nugen’s /embeddings endpoint with the text data and embedding model (model_embed).
* The function returns the vector embedding of the text, which is later stored in Qdrant.






# **Storing Embeddings in Qdrant**

In [None]:
def store_embeddings(chunks, file_path, file_hash, user_id, thread_id, message_id, collection_name):
    try:
        points = [
            PointStruct(id=i, vector=create_embedding(chunk, model_embed), payload={
                "chunk_text": chunk, "file_path": file_path, "file_hash": file_hash,
                "user_id": user_id, "thread_id": thread_id, "message_id": message_id
            }) for i, chunk in enumerate(chunks)
        ]
        qdrant_client.upsert(collection_name=collection_name, points=points)
        print("Embeddings stored successfully.")
    except Exception as e:
        print(f"Error storing embeddings: {e}")



* This function stores embeddings for the PDF chunks in the Qdrant collection.

* Each chunk of text is processed to generate an embedding, and then a PointStruct (which consists of the vector and some metadata) is created.
* The metadata includes the original file path, file hash, and information about the user and thread, helping in filtering later when searching for relevant context.
* upsert is used to either update or insert embeddings into Qdrant.





# **Retrieving Relevant Context from Qdrant**

In [None]:
def simple_rag_retrieve(query, top_k, user_id, thread_id, collection_name):
    try:
        query_embedding = create_embedding(query, model_embed)
        search_result = qdrant_client.search(
            collection_name=collection_name,
            query_vector=query_embedding,
            limit=top_k,
            query_filter={"must": [{"key": "user_id", "match": {"value": user_id}}]}
        )
        return "\n".join([hit.payload['chunk_text'] for hit in search_result])
    except Exception as e:
        print(f"Error retrieving context: {e}")
        return None

* **Retrieving Context:** When the user asks a question, this function retrieves relevant context by searching through the embeddings stored in Qdrant.

* **Query Embedding**: First, the user’s query is converted into an embedding.

* **Search:** The Qdrant database is searched for similar embeddings using the query vector, with a filter applied to ensure that only results belonging to the same user_id and thread_id are returned.

* The function returns the matching text chunks, combined into a single string.





# **Generating a Response Using Retrieved Context**

In [None]:
def generate_llm_response(context, query, model_llm):
    url = f"{LLM_API_URL}/chat/completions"
    headers = {
        "Authorization": f"Bearer {LLM_API_PROVIDER_KEY}",
        "Content-Type": "application/json"
    }
    data = {
        "model": model_llm,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": f"Context: {context}"},
            {"role": "user", "content": f"Answer the question: {query}"}
        ]
    }
    response = requests.post(url, json=data, headers=headers)
    response.raise_for_status()
    return response.json()["choices"][0]["message"]["content"]

* This function sends a POST request to the Nugen API to generate a response based on the retrieved context and the user query.

* **messages:** The request includes the context retrieved from Qdrant and the user’s query. The assistant uses these messages to generate a relevant response

* The Nugen API processes this request and returns a completion (answer) that is sent back to the user.





In [None]:
def embed_pdf(file_path, user_id, thread_id, message_id, collection_name):
    url = "https://api.nugen.in/inference/embeddings"
    with open(file_path, 'rb') as f:
        payload = {
            "input": f.read().decode(),  # Assuming the file is readable as text
            "model": "nugen-flash-embed",
            "dimensions": 123  # Set to your required dimensions
        }
    headers = {
        "Authorization": "Bearer <token>",  # Replace <token> with your actual API token
        "Content-Type": "application/json"
    }

    response = requests.post(url, json=payload, headers=headers)
    if response.status_code == 200:
        data = response.json()
        file_hash = data['file_hash']  # Adjust based on your API response structure
        file_chunks_count = data['chunks_count']  # Adjust based on your API response structure
        return True, file_hash, file_chunks_count
    else:
        return False, None, None

In [None]:
def simple_rag_generate(user_query, query_context):
    url = "https://api.nugen.in/inference/completions"
    payload = {
        "max_tokens": 400,
        "model": "nugen-flash-instruct",
        "prompt": user_query + " " + (query_context or ""),
        "temperature": 1
    }
    headers = {
        "Authorization": "Bearer <token>",
        "Content-Type": "application/json"
    }

    response = requests.post(url, json=payload, headers=headers)
    if response.status_code == 200:
        data = response.json()
        return data['completion']  # Adjust based on API response structure
    else:
        return "Error generating response."

# **Chainlit Integration**
Chainlit is used to handle messages and file uploads.

In [None]:
@cl.on_message
async def main(message: cl.Message):
    uploaded_files = [file for file in message.elements]

    if uploaded_files:
        for file in uploaded_files:
            embed_ready, file_hash, file_chunks_count = embed_pdf(file.path, user_id, thread_id, message_id, collection_name)
            if embed_ready:
                await cl.Message(content=f"The uploaded file '{file.name}' has been embedded.").send()
            else:
                await cl.Message(content=f"Failed to embed '{file.name}'.").send()

    user_query = message.content
    query_context = simple_rag_retrieve(user_query, top_k, user_id, thread_id, message_id, collection_name)
    answer = simple_rag_generate(user_query, query_context)

    await cl.Message(content=f"Response: {answer}").send()



---

By following this structure, the model enables users to upload PDFs, extract meaningful information from them, and ask questions that are answered based on the embedded content in the document. All of this is powered by Nugen’s APIs and the Qdrant vector database for high-quality search and retrieval.
