## **Nugen Intelligence**
<img src="https://nugen.in/logo.png" alt="Nugen Logo" width="200"/>

Domain-aligned foundational models at industry leading speeds and zero-data retention!

### **Integrating Nugen’s Completion Model with LangChain’s Embedding Model**

### Introduction

This guide demonstrates how to combine the strengths of Nugen’s completion API with LangChain’s embedding model to build a text-based application. You can leverage LangChain's embedding capabilities for document storage and retrieval, while using Nugen's completion model for generating natural language outputs based on those embeddings.

**Key Terms:**

* **Nugen Completion Model**: An API that generates text completions based on input prompts.
* **LangChain Embedding Model**: A framework used for generating text embeddings and storing them in vector stores for retrieval.
* **Embeddings**: Numerical representations of text for semantic understanding.
* **Vector Store**: A data structure that stores embeddings for fast retrieval.
* **Completions**: Generated text responses based on a given input prompt.

**Step 1**: Set Up the Environment

Install Required Libraries

In [9]:
pip install --quiet requests langchain PyMuPDF langchain-community


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: C:\Users\parimal\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


**Step 2**: Extract Text from PDF
We start by extracting text from a PDF file using PyMuPDF.

In [10]:
import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path):
    """
    Extracts text from a PDF file.
    
    Args:
        pdf_path (str): Path to the PDF file.
        
    Returns:
        str: The extracted text from the PDF.
    """
    document = fitz.open(pdf_path)
    text = ""
    for page_num in range(len(document)):
        page = document.load_page(page_num)
        text += page.get_text("text")
    return text

**Step 3**: Split Text into Chunks

Since LangChain’s embedding models work better on smaller chunks of text, split the extracted text into manageable pieces.

In [11]:
def split_text_into_chunks(text, chunk_size=500):
    """
    Splits the extracted text into smaller chunks for embedding.
    
    Args:
        text (str): The full text to split.
        chunk_size (int): The maximum size of each chunk in characters.
        
    Returns:
        list: A list of text chunks.
    """
    return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

**Step 4**: Generate Embeddings Using LangChain

Here we use LangChain’s embedding model to generate embeddings for the text chunks.

In [12]:
from langchain.embeddings import OpenAIEmbeddings

# Use OpenAI embeddings (or any supported embedding model)
def generate_langchain_embeddings(text_chunks):
    """
    Generates embeddings using LangChain's embedding model.
    
    Args:
        text_chunks (list): A list of text chunks.
        
    Returns:
        list: Embedding vectors for each text chunk.
    """
    embedding_model = OpenAIEmbeddings()  # Use OpenAI's embedding model
    embeddings = [embedding_model.embed(text_chunk) for text_chunk in text_chunks]
    return embeddings

**Step 5**: Create a Vector Store
The embeddings will be stored in a vector store using LangChain’s FAISS implementation, which allows for fast similarity-based retrieval.

In [13]:
from langchain.vectorstores import FAISS

def create_langchain_vector_store(text_chunks):
    """
    Creates a vector store using LangChain and the generated embeddings.
    
    Args:
        text_chunks (list): List of text chunks to embed.
        
    Returns:
        FAISS: A FAISS-based vector store with the embeddings.
    """
    embeddings = generate_langchain_embeddings(text_chunks)
    vector_store = FAISS.from_texts(text_chunks, embeddings)
    return vector_store

**Step 6**: Use Nugen’s Completion Model for Generating Responses

Now, you can use Nugen’s completion API to generate responses for specific queries. You can query the vector store for relevant chunks, and based on those, generate completions using Nugen’s model.

**Generate Completion Using Nugen:**

In [14]:
import requests

def get_nugen_completion(prompt, max_tokens=400, model="nugen-flash-instruct", temperature=1):
    """
    Calls Nugen's completion API to generate a text response based on a prompt.
    
    Args:
        prompt (str): The input prompt for generating the completion.
        max_tokens (int): Maximum number of tokens in the response.
        model (str): The model to use for completions (default is "nugen-flash-instruct").
        temperature (float): Sampling temperature (default is 1 for randomization).
        
    Returns:
        str: The generated text completion.
    """
    url = "https://api.nugen.in/inference/completions"
    
    payload = {
        "max_tokens": max_tokens,
        "model": model,
        "prompt": prompt,
        "temperature": temperature
    }
    
    headers = {
        "Authorization": "Bearer <your_api_token>",  # Replace with your Nugen API token
        "Content-Type": "application/json"
    }

    response = requests.post(url, json=payload, headers=headers)

    if response.status_code == 200:
        return response.json().get('completion')
    else:
        raise Exception(f"Error generating completion: {response.text}")

**Step 7**: Query the Vector Store and Generate a Response

Now that we have the vector store and Nugen’s completion model integrated, let’s retrieve the most relevant chunks and use them to generate a completion with Nugen.

In [None]:
def query_and_generate_response(query, vector_store):
    """
    Queries the vector store and generates a completion using Nugen's API.
    
    Args:
        query (str): The user query to search for in the document.
        vector_store (FAISS): The vector store containing embeddings.
        
    Returns:
        str: The generated response based on the most relevant chunks.
    """
    # Query the vector store to get relevant chunks
    relevant_chunks = vector_store.similarity_search(query)
    
    # Combine the most relevant chunks to create the input prompt for Nugen
    combined_text = " ".join([chunk for chunk in relevant_chunks])
    
    # Use Nugen's completion model to generate a response based on the combined text
    completion = get_nugen_completion(combined_text)
    
    return completion   

**Step 8**: Complete Process from PDF to Query and Response

Here’s the full process, from extracting text from a PDF to generating a response using LangChain’s embeddings and Nugen’s completion model.

In [None]:
def process_pdf_and_generate_response(pdf_path, query):
    """
    Processes a PDF by extracting text, generating embeddings, and querying the vector store.
    Uses Nugen's completion model to generate a response based on the query.
    
    Args:
        pdf_path (str): Path to the PDF file.
        query (str): The user query to search for.
        
    Returns:
        str: The generated response based on the query.
    """
    # Step 1: Extract text from the PDF
    text = extract_text_from_pdf(pdf_path)
    
    # Step 2: Split the text into chunks
    text_chunks = split_text_into_chunks(text)
    
    # Step 3: Create a vector store using LangChain
    vector_store = create_langchain_vector_store(text_chunks)
    
    # Step 4: Query the vector store and generate a response using Nugen
    response = query_and_generate_response(query, vector_store)
    
    return response

# Example usage
pdf_path = "legal_service_authorities_act_1987.pdf"  # Replace with your PDF path
query = "What should the Central Authority consist of?"

# Process the PDF and get a generated response
generated_response = process_pdf_and_generate_response(pdf_path, query)

# Output the response
print(f"Generated Response: {generated_response}")

FileNotFoundError: no such file: 'sample.pdf'

### **Conclusion**
By combining LangChain’s embedding model with Nugen’s completion model, we can build a robust system that allows for:

Fast and accurate document retrieval using embeddings.
Natural language completions based on the retrieved content, enhancing the user experience in applications like Q&A systems, chatbots, and document summarization tools.
This integration highlights the power of using Nugen’s completion API for text generation and LangChain for handling text embeddings and retrieval. With these tools, developers can create intelligent, scalable applications that meet diverse NLP use cases.

Nugen provides cutting-edge language models that are easy to integrate into modern NLP workflows, while LangChain offers the flexibility to build more complex, multi-step language processing pipelines.