## **Nugen Intelligence**
<img src="https://nugen.in/logo.png" alt="Nugen Logo" width="200"/>

Domain-aligned foundational models at industry leading speeds and zero-data retention!

### **Using Nugen's Embedding Model with LlamaIndex for PDF Content Retrieval**

### **Introduction**
In this cookbook, you will learn how to use Nugen’s powerful embedding models to convert PDF content into embeddings and how to use LlamaIndex to index and retrieve that data efficiently. This guide provides step-by-step instructions, from extracting text from PDFs to performing semantic searches using the generated embeddings.

Nugen offers state-of-the-art embedding models for natural language understanding that can transform unstructured text into meaningful vectors. LlamaIndex is an efficient tool for indexing and querying text data based on semantic similarity, making it an excellent choice for creating search engines or knowledge retrieval systems.


## Key Terms:

* Embedding: A numerical representation of text, allowing machines to understand and process language in a meaningful way.
* Nugen API: An API that provides embedding and completion models for text processing.
* LlamaIndex: A framework for building retrieval-augmented generation (RAG) systems that index and retrieve information based on embeddings.
* Vector Store: A data structure used to store embeddings for fast, similarity-based retrieval.
* Semantic Search: A search method that uses the meaning of the query rather than just keyword matching.

**What You Will Learn**

* How to use Nugen API to get embeddings.
* How to create documents with embeddings and index them using LlamaIndex.
* How to query your indexed documents with ease.

**Q. What Are Embeddings?**

Embeddings are a way to convert text into numerical data (vectors). These vectors help computers understand and process human language. Think of them as the "DNA" of text, where each piece of text has its own unique vector representation.

### Step 1: Set Up the Environment

**Install Required Libraries**

Before you begin, ensure you have the necessary Python libraries installed. These libraries include requests (for making HTTP requests to Nugen’s API), llama_index (for indexing and querying), and PyMuPDF (for extracting text from PDF files).

In [None]:
pip install --quiet -U requests llama_index

Note: you may need to restart the kernel to use updated packages.


### **Step 2: Get Your Nugen API Key**

To use Nugen's embedding models, you will need to obtain an API key. 
You can access **Nugen API** key from **[here](https://docs.nugen.in/)** for **FREE**! 

Once you have the API key, store it securely, as it will be used to authenticate requests to Nugen’s API.

### **Step 3: Fetch Embeddings from Nugen API**

Nugen provides a powerful API for fetching embeddings for your text. We’ll send a request to their API to get embeddings for the sentence “The quick brown fox jumped over the lazy dog.”


In [None]:
import requests

# Your Nugen API key
api_key = "<--nugen api key-->"

# Function to fetch embeddings
def get_embeddings(text):
    url = "https://api.nugen.in/inference/embeddings"
    payload = {
        "input": text,
        "model": "nugen-flash-embed",  # Model used for embeddings
        "dimensions": 123  # Dimensions of the embedding vector
    }
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=payload, headers=headers)
    
    if response.status_code == 200:
        return response.json()  # Return the embeddings as JSON
    else:
        print(f"Error fetching embeddings: {response.text}")
        return None


**What Does This Code Do?**

* Sends a POST request to the Nugen API to get embeddings for the given text.
* Uses the API key for authentication.
* Returns the embeddings in JSON format.

**How to Use:**

You can use this function by passing a string (text) to get its embeddings. For example:

In [20]:
text = "The quick brown fox jumped over the lazy dog"
embeddings = get_embeddings(text)
print(embeddings)

{'id': 'nugen-1731428062.281123', 'data': [{'index': 0, 'embedding': [-0.013336181640625, 0.044891357421875, -0.346923828125, -0.04205322265625, 0.019744873046875, 0.07000732421875, -0.076171875, 0.039398193359375, 0.01233673095703125, -0.167236328125, -0.09100341796875, 0.1190185546875, 0.11151123046875, -0.0006594657897949219, 0.042724609375, -0.08740234375, 0.179443359375, -0.0765380859375, 0.1322021484375, -0.060272216796875, 0.0416259765625, 0.08697509765625, 0.10504150390625, -0.05859375, 0.19873046875, -0.0229034423828125, -0.1722412109375, 0.1328125, -0.0927734375, 0.205322265625, -0.0235443115234375, 0.00870513916015625, -0.053802490234375, -0.08502197265625, -0.011749267578125, -0.1087646484375, 0.09429931640625, 0.02069091796875, 0.1297607421875, 0.0193328857421875, -0.036468505859375, 0.102294921875, -0.0286407470703125, 0.0141143798828125, 0.1427001953125, -0.1298828125, 0.12103271484375, 0.0833740234375, -0.017059326171875, -0.046356201171875, -0.06494140625, 0.0805664062

### **Step 4: Create a Document with Embeddings and Index It**

Once we have the embeddings for the text, we need to create a document using LlamaIndex and add these embeddings. LlamaIndex helps you create an index of documents so that you can efficiently search through them.

**Code to Create Document and Index:**

In [21]:
from llama_index.core import VectorStoreIndex, Document

# Create a document with embeddings
def create_index_with_embeddings(embeddings, text):
    # Create a Document object with both text and embeddings
    doc_with_embeddings = Document(text=text, embeddings=embeddings)
    
    # Create an index from the document
    index = VectorStoreIndex.from_documents([doc_with_embeddings])
    
    return index

**What Does This Code Do?**

* It creates a Document using both the text and the embeddings fetched from the Nugen API.
* It creates an index of that document using VectorStoreIndex, which will allow us to perform fast queries later.

**How to Use:**

Once you have the embeddings from Nugen, you can create the index with:

In [22]:
index = create_index_with_embeddings(embeddings, "The quick brown fox jumped over the lazy dog")

### **Step 5: Query the Indexed Document**

Now that the document is indexed, you can query it to find out what is inside. LlamaIndex allows you to query your documents by asking specific questions.

In [23]:
def query_index(index, query_text):
    # Query the index to find relevant information
    query_response = index.as_query_engine().query(query_text)
    return query_response

**How to Use:**

After creating the index, you can query it by passing a question:

In [24]:
query_response = query_index(index, "What is in this document?")
print(query_response)

A sentence about a quick brown fox jumping over a lazy dog.


**Conclusion**

Congratulations! You’ve just built an application that:

* Fetches text embeddings using Nugen API.
* Creates a document with embeddings and indexes it using LlamaIndex.
* Queries that document efficiently.

This cookbook is a great start to using powerful Nugen APIs for text understanding and generation. Whether you are building intelligent applications or exploring advanced AI capabilities, Nugen provides the tools you need to succeed!