## **Nugen Intelligence**
<img src="https://nugen.in/logo.png" alt="Nugen Logo" width="200"/>

Domain-aligned foundational models at industry leading speeds and zero-data retention!

## **Create Embeddings with Nugen API**

Text embeddings convert text strings into high-dimensional vectors of floating-point numbers, allowing machines to understand the semantic relationship between different pieces of text. The Nugen API generates highly accurate embeddings, where the distance (e.g., cosine similarity) between two embeddings indicates how closely related the texts are. Smaller distances signify greater similarity, making this technology crucial for a variety of applications such as intelligent search engines, recommendation systems, anomaly detection, and Retrieval-Augmented Generation (RAG) architectures.

To read more about Nugen API and access free API keys, you can visit [Nugen Intelligence](https://docs.nugen.in/introduction)

**Import**

Install necessary libraries

In [1]:
!pip install --quiet requests numpy

### **Nugen API Integration**

**Set up the Nugen API Key**

In [None]:
import requests
import json
import numpy as np

# Replace with your Nugen API key
api_key = "<--nugen-api-key-->"

# URL for fetching embeddings for texts
embedding_url = "https://api.nugen.in/inference/embeddings"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Example texts
def get_nugen_embeddings(texts, model="nugen-flash-embed", dimensions=123):
    """Fetch embeddings for a list of texts from Nugen API."""
    data = {
        "input": texts,
        "model": model,
        "dimensions": dimensions
    }

    response = requests.post(embedding_url, headers=headers, data=json.dumps(data))

    if response.status_code == 200:
        response_json = response.json()
        return [entry["embedding"] for entry in response_json["data"]]
    else:
        print("Error:", response.status_code, response.text)
        return None

**Embed a list of documents**

In [3]:
# Example documents
documents = [
    "The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.",
    "Photosynthesis in plants converts light energy into glucose and produces essential oxygen.",
    "20th-century innovations, from radios to smartphones, centered on electronic advancements.",
    "Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.",
    "Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET.",
    "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature."
]

# Get document embeddings from Nugen API
doc_embds = get_nugen_embeddings(documents, model="nugen-flash-embed", dimensions=123)
if doc_embds:
    for i, embd in enumerate(doc_embds):
        print(f"Embedding for document {i + 1}: {embd}")
else:
    print("Failed to retrieve document embeddings")

# Example query
query = "When is Apple's conference call scheduled?"

# Get the embedding for the query
query_embd = get_nugen_embeddings([query], model="nugen-flash-embed", dimensions=123)
if query_embd:
    query_embd = query_embd[0]
    print(f"Length of embedding: {len(embd)}; Embedding for document {i + 1}: {embd[:5]}")
else:
    print("Failed to retrieve query embedding")

Embedding for document 1: [0.1556396484375, 0.11444091796875, -0.330078125, -0.048828125, 0.1094970703125, -0.07427978515625, -0.0849609375, -0.02117919921875, 0.1304931640625, -0.0509033203125, 0.10845947265625, 0.017242431640625, 0.2490234375, -0.0006623268127441406, 0.0290985107421875, 0.038543701171875, -0.1820068359375, -0.0284271240234375, 0.04302978515625, 0.16162109375, -0.107666015625, -0.002361297607421875, 0.01080322265625, 0.0204620361328125, 0.10162353515625, 0.189208984375, -0.0038166046142578125, 0.168701171875, -0.039794921875, -0.0533447265625, -0.005504608154296875, -0.239501953125, 0.049652099609375, 0.0521240234375, 0.0753173828125, -0.08673095703125, 0.11883544921875, 0.1705322265625, 0.09088134765625, 0.00904083251953125, -0.0179443359375, 0.07763671875, 0.099365234375, -0.09259033203125, 0.12744140625, -0.085205078125, 0.054046630859375, 0.019866943359375, -0.0606689453125, -0.11737060546875, -0.01117706298828125, -0.0269775390625, -0.06787109375, -0.078186035156

The embeddings returned will be numerical vectors representing the documents, which we can use for semantic similarity comparisons.

**Get embeddings from the Nugen API**


        {
        "object": "list",
          "data": [
            {
              "embedding": [0.02012746, 0.01957859, ...],
              "index": 0
            },
            {
              "embedding": [0.01429677, 0.03077182, ...],
              "index": 1
            }
          ],
          "model": "nugen-flash-embed",
          "usage": {
            "total_tokens": 10
          }
        }

**Nearest Neighbor Search Using Nugen Embeddings**

Now, let's use Nugen embeddings to find the most similar documents to a given query. This process is called nearest neighbor search and helps with semantic retrieval. Here’s how it works:

* We take an example query and convert it into an embedding (a set of numbers that represent the meaning of the text).
* Then, we compare this query embedding with document embeddings using cosine similarity. The closer the numbers, the more similar the documents are.

Embed the query and compute similarities:

In [4]:

query = "When is Apple's conference call scheduled?"

# Get the embedding for the query
query_embd = get_nugen_embeddings([query], model="nugen-flash-embed", dimensions=123)
if query_embd:
    query_embd = query_embd[0]
else:
    print("Failed to retrieve query embedding")

# Compute the similarity
# Nugen embeddings are normalized to length 1, therefore dot-product is the same as cosine similarity
if doc_embds and query_embd:
    doc_embds_np = np.array(doc_embds)
    query_embd_np = np.array(query_embd)
    similarities = np.dot(doc_embds_np, query_embd_np)
    retrieved_id = np.argmax(similarities)
    print(f"The most relevant document is: {documents[retrieved_id]}")
else:
    print("Failed to compute similarities")


The most relevant document is: Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET.


**In this code:**

1. We send the query to Nugen's API to get its embedding.
2. We calculate the cosine similarity between the query embedding and the
3. document embeddings using np.dot.
4. The document with the highest similarity score is identified and printed.

This approach allows us to use Nugen's powerful embeddings to perform semantic search across a set of documents.