## **Nugen Intelligence**
<img src="https://nugen.in/logo.png" alt="Nugen Logo" width="200"/>

Domain-aligned foundational models at industry leading speeds and zero-data retention! To learn more, visit [Nugen](https://docs.nugen.in/introduction)

## **Create Embeddings with Nugen API**

Text embeddings convert text strings into high-dimensional vectors of floating-point numbers, allowing machines to understand the semantic relationship between different pieces of text. The Nugen API generates highly accurate embeddings, where the distance (e.g., cosine similarity) between two embeddings indicates how closely related the texts are. Smaller distances signify greater similarity, making this technology crucial for a variety of applications such as intelligent search engines, recommendation systems, anomaly detection, and Retrieval-Augmented Generation (RAG) architectures.

To read more about Nugen API and access free API keys, you can visit [Nugen Dashboard](https://nugen-platform-frontend.azurewebsites.net/dashboard)

**Import**

Install necessary libraries

In [1]:
!pip install --quiet requests numpy

### **Nugen API Integration**

**Set up the Nugen API Key**

In [None]:
import requests
import json
import numpy as np
import time
import hashlib
import os
from urllib.parse import urlparse

# API Configuration
api_key = os.getenv("NUGEN_API_KEY")  # More secure than hardcoding
if not api_key:
    raise EnvironmentError("NUGEN_API_KEY environment variable not set.")

# URL for fetching embeddings for texts
embedding_url = "https://api.nugen.in/inference/embeddings"

parsed_url = urlparse(embedding_url)
# Ensure the URL uses HTTPS for secure communication
if parsed_url.scheme != "https":
    raise ValueError("Insecure URL scheme detected. HTTPS is required.")

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}
# Mask sensitive authorization tokens before logging
def safe_log_request(url, headers, data):
    safe_headers = headers.copy()
    if "Authorization" in safe_headers:
        safe_headers["Authorization"] = "Bearer <HIDDEN>"
    print("Requesting:", url)
    print("Headers:", safe_headers)
    print("Payload Hash:", hashlib.sha256(json.dumps(data).encode()).hexdigest())
    
# Example texts
def get_nugen_embeddings(texts, model="nugen-flash-embed", dimensions=123):
    """
    Fetch embeddings for a list of texts from Nugen API.
    """
    # Validate input: Ensure the input is a list of strings
    if not isinstance(texts, list) or not all(isinstance(t, str) for t in texts):
        raise ValueError("Input must be a list of strings.")

    # Prepare request payload for Nugen API
    data = {
        "input": texts,
        "model": model,
        "dimensions": dimensions
    }

    # Log the API request securely, masking sensitive information
    start_time = time.time()
    safe_log_request(embedding_url, headers, data)

    try:
        # Make API call with a timeout for reliability
        response = requests.post(embedding_url, headers=headers, data=json.dumps(data), timeout=10)
        elapsed = time.time() - start_time  # Measure API response time

        if response.status_code == 200:
            # Process successful API response and extract embeddings
            response_json = response.json()
            print(f"API call successful in {elapsed:.2f} seconds")
            return [entry["embedding"] for entry in response_json["data"]]
        else:
            # Handle API response errors
            print(f"API call failed in {elapsed:.2f} seconds")
            print("Error:", response.status_code, response.text)
            return None

    except requests.exceptions.Timeout:
        # Handle request timeout errors
        print("Request timed out.")
        return None
    except requests.exceptions.RequestException as e:
        # Catch general request exceptions for better error handling
        print("Request failed:", e)
        return None

**Embed a list of documents**

In [None]:
# Example documents
documents = [
    "The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.",
    "Photosynthesis in plants converts light energy into glucose and produces essential oxygen.",
    "20th-century innovations, from radios to smartphones, centered on electronic advancements.",
    "Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.",
    "Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET.",
    "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature."
]

# Get document embeddings from Nugen API
doc_embds = get_nugen_embeddings(documents, model="nugen-flash-embed", dimensions=123)
if doc_embds:
    for i, embd in enumerate(doc_embds):
        print(f"Embedding for document {i + 1}: {embd}")
else:
    print("Failed to retrieve document embeddings")

# Example query
query = "When is Apple's conference call scheduled?"

# Get the embedding for the query
query_embd = get_nugen_embeddings([query], model="nugen-flash-embed", dimensions=123)
if query_embd:
    query_embd = query_embd[0]
    print(f"Query embedding retrieved: {query_embd[:5]}")
else:
    print("Failed to retrieve query embedding")

Embedding for document 1: [0.15576171875, 0.11444091796875, -0.329833984375, -0.048736572265625, 0.10955810546875, -0.0743408203125, -0.0849609375, -0.0210418701171875, 0.13037109375, -0.050994873046875, 0.1083984375, 0.0172271728515625, 0.2490234375, -0.0007581710815429688, 0.0289764404296875, 0.038604736328125, -0.1820068359375, -0.0283203125, 0.042999267578125, 0.16162109375, -0.107666015625, -0.0022602081298828125, 0.01082611083984375, 0.020477294921875, 0.10162353515625, 0.189208984375, -0.00386810302734375, 0.168701171875, -0.03985595703125, -0.053497314453125, -0.00550079345703125, -0.239501953125, 0.049774169921875, 0.0521240234375, 0.0753173828125, -0.0865478515625, 0.11883544921875, 0.170654296875, 0.09075927734375, 0.009124755859375, -0.017822265625, 0.07745361328125, 0.09942626953125, -0.0926513671875, 0.12744140625, -0.08514404296875, 0.053924560546875, 0.0197601318359375, -0.0606689453125, -0.11737060546875, -0.01116943359375, -0.0268707275390625, -0.06787109375, -0.07818

The embeddings returned will be numerical vectors representing the documents, which we can use for semantic similarity comparisons.

**Get embeddings from the Nugen API**


        {
        "object": "list",
          "data": [
            {
              "embedding": [0.02012746, 0.01957859, ...],
              "index": 0
            },
            {
              "embedding": [0.01429677, 0.03077182, ...],
              "index": 1
            }
          ],
          "model": "nugen-flash-embed",
          "usage": {
            "total_tokens": 10
          }
        }

**Nearest Neighbor Search Using Nugen Embeddings**

Now, let's use Nugen embeddings to find the most similar documents to a given query. This process is called nearest neighbor search and helps with semantic retrieval. Here’s how it works:

* We take an example query and convert it into an embedding (a set of numbers that represent the meaning of the text).
* Then, we compare this query embedding with document embeddings using cosine similarity. The closer the numbers, the more similar the documents are.

Embed the query and compute similarities:

In [4]:

query = "When is Apple's conference call scheduled?"

# Get the embedding for the query
query_embd = get_nugen_embeddings([query], model="nugen-flash-embed", dimensions=123)
if query_embd:
    query_embd = query_embd[0]
else:
    print("Failed to retrieve query embedding")

# Compute the similarity
# Nugen embeddings are normalized to length 1, therefore dot-product is the same as cosine similarity
if doc_embds and query_embd:
    doc_embds_np = np.array(doc_embds)
    query_embd_np = np.array(query_embd)
    similarities = np.dot(doc_embds_np, query_embd_np)
    retrieved_id = np.argmax(similarities)
    print(f"The most relevant document is: {documents[retrieved_id]}")
else:
    print("Failed to compute similarities")


The most relevant document is: Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET.


**In this code:**

1. We send the query to Nugen's API to get its embedding.
2. We calculate the cosine similarity between the query embedding and the
3. document embeddings using np.dot.
4. The document with the highest similarity score is identified and printed.

This approach allows us to use Nugen's powerful embeddings to perform semantic search across a set of documents.