In [1]:
import json
import numpy as np
import requests
from IPython.display import JSON

# Embedding API Endpoint

This tutorial demonstrates how to generate embedding vectors using LLaMA C++.


## Quick Start

### Download an embedding model

In [33]:
%%bash

MODEL_URL=https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q4_K_M.gguf
curl --location --output ../models/nomic-embed-text-v1.5.Q4_K_M.gguf $MODEL_URL


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1142  100  1142    0     0   2517      0 --:--:-- --:--:-- --:--:--  2520
100 80.2M  100 80.2M    0     0  11.0M      0  0:00:07  0:00:07 --:--:-- 14.7M


### Start your embedding server

To use the `embedding` API endpoint you must start the server using the `--embedding` option (and without using the `--reranking` option). Open a new terminal and run the following command to start a new server with the embeddings endpoint enabled.

```bash
MODEL="./models/nomic-embed-text-v1.5.Q4_K_M.gguf"
llama-server \
    --model $MODEL \
    --host localhost \
    --port 8081 \
    --embedding
```


## Health Check

In [34]:
response = requests.get("http://localhost:8081/health")

print(response.content)

b'{"status":"ok"}'


## Basic Example 

In [35]:
response = requests.post(
    url="http://localhost:8081/embedding",
    json={
        "content": "Why is the sky blue?",
    }
)

In [36]:
json_data = json.loads(response.content)
JSON(json_data)

<IPython.core.display.JSON object>

In [37]:
embedding_arr = np.array(json_data["embedding"])
embedding_arr

array([ 1.96666643e-02,  7.20226467e-02, -1.48576662e-01,  7.55144237e-03,
        2.41134465e-02,  9.73183587e-02,  6.35541510e-03,  1.30311782e-02,
        1.52380746e-02, -1.89530551e-02,  4.66942638e-02,  6.17050454e-02,
        8.51472095e-02,  8.56138691e-02,  2.68520433e-02,  2.45119948e-02,
       -1.84771493e-02, -3.35437804e-02,  6.39470443e-02, -2.11282186e-02,
       -5.51399589e-02, -3.67782004e-02, -3.41663603e-03, -4.05803286e-02,
        5.31890355e-02,  2.66594663e-02,  4.12533395e-02,  3.00105219e-03,
        5.72125614e-03, -2.33889259e-02,  6.08063266e-02,  5.27451281e-03,
        2.79865433e-02, -5.12442552e-02,  4.72332276e-02, -5.50679415e-02,
        7.93430358e-02,  2.80700866e-02,  1.29281264e-02, -2.63475515e-02,
        1.74362038e-03, -2.85455156e-02, -1.56645868e-02,  4.09843726e-03,
        2.57496201e-02, -3.75840142e-02,  2.03848723e-02,  1.93328131e-02,
       -1.57258399e-02, -4.09297608e-02, -3.22022401e-02,  6.86198026e-02,
       -2.45432090e-03, -