# Embeddings Part 1
OpenAI’s text embeddings measure the relatedness of text strings. 

Embeddings are commonly used for:
-Search (where results are ranked by relevance to a query string)
-Clustering (where text strings are grouped by similarity)
-Recommendations (where items with related text strings are recommended)
-Anomaly detection (where outliers with little relatedness are identified)
-Diversity measurement (where similarity distributions are analyzed)
-Classification (where text strings are classified by their most similar label)

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

## Getting A Simple Embedding
To get an embedding, send your text string to the embeddings API endpoint along with the embedding model name. The response will contain an embedding (list of floating point numbers), which you can extract, save in a vector database, and use for many different use cases.

By default, the length of the embedding vector will be 1536 for text-embedding-3-small or 3072 for text-embedding-3-large. You can reduce the dimensions of the embedding by passing in the dimensions parameter without the embedding losing its concept-representing properties. 

In [8]:
from openai import OpenAI
client = OpenAI()

# Create an embedding for the word "Mary" using the text-embedding-3-small model
response_small = client.embeddings.create(
    input="Mary",
    model="text-embedding-3-small"
)

# capture the embedding from the response
embedding_small = response_small.data[0].embedding

# Print the first five elements
print("========================================")
print("First fifty elements of the embedding:", embedding_small[:50])

# Get the count of all elements in the list
count_small = len(embedding_small)
print("Total number of elements in the embedding:", count_small)
print("========================================\n\n")

# Create an embedding for the word "Mary" using the text-embedding-3-large model
response_large = client.embeddings.create(
    input="Mary",
    model="text-embedding-3-large"
)

# Capture the embedding from the response
embedding_large = response_large.data[0].embedding

# Print the first five elements
print("First fifty elements of the embedding:", embedding_large[:50])

# Get the count of all elements in the list
count_large = len(embedding_large)
print("Total number of elements in the embedding:", count_large)


First fifty elements of the embedding: [0.052373871207237244, 0.008488261140882969, -0.028862453997135162, 0.037338875234127045, -0.058482576161623, -0.019699394702911377, 0.0443236380815506, 0.01948630064725876, -0.002975922543555498, -0.035231608897447586, 0.0030099586583673954, 0.00332072121091187, -0.016278045251965523, -0.009672118350863457, 0.005025476682931185, 0.02154621295630932, 0.004335879348218441, 0.020551772788167, 0.002802783390507102, 0.04226372390985489, 0.04635987430810928, 0.023298323154449463, 0.01241866871714592, 0.024363793432712555, 0.03558676689863205, -0.004063591826707125, 0.01602943427860737, 0.0005356956971809268, -0.009979921393096447, -0.020563609898090363, 0.03184577450156212, -0.02355877123773098, 0.018645761534571648, 0.010802702978253365, 0.009524136781692505, -0.016834458336234093, -0.008547453209757805, -0.0051882569678127766, -0.001015158137306571, 0.01080862246453762, 0.0394461415708065, -0.0039156097918748856, 0.0007025456288829446, 0.030519854277

## input (string or array)
Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. The input must not exceed the max input tokens for the model (8192 tokens for text-embedding-ada-002), cannot be an empty string, and any array must be 2048 dimensions or less. 

## model (string)
ID of the model to use. 

In [12]:
from openai import OpenAI
client = OpenAI()

# Create embeddings for multiple words using the text-embedding-3-small model

# Define the inputs using an array of strings
inputs = ["Mary", "had", "a", "little", "lamb"]  # This is an array of strings

# Send the request to the model
response_small = client.embeddings.create(
    input=inputs,
    model="text-embedding-3-small"
)

# Capture the embeddings from the response
embeddings_small = [item.embedding for item in response_small.data]

# Iterate through the embeddings and print the first five elements for each
print("========================================")
for i, embedding in enumerate(embeddings_small):
    print(f"First five elements of embedding {i+1}: {embedding[:5]}")

# Get the count of all elements in each embedding list
counts_small = [len(embedding) for embedding in embeddings_small]
print("Total number of elements in each embedding:", counts_small)
print("========================================\n\n")


First five elements of embedding 1: [0.05237411707639694, 0.008511978201568127, -0.02883891388773918, 0.03733905404806137, -0.05843549966812134]
First five elements of embedding 2: [-0.016209183260798454, -0.0002985325991176069, 0.012301732785999775, 0.01797427237033844, 0.05653676763176918]
First five elements of embedding 3: [0.026699936017394066, 0.008668706752359867, -0.00900365225970745, 0.026428788900375366, -0.014466452412307262]
First five elements of embedding 4: [0.026383547112345695, -0.021786311641335487, -0.004020895808935165, -0.03564542531967163, -0.0020761708728969097]
First five elements of embedding 5: [0.021510092541575432, -0.038446709513664246, -0.031512729823589325, -0.038564734160900116, -0.010275568813085556]
Total number of elements in each embedding: [1536, 1536, 1536, 1536, 1536]




## encoding_format (string)
The format to return the embeddings in. Can be either float or base64.

In [20]:
# simple example of how to use the OpenAI Python client to create embeddings for multiple words using the text-embedding-3-small model and base64 encoding
import base64
from openai import OpenAI
client = OpenAI()

# Define the inputs using an array of strings
inputs = "Mary"

# Send the request to the model
response_small = client.embeddings.create(
    input=inputs,
    model="text-embedding-3-small",
    encoding_format="float",
)

# Capture the embeddings from the response
embeddings_small = [item.embedding for item in response_small.data]

# Print the first five elements for each embedding
print("============= FLOAT ===========================")
for i, embedding in enumerate(embeddings_small):
    print(f"First five elements of embedding {i+1}: {embedding[:5]}")

# Get the count of all elements in each embedding list
counts_small = [len(embedding) for embedding in embeddings_small]
print("Total number of elements in each embedding:", counts_small)
print("========================================\n\n")



# Send the request to the model for base64 encoded embeddings
response_base64 = client.embeddings.create(
    input=inputs,
    model="text-embedding-3-small",
    encoding_format="base64",
)

# Capture the base64-encoded embeddings from the response
embeddings_base64 = [item.embedding for item in response_base64.data]

# Print the first 69 characters of each base64-encoded embedding (to keep the output concise)
print("=============== BASE64 =========================")
for i, embedding in enumerate(embeddings_base64):
    print(f"Base64 encoded embedding {i+1}: {embedding[:69]}...")  # Truncated for brevity

# Get the count of all elements in each base64-encoded embedding
# Here we assume each base64 character represents one byte, and 4 base64 characters represent 3 bytes (24 bits)
counts_base64 = [len(base64.b64decode(embedding)) // 4 for embedding in embeddings_base64]
print("Total number of elements in each base64-encoded embedding:", counts_base64)
print("========================================\n\n")

First five elements of embedding 1: [0.05237387, 0.008488261, -0.028862454, 0.037338875, -0.058482576]
Total number of elements in each embedding: [1536]


Base64 encoded embedding 1: /IVWPVkSCzz0cOy8pvAYPW2Lb72gYKG8tIw1PbyhnzyyB0O7BU8QvbpCRTt1oFk7iVmFv...
Total number of elements in each base64-encoded embedding: [1536]




In [15]:
# simple example of how to use the OpenAI Python client to create embeddings for multiple words using the text-embedding-3-small model and base64 encoding
# with an array of words
import base64
from openai import OpenAI
client = OpenAI()

# Define the inputs using an array of strings
inputs = ["Mary", "had", "a", "little", "lamb"]

# Send the request to the model
response_small = client.embeddings.create(
    input=inputs,
    model="text-embedding-3-small",
    encoding_format="float",
)

# Capture the embeddings from the response
embeddings_small = [item.embedding for item in response_small.data]

# Print the first five elements for each embedding
print("============= FLOAT ===========================")
for i, embedding in enumerate(embeddings_small):
    print(f"First five elements of embedding {i+1}: {embedding[:5]}")

# Get the count of all elements in each embedding list
counts_small = [len(embedding) for embedding in embeddings_small]
print("Total number of elements in each embedding:", counts_small)
print("========================================\n\n")



# Send the request to the model for base64 encoded embeddings
response_base64 = client.embeddings.create(
    input=inputs,
    model="text-embedding-3-small",
    encoding_format="base64",
)

# Capture the base64-encoded embeddings from the response
embeddings_base64 = [item.embedding for item in response_base64.data]

# Print the first 69 characters of each base64-encoded embedding (to keep the output concise)
print("=============== BASE64 =========================")
for i, embedding in enumerate(embeddings_base64):
    print(f"Base64 encoded embedding {i+1}: {embedding[:69]}...")  # Truncated for brevity

# Get the count of all elements in each base64-encoded embedding
# Here we assume each base64 character represents one byte, and 4 base64 characters represent 3 bytes (24 bits)
counts_base64 = [len(base64.b64decode(embedding)) // 4 for embedding in embeddings_base64]
print("Total number of elements in each base64-encoded embedding:", counts_base64)
print("========================================\n\n")


First five elements of embedding 1: [0.05237387, 0.008488261, -0.028862454, 0.037338875, -0.058482576]
First five elements of embedding 2: [-0.016156575, -0.00031287363, 0.0122825, 0.018016132, 0.056595176]
First five elements of embedding 3: [0.026696905, 0.008715566, -0.009066422, 0.026457686, -0.0144568365]
First five elements of embedding 4: [0.02638991, -0.021764595, -0.0040218653, -0.03565402, -0.002044645]
First five elements of embedding 5: [0.021542495, -0.03842237, -0.031575985, -0.038540408, -0.010291706]
Total number of elements in each embedding: [1536, 1536, 1536, 1536, 1536]


Base64 encoded embedding 1: /IVWPVkSCzz0cOy8pvAYPW2Lb72gYKG8tIw1PbyhnzyyB0O7BU8QvbpCRTt1oFk7iVmFv...
Base64 encoded embedding 2: y1qEvDAJpLmKPEk8kZaTPFjQZz1D2vK8hef7vGzrAj2OeKW8Z6CvvB8q5TwPeka8OGutP...
Base64 encoded embedding 3: eLPaPLzLDjxUixS8yr3YPF7cbLwZKTQ8dOkFvO6+CrzV5xw6TgQRvQnBqjy5Kh29NugCv...
Base64 encoded embedding 4: py/YPKpLsrzayYO78wkSvXP/BbvW1Q69t/8rvF/IBD3sbPu8RL6xvBT1DL3o3Wg8/VFnv.

## dimensions (integer)
The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

In [22]:
import base64
from openai import OpenAI
client = OpenAI()

# Define the inputs using an array of strings
inputs = ["Mary", "had", "a", "little", "lamb"]

# Send the request to the model
response_small = client.embeddings.create(
    input=inputs,
    model="text-embedding-3-small",
    encoding_format="float",
    dimensions=2,
)

# Capture the embeddings from the response
embeddings_small = [item.embedding for item in response_small.data]

# Print the first five elements for each embedding
print("========================================")
for i, embedding in enumerate(embeddings_small):
    print(f"First five elements of embedding {i+1}: {embedding[:5]}")

# Get the count of all elements in each embedding list
counts_small = [len(embedding) for embedding in embeddings_small]
print("Total number of elements in each embedding:", counts_small)
print("========================================\n\n")

First five elements of embedding 1: [0.9871198, 0.15998302]
First five elements of embedding 2: [-0.9998125, -0.019361464]
First five elements of embedding 3: [0.9506243, 0.3103442]
First five elements of embedding 4: [0.77147484, -0.6362598]
First five elements of embedding 5: [0.48905212, -0.8722546]
Total number of elements in each embedding: [2, 2, 2, 2, 2]


