<a href="https://colab.research.google.com/github/sudarshan-koirala/youtube-stuffs/blob/main/llamaindex/OpenAI_new_embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# OpenAI Embeddings with LlamaIndex
## [Youtube video covering this notebook](https://youtu.be/egJgdJDjpAQ)

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [1]:
%%capture
!pip install llama-index

In [2]:
import os
# https://platform.openai.com/api-keys

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

### Earlier, this is what we had / still have

In [3]:
# get API key and create embeddings
from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    model="text-embedding-ada-002",
)

embeddings = embed_model.get_text_embedding(
    "Open AI new Embeddings models is awesome."
)

print(len(embeddings))

1536


## Using OpenAI `text-embedding-3-large`, `text-embedding-3-small` and `text-embedding-ada-002`

Note, you may have to update your openai client: `pip install -U openai`

In [7]:
#!pip install -U openai

In [4]:
# get API key and create embeddings
from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding(model="text-embedding-3-large")

embeddings = embed_model.get_text_embedding(
    "Open AI new Embeddings models is great."
)

In [5]:
print(embeddings[:5])

[-0.011500772088766098, 0.02457442320883274, -0.01760469563305378, -0.017763426527380943, 0.029841400682926178]


In [6]:
print(len(embeddings))

3072


In [None]:
# get API key and create embeddings
from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
)

embeddings = embed_model.get_text_embedding(
    "Open AI new Embeddings models is awesome."
)

In [None]:
print(len(embeddings))

1536


# Change the dimension of output embeddings
- https://openai.com/blog/new-embedding-models-and-api-updates

Note: Make sure you have the latest OpenAI client

### Trade-off:

- Both embeddings support a novel "dimensions" parameter that lets you shorten the embeddings to trade accuracy for smaller vector sizes.
- gain the ability to use the embedding model with a data store that supports only up to limited (512, 1024) dimensions.
- sacrifice some accuracy because the reduced-dimensional embedding may not capture all the nuances present in the original higher-dimensional embedding.

In [None]:
# get API key and create embeddings
from llama_index.embeddings import OpenAIEmbedding


embed_model = OpenAIEmbedding(
    model="text-embedding-3-large",
    #dimensions=512,
    dimensions=1024,
)

embeddings = embed_model.get_text_embedding(
    "Open AI new Embeddings models with different dimensions is awesome."
)
print(len(embeddings))

1024


In [None]:
# get API key and create embeddings
from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    dimensions=512,
)

embeddings = embed_model.get_text_embedding(
    "Open AI new Embeddings models is awesome."
)

print(len(embeddings))

512


# Conclusion
This enables very flexible usage. For example, when using a vector data store that only supports embeddings up to 1024 dimensions long, developers can now still use OpenAI's best embedding model `text-embedding-3-large` and specify a value of 1024 for the dimensions API parameter, which will shorten the embedding down from 3072 dimensions, **trading off some accuracy** in exchange for the smaller vector size.