## Embedding Techniques

Converting text into vectors

In [1]:
import os
from dotenv import load_dotenv
load_dotenv() ##load all the environment variables

True

In [2]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

There are various embedding models that are available. You can fetch these models from OpenAI documentation

https://platform.openai.com/docs/models/embeddings

The most efficient OpenAI embedding model is text-embedding-3-large

In [3]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [5]:
text = "This is a sample text to be embedded"

## Convert text to vector
vector = embeddings.embed_query(text)
vector

[0.03098198026418686,
 0.02501985803246498,
 0.012977159582078457,
 0.006327352486550808,
 -0.01767578534781933,
 -0.06428036838769913,
 0.006771550048142672,
 -0.03171902149915695,
 -0.007515170145779848,
 -0.008153499104082584,
 0.03832605108618736,
 -0.001720855594612658,
 -0.0206634271889925,
 -0.036430809646844864,
 0.04474882408976555,
 0.04474882408976555,
 -0.0247039832174778,
 -0.002699736040085554,
 -0.0008637181599624455,
 0.037694305181503296,
 -0.004070168826729059,
 -0.0032409995328634977,
 -0.020452845841646194,
 0.031060948967933655,
 0.016912423074245453,
 -0.009127444587647915,
 -0.011417531408369541,
 0.011634694412350655,
 0.06980817019939423,
 -0.026414968073368073,
 0.0012133183190599084,
 -0.03764165937900543,
 0.023677393794059753,
 -0.027165168896317482,
 -0.025309409946203232,
 0.04219551011919975,
 0.049671195447444916,
 0.02756001241505146,
 -0.003876037895679474,
 -0.024164365604519844,
 0.024138042703270912,
 -0.021229369565844536,
 0.017333589494228363,
 

In [None]:
## Get the dimension of the vector. By default text-embedding-3-small model generates 1536 dimension vectors
len(vector)

1536

In [None]:
## Limit the dimension when creating the embeddings object
embeddings_1024 = OpenAIEmbeddings(model="text-embedding-3-small", dimensions=1024)
vector_1024 = embeddings_1024.embed_query(text)
print(len(vector_1024))

In [5]:
## Load, Split Embed and Store
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader('sampletext.txt')
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_documents(docs)
texts[:3]

[Document(metadata={'source': 'sampletext.txt'}, page_content='Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with or without human intervention. The independent systems automatically respond to conditions, with procedural, algorithmic, and human-like creative steps, to produce process results. The field is closely linked to agentic automation, also known as agent-based process management systems, when applied to process automation. Applications include software development,'),
 Document(metadata={'source': 'sampletext.txt'}, page_content='Applications include software development, customer support, cybersecurity and business intelligence.'),
 Document(metadata={'source': 'sampletext.txt'}, page_content='The core concept of agentic AI is the use of AI agents to perform automated tasks with or without human intervention.[1] While robotic process automation (RPA) systems automate rule-based, repetitive tasks w

In [9]:
## Vector Embedding and Vector Store. 
from langchain_community.vectorstores import Chroma


## Create the vector store db. This will create the vectors using the embeddings model we have provided and it will store the created embeddings in the db
db = Chroma.from_documents(texts, embeddings_1024)

In [11]:
## Similarity search returns the most relevent documents to a given query based on vector similarity
query = "what is the core concept of agentic ai?"
retrieved_results = db.similarity_search(query)
print(type(db))
print(retrieved_results)

<class 'langchain_community.vectorstores.chroma.Chroma'>
[Document(metadata={'source': 'sampletext.txt'}, page_content='The core concept of agentic AI is the use of AI agents to perform automated tasks with or without human intervention.[1] While robotic process automation (RPA) systems automate rule-based, repetitive tasks with fixed logic, agentic AI adapts and learns from data inputs. [2] Agentic AI refers to autonomous systems capable of pursuing complex goals with minimal human intervention, often making decisions based on continuous learning and external data. [3] Functioning agents can require various AI'), Document(metadata={'source': 'sampletext.txt'}, page_content='Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with or without human intervention. The independent systems automatically respond to conditions, with procedural, algorithmic, and human-like creative steps, to produce process results. The 