# Embeddings and document Q&A

In [None]:
%pip install --quiet -U llama-index-vector-stores-chroma llama-index sentence-transformers sentencepiece InstructorEmbedding pydantic llama-index-embeddings-huggingface llama-index-embeddings-instructor


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


# Embeddings

Computers only know how to talk in numbers, so embeddings **convert text to numbers**. I've already written [a lot of words about embeddings](https://investigate.ai/text-analysis/word-embeddings/), and that will hopefully be helpful to understanding them.

For example, we can take a look at seeing what "cat" turns into.

In [2]:
from sentence_transformers import SentenceTransformer
sentences = ["cat"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings[0][:25])

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

[ 0.03733034  0.05116179 -0.00030603  0.0602099  -0.11749439 -0.01423008
  0.10577617  0.02678619  0.02633773 -0.02570083 -0.02349044 -0.05955525
 -0.03021392  0.01632024 -0.02907015 -0.02168977 -0.06624995  0.00185666
 -0.02400625 -0.02846258 -0.04663163  0.04970476  0.00308295  0.00176269
 -0.06775755]


Generating embeddings also works for **entire sentences** (or paragraphs, or books, or anything!).

In [3]:
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings[0][:50])

[ 0.0676569   0.06349587  0.04871311  0.07930495  0.03744802  0.00265279
  0.03937487 -0.00709845  0.05936147  0.03153699  0.06009806 -0.05290519
  0.04060678 -0.02593081  0.02984278  0.00112692  0.07351492 -0.05038196
 -0.12238662  0.02370284  0.02972653  0.04247687  0.02563379  0.0019952
 -0.05691909 -0.02715985 -0.03290356  0.06602491  0.11900706 -0.04587916
 -0.07262147 -0.032584    0.05234136  0.04505525  0.00825304  0.03670236
 -0.01394151  0.06539194 -0.02642725  0.00020639 -0.01366436 -0.03628098
 -0.01950433 -0.02897385  0.03942706 -0.08840908  0.00262427  0.01367138
  0.04830632 -0.03115654]


## Document similarity

Because everything is "just numbers," we can use those numbers to compare sentences. We're going to use a dataset below because we'll be able to see how different embeddings get us to different results.

In [4]:
import pandas as pd

sentences = [
    "Molly ate a fish",
    "Jen consumed a carp",
    "I would like to sell you a house",
    "Я пытаюсь купить дачу",
    "J'aimerais vous louer un grand appartement",
    "This is a wonderful investment opportunity",
    "write some more sentences 1",
    "write some more sentences 2",
    "write some more sentences 3",
    "write some more sentences 4",
]

In [5]:
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)

In [6]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute similarities exactly the same as we did before!
similarities = cosine_similarity(embeddings)

# Turn into a dataframe
pd.DataFrame(similarities,
            index=sentences,
            columns=sentences) \
            .style \
            .background_gradient(axis=None)

Unnamed: 0,Molly ate a fish,Jen consumed a carp,I would like to sell you a house,Я пытаюсь купить дачу,J'aimerais vous louer un grand appartement,This is a wonderful investment opportunity,write some more sentences 1,write some more sentences 2,write some more sentences 3,write some more sentences 4
Molly ate a fish,1.0,0.526053,0.025476,0.098335,0.020435,-0.065292,0.149934,0.137241,0.165194,0.140539
Jen consumed a carp,0.526053,1.0,0.044178,0.035044,-0.018194,-0.004439,0.117251,0.13183,0.13493,0.139572
I would like to sell you a house,0.025476,0.044178,1.0,0.154773,0.083555,0.386736,0.175895,0.174119,0.151783,0.169388
Я пытаюсь купить дачу,0.098335,0.035044,0.154773,1.0,0.159518,0.064379,0.242579,0.247275,0.225213,0.251053
J'aimerais vous louer un grand appartement,0.020435,-0.018194,0.083555,0.159518,1.0,0.032253,0.068274,0.080694,0.063166,0.086606
This is a wonderful investment opportunity,-0.065292,-0.004439,0.386736,0.064379,0.032253,1.0,0.101699,0.111003,0.100311,0.123455
write some more sentences 1,0.149934,0.117251,0.175895,0.242579,0.068274,0.101699,1.0,0.940995,0.922456,0.914263
write some more sentences 2,0.137241,0.13183,0.174119,0.247275,0.080694,0.111003,0.940995,1.0,0.924031,0.918736
write some more sentences 3,0.165194,0.13493,0.151783,0.225213,0.063166,0.100311,0.922456,0.924031,1.0,0.925028
write some more sentences 4,0.140539,0.139572,0.169388,0.251053,0.086606,0.123455,0.914263,0.918736,0.925028,1.0


In [7]:
model = SentenceTransformer('sentence-transformers/distiluse-base-multilingual-cased-v2')
embeddings = model.encode(sentences)

README.md:   0%|          | 0.00/2.46k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/539M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.58M [00:00<?, ?B/s]

  torch.load(os.path.join(input_path, "pytorch_model.bin"), map_location=torch.device("cpu"))


In [8]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute similarities exactly the same as we did before!
similarities = cosine_similarity(embeddings)

# Turn into a dataframe
pd.DataFrame(similarities,
            index=sentences,
            columns=sentences) \
            .style \
            .background_gradient(axis=None)

Unnamed: 0,Molly ate a fish,Jen consumed a carp,I would like to sell you a house,Я пытаюсь купить дачу,J'aimerais vous louer un grand appartement,This is a wonderful investment opportunity,write some more sentences 1,write some more sentences 2,write some more sentences 3,write some more sentences 4
Molly ate a fish,1.0,0.358347,0.05834,0.145439,-0.024103,-0.070145,0.073381,0.074465,0.084661,0.073484
Jen consumed a carp,0.358347,1.0,0.059195,0.190241,-0.001941,-0.024359,0.104019,0.105506,0.101851,0.088226
I would like to sell you a house,0.05834,0.059195,1.0,0.418692,0.642746,0.081795,0.130625,0.127598,0.122881,0.125155
Я пытаюсь купить дачу,0.145439,0.190241,0.418692,1.0,0.351605,0.120679,0.057963,0.053296,0.055026,0.054982
J'aimerais vous louer un grand appartement,-0.024103,-0.001941,0.642746,0.351605,1.0,0.203307,0.151041,0.13572,0.126648,0.121582
This is a wonderful investment opportunity,-0.070145,-0.024359,0.081795,0.120679,0.203307,1.0,-0.05942,-0.049101,-0.059882,-0.060564
write some more sentences 1,0.073381,0.104019,0.130625,0.057963,0.151041,-0.05942,1.0,0.840774,0.812648,0.77632
write some more sentences 2,0.074465,0.105506,0.127598,0.053296,0.13572,-0.049101,0.840774,1.0,0.850235,0.793673
write some more sentences 3,0.084661,0.101851,0.122881,0.055026,0.126648,-0.059882,0.812648,0.850235,1.0,0.836848
write some more sentences 4,0.073484,0.088226,0.125155,0.054982,0.121582,-0.060564,0.77632,0.793673,0.836848,1.0


Why would you care about all of this? **Searching through your documents!** Sometimes you don't know exactly the words you're looking for, you just want something that kind of captures a feeling.

That's how John Keefe and Jeremy Merrill helped ICIJ navigate through over 300gb of multilingual data in [the Luanda Leaks](https://qz.com/1786896/ai-for-investigations-sorting-through-the-luanda-leaks). Jeremy is also releasing a tool today or tomorrow, you can sneak a peek at it [over here](https://github.com/jeremybmerrill/meaningfully), or check out [Semantra](https://github.com/freedmand/semantra) by Dylan Freedman.

You can also use embeddings for [general similarity clustering](https://www.commons-project.com/dockets/FDA-2019-N-5959), too!

# Retrieval-augmented generation/document-based Q&A

These days everyone loves to search across documents. Let's see how that works with embeddings!

In [9]:
import os
os.environ['OPENAI_API_KEY'] = 'XXXXXXXX'

In [10]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("documents").load_data()
index = VectorStoreIndex.from_documents(documents)

Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 14 0 (offset 0)
Ignoring wrong pointing object 23 0 (offset 0)
Ignoring wrong pointing object 25 0 (offset 0)
Ignoring wrong pointing object 27 0 (offset 0)
Ignoring wrong pointing object 29 0 (offset 0)
Ignoring wrong pointing object 31 0 (offset 0)
Ignoring wrong pointing object 45 0 (offset 0)
Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 14 0 (offset 0)
Ignoring wrong pointing object 23 0 (offset 0)
Ignoring wrong pointing object 25 0 (offset 0)
Ignoring wrong pointing object 27 0 (offset 0)
Ignoring wrong pointing object 29 0 (offset 0)
Ignoring wrong pointing object 31 0 (offset 0)
Ignoring wrong po

In [11]:
query_engine = index.as_query_engine()
response = query_engine.query("Who was the red cow?")
print(response)

The red cow was the one who saw the one-eyed girl spying on Ferkó in the stable and warned her not to reveal who gave Ferkó food, threatening her with dire consequences if she did.


In [12]:
query_engine = index.as_query_engine()
response = query_engine.query("Why was the red cow helping Ferko?")
print(response)

The red cow was helping Ferko because it wanted to protect him from the dangerous animals that were threatening him.


What happens if we run it again?

In [13]:
query_engine = index.as_query_engine()
response = query_engine.query("Why was the red cow helping Ferko?")
print(response)

The red cow was helping Ferko because it instructed him to blow a whistle if he needed help, promising to return immediately. This shows that the red cow was looking out for Ferko's safety and well-being, ready to assist him whenever he needed help or was in danger.


What if we want to provide more context? By default it only gives the top 2 most relevant documents.

In [14]:
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("Why was the red cow helping Ferko?")
print(response)

The red cow was helping Ferko because it was actually his mother who had transformed into the cow to nourish him. She saw how poorly he was being treated and wanted to provide him with proper food and care without revealing her true identity to anyone.


In [15]:
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("As a landlord, can I discriminate against poor people?")
print(response)

As a landlord, you cannot discriminate against individuals based on their lawful source of income, including those who receive public assistance or vouchers. Discriminating against individuals because they are poor or receive subsidies is against the law, as protected under the NYC Human Rights Law.


In [16]:
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("Can I take my employees tips? I run the restaurant, I deserve them.")
print(response)

No, as an employer, you are not entitled to take your employees' tips. Employees who regularly receive tips as part of their job are entitled to keep their tips, and employers are required to ensure that the employees' hourly tips and wages add up to at least the normal minimum wage.


**Easy citations**

In [17]:
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("""
Can I take my employees tips? I run the restaurant, I deserve them. 
Cite filenames and page numbers where you retrieved information.
""")
print(response)

Employers are not allowed to take their employees' tips. If employees regularly receive tips as part of their job, the employer must ensure that the employees receive at least the normal minimum wage when combining their hourly tips and wage. Employers are required to keep a weekly record of the amount of tips earned by each employee. Additionally, no tip credit is permitted for fast food employees. This information can be found on page 3 of the document "Wages and pay _ New York State Attorney General.pdf."


**Sources**

In [18]:
for node in response.source_nodes:
    print("-----")
    text_fmt = node.node.get_content().strip().replace("\n", " ")[:1000]
    print(f"Text:\t {text_fmt} ...")
    print(f"Metadata:\t {node.node.metadata}")
    print(f"Score:\t {node.score:.3f}")


-----
Text:	 long people under age 18 can work, and for some workers, such as drivers, safety laws limit how many hours of work are permitted. For more information, contact the New York State Department of Labor at 1- 800-662-1220 or visit , or contact the U.S. Department of Labor at 1-866-4-USWAGE or visit . If I receive tips, how much is my employer required to pay me? Your employer may be able to pay you less than the normal minimum wage if all of the factors below are met: You regularly receive tips as part of your job. Your hourly tips and wage add up to at least the normal minimum wage. Your employer informed you that it is paying you less than the minimum wage because you regularly receive tips. Your employer keeps a weekly record of the amount of tips you earned each week. Tips can come in the form of cash, check, credit card, or any other form of payment. No tip credit is permitted for fast food employees. The minimum wage and maximum allowable tip credit for food service work

What happens if we run it again?

There's also plenty of [other customizations we can make](https://docs.llamaindex.ai/en/stable/getting_started/customization.html)

## Local embedding

These embeddings are currently coming from OpenAI, which makes them slow and expensive. They're pretty good, though! I can spill more words about them but the [general idea](https://openai.com/blog/new-embedding-models-and-api-updates) is:

> Both of our new embedding models were trained with a technique Matryoshka Representation Learning that allows developers to trade-off performance and cost of using embeddings. Specifically, developers can shorten embeddings (i.e. remove some numbers from the end of the sequence) without the embedding losing its concept-representing properties by passing in the dimensions API parameter. For example, on the MTEB benchmark, a text-embedding-3-large embedding can be shortened to a size of 256 while still outperforming an unshortened text-embedding-ada-002 embedding with a size of 1536.

The "best" embeddings (and tools for using them) are constantly changing, and [the major leaderboard](https://huggingface.co/spaces/mteb/leaderboard) is always being upset.

In [19]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# loads BAAI/bge-small-en
# embed_model = HuggingFaceEmbedding()

# loads BAAI/bge-small-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [20]:
embeddings = embed_model.get_text_embedding("Hello World!")
print(embeddings[:5])

[-0.003275728551670909, -0.011690821498632431, 0.041559234261512756, -0.03814810886979103, 0.024183064699172974]


In [21]:
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# loads BAAI/bge-small-en
# embed_model = HuggingFaceEmbedding()

# loads BAAI/bge-small-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")
Settings.embed_model = embed_model

README.md:   0%|          | 0.00/15.8k [00:00<?, ?B/s]

In [22]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("documents").load_data()
index = VectorStoreIndex.from_documents(documents)

Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 14 0 (offset 0)
Ignoring wrong pointing object 23 0 (offset 0)
Ignoring wrong pointing object 25 0 (offset 0)
Ignoring wrong pointing object 27 0 (offset 0)
Ignoring wrong pointing object 29 0 (offset 0)
Ignoring wrong pointing object 31 0 (offset 0)
Ignoring wrong pointing object 45 0 (offset 0)
Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 14 0 (offset 0)
Ignoring wrong pointing object 23 0 (offset 0)
Ignoring wrong pointing object 25 0 (offset 0)
Ignoring wrong pointing object 27 0 (offset 0)
Ignoring wrong pointing object 29 0 (offset 0)
Ignoring wrong pointing object 31 0 (offset 0)
Ignoring wrong po

In [23]:
query_engine = index.as_query_engine()
response = query_engine.query("Why was the red cow helping Ferko?")
print(response)

The red cow was helping Ferko because she was actually his deceased first queen mother who had transformed herself into a cow to nourish and protect him.


## Other things to research

RAG goes very deep, and is an incredibly active field of research. You might want to look at [RAGatouille](https://github.com/bclavie/RAGatouille) and [ColBERT embeddings](https://github.com/stanford-futuredata/ColBERT) if you're interested in more recent/interesting items. Also **[reranking](https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/CohereRerank.html)** to improve relevant results