Implementing a Semantical cache with FAISS in a RAG System #53

peremartra · 2024-03-02T19:17:59Z

Sample how to implement a Semantical Cache in a RAG system to avoid unnecessary requests.

In the notebook, in addition to providing an example of a very simple semantic cache class, I explain that in a larger project, it is common to establish multiple layers of semantic cache depending on user characteristics, in order to achieve better data grouping for cache maintenance.

Thanks for reviewing it @MKhalusova

review-notebook-app · 2024-03-02T19:18:04Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

HuggingFaceDocBuilderDev · 2024-03-02T19:21:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

aymeric-roucher · 2024-03-05T14:36:05Z

notebooks/en/semantic_cache_chroma_vector_database.ipynb

@@ -0,0 +1,1389 @@
+{


In this paragraoh:

The semantic comparison will be performed using the Euclidean distance of question embeddings. This is because, semantically, "What is the capital of France?" is essentially the same as "Tell me the name of the capital of France?"
I feel like the two sentences do not belong together: the "This is because" does not make sense to me.
Is it possible to reformulate this? Also, why do you use Euclidian distance rather than the widely used Cosine

Reply via ReviewNB

I've modified the introduction:

In this notebook, we will explore a typical RAG solution where we will utilize an open-source model and the vector database Chroma DB. However, we will integrate a semantic cache system that will store various user queries and decide whether to generate the prompt enriched with information from the vector database or the cache.

A semantic caching system aims to identify similar or identical user requests. When a matching request is found, the system retrieves the corresponding information from the cache, reducing the need to fetch it from the original source.

As the comparison takes into account the semantic meaning of the request, they don't have to be identical for the system to recognize them as the same question. They can be formulated differently or contain inaccuracies, be they typographical or in the sentence structure, and we can identify that the user is actually requesting the same information.

For instance, queries like What is the capital of France?, Tell me the name of the capital of France?, and What The capital of France is? all convey the same intent and should be identified as the same question.
While the model's response may differ based on the request for a concise answer in the second example, the information retrieved from the vector database should be the same. This is why I'm placing the cache system between the user and the vector database, not between the user and the Large Language Model.

_____

I've used Euclidean distance instead of Cosine because, as far as I know, it's the default distance metric used by Faiss. It can also be calculated with Cosine, but I believe using it introduces added complexity that may not contribute significantly to the explanation.
facebookresearch/faiss#95

I think this justification about why to use the Euclidean distance is perfectly sound, you should put it somewhere in the notebook!

Hi, what about this text:

I've used Euclidean distance instead of Cosine, which is widely employed in vector comparisons. This choice is based on the fact that Euclidean distance is the default metric used by Faiss. Although Cosine distance can also be calculated, doing so adds complexity that may not significantly contribute to the final result.

aymeric-roucher · 2024-03-05T14:36:05Z

notebooks/en/semantic_cache_chroma_vector_database.ipynb

@@ -0,0 +1,1389 @@
+{


We can assume the reader to already know pandas and numpy if they're advanced enough to be building a nontrivial RAG system. So I'd remove this explanation.

Reply via ReviewNB

aymeric-roucher · 2024-03-05T14:36:06Z

notebooks/en/semantic_cache_chroma_vector_database.ipynb

@@ -0,0 +1,1389 @@
+{


Use backticks "`" to type variables as code: MAX_ROWS
Also, MAX_ROWS is not reused later. So maybe remove this reference, or replace with MAX_NEWS?

Reply via ReviewNB

Replaced with MAX_NEWS

Sorry, MAX_ROWS is the correct one

aymeric-roucher · 2024-03-05T14:36:06Z

notebooks/en/semantic_cache_chroma_vector_database.ipynb

@@ -0,0 +1,1389 @@
+{


Line #2. subset_data = data.head(MAX_NEWS)
Replace with MAX_ROWS?

Reply via ReviewNB

replaced with MAX_ROWS

aymeric-roucher · 2024-03-05T14:36:06Z

notebooks/en/semantic_cache_chroma_vector_database.ipynb

@@ -0,0 +1,1389 @@
+{


Typo: In metadatas, we can informa a list of topics.
"In the document we store the big text, it's a different column in each Dataset." This sentence does not make sense to me
Using the function add we need to inform, at least documents, metadatas and ids. => Correct the commas placement => "When using the function add we need to inform at least documents, metadatas, and ids"

Reply via ReviewNB

Reformuled:

We are now ready to add the data to the collection using the add function. This function requires three key pieces of information:

* In the **document** we store the content of the Answer column in the Dataset.
* In **metadatas**, we can inform a list of topics. I used the value in the column qtype.
* In **id** we need to inform an unique identificator for each row. I'm creating the ID using the range of MAX_ROWS.

Thank you ! Very clear now!

aymeric-roucher · 2024-03-05T14:36:06Z

notebooks/en/semantic_cache_chroma_vector_database.ipynb

@@ -0,0 +1,1389 @@
+{


Please add link to sources or explain more on FlatLS, faiss library, reasons to pick HNSW or IVF.

Reply via ReviewNB

The init_cache() function below initializes the semantic cache.
It employs the FlatLS index, which might not be the fastest but is ideal for small datasets. Depending on the characteristics of the data intended for the cache and the expected dataset size, another index such as HNSW or IVF could be utilized.
I chose this index because it aligns well with the example. It can be used with vectors of high dimensions, consumes minimal memory, and performs well with small datasets.
I outline the key features of the various indices available with Faiss.
FlatL2 o FlatIP. Well-suited for small datasets, it may not be the fastest, but its memory consumption is not excessive.
LSH. It works effectively with small datasets and is recommended for use with vectors of up to 128 dimensions.
HNSW. Very fast but demands a substantial amount of RAM.
IVF. Works well with large datasets without consuming much memory or compromising performance.
More information about the different indices available with Faiss can be found at this link: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index

aymeric-roucher · 2024-03-05T14:36:06Z

notebooks/en/semantic_cache_chroma_vector_database.ipynb

@@ -0,0 +1,1389 @@
+{


Line #4. print('Index trained')
Poor indentation

Reply via ReviewNB

Please correct the indentation!

ups... corrected.

notebooks/en/semantic_cache_chroma_vector_database.ipynb

aymeric-roucher · 2024-03-05T14:41:03Z

@peremartra thanks a lot for this PR!

I feel like there's a few things to do before merging:

Check the wording, spelling and grammar everywhere
Add a more narrative introduction: why would anyone want to implement semantic caching in the first place? What could be the potential pitfalls if poorly set up? Start from a user painpoint ! ("I have millions of requests overloading my RAG system, how can I alleviate my suffering?")
Make the layout more visual: use code highlighting for your variables, make explanation paragraphs more readable, add numbers to different parts.

This will really be a great addition to our cookbook! 🔥

cc @MKhalusova

peremartra · 2024-03-05T19:39:54Z

@peremartra thanks a lot for this PR!

I feel like there's a few things to do before merging:

Check the wording, spelling and grammar everywhere

Add a more narrative introduction: why would anyone want to implement semantic caching in the first place? What could be the potential pitfalls if poorly set up? Start from a user painpoint ! ("I have millions of requests overloading my RAG system, how can I alleviate my suffering?")

Make the layout more visual: use code highlighting for your variables, make explanation paragraphs more readable, add numbers to different parts.

This will really be a great addition to our cookbook! 🔥

cc @MKhalusova

@aymeric-roucher Thank you very much for the corrections. I just uploaded a new version; I believe I've modified everything and reviewed the notebook. I hope that if everything isn't perfect, at least I've taken a big step in the right direction!

MKhalusova

The new recipe looks great, thanks for working on it! Once @aymeric-roucher approves we can merge.

notebooks/en/semantic_cache_chroma_vector_database.ipynb

aymeric-roucher · 2024-03-06T18:07:51Z

notebooks/en/semantic_cache_chroma_vector_database.ipynb

@@ -0,0 +1,1430 @@
+{


Line #30. print(f'{D[0][0]} smaller than {self.euclidean_threshold}')
Please format all times in seconds as f"{number:.3f}" to make them more readable !

Reply via ReviewNB

aymeric-roucher · 2024-03-06T18:08:21Z

Only a few more minor fixes and we'll be good to go!

…using Euclidean Distance instead of Cosine.

MKhalusova · 2024-03-07T13:36:37Z

Can we merge now @aymeric-roucher ?

MKhalusova · 2024-03-11T14:02:13Z

I think the notebook is ready to be merged.

peremartra added 6 commits March 2, 2024 19:42

Creado con Colaboratory

1d773d8

Creado con Colaboratory

edb240b

Creado con Colaboratory

74ad965

Creado con Colaboratory

311d734

Update _toctree.yml

97b8054

Update index.md

b06d1bf

peremartra added 4 commits March 3, 2024 11:08

Typos in the text, and modifications in instructions.

71af17e

Remove the colab button

a998363

Typo in toctree.yml and index.md

2083b9f

Semantic Cache Notebook working fine

7b18c9c

MKhalusova requested review from aymeric-roucher and MKhalusova March 5, 2024 14:14

aymeric-roucher reviewed Mar 5, 2024

View reviewed changes

Revised version, typos corrected and more explanations added.

0f924c7

MKhalusova and others added 2 commits March 6, 2024 08:09

Merge branch 'main' into semantical_cache

52eeef8

Indentation correction

e630f71

MKhalusova approved these changes Mar 6, 2024

View reviewed changes

aymeric-roucher reviewed Mar 6, 2024

View reviewed changes

notebooks/en/semantic_cache_chroma_vector_database.ipynb Show resolved Hide resolved

aymeric-roucher reviewed Mar 6, 2024

View reviewed changes

Formatted numbers to .3 decimals and added explanation about why I'm …

aedb6cc

…using Euclidean Distance instead of Cosine.

MKhalusova merged commit 5749703 into huggingface:main Mar 11, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing a Semantical cache with FAISS in a RAG System #53

Implementing a Semantical cache with FAISS in a RAG System #53

peremartra commented Mar 2, 2024

review-notebook-app bot commented Mar 2, 2024

HuggingFaceDocBuilderDev commented Mar 2, 2024

aymeric-roucher Mar 5, 2024 •

edited

Loading

peremartra Mar 5, 2024

aymeric-roucher Mar 6, 2024

peremartra Mar 6, 2024

aymeric-roucher Mar 5, 2024 •

edited

Loading

peremartra Mar 5, 2024

aymeric-roucher Mar 5, 2024 •

edited

Loading

peremartra Mar 5, 2024

peremartra Mar 5, 2024

aymeric-roucher Mar 5, 2024 •

edited

Loading

peremartra Mar 5, 2024

aymeric-roucher Mar 5, 2024 •

edited

Loading

peremartra Mar 5, 2024

aymeric-roucher Mar 6, 2024

aymeric-roucher Mar 5, 2024 •

edited

Loading

peremartra Mar 5, 2024

aymeric-roucher Mar 5, 2024 •

edited

Loading

aymeric-roucher Mar 6, 2024

peremartra Mar 6, 2024

aymeric-roucher commented Mar 5, 2024

peremartra commented Mar 5, 2024

MKhalusova left a comment

aymeric-roucher Mar 6, 2024 •

edited

Loading

aymeric-roucher commented Mar 6, 2024

MKhalusova commented Mar 7, 2024

MKhalusova commented Mar 11, 2024

Implementing a Semantical cache with FAISS in a RAG System #53

Implementing a Semantical cache with FAISS in a RAG System #53

Conversation

peremartra commented Mar 2, 2024

review-notebook-app bot commented Mar 2, 2024

HuggingFaceDocBuilderDev commented Mar 2, 2024

aymeric-roucher Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aymeric-roucher Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aymeric-roucher Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aymeric-roucher Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aymeric-roucher Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aymeric-roucher Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aymeric-roucher Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aymeric-roucher commented Mar 5, 2024

peremartra commented Mar 5, 2024

MKhalusova left a comment

Choose a reason for hiding this comment

aymeric-roucher Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

aymeric-roucher commented Mar 6, 2024

MKhalusova commented Mar 7, 2024

MKhalusova commented Mar 11, 2024

aymeric-roucher Mar 5, 2024 •

edited

Loading

aymeric-roucher Mar 5, 2024 •

edited

Loading

aymeric-roucher Mar 5, 2024 •

edited

Loading

aymeric-roucher Mar 5, 2024 •

edited

Loading

aymeric-roucher Mar 5, 2024 •

edited

Loading

aymeric-roucher Mar 5, 2024 •

edited

Loading

aymeric-roucher Mar 5, 2024 •

edited

Loading

aymeric-roucher Mar 6, 2024 •

edited

Loading