Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing a Semantical cache with FAISS in a RAG System #53

Merged
merged 14 commits into from
Mar 11, 2024

Conversation

peremartra
Copy link
Contributor

Sample how to implement a Semantical Cache in a RAG system to avoid unnecessary requests.

In the notebook, in addition to providing an example of a very simple semantic cache class, I explain that in a larger project, it is common to establish multiple layers of semantic cache depending on user characteristics, in order to achieve better data grouping for cache maintenance.

Thanks for reviewing it @MKhalusova

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@@ -0,0 +1,1389 @@
{
Copy link
Collaborator

@aymeric-roucher aymeric-roucher Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this paragraoh:

The semantic comparison will be performed using the Euclidean distance of question embeddings. This is because, semantically, "What is the capital of France?" is essentially the same as "Tell me the name of the capital of France?"

I feel like the two sentences do not belong together: the "This is because" does not make sense to me.

Is it possible to reformulate this? Also, why do you use Euclidian distance rather than the widely used Cosine


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the introduction:

In this notebook, we will explore a typical RAG solution where we will utilize an open-source model and the vector database Chroma DB. However, we will integrate a semantic cache system that will store various user queries and decide whether to generate the prompt enriched with information from the vector database or the cache.

A semantic caching system aims to identify similar or identical user requests. When a matching request is found, the system retrieves the corresponding information from the cache, reducing the need to fetch it from the original source.

As the comparison takes into account the semantic meaning of the request, they don't have to be identical for the system to recognize them as the same question. They can be formulated differently or contain inaccuracies, be they typographical or in the sentence structure, and we can identify that the user is actually requesting the same information.

For instance, queries like What is the capital of France?Tell me the name of the capital of France?, and What The capital of France is? all convey the same intent and should be identified as the same question.

While the model's response may differ based on the request for a concise answer in the second example, the information retrieved from the vector database should be the same. This is why I'm placing the cache system between the user and the vector database, not between the user and the Large Language Model.

_____

I've used Euclidean distance instead of Cosine because, as far as I know, it's the default distance metric used by Faiss. It can also be calculated with Cosine, but I believe using it introduces added complexity that may not contribute significantly to the explanation.

facebookresearch/faiss#95

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this justification about why to use the Euclidean distance is perfectly sound, you should put it somewhere in the notebook!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, what about this text:

I've used Euclidean distance instead of Cosine, which is widely employed in vector comparisons. This choice is based on the fact that Euclidean distance is the default metric used by Faiss. Although Cosine distance can also be calculated, doing so adds complexity that may not significantly contribute to the final result.

@@ -0,0 +1,1389 @@
{
Copy link
Collaborator

@aymeric-roucher aymeric-roucher Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assume the reader to already know pandas and numpy if they're advanced enough to be building a nontrivial RAG system. So I'd remove this explanation.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -0,0 +1,1389 @@
{
Copy link
Collaborator

@aymeric-roucher aymeric-roucher Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use backticks "`" to type variables as code: MAX_ROWS

Also, MAX_ROWS is not reused later. So maybe remove this reference, or replace with MAX_NEWS?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with MAX_NEWS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, MAX_ROWS is the correct one

@@ -0,0 +1,1389 @@
{
Copy link
Collaborator

@aymeric-roucher aymeric-roucher Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    subset_data = data.head(MAX_NEWS)

Replace with MAX_ROWS?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with MAX_ROWS

@@ -0,0 +1,1389 @@
{
Copy link
Collaborator

@aymeric-roucher aymeric-roucher Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Typo: In metadatas, we can informa a list of topics.
  • "In the document we store the big text, it's a different column in each Dataset." This sentence does not make sense to me
  • Using the function add we need to inform, at least documents, metadatas and ids. => Correct the commas placement => "When using the function add we need to inform at least documents, metadatas, and ids"


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reformuled:

We are now ready to add the data to the collection using the add function. This function requires three key pieces of information:

* In the **document** we store the content of the Answer column in the Dataset.

* In **metadatas**, we can inform a list of topics. I used the value in the column qtype.

* In **id** we need to inform an unique identificator for each row. I'm creating the ID using the range of MAX_ROWS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ! Very clear now!

@@ -0,0 +1,1389 @@
{
Copy link
Collaborator

@aymeric-roucher aymeric-roucher Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add link to sources or explain more on FlatLS, faiss library, reasons to pick HNSW or IVF.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init_cache() function below initializes the semantic cache.

It employs the FlatLS index, which might not be the fastest but is ideal for small datasets. Depending on the characteristics of the data intended for the cache and the expected dataset size, another index such as HNSW or IVF could be utilized.

I chose this index because it aligns well with the example. It can be used with vectors of high dimensions, consumes minimal memory, and performs well with small datasets.

I outline the key features of the various indices available with Faiss.

  • FlatL2 o FlatIP. Well-suited for small datasets, it may not be the fastest, but its memory consumption is not excessive.
  • LSH. It works effectively with small datasets and is recommended for use with vectors of up to 128 dimensions.
  • HNSW. Very fast but demands a substantial amount of RAM.
  • IVF. Works well with large datasets without consuming much memory or compromising performance.

More information about the different indices available with Faiss can be found at this link: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index

@@ -0,0 +1,1389 @@
{
Copy link
Collaborator

@aymeric-roucher aymeric-roucher Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #4.                print('Index trained')

Poor indentation


Reply via ReviewNB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct the indentation!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups... corrected.

@aymeric-roucher
Copy link
Collaborator

@peremartra thanks a lot for this PR!

I feel like there's a few things to do before merging:

  • Check the wording, spelling and grammar everywhere
  • Add a more narrative introduction: why would anyone want to implement semantic caching in the first place? What could be the potential pitfalls if poorly set up? Start from a user painpoint ! ("I have millions of requests overloading my RAG system, how can I alleviate my suffering?")
  • Make the layout more visual: use code highlighting for your variables, make explanation paragraphs more readable, add numbers to different parts.

This will really be a great addition to our cookbook! 🔥

cc @MKhalusova

@peremartra
Copy link
Contributor Author

@peremartra thanks a lot for this PR!

I feel like there's a few things to do before merging:

  • Check the wording, spelling and grammar everywhere
  • Add a more narrative introduction: why would anyone want to implement semantic caching in the first place? What could be the potential pitfalls if poorly set up? Start from a user painpoint ! ("I have millions of requests overloading my RAG system, how can I alleviate my suffering?")
  • Make the layout more visual: use code highlighting for your variables, make explanation paragraphs more readable, add numbers to different parts.

This will really be a great addition to our cookbook! 🔥

cc @MKhalusova

@aymeric-roucher Thank you very much for the corrections. I just uploaded a new version; I believe I've modified everything and reviewed the notebook. I hope that if everything isn't perfect, at least I've taken a big step in the right direction!

Copy link
Contributor

@MKhalusova MKhalusova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new recipe looks great, thanks for working on it! Once @aymeric-roucher approves we can merge.

@@ -0,0 +1,1430 @@
{
Copy link
Collaborator

@aymeric-roucher aymeric-roucher Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #30.                      print(f'{D[0][0]} smaller than {self.euclidean_threshold}')

Please format all times in seconds as f"{number:.3f}" to make them more readable !


Reply via ReviewNB

@aymeric-roucher
Copy link
Collaborator

Only a few more minor fixes and we'll be good to go!

@MKhalusova
Copy link
Contributor

Can we merge now @aymeric-roucher ?

@MKhalusova
Copy link
Contributor

I think the notebook is ready to be merged.

@MKhalusova MKhalusova merged commit 5749703 into huggingface:main Mar 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants