A minimal example how we can use RAG with LLMs to make a recommendation assistant based on specific parts of custom data instead of inserting all the data into context. The goal is to use minimal dependecies and few lines of code, so we only use SBERT as the embedding model, an huggingface transformers model (e.g. Llama-3.1-8B-Instruct), and Meta's Faiss as the vector indexing library.
In this example, we use a small CSV file as our data for the top 1000 movies on IMDb. But feel free to modify the code/system prompt to use other data. Depending on the chosen embedding model the results can vary, and keep in mind the movie list is limited. Bigger LLMs have a quite broad knowledge of movies, so smaller ones would benefit more from a system like this. It is also much useful when applied to documents/data that is not in the training set of the LLM, for example, private documents.
I have only tested this using the Llama-3.1-8B-Instruct model, but you could try it on other LLMs supported by the transformers library. In theory you could run this only on the CPU, depending on the size of the LLM, but a GPU is definitely recommended.
Assuming you already have pytorch installed, install faiss, transformers and sentence-transformers:
pip install -r requirements.txt
I already include a movie vector index in the data folder, but if you want to create a new one with other embedding model use:
python index.py --embedding_model [SBERT embedding model]
If you used all the default arguments, to search the movie(s) with a query just run:
python search.py
To change the LLM use the --assistant-model argument. Other arguments such as the number of neighbours to consider in k-NN are available, see --help.