MovieSearch

A minimal example how we can use RAG with LLMs to make a recommendation assistant based on specific parts of custom data instead of inserting all the data into context. The goal is to use minimal dependecies and few lines of code, so we only use SBERT as the embedding model, an huggingface transformers model (e.g. Llama-3.1-8B-Instruct), and Meta's Faiss as the vector indexing library.

In this example, we use a small CSV file as our data for the top 1000 movies on IMDb. But feel free to modify the code/system prompt to use other data. Depending on the chosen embedding model the results can vary, and keep in mind the movie list is limited. Bigger LLMs have a quite broad knowledge of movies, so smaller ones would benefit more from a system like this. It is also much useful when applied to documents/data that is not in the training set of the LLM, for example, private documents.

I have only tested this using the Llama-3.1-8B-Instruct model, but you could try it on other LLMs supported by the transformers library. In theory you could run this only on the CPU, depending on the size of the LLM, but a GPU is definitely recommended.

Usage examples

Instructions

Assuming you already have pytorch installed, install faiss, transformers and sentence-transformers:

pip install -r requirements.txt

Creating the vector index file

I already include a movie vector index in the data folder, but if you want to create a new one with other embedding model use:

python index.py --embedding_model [SBERT embedding model]

Search

If you used all the default arguments, to search the movie(s) with a query just run:

python search.py

To change the LLM use the --assistant-model argument. Other arguments such as the number of neighbours to consider in k-NN are available, see --help.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
common.py		common.py
examples.png		examples.png
index.py		index.py
requirements.txt		requirements.txt
search.py		search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MovieSearch

Usage examples

Instructions

Creating the vector index file

Search

About

Releases

Packages

Languages

License

samuel-vitorino/MovieSearch

Folders and files

Latest commit

History

Repository files navigation

MovieSearch

Usage examples

Instructions

Creating the vector index file

Search

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages