This repo contains source code that shows a demo of how vector embeddings can help in finding similar questions from a FAQ list. The demo is based on the Sentence Transformer from HuggingFace.
An embedding is a numerical representation of a piece of information, for example, text, documents, images, audio, etc. The representation captures the semantic meaning of what is being embedded, making it robust for many industry applications.
Embeddings are not limited to text! You can also create an embedding of an image (for example, a list of 384 numbers) and compare it with a text embedding to determine if a sentence describes the image. This concept is under powerful systems for image search, classification, description, and more!
"[...] once you understand this ML multitool (embedding), you'll be able to build everything from search engines to recommendation systems to chatbots and a whole lot more. You don't have to be a data scientist with ML expertise to use them, nor do you need a huge labeled dataset." - Dale Markowitz, Google Cloud.
The process flow of the demo is as follows:
- Load the FAQ list and the question to be matched.
- Create embeddings for the FAQ list and the question to be matched.
- Calculate the cosine similarity between the question to be matched and the FAQ list.
- Sort the FAQ list by the cosine similarity.
- Return the top 5 questions from the FAQ list.
- Clone the repo using the following command
git clone https://github.com/mwanjajoel/vector-embeddings-demo.git
- Create a virtual environment and install the dependencies
pip install -r requirements.txt
- Run the demo
python app.py
- Run the LangChain version
python chat.py