The goal of this demo app is to showcase the quickest and easiest way to build a usable knowledge database using Retrieval-Augmented Generation (RAG). RAG enhances the ability to answer questions by combining retrieval and generation capabilities.
Use case: seed knowledge with maintenance manuals, ask questions for troubleshooting issues. Provided is an example PDF file (t60.pdf - maintenance manual for IBM T60 laptops).
- Retrieval-Augmented Generation implementation via LangChain
- Dockerized setup for easy running
- Integration with AWS Bedrock
- Usage of LangChain for language model operations
- Docker
- AWS credentials with Bedrock access (bedrock:invokeModel permission)
- Clone the repository
- Copy the
.env.example
file to.env
and fill in the required values
Build the Docker image:
docker build -t local-rag:latest image/
Run the Docker container:
docker run -it --env-file .env local-rag:latest python3 pdfloader.py
This will:
- read the provided example PDF file
- extract text from the PDF
- split text into chunks
- generate chunk embeddings (this takes some time: ~3 min with the
amazon.titan-embed-text-v2:0
model) - load embeddings to a local ChromaDB vector store
- run a loop with query-retrieve-generate operations
Some example questions and generated answers can be found in the results/
directory.
Tons of them, but here are a few:
- compare different vector stores: https://python.langchain.com/v0.1/docs/integrations/vectorstores/
- compare different chunking strategies (different tokenizers, window size, overlap, etc)
- compare different language models for generating answers
- experiment with different LLM prompt instructions
- when comparing anything, we will need a result evaluation metric/strategy
- image extraction - a whole topic on its own
- streaming responses
- experiment with local Huggingface models to cut out Bedrock dependency
- add memory to context - currently each question is treated as a separate context
- experiment with different type of text information - source code? software manuals?
- improve retrieval process with query derivation from user input
Using AWS Bedrock will incur some costs.
With the current setup (example t60.pdf file of ~200 pages mixed text and images, settings for chunking text) the whole corpus is about 200 000 tokens. Running it through the amazon.titan-embed-text-v2:0
model will cost about $0.004.
Asking questions and sending prompts to anthropic.claude-3-haiku-20240307-v1:0
results in minimal costs for input/output model tokens. For rough estimate - everything in the results/
direcory was generated for about $0.005.
Keep in mind those are one of the cheapest models, so probably a model switch will increase costs.