Building upon kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference
with a Streamlit front-end. It allows to load any .txt
file or .pdf
document with text and asking any questions about it using LLMs on CPU.
Build the docker image:
docker build -t document-ama .
Run it:
docker run -d -p 8501:8501 --name document-ama document-ama:latest
Open the Streamlit app in your browser:
http://localhost:8501/
Init a VENV, if desired:
python3 -m venv venv
source venv/bin/activate
Install requirements:
pip install -r requirements.txt
Run streamlit:
streamlit run app.py
A new browser window will pop up with the streamlit app.
(Also see the medium article for the original code architecture)
It is currently set up to run with the 8-bit quantized version of Llama2 that runs on GGML as the Q&A model and sentence-transformers/all-MiniLM-L6-v2 as the embeddings model. If those models are not present (e.g. in the first run), it will first download them.
Then, it will ask for a file to be uploaded. It needs to be either a .txt
file or a .pdf
file with selectable
text in them. It will not try to OCR the document.
Once the document is uploaded, Langchain is being used to:
- Load those documents into it;
- Extract the text with PyPDF;
- Split it into 500-character chunks (with 50-character overlap);
- Calculate the embeddings vector of each chunk;
- and finally load those embeddings with their respective chunks into a FAISS database.
The FAISS files are being saved on disk under a folder with the checksum of each file, so if the exact same file is uploaded again it will just reuse the previously created database instead of re-doing them.
After that is done, it asks for the question. It will then load the LLM model into memory using CTransformers and keep using Langchain to:
- Load the FAISS db to memory;
- Build a prompt template with the question and the hardcoded prompt template string;
- Build a RetrievalQA database with the LLM to load the context relative to the question into the prompt template;
- Ask the RetrievalQA database the question
After the RetrievalQA returns with an answer, it will display the answer and the relevant passages of the text to that answer.