# Module 3 Project 2: Retrieval Augmented Generation

Implement a simple RAG pipeline with a PDF document for knowledge enhancement

In [None]:
!pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-llama-cpp
!pip install llama-index

## IMPORTS
- We will be using `llama-index` for wrapping `llama-cpp-python`
- We also need it for reading PDF files, and to create our vector store and query engine

In [None]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import (
    messages_to_prompt,
    completion_to_prompt,
)

## STEP 1: MODEL
- Build the `llama-cpp-python` wrapper with `max_new_tokens` set to 1024 and a temperature of 0.1
- We shorten the context window to 3900 as well from the default 4096 for llama-2

In [None]:
model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"

llm = LlamaCPP(
    model_url=model_url,
    model_path=None,
    temperature=0.1,
    max_new_tokens=1024,
    context_window=3900,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": 1},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

## STEP 2: TOKENIZER
- Set our global tokenizer to be [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf) pretrained tokenizer
- We want to match our Tokenizer to our LLM (llama -> llama)

In [None]:
from llama_index.core import set_global_tokenizer
from transformers import AutoTokenizer

set_global_tokenizer(
    AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode
)

## STEP 3: EMBEDDING MODEL
- We need a way to create our vector store using the [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) embedding model
- We will load our embedding model using the `HuggingFaceEmbedding` construct with our embedding model

In [None]:
# use Huggingface embeddings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

## STEP 4: QUERY ENGINE
- We build our query engine using our vector store created above 
- We want to first load our documents into a variable to pass into the vector index
- We then pass in the documents with our embedding model
- Finally, we create our query engine using our `llm` variable created above

In [None]:
documents = SimpleDirectoryReader(
    "./documents"
).load_data()

index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

query_engine = index.as_query_engine(llm=llm)

## STEP 5: DEMO
- Now we can run our demo
- We use subject matter from the paper where the base LLM normally hallucinates an answer
- We can confirm if it answers correctly based on the contents of the provided PDF

In [None]:
query="Tell me about pragmatic truth"
response = query_engine.query(query)
print(response)
print("\n")