## Retrieval-Augmented Generation

#### Steps to Download and Run any open LLM model compliant with LlamaFile, usually with .llamafile extension

0. you can use [HugingFace](https://huggingface.co/) to download an open llm for local server runtime and embeding capabilites.
 
   here's how to start, you can select a set of different of models available on the site; but since we're looking for llamafile llms lets :
   - > filter by models
   - > select libraries  
   - > check llamafile  
   - > download for example `llava-v1.5-7b-q4.llamafile` from `Mozzila` project

1. **Download the LLaVA Model**  
   Download the file `llava-v1.5-7b-q4.llamafile` (4.29 GB).

2. **Move file for better organization**  
   Move the `.llamafile` model to `openllms/llamafiles` subfolder

3. **Grant Execution Permission (macOS, Linux, BSD)**  
   If you're using macOS, Linux, or BSD, you need to allow the file to be executed. Run the following command:  
   ```bash
   chmod +x llava-v1.5-7b-q4.llamafile

4. **Satrt server without browser chatbot UI**
    ```bash
    ./llava-v1.5-7b-q4.llamafile  --server --nobrowser --embedding

### Retrieval-Augmented Generation 

#### Indexing 
<img src="../assets/rag/how-indexing-works.jpg" width="600px">

#### Retrieval
<img src="../assets/rag/rag-system-retrieval.jpg" width="600px">

#### Quering 
<img src="../assets/rag/how-rag-works-prompt-query.jpg" width="600px">

#### RAG overall architecure
<img src="../assets/rag/rag-all-steps.png" width="600px">

### Notebook debut

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from langchain_community.embeddings import LlamafileEmbeddings
from langchain_openai import ChatOpenAI
from langchain_chroma import Chroma

In [3]:
from langchain import hub
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph

In [4]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [5]:
from pathlib import Path
from typing_extensions import List, TypedDict

In [6]:
documents_path = Path().resolve().parent / "assets" / "data"

#### Point to local LLamafile running localy as a server  

In [7]:
# llm = Llamafile()
llm = ChatOpenAI(
    model="LLaMA_CPP", base_url="http://localhost:8080/v1", api_key="sk-no-key-required"
)

#### Instanciate an embedding instance based on the local llamafile llm  

In [8]:
embeddings = LlamafileEmbeddings()

#### Instanciate a vectore/embeeding database for vectores indexation

In [9]:
vector_store = Chroma(embedding_function=embeddings)

#### Embeddings functions for corpus/text into a dense numerical representation of text in a vector space.

In [10]:
# Step 1: Load the PDF document
def load_pdf(file_path: str):
    loader = PyPDFLoader(file_path)
    documents = loader.load()
    return documents


# Step 2: Split the document into chunks
def split_documents(documents: list[Document]):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    return text_splitter.split_documents(documents)

#### Start creating embedding documents from a pdf through PyPDF 

In [11]:
all_splits = []
for file in documents_path.glob("*.pdf"):
    documents = load_pdf(str(file))
    all_splits.extend(split_documents(documents=documents))

#### Indexing embeddings/vectors into a special dedicated database 

In [12]:
# Index chunks
_ = vector_store.add_documents(documents=all_splits)

#### Retrieval functions

In [13]:
# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

In [14]:
# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

#### Query

##### you can now query your pdf documents that were in *`assets/data`* folder

In [15]:
response = graph.invoke({"question": "What can you tell me about Zlatan Ibrahimovic ?"})
print(response["answer"])

Zlatan Ibrahimovic is a professional soccer player who was born on October 3, 1981, in Malm√∂, Sweden. He is known for his skills as a striker and has played for several teams, including Ajax, Juventus, Inter Milan, Barcelona, and Paris Saint-Germain. He has also played for the Swedish national team and has won numerous awards, including the Golden Ball. Ibrahimovic is known for his outspoken comments and has referred to himself in the third person. He has scored many memorable goals throughout his career, including a record 12 times for the Swedish player of the year award.</s>
