## RAG Hands-on tutorial

1. Prepare data 
2. Create a vector store
3. Search the vector store and retrieve relevant documents
4. Call LLM with the user query and the retrieved documents
4. Return the LLM response to the user

Will be using [Langchain framework](https://www.langchain.com/)

Suggested code references:
- Langchain RAG from scratch [link](https://github.com/langchain-ai/rag-from-scratch/tree/main)
- Langchain RAG quickstart [link](https://python.langchain.com/v0.1/docs/use_cases/question_answering/quickstart/)

In [2]:
# basic imports
import os
import json
import logging
import sys
import mlflow

from dotenv import load_dotenv
load_dotenv(override=True)

# create and configure logger
logging.basicConfig(level=logging.INFO, datefmt='%Y-%m-%dT%H:%M:%S',
                    format='%(asctime)-15s.%(msecs)03dZ %(levelname)-7s : %(name)s - %(message)s',
                    handlers=[logging.StreamHandler(sys.stdout)]
                    )
# create log object with current module name
log = logging.getLogger(__name__)

## 1. Prepare data
- Load data from different sources
- Will be using NCSA Delta documentation as an example - in delta_docs folder


### 1.1 Data Loaders
- Langchain provides different data loaders for different file types
- Data loaded in Langchain Document class format [document class](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html)


In [6]:
# data loaders
from langchain_community.document_loaders import CSVLoader, DataFrameLoader, PyPDFLoader, Docx2txtLoader, UnstructuredRSTLoader, DirectoryLoader


class DataLoaders:
    """
    various data loaders
    """
    def __init__(self, data_dir_path):
        self.data_dir_path = data_dir_path
    
    def csv_loader(self):
        csv_loader_kwargs = {
                            "csv_args":{
                                "delimiter": ",",
                                "quotechar": '"',
                                },
                            }
        dir_csv_loader = DirectoryLoader(self.data_dir_path, glob="**/*.csv", use_multithreading=True,
                                    loader_cls=CSVLoader, 
                                    loader_kwargs=csv_loader_kwargs,
                                    )
        return dir_csv_loader
    
    def pdf_loader(self):
        dir_pdf_loader = DirectoryLoader(self.data_dir_path, glob="**/*.pdf",
                                    loader_cls=PyPDFLoader,
                                    )
        return dir_pdf_loader
    
    def word_loader(self):
        dir_word_loader = DirectoryLoader(self.data_dir_path, glob="**/*.docx",
                                    loader_cls=Docx2txtLoader,
                                    )
        return dir_word_loader
    
    def rst_loader(self):
        rst_loader_kwargs = {
                        "mode":"single"
                        }
        dir_rst_loader = DirectoryLoader(self.data_dir_path, glob="**/*.rst",
                                    loader_cls=UnstructuredRSTLoader, 
                                    loader_kwargs=rst_loader_kwargs
                                    )
        return dir_rst_loader
    

In [7]:
# load data
data_dir_path = os.getenv('DATA_DIR_PATH', "data")
data_loader = DataLoaders(data_dir_path=data_dir_path)
log.info("Loading files from directory %s", data_dir_path)
dir_csv_loader = data_loader.csv_loader()
dir_word_loader = data_loader.word_loader()
dir_pdf_loader = data_loader.pdf_loader()
dir_rst_loader = data_loader.rst_loader()
csv_data = dir_csv_loader.load()
word_data = dir_word_loader.load()
pdf_data = dir_pdf_loader.load()
rst_data = dir_rst_loader.load()

2024-09-16T16:25:18.606Z INFO    : __main__ - Loading files from directory delta_docs/selected


In [9]:
for doc in pdf_data:
    print(doc)
    break

page_content='Running Jobs
 
Accessing the Compute Nodes
Delta implements the Slurm batch environment to manage access to the compute nodes. Use the Slurm
commands to run batch jobs or for interactive access to compute nodes. See the Slurm quick start guide for
an introduction to Slurm. There are multiple ways to access compute nodes on Delta.
Batch scripts (sbatch) or Interactive (srun , salloc), which is right for me?

:ref:`sbatch` . Use batch scripts for jobs that are debugged, ready to run, and don't require interaction.
Sample Slurm batch job scripts are provided in the :ref:`examples` section. For mixed resource
heterogeneous jobs see the Slurm job support documentation. Slurm also supports job arrays for easy
management of a set of similar jobs, see the Slurm job array documentation for more information.

:ref:`srun` . For interactive use of a compute node, srun will run a single command through Slurm on a
compute node. srun blocks, it will wait until Slurm has scheduled comp

### 1.2 Format into text and metadata
- Convert data to a list of texts and metadata 
- Metadata can be used for filtering the data


In [10]:
# get text and metadata from the data
def get_text_metadatas(csv_data=None, pdf_data=None, word_data=None, rst_data=None):
    """
    Each document class has page_content and metadata properties
    Separate text and metadata content from Document class
    Have custom metadata if needed
    """
    csv_texts = [doc.page_content for doc in csv_data]
    # custom metadata
    csv_metadatas = [{'source': doc.metadata['source'], 'row_page': doc.metadata['row']} for doc in csv_data]   # metadata={'source': 'filename.csv', 'row': 0}
    pdf_texts = [doc.page_content for doc in pdf_data]
    pdf_metadatas = [{'source': doc.metadata['source'], 'row_page': doc.metadata['page']} for doc in pdf_data]  # metadata={'source': 'data/filename.pdf', 'page': 8}
    word_texts = [doc.page_content for doc in word_data]
    word_metadatas = [{'source': doc.metadata['source'], 'row_page': ''} for doc in word_data] 
    rst_texts = [doc.page_content for doc in rst_data]
    rst_metadatas = [{'source': doc.metadata['source'], 'row_page': ''} for doc in rst_data]         # metadata={'source': 'docs/images/architecture/index.rst'}

    texts = csv_texts + pdf_texts + word_texts + rst_texts
    metadatas = csv_metadatas + pdf_metadatas + word_metadatas + rst_metadatas
    return texts, metadatas


texts , metadatas = get_text_metadatas(csv_data, pdf_data, word_data, rst_data)

### 1.3 Chunking
- Split texts into chunks for embedding
- Return a list of document chunks (list of langchain [document class](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html))

![Chunk Optimization](images/chunking.png)

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from typing import List

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=1000,
        chunk_overlap=200,
        separators=[
            "\n\n", "\n", ". ", " ", ""
        ]  # try to split on paragraphs... fallback to sentences, then chars, ensure we always fit in context window
    )

docs: List[Document] = text_splitter.create_documents(texts=texts, metadatas=metadatas)


In [None]:
print(docs[0])
print("Number of documents: ", len(docs))


### 1.4 Embeddings
- We will be using OpenAI embeddings
- text-embedding-ada-002 model for embeddings, which has a maximum token limit of 8191 according to OpenAI documentation.
- HF Embedding models leaderboard [link](https://huggingface.co/spaces/mteb/leaderboard)

In [12]:
# embeddings 
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

## 2. Vector Store
- We will use [Qdrant](https://qdrant.tech/) vector store for this example
- For today we will use local memory as the vector store
- Qdrant has a docker image that can be used to create a vector store and hosted remotely
Eg: [Qdrant docker container running locally](http://localhost:6333/dashboard)

- Blog post on vector stores [link](https://medium.com/google-cloud/vector-databases-are-all-the-rage-872c888fa348)

In [13]:
# creating a qdrant vector store in local memory

from langchain_community.vectorstores import Qdrant

# qdrant collection name
collection_name = os.getenv('QDRANT_COLLECTION_NAME', "data-collection")

# create vector store in local memory
vectorstore = Qdrant.from_documents(
    documents=docs,
    embedding=embeddings,
    location=":memory:",  # Local mode with in-memory storage only
    collection_name=collection_name,
    )

2024-09-16T16:25:51.683Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-09-16T16:25:52.699Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-09-16T16:25:54.034Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-09-16T16:25:54.609Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


## 3. Retrieve relevant documents
Create a retriever from the vector store

In [14]:
# Retriever to retrieve relevant snippets
retriever = vectorstore.as_retriever()

## 4. Call LLM

### 4.1 Prompting
- Use a prompt template [link](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.prompt.PromptTemplate.html)
    - includes input parameters that can be dynamically changed
- Use Langchain hub to pull prompts [link](https://smith.langchain.com/hub)
    - easy to share and reuse prompts
    - can see what are the popular prompts for specific use cases
    - Eg: [rag-prompt](https://smith.langchain.com/hub/rlm/rag-prompt)
- Use a custom prompt
```
qa_prompt_template = """Use the following pieces of context to answer the question at the end. Please follow the following rules:
    1. If the question has some initial findings, use that as context.
    2. If you don't know the answer, don't try to make up an answer. Just say **I can't find the final answer but you may want to check the following sourcess** and add the source documents as a list.
    3. If you find the answer, write the answer in a concise way and add the list of sources that are **directly** used to derive the answer. Exclude the sources that are irrelevant to the final answer.

    {context}

    Question: {question}
    Helpful Answer:"""

rag_chain_prompt = PromptTemplate.from_template(qa_prompt_template) 
```


In [15]:
# prompting

from langchain import hub
prompt = hub.pull("rlm/rag-prompt")

## 4.2 Call LLM
- We will use 
    - OpenAI GPT-4o-mini and 
    - Ollama llama3 model (hosted on NCSA Radiant SD-GPU)
- Each model has its own formats and parameters

In [16]:
# formatting the documents as a string before calling the LLM
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [17]:
# call open ai GPT-4o-mini
from langchain_openai import ChatOpenAI

prompt = hub.pull("rlm/rag-prompt")

# create a chat openai model
llm: ChatOpenAI = ChatOpenAI(
            temperature=0,
            model="gpt-4o-mini",
            max_retries=500,
        )

In [16]:
# call GPT4o-mini
llm.invoke("What is the capital of the world?")

2024-09-12T15:56:28.077Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


AIMessage(content='There isn\'t a single "capital of the world" as each country has its own capital city. However, some cities are often referred to as global capitals due to their significant influence in international politics, finance, culture, and trade. Examples include New York City, which is home to the United Nations headquarters, and London, which is a major financial center. Ultimately, the concept of a "capital of the world" is subjective and can vary based on context.', response_metadata={'token_usage': {'completion_tokens': 92, 'prompt_tokens': 15, 'total_tokens': 107}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_483d39d857', 'finish_reason': 'stop', 'logprobs': None}, id='run-ef7164b4-e87d-4f02-acb9-c56d1d67522d-0', usage_metadata={'input_tokens': 15, 'output_tokens': 92, 'total_tokens': 107})

### 4.2 RAG Chain
Combining it all together

- RunnablePassthrough() is used to pass the user query as is to the chain
- format_docs is used to format the documents as a string
- prompt is used to call the prompt template
- llm is used to call the LLM
- StrOutputParser() is used to parse the output from the LLM

In [18]:
# rag chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

openai_rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [19]:
# call openai rag chain
openai_rag_chain.invoke("What were the goals of the symposium?")


2024-09-16T16:26:57.872Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-09-16T16:26:59.540Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


'The goals of the symposium included creating a forum for North Slope leaders and residents to engage with scientists and engineers on permafrost-related issues, increasing dialogue between these groups, and developing adaptive strategies for Arctic infrastructure. It aimed to reduce research fatigue by consolidating outreach efforts from multiple science teams and allowing visiting experts to learn from local knowledge. Overall, the symposium sought to enhance understanding of the interactions between permafrost and the built environment.'

In [3]:
# call ollama llama3:latest

from langchain_ollama import OllamaLLM

ollama_api_key = os.getenv('OLLAMA_API_KEY')
ollama_headers = {"Authorization": f"Bearer {ollama_api_key}"}

# create a ollama model
ollamallm: OllamaLLM = OllamaLLM(
    base_url="https://sd-gpu.ncsa.illinois.edu/ollama",
    model="llama3.2:latest",
    headers=ollama_headers,
    )

In [4]:
# call llama3 model
ollamallm.invoke("What is the capital of the world?")

'There is no single "capital of the world". Each country has its own capital city, and the concept of a global capital is not universally defined or recognized.\n\nHowever, some cities are often referred to as the "global hubs" or "international capitals" due to their significance in international relations, trade, finance, culture, and politics. Some examples include:\n\n1. New York City (USA) - often considered a global financial hub\n2. London (UK) - a major center for international business and finance\n3. Paris (France) - a cultural and artistic hub with significant diplomatic influence\n4. Beijing (China) - a rising economic power with growing global influence\n5. Geneva (Switzerland) - known as the "Capital of International Organizations" due to its hosting many UN agencies\n\nIt\'s worth noting that these designations are subjective and can vary depending on individual perspectives and criteria.\n\nIf you\'re looking for a more neutral answer, I\'d say there isn\'t a single cit

In [None]:
# ollama rag chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

ollama_rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | ollamallm
    | StrOutputParser()
)

In [None]:
# call ollama rag chain
ollama_rag_chain.invoke("Who is the president of USA?")

In [20]:
## adding sources to openai rag chain

from langchain_core.runnables import RunnableParallel

openai_rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

openai_rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=openai_rag_chain_from_docs)

In [21]:
# call openai rag chain with source
# this will return the answer and the sources (context)
openai_rag_chain_with_source.invoke("What were the goals of the symposium?")

2024-09-16T16:27:26.939Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-09-16T16:27:29.160Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


{'context': [Document(metadata={'source': 'delta_docs/selected/2023-PI-Symposium-final-report-web.pdf', 'row_page': 63, '_id': '1efab95bab5447e29e6a96377588c25d', '_collection_name': 'delta-collection'}, page_content='63\nThe symposium provided...\n• opportunity to connect with other subject matter \nexperts for on-site discussions and brainstorming \non climate resilience issues.\n• insights into the science and management of in-\nfrastructure-permafrost interactions, including \nmethods and solutions practiced in other Arctic \ncountries.\n• an opportunity for early career researchers to vis-\nit remote sites and learn from local experts before \napplying for research funding.\nThe symposium emphasized...\n• the need for adaptation strategies based on sci-\nentific knowledge and partnerships between com-\nmunity members, permafrost scientists, engineers, \nand planners.\n• the importance of incorporating new scientific \nand engineering knowledge in planning Arctic \nprojects.\n• the

In [None]:
openai_rag_chain_with_source.invoke("Why is tundra restoration and rehabilitation important")

In [None]:
openai_rag_chain_with_source.invoke("Who is Brenadette Adams?")