# RAG - Retrieval-Augmented Generation

# Recap

![With and Without RAG](notebook_images/rag-with-without.png "With and Without RAG") 

![RAG](notebook_images/rag-before-after.png "RAG")

### RAG

#### Retrieval
- Setup a knowledge base 
- Retrieve documents relevant to the user query

#### Generation
- Using LLMs
- Use the retrieved documents as context


### RAG - Retrieval Steps

1. Prepare data 
2. Create a vector store and insert data
3. Search the vector store and retrieve relevant documents

## RAG - *Retrieval*-Augmented Generation

### Knowledge DB

* Vector database (Beginners [blog 1](https://medium.com/data-and-beyond/vector-databases-a-beginners-guide-b050cbbe9ca0), Pinecone [blog 2](https://www.pinecone.io/learn/vector-database/))


![knowledge DB](notebook_images/knowledge-db.png "Knowledge DB")


## RAG - *Retrieval*-Augmented Generation

### Vector DB Retrieval

![Vector DB Retrieval](notebook_images/vectordb-retrieval.png "Vector DB Retrieval")

### RAG - Retrieval Steps

1. Prepare data 
2. Create a vector store and insert data
3. Search the vector store and retrieve relevant documents

In [1]:
# basic imports
import os
import json
import logging
import sys
import pandas as pd

from dotenv import load_dotenv
load_dotenv(override=True)

# create and configure logger
logging.basicConfig(level=logging.INFO, datefmt='%Y-%m-%dT%H:%M:%S',
                    format='%(asctime)-15s.%(msecs)03dZ %(levelname)-7s : %(name)s - %(message)s',
                    handlers=[logging.StreamHandler(sys.stdout)]
                    )
# create log object with current module name
log = logging.getLogger(__name__)

### 1. Prepare data
- Load data from different sources
- Will be using proceedings from [Arctic data symposium 2023](https://arcticdata.io/catalog/portals/pisymposium2023).
- Eg: [Proceedings final report](https://permafrostcoasts.org/wp-content/uploads/2024/08/2023-PI-Symposium-final-report-web.pdf), [participant bios](https://arcticdata.io/metacat/d1/mn/v2/object/urn:uuid:6e613b84-842c-4ac9-993d-a863d7040aa5) 
- Data in data/docs directory. 


### 1.1 Data Loaders
- Langchain provides different data loaders for different file types
- Data loaded in Langchain Document class format [document class](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html)

![Langchain document class](notebook_images/langchain-document-class.png "Langchain document class")


In [2]:
# data loaders
from langchain_community.document_loaders import CSVLoader, DataFrameLoader, PyPDFLoader, Docx2txtLoader, UnstructuredRSTLoader, DirectoryLoader


class DataLoaders:
    """
    various data loaders
    """
    def __init__(self, data_dir_path):
        self.data_dir_path = data_dir_path
    
    def csv_loader(self):
        csv_loader_kwargs = {
                            "csv_args":{
                                "delimiter": ",",
                                "quotechar": '"',
                                },
                            }
        dir_csv_loader = DirectoryLoader(self.data_dir_path, glob="**/*.csv", use_multithreading=True,
                                    loader_cls=CSVLoader, 
                                    loader_kwargs=csv_loader_kwargs,
                                    )
        return dir_csv_loader
    
    def pdf_loader(self):
        dir_pdf_loader = DirectoryLoader(self.data_dir_path, glob="**/*.pdf",
                                    loader_cls=PyPDFLoader,
                                    )
        return dir_pdf_loader
    
    def word_loader(self):
        dir_word_loader = DirectoryLoader(self.data_dir_path, glob="**/*.docx",
                                    loader_cls=Docx2txtLoader,
                                    )
        return dir_word_loader
    
    def rst_loader(self):
        rst_loader_kwargs = {
                        "mode":"single"
                        }
        dir_rst_loader = DirectoryLoader(self.data_dir_path, glob="**/*.rst",
                                    loader_cls=UnstructuredRSTLoader, 
                                    loader_kwargs=rst_loader_kwargs
                                    )
        return dir_rst_loader
    

In [20]:
# load data
data_dir_path =  "data"
data_loader = DataLoaders(data_dir_path=data_dir_path)
log.info("Loading files from directory %s", data_dir_path)
dir_csv_loader = data_loader.csv_loader()
dir_word_loader = data_loader.word_loader()
dir_pdf_loader = data_loader.pdf_loader()
dir_rst_loader = data_loader.rst_loader()
csv_data = dir_csv_loader.load()
word_data = dir_word_loader.load()
pdf_data = dir_pdf_loader.load()
rst_data = dir_rst_loader.load()

2024-11-06T12:11:10.033Z INFO    : __main__ - Loading files from directory data


In [21]:
for doc in pdf_data:
    print(doc)
    break

page_content='www.afm-journal.de© 2020 Wiley-VCH GmbH 2006683 (1 of 9)Full PaPer
Direct Ink Writing of Polymer Composite Electrolytes 
with Enhanced Thermal Conductivities
Meng Cheng, Ajaykrishna Ramasubramanian, Md Golam Rasul, Yizhou Jiang, Yifei Yuan, 
Tara Foroozan, Ramasubramonian Deivanayagam, Mahmoud Tamadoni Saray, 
Ramin Rojaee, Boao Song, Vitaliy Robert Yurkiv, Yayue Pan, Farzad Mashayek, 
and Reza Shahbazian-Yassar*
Proper distribution of thermally conductive nanomaterials in polymer 
batteries offers new opportunities to mitigate performance degradations 
associated with local hot spots and safety concerns in batteries. Herein, a 
direct ink writing (DIW) method is utilized to fabricate polyethylene oxide (PEO) composite polymers electrolytes (CPE) embedded with silane-treated 
hexagonal boron nitride (S-hBN) platelets and free of any volatile organic 
solvents. It is observed that the S-hBN platelets are well aligned in the printed CPE during the DIW process. The in-plane 

In [25]:
print("Number of PDF documents: ", len(pdf_data))

Number of PDF documents:  125


### 1.2 Format into text and metadata
- Convert data to a list of texts and metadata 
- Metadata can be used for filtering the data


In [22]:
# get text and metadata from the data
def get_text_metadatas(csv_data=None, pdf_data=None, word_data=None, rst_data=None):
    """
    Each document class has page_content and metadata properties
    Separate text and metadata content from Document class
    Have custom metadata if needed
    """
    csv_texts = [doc.page_content for doc in csv_data]
    # custom metadata
    csv_metadatas = [{'source': doc.metadata['source'], 'row_page': doc.metadata['row']} for doc in csv_data]   # metadata={'source': 'filename.csv', 'row': 0}
    pdf_texts = [doc.page_content for doc in pdf_data]
    pdf_metadatas = [{'source': doc.metadata['source'], 'row_page': doc.metadata['page']} for doc in pdf_data]  # metadata={'source': 'data/filename.pdf', 'page': 8}
    word_texts = [doc.page_content for doc in word_data]
    word_metadatas = [{'source': doc.metadata['source'], 'row_page': ''} for doc in word_data] 
    rst_texts = [doc.page_content for doc in rst_data]
    rst_metadatas = [{'source': doc.metadata['source'], 'row_page': ''} for doc in rst_data]         # metadata={'source': 'docs/images/architecture/index.rst'}

    texts = csv_texts + pdf_texts + word_texts + rst_texts
    metadatas = csv_metadatas + pdf_metadatas + word_metadatas + rst_metadatas
    return texts, metadatas


texts , metadatas = get_text_metadatas(csv_data, pdf_data, word_data, rst_data)

### 1.3 Chunking

![Chunk Optimization](notebook_images/rag-chunking.png "Chunk Optimization")

### 1.3 Chunking
- Split texts into chunks for embedding
- Return a list of document chunks (list of langchain [document class](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html))

In [23]:

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from typing import List

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=1000,
        chunk_overlap=200,
        separators=[
            "\n\n", "\n", ". ", " ", ""
        ]  # try to split on paragraphs... fallback to sentences, then chars, ensure we always fit in context window
    )

docs: List[Document] = text_splitter.create_documents(texts=texts, metadatas=metadatas)


In [24]:
print(docs[0])
print("Number of documents: ", len(docs))


page_content='www.afm-journal.de© 2020 Wiley-VCH GmbH 2006683 (1 of 9)Full PaPer
Direct Ink Writing of Polymer Composite Electrolytes 
with Enhanced Thermal Conductivities
Meng Cheng, Ajaykrishna Ramasubramanian, Md Golam Rasul, Yizhou Jiang, Yifei Yuan, 
Tara Foroozan, Ramasubramonian Deivanayagam, Mahmoud Tamadoni Saray, 
Ramin Rojaee, Boao Song, Vitaliy Robert Yurkiv, Yayue Pan, Farzad Mashayek, 
and Reza Shahbazian-Yassar*
Proper distribution of thermally conductive nanomaterials in polymer 
batteries offers new opportunities to mitigate performance degradations 
associated with local hot spots and safety concerns in batteries. Herein, a 
direct ink writing (DIW) method is utilized to fabricate polyethylene oxide (PEO) composite polymers electrolytes (CPE) embedded with silane-treated 
hexagonal boron nitride (S-hBN) platelets and free of any volatile organic 
solvents. It is observed that the S-hBN platelets are well aligned in the printed CPE during the DIW process. The in-plane 

### 1.4 Embeddings

- Mathematical representations of data points in a high-dimensional space. 
- In the context of natural language processing:
    1. Word Embeddings: Individual words are represented as real-valued vectors in a multi-dimensional space.
    2. Semantic Capture: These embeddings capture the semantic meaning and relationships of the text.
    3. Similarity Principle: Words with similar meanings tend to have similar vector representations.

- We will be using OpenAI embeddings
- text-embedding-ada-002 model for embeddings, which has a maximum token limit of 8191 according to OpenAI documentation.
- HF Embedding models leaderboard [link](https://huggingface.co/spaces/mteb/leaderboard)

In [8]:
# embeddings 
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

### RAG - Retrieval Steps

~~1. Prepare data~~

2. Create a vector store and insert data

3. Search the vector store and retrieve relevant documents

### 2. Vector Store

![Inserting into DB](notebook_images/inserting-db.png "Inserting into DB")

Source Credits : [Blog.demir](https://blog.demir.io/hands-on-with-rag-step-by-step-guide-to-integrating-retrieval-augmented-generation-in-llms-ac3cb075ab6f)


### 2. Vector Store

- We will use [Qdrant](https://qdrant.tech/) vector store for this example
- For today we will use local memory as the vector store
- Qdrant has a docker image that can be used to create a vector store and hosted remotely
Eg: [Qdrant docker container running locally](http://localhost:6333/dashboard)

- Blog post on vector stores [link](https://medium.com/google-cloud/vector-databases-are-all-the-rage-872c888fa348)

In [26]:
# creating a qdrant vector store in local memory

from langchain_community.vectorstores import Qdrant

# qdrant collection name
collection_name = os.getenv('QDRANT_COLLECTION_NAME', "data-collection")

# create vector store in local memory
vectorstore = Qdrant.from_documents(
    documents=docs,
    embedding=embeddings,
    location=":memory:",  # Local mode with in-memory storage only
    collection_name=collection_name,
    )

2024-11-06T12:12:37.962Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-11-06T12:12:39.865Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-11-06T12:12:41.871Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-11-06T12:12:43.316Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-11-06T12:12:44.658Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-11-06T12:12:45.392Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


### RAG - Retrieval Steps

~~1. Prepare data~~

~~2. Create a vector store and insert data~~

3. Search the vector store and retrieve relevant documents

## 3. Retrieve relevant documents
Create a retriever from the vector store

In [27]:
# Retriever to retrieve relevant snippets
retriever = vectorstore.as_retriever()

### RAG - Retrieval Steps

~~1. Prepare data~~

~~2. Create a vector store and insert data~~

~~3. Search the vector store and retrieve relevant documents~~

## RAG - Retrieval-Augmented *Generation*


![RAG LLM](notebook_images/rag-llm.png "RAG LLM")

![LLM prompting](notebook_images/rag-prompting.png "LLM Prompting")

## 4. Call LLM

### 4.1 Prompting
- Use a prompt template [link](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.prompt.PromptTemplate.html)
    - includes input parameters that can be dynamically changed
- Use Langchain hub to pull prompts [link](https://smith.langchain.com/hub)
    - easy to share and reuse prompts
    - can see what are the popular prompts for specific use cases
    - Eg: [rag-prompt](https://smith.langchain.com/hub/rlm/rag-prompt)
- Use a custom prompt
```
qa_prompt_template = """Use the following pieces of context to answer the question at the end. Please follow the following rules:
    1. If the question has some initial findings, use that as context.
    2. If you don't know the answer, don't try to make up an answer. Just say **I can't find the final answer but you may want to check the following sourcess** and add the source documents as a list.
    3. If you find the answer, write the answer in a concise way and add the list of sources that are **directly** used to derive the answer. Exclude the sources that are irrelevant to the final answer.

    {context}

    Question: {question}
    Helpful Answer:"""

rag_chain_prompt = PromptTemplate.from_template(qa_prompt_template) 
```


#### 4.1 Prompting

rlm/rag-prompt from Langchain

![RLM RAG prompt](notebook_images/rlm-rag-prompt.png "rlm/rag-prompt")

In [14]:
# prompting

from langchain import hub
prompt = hub.pull("rlm/rag-prompt")

## 4.2 Call LLM
- We will use 
    - OpenAI GPT-4o-mini and 
    - Ollama llama3.2 model (hosted by NCSA)
- Each model has its own formats and parameters

In [28]:
# formatting the documents as a string before calling the LLM
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [16]:
# call open ai GPT-4o-mini
from langchain_openai import ChatOpenAI

# create a chat openai model
llm: ChatOpenAI = ChatOpenAI(
            temperature=0,
            model="gpt-4o-mini",
            max_retries=500,
        )

In [17]:
# call GPT4o-mini
llm.invoke("What is the capital of the world?")

2024-11-06T10:06:53.653Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


AIMessage(content='There is no official "capital of the world," as each country has its own capital city. However, some cities are often referred to as global capitals due to their significant influence in international politics, finance, culture, and diplomacy. Examples include New York City (home to the United Nations), London, and Washington, D.C. Each of these cities plays a crucial role on the world stage, but there is no single city that serves as the capital of the entire world.', response_metadata={'token_usage': {'completion_tokens': 95, 'prompt_tokens': 15, 'total_tokens': 110, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0ba0d124f1', 'finish_reason': 'stop', 'logprobs': None}, id='run-78e59ba3-36f9-4689-a8b7-791266b4d593-0', usage_metadata={'input_toke

## 5 RAG 

![RAG system](notebook_images/rag-system.png "RAG system")


### 5 RAG Chain
Combining it all together

- Context is the retrieved docs from the retriever/vector db
- RunnablePassthrough() is used to pass the user query as is to the chain
- format_docs is used to format the documents as a string
- prompt is used to call the prompt template
- llm is used to call the LLM
- StrOutputParser() is used to parse the output from the LLM

In [29]:
# rag chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

openai_rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [30]:
# call openai rag chain
openai_rag_chain.invoke("What material is used?")


2024-11-06T12:13:52.648Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-11-06T12:13:54.651Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


'The materials used are poly(vinylidene fluoride) (PVDF) and poly(vinylidene fluoride-co-hexaﬂuoropropylene) (PVDF-HFP). These polymers are prepared through direct-ink-writing techniques for applications in sensing and energy storage. They exhibit various morphologies and properties suitable for advanced electronic devices.'

In [31]:
openai_rag_chain.invoke("What is the yield stress value or unit?")


2024-11-06T12:14:27.490Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-11-06T12:14:28.708Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


'The yield stress values for the materials mentioned are 1000 Pa for NIPAM/Lap and 300 Pa for NIPAM/Lap/NaPyrPh. Yield stress is typically measured in pascals (Pa). It represents the minimum stress required to induce flow in a material.'

In [32]:
openai_rag_chain.invoke("Is there any epoxy, epoxy-based resin?")

2024-11-06T12:15:04.344Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-11-06T12:15:05.890Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


'Yes, there are epoxy-based resins, including those used in 3D printing technologies. These resins can be formulated with various additives, such as acrylates and nanoparticles, to enhance their properties for specific applications. The context mentions the use of epoxy oligomers and the development of inks for direct-ink write (DIW) printing methods.'

In [33]:
openai_rag_chain.invoke("What was the print temperature, speed, nozzle diameter?")

2024-11-06T12:15:08.719Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-11-06T12:15:11.185Z INFO    : httpx - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


'The print temperature was initially set to 45°C and later increased to 80°C for post-curing. The printing speed was a maximum of 180 mm/s, and the nozzle diameter used was 1.63 mm. Additionally, a 22 GA nozzle with an inner diameter of 0.41 mm was also mentioned for different printed architectures.'

In [None]:
# call openai rag chain
# This should ideally give "I dont know" - different from the llm.invoke() method where we do not give a custom prompt
openai_rag_chain.invoke("What is the capital of the world?")

In [None]:
# call ollama llama3:latest

from langchain_community.llms import Ollama

ollama_api_key = os.getenv('OLLAMA_API_KEY')
ollama_jwt_token = os.getenv('OLLAMA_JWT_TOKEN')
ollama_headers = {"Authorization": f"Bearer {ollama_api_key}"}

# create a ollama model
ollamallm: Ollama = Ollama(
    base_url="https://ollama.software.ncsa.illinois.edu/ollama",
    model="llama3.2:latest",
    headers=ollama_headers
    )

In [None]:
# call llama3 model
ollamallm.invoke("What is the capital of the world?")

In [None]:
# ollama rag chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

ollama_rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | ollamallm
    | StrOutputParser()
)

In [None]:
# call ollama rag chain
ollama_rag_chain.invoke("Who is the president of USA?")

In [None]:
## adding sources to openai rag chain

from langchain_core.runnables import RunnableParallel

openai_rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

openai_rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=openai_rag_chain_from_docs)

In [None]:
# call openai rag chain with source
# this will return the answer and the sources (context)
openai_rag_chain_with_source.invoke("What were the goals of the symposium?")

In [None]:
openai_rag_chain_with_source.invoke("Why is tundra restoration and rehabilitation important")

In [None]:
openai_rag_chain_with_source.invoke("Who is Brenadette Adams?")

## RAG Steps

1. Prepare data 
2. Create a vector store and insert into db
3. Search the vector store and retrieve relevant documents
4. Call LLM with the user query and the retrieved documents
4. Return the LLM response to the user