<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_06_1_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative Artificial Intelligence
**Module 6: Retrieval-Augmented Generation (RAG)**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 6 Material

* **Part 6.1: Introduction to Retrieval-Augmented Generation (RAG)** [[Video]](https://www.youtube.com/watch?v=qA52K0K181Q) [[Notebook]](t81_559_class_06_1_rag.ipydb)
* Part 6.2: Introduction to ChromaDB [[Video]](https://www.youtube.com/watch?v=R53lo4sevLQ) [[Notebook]](t81_559_class_06_2_chromadb.ipynb)
* Part 6.3: Understanding Embeddings [[Video]](https://www.youtube.com/watch?v=Tq82Gl2ZZNM) [[Notebook]](t81_559_class_06_3_embeddings.ipynb)
* Part 6.4: Question Answering Over Documents [[Video]](https://www.youtube.com/watch?v=hCwL_lW-gP0) [[Notebook]](t81_559_class_06_4_qa.ipynb)
* Part 6.5: Embedding Databases [[Video]](https://www.youtube.com/watch?v=BG2gT4uYxhM) [[Notebook]](t81_559_class_06_5_embed_db.ipynb)


# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [3]:
import os

try:
    from google.colab import drive, userdata
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai langchain-community pypdf chromadb

Note: using Google CoLab
Collecting langchain-community
  Downloading langchain_community-0.2.5-py3-none-any.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m21.3 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.21.3-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.2/49.2 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
Installing collect

# 6.1: Introduction to Retrieval-Augmented Generation (RAG)

Large Language Models (LLMs), like those integrated into LangChain, are powerful tools for processing and understanding large amounts of text. These models can be leveraged to answer questions based on the content of a document, making them highly valuable for tasks that require information extraction and comprehension.

One of the techniques used in conjunction with LLMs for document-based question answering is the Retrieval-Augmented Generation (RAG). RAG is a hybrid approach that combines the strengths of information retrieval systems with the generative capabilities of language models. Here's a brief overview of how RAG works:

1. **Retrieval Phase:** When a question is posed, the RAG system first retrieves relevant documents or document segments from a large corpus. This retrieval is based on the similarity of the content in the documents to the question. The idea is to find textual evidence that could contain the answer.
2. **Augmentation Phase:** The retrieved documents are then fed into a language model as contextual information. This step is crucial as it provides the language model with specific data relevant to the question, which might not be present in the model’s pre-trained knowledge base.
3. **Generation Phase:** With the context provided by the retrieved documents, the language model generates a response. The model synthesizes the information from the documents, using its understanding of language and context to formulate a coherent and accurate answer.

By combining the retrieval of relevant information with the generative power of LLMs, RAG effectively enhances the model's ability to provide precise answers based on the content of a document. This approach is particularly useful in scenarios where the direct answer to a question may not be readily available in the model's training data, requiring the system to fetch external evidence to support response generation. This makes LangChain's integration of LLMs with RAG a robust solution for document-based question answering, enabling deep understanding and nuanced responses across various domains and types of inquiries.

We begin by opening a connection to an OpenAI LLM model.

In [4]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import PyPDFLoader, TextLoader
from langchain import OpenAI, PromptTemplate
from langchain_openai import ChatOpenAI
from IPython.display import display_markdown

MODEL = 'gpt-4o-mini'

llm = ChatOpenAI(
        model=MODEL,
        temperature=0.2,
        n=1
    )


We will now load multiple PDFs that we can query with questions. We can ask questions about these documents with information that is not part of the foundation model. We create loaders for each of the PDFs so that we can load them into a vector store for easy querying.

The following code snippet demonstrates how to use a specific 'load_summarize_chain' function to set up a summarization process using a Large Language Model (LLM) with a "map_reduce" chain type. It starts by loading a PDF from the given URL ("https://arxiv.org/pdf/1706.03762") using the 'PyPDFLoader'. The loaded document is then split into manageable parts ('load_and_split'). These parts are fed into the summarization chain ('chain.run(docs)'), which processes and condenses the content. Finally, the summarized content is displayed in markdown format directly within the output environment, ensuring that the formatting of the summary remains intact.

In [5]:
urls = [
  "https://arxiv.org/pdf/1706.03762",
  "https://arxiv.org/pdf/1810.04805",
  "https://arxiv.org/pdf/2005.14165",
  "https://arxiv.org/pdf/1910.10683"
]

loaders = []

chain = load_summarize_chain(llm, chain_type="map_reduce")

for url in urls:
  print(f"Reading: {url}")
  loader = PyPDFLoader(url)
  loaders.append(loader)



Reading: https://arxiv.org/pdf/1706.03762
Reading: https://arxiv.org/pdf/1810.04805
Reading: https://arxiv.org/pdf/2005.14165
Reading: https://arxiv.org/pdf/1910.10683


Next we load embeddings from the four documents into [ChromaDB](https://www.trychroma.com/). These embeddings will allow the prompt to be augmented with information from the PDF papers that we loaded.

In [20]:
from langchain.indexes import VectorstoreIndexCreator
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.inmemory import InMemoryVectorStore

embeddings_model = OpenAIEmbeddings()
index = VectorstoreIndexCreator(embedding=embeddings_model,vectorstore_cls=InMemoryVectorStore).from_loaders(loaders)

Now that the embeddings are loaded, we can query the model for information that is only contained in the documents.

In [18]:
query = "Which figure demonstrates Scaled Dot-Product Attention?"

index.query(query,llm=llm)

'The left figure in Figure 2 demonstrates Scaled Dot-Product Attention.'

## Limitations of RAG

Language Model Retrieval-Augmented Generation (LLM RAG) combines the capabilities of large language models with external data retrieval mechanisms. This approach enhances language models' performance by providing access to specific, often proprietary, data that the pre-trained model's general knowledge might lack. However, the effectiveness of LLM RAG diminishes significantly when the augmented data is already common knowledge and inherently included in the foundation model.

LLM RAG excels in scenarios where users require proprietary or highly specialized information. In fields such as finance, law, or technical industries, specific data sets, reports, or documents are essential for accurate responses. RAG's unique ability to pull in this external data ensures highly precise and contextually relevant answers. While the foundation model is extensively trained on a broad array of publicly available information, it may lack the depth or latest updates required for these niche domains. Therefore, RAG's integration of proprietary data sets leads to more accurate and contextually enriched outputs.

It's important to consider that when the data intended for augmentation is already common knowledge, the benefits of LLM RAG reduce considerably. The foundation model undergoes pre-training on vast amounts of data, encompassing a wide range of general knowledge topics. Consequently, when queries involve information that falls within this general scope, the foundation model generates accurate and informative responses without the need for external augmentation. In such cases, using RAG does not add significant value and can introduce unnecessary complexity and processing overhead.

Moreover, retrieving data that the model already understands well leads to inefficiencies. The foundational model's extensive pre-training includes diverse texts, meaning that its internal knowledge base is typically sufficient for common knowledge queries. Therefore, relying on RAG in these instances does not leverage the model's strengths effectively and detracts from the system's overall efficiency.

In conclusion, while LLM RAG highly benefits the augmentation of language models with proprietary or highly specialized data, its efficacy wanes when dealing with common knowledge. The foundation model's comprehensive training already covers a vast expanse of general information, rendering RAG augmentation superfluous in such contexts. Therefore, strategically employ RAG, which offers the most significant enhancement in areas requiring access to proprietary data that the foundational model might only partially encompass.

# Module 6 Assignment

You can find the first assignment here: [assignment 6](https://github.com/jeffheaton/app_generative_ai/blob/main/assignments/assignment_yourname_class6.ipynb)