# Retrieval Augmented Generation
* another method of connecting a large language model with an external datasource
    * function calling is connecting to other peoples data with APIs
    * RAG connects to your own database
* usually focused on private or unorganized data
    * Over 95% of the world’s data is private (personal, company, etc.)
* supplies the model with the additional relevant information from the datasource within the context of the input prompt

## 3 Steps of RAG
1. Indexing - formatting the database so data can be retrieved based on the user's natural language response
    * usually done once initially with any updates necessary
    * the retrieval function is to be applied to the indexed database
    * we index a document by converting it into a numerical representation which we call embeddings (similar to tokenization)
        * **Embeddings** - a vector of floats that stores semantic similarity in a multidimensional space
        * There is a context limit for embeddings - longer documents get split into chunks
        * The user prompt normally gets converted into an embedding to search the vector database (the storage of all the document chunk embeddings)
2. Retrieval - finding the relevant data in the indexed database using a function to then pass to the model for generation
    * the 'retrieval' of the supplemental data
    * **Cosine Similarity** - the distance between two vectors in a multidimensional space (represents relatedness)
    * **K-Nearest Neighbor Search (kNN)** - one of the most common retrieval functions
        * finds the lowest cosine similarity values between the prompt embedding and the dataset chunk embeddings
        * K - the number of nearest neighbors that will be retrieved
            * Higher k means the increased probability that the info needed is provided for generation
3. Generation - taking the retrieved data and the initial user prompt and prompting the model to get a factual response
    * This is where we actually prompt the model

## Embedding Example
Lets see what an embedding looks like. 


Lets embed a possible user prompt that this chatbot would recieve.

In [6]:
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
import os

load_dotenv("C:/Users/Patrick/Desktop/PROJECTS/Python Lesson Plan/AI_CLASS_7/api.env")
api_key = os.getenv("OPENAI_API_KEY")

prompt = "What is the policy on PTO?"
embeddings = OpenAIEmbeddings(api_key=api_key, model="text-embedding-3-large") # , dimensions=1024
query_result = embeddings.embed_query(prompt)
print(f"{query_result}\nLength of embedding: {len(query_result)}")

[-0.06600009649991989, -0.006616381462663412, -0.018194159492850304, 4.1736100683920085e-05, -7.546420965809375e-05, 0.03362645208835602, -0.018265342339873314, 0.02776104211807251, -0.01790943183004856, 0.03584733605384827, 0.044616978615522385, -0.01354596484452486, 0.0026159442495554686, -0.000760759343393147, 0.015859384089708328, 0.0413425974547863, -0.04493017867207527, 0.0401182658970356, 0.0139089934527874, -0.008790996856987476, 0.021397357806563377, 0.006598586216568947, -0.020101841539144516, 0.019817113876342773, 0.034623000770807266, 0.029141975566744804, 0.02722005732357502, 0.001170946517959237, -0.0012688220012933016, 0.0359327532351017, -0.018678197637200356, 0.032259754836559296, 0.037925854325294495, -0.011232544668018818, -0.030038870871067047, 0.0005957057001069188, 0.0568603090941906, -0.035163987427949905, -0.025654049590229988, 0.003843836486339569, -0.019162237644195557, -0.041570380330085754, 0.01708371751010418, 0.0036124945618212223, 0.023119965568184853, -0

# Full RAG Example

## Installs
* pip install each library that is necessary

In [9]:
%pip install bs4 langchain_community tiktoken langchain-openai langchainhub chromadb langchain

Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting beautifulsoup4 (from bs4)
  Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4->bs4)
  Using cached soupsieve-2.5-py3-none-any.whl.metadata (4.7 kB)
Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Using cached beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
Using cached soupsieve-2.5-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.12.3 bs4-0.0.2 soupsieve-2.5
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1 -> 24.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Imports
* import everything necessary for the program
    * bs4 - beautifulsoup - a helpful webscraper
    * langchain.hub - used to pull a helpful RAG prompt
    * langchain.text_splitter - chunks the documents before embedding
    * langchain_community.document_loaders - loads the website as a document
    * langchain_community.vectorstores - connects the open source vector storage we are using (ChromadB)
    * StrOutputParser - basic string output parser (more info in langchain notebook)
    * RunnablePassthrough - a placeholder for the user prompt in the RAG chain
    * ChatOpenAI - the OpenAI chat completions API client through langchain
    * OpenAIEmbeddings - the OpenAI embeddings API client through langchain

In [10]:
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

USER_AGENT environment variable not set, consider setting it to identify your requests.


## Indexing
* We use beautifulsoup and the webbaseloader to get the content from a website and convert it into a document that can be embedded

In [18]:
# Load Documents
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# Lets check out the chunks
print(f'Length of chunked document: {len(splits)}\n')
print(f'First chunk of document: \n\n{splits[0]}\n')
print(f'Last chunk of document: \n\n{splits[-1]}\n')

Length of chunked document: 66

First chunk of document: 

page_content='LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:

Planning

Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improv

## Retrieval
* ChromadB has a built-in retriever function - https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.chroma.Chroma.html#langchain_community.vectorstores.chroma.Chroma.as_retriever
* Lets build the chain and connect the retriever to a prompt template that incorporates the initial user prompt and the retrieved context

In [20]:
# Create a vectorstore from the embedded chunks
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
# Create the retriever from the vectorstore
retriever = vectorstore.as_retriever()
print(retriever)

tags=['Chroma', 'OpenAIEmbeddings'] vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000001E5F3B07290>


In [23]:
# Prompt Template
prompt = hub.pull("rlm/rag-prompt")
print(prompt.messages[0].prompt.template)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:


## Generation
* Create the chain to connect all the parts of the RAG system

In [24]:
# Chat Completions API Client
llm = ChatOpenAI(api_key=api_key, model_name="gpt-3.5-turbo", temperature=0)

# Post-processing where we combine the retrieved chunks into a single document
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Create the chain
rag_chain = (
    # The retriever will retrieve the relevant documents based on the prompt passed through the .invoke() function
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    # The retrieved documents and user prompt will be passed to the prompt template
    | prompt
    # The prompt template will be passed to the language model
    | llm
    # The output from the language model will be parsed into a string
    | StrOutputParser()
)

## Test the RAG chain

In [28]:
# Question
rag_chain.invoke("What is Task Decomposition?")

"A reflection mechanism is a framework that equips agents with dynamic memory and self-reflection capabilities to improve reasoning skills. It involves showing examples to the agent for guiding future changes in the plan and adding reflections into the agent's working memory. The agent computes a heuristic after each action and may decide to reset the environment based on the self-reflection results."