# Retrieval Augmented Generation (RAG) Using our Vector DB

In section I, we built a Vector DB to allow for retrieval of similar documents.  This direct followup will show how to use the Vector DB to enhance our prompts with additional context before we put it into a Large Language Model.  

The notebook follows as:

1. RAG Conceptually
   - Question-Answering using Large Language Models
   - Retrieval of Relevant Documents for a Query
   - Question-Answering using RAG for Document Context
2. Using built-in LangChain RAG prompts and Vectors

## 1. RAG Conceptually

Large Language Models have proven to be very good at general question and answering tasks.  However, a main limitation of many LLMs is that they are generally constrained to the data that they are initially trained on.  Without access to an external data source, LLMs cannot bring in new information, whether this is proprietary domain specific knowledge or just an update on an existing knowledge base.  Given that, how can we enable LLMs to be updated with new information while leveraging the powerful language properties?

One solution to this Retrieval Augumented Generation (RAG).  In RAG, we leverage the fact that LLMs can be prompted with additional context data to add additional relevant context to a given query before we pass it into the model.  The old pipeline would be:

```
Query ------> LLM
```

which with RAG will be updated to

```
Query ------> Retrieve Relevant Documents ------> Augmented Query ------> LLM
```

We will retrieve relevant documents using the knowledge base we built with the Vector DB.

### Question-Answering using Large Language Models

We start by looking at a question answering system that simply asks the LLM a question.  In this case, if the model doesn't already know the answer, then there's not much way to inject that knowledge into the model.

In [16]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

llm = OpenAI(openai_api_key="")
prompt_template = PromptTemplate.from_template(
    "{query}"
)
chain = LLMChain(prompt=prompt_template, llm=llm)

Let's ask it a very general question because the ChatGPT has been trained on a huge amount of data and providing any specifics in the question will likely result in a correct answer.  In this situation, the model can't possibly ground itself because it doesn't know the context - yet it will still answer with something that it has.

In [17]:
chain.run(query = "What is the wine keeper's shop?")

"\n\nThe Wine Keeper's Shop is an online store specializing in wine storage solutions. They offer a variety of products including wine racks, wine coolers, cellars, and other accessories. They also provide expert advice and assistance in selecting the best wine storage solution for your needs."

It doesn't know the context, so let's provide it the context.  Which context should we provide?  The context will be retrieved from our vector databse.

We will retrieve the relevant documents to this question, inject it into the prompt, and send that to the model instead.

### Retrieval of Relevant Documents for a Query

We'll briefly revisit our code to retrieve documents from our previous example.  This Vector DB has already been populated with a set of documents.

In [18]:
from typing import List, Dict
from langchain.vectorstores.pgvector import PGVector

from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

In [19]:
# The connection to the database
CONNECTION_STRING = PGVector.connection_string_from_db_params(
    driver= "psycopg2",
    host = "localhost",
    port = "5432",
    database = "postgres",
    user= "username",
    password="password"
)

# The embedding function that will be used to store into the database
embedding_function = SentenceTransformerEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

# Creates the database connection to our existing DB
db = PGVector(
    connection_string = CONNECTION_STRING,
    collection_name = "embeddings",
    embedding_function = embedding_function
)

In [23]:
# query it, note that the score here is a distance metric (lower is more related)
query = "What is the wine keeper's shop?"
docs_with_scores = db.similarity_search_with_score(query, k = 1)

# print results
for doc, score in docs_with_scores:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)

--------------------------------------------------------------------------------
Score:  0.44882341914316437
The wine-shop keeper accordingly rolled his eyes about, until they
rested upon an elderly gentleman and a young lady, who were seated in
a corner. Other company were there: two playing cards, two playing
dominoes, three standing by the counter lengthening out a short supply
of wine. As he passed behind the counter, he took notice that the
elderly gentleman said in a look to the young lady, “This is our man.”

“What the devil do _you_ do in that galley there?” said Monsieur Defarge
to himself; “I don’t know you.”

But, he feigned not to notice the two strangers, and fell into discourse
with the triumvirate of customers who were drinking at the counter.

“How goes it, Jacques?” said one of these three to Monsieur Defarge. “Is
all the spilt wine swallowed?”

“Every drop, Jacques,” answered Monsieur Defarge.
---------------------------------------------------------------------------

When we query, we get the most relevant document for this query.  Let's create a new prompt that can take this new context. 

### Question-Answering using RAG for Document Context

In [27]:
rag_prompt_context = PromptTemplate.from_template("""
Answer the question using only this context:

Context: {context}

Question: {query}
""")
rag_chain = LLMChain(prompt=rag_prompt_context, llm=llm)

In [28]:
query = "What is the wine keeper's shop?"
docs_with_scores = db.similarity_search_with_score(query, k = 1)

rag_chain.run(
    context = docs_with_scores[0][0],
    query = query
)

"\nThe wine-keeper's shop appears to be a tavern where people were drinking wine and playing cards and dominoes."

That's it! That's the general concept of Retrieval Augmented Generation.

## Using built in LangChain RAG chains

LangChain contains many built-in methods that have connectivity to Vector Databases and LLMs.  In the example above, we built a custom prompt template and manually retrieved the document, then put it into the chain.  While pretty simple, with LangChain, this can all be pipelined together and more can be done, such as retrieving meta-data and sources.

In [77]:
from langchain import hub
from operator import itemgetter
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.runnable import RunnableParallel

retriever = db.as_retriever(search_kwargs = {'k' : 2})

prompt = hub.pull("rlm/rag-prompt")

print(prompt.messages[0].prompt.template)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:


In [78]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build a chain with multiple documents for RAG
rag_chain_from_docs = (
    {
        "context": lambda input: format_docs(input["documents"]),
        "question": itemgetter("question"),
    }
    | prompt
    | llm
    | StrOutputParser()
)

# 2-step chain, first retrieve documents
# Then take those documents and store relevant infomration in `document_sources`
# Pass the prompt into the document chain
rag_chain_with_source = RunnableParallel({
    "documents": retriever, 
     "question": RunnablePassthrough()
}) | {
    "sources": lambda input: [(doc.page_content, doc.metadata) for doc in input["documents"]],
    "answer": rag_chain_from_docs,
}

In [79]:
res = rag_chain_with_source.invoke("What is the wine keeper's shop?")

In [80]:
res['answer']

" The wine-keeper's shop is a corner shop, better than most others in appearance and degree, owned and operated by Monsieur Defarge. Customers visit the shop to purchase wine and other items, and the shopkeeper also provides conversation and other services."

In [81]:
res['sources']

[('The wine-shop keeper accordingly rolled his eyes about, until they\nrested upon an elderly gentleman and a young lady, who were seated in\na corner. Other company were there: two playing cards, two playing\ndominoes, three standing by the counter lengthening out a short supply\nof wine. As he passed behind the counter, he took notice that the\nelderly gentleman said in a look to the young lady, “This is our man.”\n\n“What the devil do _you_ do in that galley there?” said Monsieur Defarge\nto himself; “I don’t know you.”\n\nBut, he feigned not to notice the two strangers, and fell into discourse\nwith the triumvirate of customers who were drinking at the counter.\n\n“How goes it, Jacques?” said one of these three to Monsieur Defarge. “Is\nall the spilt wine swallowed?”\n\n“Every drop, Jacques,” answered Monsieur Defarge.',
  {'source': 'pg98.txt'}),
  {'source': 'pg98.txt'})]