In [4]:
import os
from dotenv import load_dotenv

load_dotenv()

True

## Retrieval Augmented Generation (RAG)

In the previous tutorial, we have talked about RAG and its first component indexing. In a simple words, in the indexing step, we create a vector store (or database) which contains the vector/numberical representations of chunks of texts.

In this tutorial, we talk about the second component. It consists of:
* retrieval: given a query (from a user), we need to find the most relevant information (from the vector store),
* generation: given the user's query and the retrieved contexts, the LLM should answer the user.

#### step 1) loading the vector store

In [5]:
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_chroma import Chroma

In [7]:
persist_directory = 'data/chroma_langchain_db'

# the embedding model
embedding = OpenAIEmbeddings(
    model="text-embedding-ada-002"
)

# To load vector store (containing all vector representations of text's chunks)
db = Chroma(
    embedding_function=embedding,
    persist_directory=persist_directory,  # Where to save data locally, remove if not necessary
)

#### step 2) create a retriever

Given a query, a `retriever` can extract relevant chunks from the vector store. Fortunately, langchain provides functionalities that simplify this.

In [9]:
retriever = db.as_retriever()

# to test
res = retriever.invoke("Who are judges?") # it returns the most 4 relevant text's chunks.
res[0].page_content[20:200]

'GMENT\n2\n5.  In September 2013 Judge Ch., sitting in a single-judge formation, \nbegan the examination of the criminal case.\n6.  On 30 September 2015 Judge Ch. held a final hearing. '

#### step 3) create a prompt for the LLM

To create a prompt which provide both the query (of the user) and the context (i.e. relevant chunks), we can use langchain hub.

In [10]:
from langchain import hub # to download prompt

In [15]:
prompt = hub.pull("rlm/rag-prompt")
prompt



ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])

#### step 4) chain

Now, we can put all peices together (in a chain) to create a pipeline which 
* accept a query from the user
* then retrieve relevant information from the vector store
* and lastly, send the query and the context to the LLM to generate an answer.

In [17]:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

In [28]:
# Load the LLM model
llm = ChatOpenAI(model='gpt-3.5-turbo-0125', temperature=0)

chain = (
    {
        'context': itemgetter('query') | retriever, # here we send the query to the retriever to extract the most relevant chunks (or documents)
        'question': itemgetter('query')
    }
    | prompt
    | llm
    | StrOutputParser()
)

In [30]:
res = chain.invoke({'query': "Who are the judges?"})

In [50]:
s = res[:25] + ' xxxxxxxxxxxxx ' +res[43:]
s

'The judges in the case of xxxxxxxxxxxxx were Lado Chanturia, Mykola Gnatovskyy, and Úna Ní Raifeartaigh. Judge Ch. was also mentioned in the context as having examined the case and delivered a verdict. Judge Ch. was later dismissed by a presidential order.'