# RAG (Retrieval Augmented Generation)
- RAG is a technique using or retrieving data from private or real-time sources to increase, expand the capabilities of the LLMs.
![image](./images/Architecture.png)

## Retrieval

![image](./images/data_connection-95ff2033a8faa5f3ba41376c0f6dd32a.jpg)

1. From the source file, load the data by using text loader (UnstructuredFileLoader).
2. Transform it by splitting the data, because it is better for LLM to search multiple smaller documents rather than single big document. 
3. Embed the data. Embedding menas a vector representation of the meaning behind the text, documents (OpenAIEmbeddings).
4. Use CacheBackedEmbedding to cache the embeddings because it is not free.
4. Store the number (data).
5. Perform a search by using vectorestore. 

In [5]:
import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()



showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml


True

- Document Loaders
    - Loader is a piece of code extract the data from a source and brings it to Langchain.
    - https://python.langchain.com/docs/integrations/document_loaders/unstructured_file

In [10]:
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
# from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import UnstructuredFileLoader

# loader = TextLoader("./files/chapter_one.txt")
loader = UnstructuredFileLoader("./files/chapter_one.pdf")

# loader.load()
len(loader.load())

1

Now, we will split the document. The return value of loader.load() is a list, and the whole chapter is only one document. It is better and efficient to split the document to store, embed, and give it to language model.

RecursiveCharacterTextSplitter will separate the file for every sentence or paragraph ending to keep the semantic meaning of sentences.

## Tiktoken
- token doesn't mean just a letter, it could be a word, a text, or a chunk of text. 
- To see the difference between token and characters, refer to OpenAI tokenizer : https://platform.openai.com/tokenizer

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300,
    chunk_overlap=100,
    separator="\n",
)

loader = UnstructuredFileLoader("./files/chapter_one.docx")

loader.load_and_split(text_splitter=splitter)

len(loader.load_and_split(text_splitter=splitter))


25

## Embedding

#### Vectors
Below cell is just an example of how each word in the list converted to vectors (numbers)

In [12]:
from langchain.embeddings import OpenAIEmbeddings

embedder = OpenAIEmbeddings()

# this is the vector with all dimesions for the word "hi"
hi = embedder.embed_query("Hi")
len(hi) #total 1536 dimensions only for the word "hi"

vector = embedder.embed_documents(
    [
    "hi",
    "how",
    "are",
    "you longer sentences",
    ]
)
print(len(vector), len(vector[0]))

4 1536


 Vector stores is sort of databases.
 1) Create vectors  
 2) Cache those vectors  
 3) Put those vectors inside of the Vector store 
 4) Perform searches to find relevant docs

Without cache, if we re-run the entire cell, this would cost more money. So it is better to save it in cache. 
Cache these embeddings.

In [23]:
from langchain.embeddings import CacheBackedEmbeddings
from langchain.vectorstores import FAISS
from langchain.storage import LocalFileStore
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI()

cache_dir = LocalFileStore("./.cache/")

splitter = CharacterTextSplitter.from_tiktoken_encoder(
    separator="\n",
    chunk_size=300,
    chunk_overlap=100,
)

# Load 
loader = UnstructuredFileLoader("./files/chapter_one.docx")

# Transform 
docs = loader.load_and_split(text_splitter=splitter)

# Embed 
embeddings = OpenAIEmbeddings()

# When we embed the file, first, we check if those embeddings already exist in our cache.
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
    embeddings, cache_dir
)

# If not, call vectorestore chroma
vectorstore = FAISS.from_documents(docs, cached_embeddings)

chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="refine",
    retriever=vectorstore.as_retriever(),
)

Now we can start search in vector space.

In [22]:
results = vectorstore.similarity_search("where does Mr.Jones live")

In [26]:
chain.run("How old is Major?")

"The new context does not provide any information about Major's age. Therefore, the original answer of Major's age being twelve years old remains the same."

### Implementing Stuff LCEL Chain

In [27]:
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough

llm = ChatOpenAI(
    temperature=0.1,
)

cache_dir = LocalFileStore("./.cache/")

splitter = CharacterTextSplitter.from_tiktoken_encoder(
    separator="\n",
    chunk_size=300,
    chunk_overlap=100,
)

# Load 
loader = UnstructuredFileLoader("./files/chapter_one.txt")

# Transform 
docs = loader.load_and_split(text_splitter=splitter)

# Embed 
embeddings = OpenAIEmbeddings()

# When we embed the file, first, we check if those embeddings already exist in our cache.
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
    embeddings, cache_dir
)

# If not, call vectorestore chroma
vectorstore = FAISS.from_documents(docs, cached_embeddings)

retriever = vectorstore.as_retriever()

prompt = ChatPromptTemplate.from_messages([
    ("system", '''You are a helpful assistant.Answer questions using only 
     the following context. If you don't know the answer just say you don't know,
     don't make it up:\n\n{context}'''),
    ("human", "{question}")
])

# Retriever is a very important component of the chain.
# Langchain will take query, and call retriever("Describe the sing that old Major sang.")
# Then we get the list of documents [Doc].
# Then, RunnablePassthrough allow us to pass the input through it.
# Hence, a list of Documents and question has passed onto the prompt.
chain = {"context":retriever, "question": RunnablePassthrough()} | prompt | llm

chain.invoke("Describe the sing that old Major sang.")

Created a chunk of size 322, which is longer than the specified 300
Created a chunk of size 302, which is longer than the specified 300
Created a chunk of size 308, which is longer than the specified 300


AIMessage(content='Old Major sang a song called "Beasts of England." The tune was described as a stirring one, resembling a combination of "Clementine" and "La Cucaracha." The lyrics of the song spoke of a golden future time when animals would be free from the tyranny of Man. It expressed the hope that one day, animals would overthrow humans and live in a world where they would have abundance and freedom.')

### Implementing MapReduce Chain using LCCE
This is a simplified version to understand MapReduce Chain.
- List of Documents
- For every documents in the list of documents, we are going to create a prompt, and give it to LLM.
Prompt: Read this document and see if it has a relevant info to answer the questions that the user request.
Then we get a response from LLM, then from every response in the list of llms responses, we create a whole new document for that, and put them all together.
And this final document will be given to the prompt, and to the LLMs, to answer the initial question. 

To recap, if we ask a question ("Describe the sing that old Major sang."), this wil pass onto the retriever, and the retriever will get us a list of documents that are relevant to the question. For every document in that list of document, we create a prompt, give that prompt to LLM. For example, if there are 5 documents, we are going to ask LLM 5 times, then we receive 5 responses from LLM. We gather those responses, and put them all together in one long document. Then, with that, we talk to LLM put that final document to prompt. Prompt is basiaclly saying "These are all relevant information for query. 

This MapReduce Chain works better when there are more than thousands of documents that retriever returns.

In [32]:
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda

llm = ChatOpenAI(
    temperature=0.1,
)

cache_dir = LocalFileStore("./.cache/")

splitter = CharacterTextSplitter.from_tiktoken_encoder(
    separator="\n",
    chunk_size=300,
    chunk_overlap=100,
)

loader = UnstructuredFileLoader("./files/chapter_one.txt")

docs = loader.load_and_split(text_splitter=splitter)

embeddings = OpenAIEmbeddings()

cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
    embeddings, cache_dir
)

vectorstore = FAISS.from_documents(docs, cached_embeddings)

retriever = vectorstore.as_retriever()

# list of docs

# for doc in list of docs | prompt | llm

# for response in list of llms response | put them all together


map_doc_prompt = ChatPromptTemplate.from_messages(
    [
        (
          "system",
          """
          Use the following portion of a long document to see if any of the
          text is relevant to answer the question. Return any relevant text
          verbatim.
          -------
          {context}
          """ , 
        ),
        ("human", "{question}"),
    ]
)

map_doc_chain = map_doc_prompt | llm

'''
def map_docs(inputs):
 #   print(inputs)
    documents = inputs['documents']
    question = inputs['question']
    results = []
    # for each of document in the input documents, run map_doc_chain for each one of them.
    # save that response, and make a list of responses, and turn them into string in a long document.
    for document in documents:
        result = map_doc_chain.invoke({
            "context": document.page_content,
            "question": question
        }).content
        results.append(result)
    results = "\n\n".join(results)
# print(results)
    return results
'''

def map_docs(inputs):
    documents = inputs["documents"]
    question = inputs["question"]
    return "\n\n".join(
        map_doc_chain.invoke(
            {"context": doc.page_content, "question": question}
        ).content
        for doc in documents
    )



# RunnableLambda class allows you to call a function inside of any chain.
# retriever has a input type of string, and give us documents.
map_chain = {"documents": retriever, "question": RunnablePassthrough()} | RunnableLambda(map_docs)

# final doc | prompt | llm
final_prompt = ChatPromptTemplate.from_messages(
    [
    ("system",
     """Given the following extracted parts of a long document and a
     question, create a final answer.
     If you don't know the answer, just say that you don't know. Don't try
     to make up an answer..
     ------
     {context}
     """,
     ),
    ("human", "{question}"),
    ]
)


# process each doucment in the list, and see if it contains relevant info to answer the question


chain =  {"context": map_chain, "question": RunnablePassthrough()} | final_prompt | llm

chain.invoke("Name a song that Old Major sang.")

Created a chunk of size 322, which is longer than the specified 300
Created a chunk of size 302, which is longer than the specified 300
Created a chunk of size 308, which is longer than the specified 300


AIMessage(content='The song that Old Major sang is called "Beasts of England."')