# Rag From Scratch: Overview

These notebooks walk through the process of building RAG app(s) from scratch.

They will build towards a broader understanding of the RAG langscape, as shown here:

<img src="../../data/images/rag.png"  />

### Enviornment


In [1]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

True

In [2]:
hugging_face_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")
langchain_token = os.getenv("LANGCHAIN_API_KEY")
pinecone_api_key = os.getenv("PINECONE_API_KEY")
pinecone_env = os.getenv("PINECONE_ENV")

## Part 1: Overview

In [28]:
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_huggingface import HuggingFaceEndpoint
from tqdm import tqdm
import tiktoken
import numpy as np

### Indexing

Load documents using a Web Scrapper

In [6]:
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

Split the text

In [7]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
print(len(splits))

66


Embedd the chunks

In [8]:
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=HuggingFaceEmbeddings())

retriever = vectorstore.as_retriever()

  from tqdm.autonotebook import tqdm, trange


Retrieval and Generation - We are using a prompt pulled from langchain hub - this is a hub where are stored all kinds of prompts - same with Dockerhub acts for Docker images

In [20]:
# Prompt from Langchain hub
prompt = hub.pull("rlm/rag-prompt")

Get the model

In [12]:
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(repo_id=repo_id,
                          huggingfacehub_api_token=hugging_face_token,
                          temperature=0.1)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to C:\Users\Hori\.cache\huggingface\token
Login successful


Post processing - We parse the docs and return them as arrays

In [13]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

RAG Chain

Here the context will be stored from the retreiver object - this will hold the embeddings that match the simmilarity search using the question

In [18]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Question

In [19]:
rag_chain.invoke("What is Task Decomposition")

' Task decomposition is a process in which a complex problem is broken down into smaller, manageable tasks. This can be done through LLM with simple prompting, task-specific instructions, or human inputs. The Tree of Thoughts model, for instance, decomposes a problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. This allows for a more thorough exploration of potential solutions.'

## Part 2: Indexing

<img src="../../data/images/indexing.png"  />

Documents

In [22]:
question = "What kinds of pets do I like?"
document = "My favorite pet is a cat."

Count the tokens

In [30]:
def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text stirng."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

print(num_tokens_from_string(question, "cl100k_base"))
print(num_tokens_from_string(document, "cl100k_base"))

8
7


Create embeddings

In [32]:
embd = HuggingFaceEmbeddings()
query_result = embd.embed_query(question)
document_result = embd.embed_query(document)

print(len(query_result))
print(len(document_result))

768
768


Define Cosimilarity method

In [33]:
def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)
    
similarity = cosine_similarity(query_result, document_result)
print(f"Cosine Similarity: {similarity}")

Cosine Similarity: 0.5595268729498503


Document Loaders

In [34]:
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

Split the docs

In [36]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

Vector Stores

In [38]:
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=HuggingFaceEmbeddings())

retriever = vectorstore.as_retriever()

## Part 3: Retreieval

Perform a doc retreival

In [43]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

In [44]:
docs = retriever.get_relevant_documents("What is Task Decomposition")

In [45]:
len(docs)

3

## Part 4: Generation

<img src="../../data/images/generation.png"  />

Define the prompt

In [61]:
from langchain.prompts import ChatPromptTemplate

# Prompt
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'))])

Define the chain

In [62]:
chain = prompt | llm

Run the chain

In [63]:
chain.invoke({"context":docs,"question":"What is Task Decomposition?"})

'\nAnswer: Task Decomposition is a process where a complex task is broken down into smaller, manageable sub-tasks. This can be done using a Large Language Model (LLM) with simple prompting, task-specific instructions, or human inputs. The goal is to create a tree structure of thoughts for each step of the problem, allowing for multiple reasoning possibilities. The search process can be either Breadth-First Search (BFS) or Depth-First Search (DFS), with each state evaluated by a classifier or majority vote.'

RAG chain using predefined prompt 

In [64]:
prompt_hub_rag = hub.pull("rlm/rag-prompt")
prompt_hub_rag

ChatPromptTemplate(input_variables=['context', 'question'], metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))])

In [65]:
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")

'\nAnswer: Task Decomposition is a process where a problem is broken down into smaller, manageable tasks. This can be done using a Large Language Model (LLM) with simple prompting, task-specific instructions, or human inputs. The tasks are then parsed and planned by the LLM, which identifies the type, ID, dependencies, and arguments of each task. This process helps in understanding the problem better and facilitates efficient problem-solving.'