# RAG from Scratch: Part 12 - Multi-Representation Indexing

Resources: 

- Video: [RAG from Scratch: Part 12](https://www.youtube.com/watch?v=gTCU9I6QqCE&list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x&index=12)
- Notebook: [`rag_from_scratch_12_to_14.ipynb`](./notebooks/rag-from-scratch/rag_from_scratch_12_to_14.ipynb)

In [3]:
from dotenv import load_dotenv

In [4]:
load_dotenv(override=True, dotenv_path="../.env")

True

In [5]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

loader = WebBaseLoader("https://lilianweng.github.io/posts/2024-02-05-human-data-quality/")
docs.extend(loader.load())

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [None]:
# We have 2 documents, one for each blog post
docs

In [8]:
import uuid

from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

# First, we summarize the documents
chain = (
    {"doc": lambda x: x.page_content}
    | ChatPromptTemplate.from_template("Summarize the following document:\n\n{doc}")
    | ChatOpenAI(model="gpt-3.5-turbo",max_retries=0)
    | StrOutputParser()
)

summaries = chain.batch(docs, {"max_concurrency": 5})

In [9]:
summaries

['The document discusses the concept of building autonomous agents powered by large language models (LLMs). It covers key components of such agents, including planning, memory, and tool use. It provides examples of proof-of-concept demos like AutoGPT and GPT-Engineer. Challenges discussed include the finite context length, reliability of natural language interfaces, and planning difficulties. The document also references various research papers and projects related to LLM-powered agents.',
 'The document discusses the importance of high-quality human data for training deep learning models. It covers topics such as the role of human raters in data quality, the wisdom of the crowd, rater agreement, and disagreement, as well as two paradigms for data annotation. It also explores methods for identifying mislabeled data during model training, such as influence functions, tracking prediction changes during training, and noisy cross-validation. The document provides citations and references f

In [12]:
from langchain.storage import InMemoryByteStore
from langchain_openai import OpenAIEmbeddings
#from langchain_community.vectorstores import Chroma
from langchain_chroma import Chroma
from langchain.retrievers.multi_vector import MultiVectorRetriever

# The vectorstore to use to index the child chunks
vectorstore = Chroma(collection_name="summaries",
                     embedding_function=OpenAIEmbeddings())

# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"

# The retriever: we pass both storages and the id_key
retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    byte_store=store,
    id_key=id_key,
)
# Generate unique ids for the documents
doc_ids = [str(uuid.uuid4()) for _ in docs]

# Docs linked to summaries
summary_docs = [
    Document(page_content=s, metadata={id_key: doc_ids[i]})
    for i, s in enumerate(summaries)
]

# Add
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))

In [13]:
query = "Memory in agents"
sub_docs = vectorstore.similarity_search(query,k=1)
sub_docs[0]

Document(metadata={'doc_id': '778ccedf-b831-4d05-8087-14db7166fefc'}, page_content='The document discusses the concept of building autonomous agents powered by large language models (LLMs). It covers key components of such agents, including planning, memory, and tool use. It provides examples of proof-of-concept demos like AutoGPT and GPT-Engineer. Challenges discussed include the finite context length, reliability of natural language interfaces, and planning difficulties. The document also references various research papers and projects related to LLM-powered agents.')

In [14]:
retrieved_docs = retriever.get_relevant_documents(query,n_results=1)
# We see we get the entire document back, from which only the first 500 characters are shown
retrieved_docs[0].page_content[0:500]

  retrieved_docs = retriever.get_relevant_documents(query,n_results=1)


"\n\n\n\n\n\nLLM Powered Autonomous Agents | Lil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n|\n\n\n\n\n\n\nPosts\n\n\n\n\nArchive\n\n\n\n\nSearch\n\n\n\n\nTags\n\n\n\n\nFAQ\n\n\n\n\nemojisearch.app\n\n\n\n\n\n\n\n\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\n \n\n\nTable of Contents\n\n\n\nAgent System Overview\n\nComponent One: Planning\n\nTask Decomposition\n\nSelf-Reflection\n\n\nComponent Two: Memory\n\nTypes of Memory\n\nMaximum Inner Product Search (MIPS"