### Retrieval

- Retrieval is the centerpiece of our retrieval augmented generation (RAG) flow.

- In the last lesson, we covered the basics of semantic search and saw that it worked pretty well for a good amount of use cases.

- But we also saw some edge cases and saw how things could go a little bit wrong.

- In this lecture, we are going to dive on retrieval and cover a few more advanced methods for overcoming those edge cases.

![image.png](attachment:image.png)

#### 1. Maximum marginal relevance (MMR)


![image.png](attachment:image.png)

- Query the Vector Sstore
- Choose the `fecth_k` most similar respnses
- Within those responses choose the `k` most diverse

![image-2.png](attachment:image-2.png)

- The idea behind MMR is that we send a query in, and then we initially get back a set of responses, with `fecth_k` being a parameter that we can control in order to determine how many responses we get.
- This is based solely on semantic similarity.
- From there, we then work with that smaller set of documents and optimize for not only the most relevant ones, based on semantic similarity, but also ones that are diverse.
- And from that set of documents, we choose a final "k" to return to the user.

#### 2. LLM Aided Retrieval

- These are several situations where the `Query` applied to the DB is more than just the `Question` asked.
- One is `SelfQuery` where we use an LLM to convert the user question into a query

![image.png](attachment:image.png)

- This is useful when we get questions that aren't solely about the content that we want to look up semantically, but also include some metadata that we want to do a filter on.

- In above given example has two components to it.
- It is got a semantic part, the alliens bit. So we want to lookup aliens in our database of movies.
- But it's also got a piece that really refers to the metadata about each movie, which is the fact that the year should be 1980.
- What we can do is we can use a language model itself to split that original question into two separate things, a filter and a search term.
- Most vector stores support a metadata filter. So we can easily filter record based on metadata, like the year being 1980.


#### 3. Compression

- Increase the number of results we can put in the context by shrinking the respones to only the relevant information.

![image.png](attachment:image.png)

- This can be useful to really pull out only the most relevant bits of the retrieved passages. For example, when asking a question, we get back the whole document that was stored, even if only the first one or two sentences are the relevant parts.
- With compression, we can then run all those documents through a language model and extract the most relevant segements and then pass only the most relevant segements into a final language model call.
- This comes at the cost of making more calls to the language model, but it is also really good for focusing the final answer on only the most important things.
- And so it is a bit of a tradeoff.

#### Vectorstore retrieval

In [1]:
import os
import openai
import sys

# sys.path.append('../..')

# from dotenv import load_dotenv, find_dotenv
# _ = load_dotenv(find_dotenv()) # read local .env file

# openai.api_key  = os.environ['OPENAI_API_KEY']

In [18]:
OPENAI_API_KEY = "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"

#### Similarity Search

In [3]:
from langchain.document_loaders import PyPDFLoader

# Load PDF
loaders = [
    # Duplicate documents on purpose - messy data
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf"),
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf"),
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture02.pdf"),
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture03.pdf")
]

In [4]:
docs = []
for loader in loaders:
    docs.extend(loader.load())

In [5]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)

In [6]:
splits = text_splitter.split_documents(docs)

In [7]:
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

In [20]:
db = FAISS.from_documents(splits, embedding_model)

AuthenticationError: Incorrect API key provided: sk-CLhoj***************************************ndQh. You can find your API key at https://platform.openai.com/account/api-keys.

In [9]:
db.save_local("docs/faiss_index")

In [10]:
texts = [
    """The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).""",
    """A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.""",
    """A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.""",
]

In [11]:
smalldb = FAISS.from_texts(texts, embedding=embedding_model)

In [11]:
question = "Tell me about all-white mushrooms with large fruiting bodies"

In [12]:
smalldb.similarity_search(question, k=2)

[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.', metadata={}),
 Document(page_content='The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).', metadata={})]

In [13]:
smalldb.max_marginal_relevance_search(question,k=2, fetch_k=3)

[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.', metadata={}),
 Document(page_content='A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.', metadata={})]

### Addressing Diversity: Maximum marginal relevance

Last class we introduced one problem: how to enforce diversity in the search results.
 
`Maximum marginal relevance` strives to achieve both relevance to the query *and diversity* among the results.

In [14]:
question = "what did they say about matlab?"
docs_ss = db.similarity_search(question, k=3)

In [17]:
docs_ss[0].page_content[:100]

'those homeworks will be done in either MATLA B or in Octave, which is sort of — I \nknow some people '

In [18]:
docs_ss[1].page_content[:100]

'those homeworks will be done in either MATLA B or in Octave, which is sort of — I \nknow some people '

In [19]:
docs_ss[2].page_content[:100]

'into his office and he said, "Oh, professo r, professor, thank you so much for your \nmachine learnin'

- Note the difference in results with MMR

In [21]:
docs_mmr = db.max_marginal_relevance_search(question, k=3)

In [22]:
docs_mmr[0].page_content[:100]

'those homeworks will be done in either MATLA B or in Octave, which is sort of — I \nknow some people '

In [23]:
docs_mmr[1].page_content[:100]

'algorithm then? So what’s different? How come  I was making all that noise earlier about \nleast squa'

In [25]:
docs_mmr[2].page_content[:100]

'learning algorithms to teach a car how to  drive at reasonably high speeds off roads \navoiding obsta'

### Addressing Specificity: working with metadata

In last lecture, we showed that a question about the third lecture can include results from other lectures as well.

To address this, many vectorstores support operations on `metadata`.

`metadata` provides context for each embedded chunk.

In [13]:
question = "what did they say about regression in the third lecture?"

In [14]:
docs = db.similarity_search(
    question,
    k=3,
    filter={"source":"docs/cs229_lectures/MachineLearning-Lecture03.pdf"}
)

In [15]:
for d in docs:
    print(d.metadata)

{'source': 'docs/cs229_lectures/MachineLearning-Lecture03.pdf', 'page': 0}
{'source': 'docs/cs229_lectures/MachineLearning-Lecture03.pdf', 'page': 14}
{'source': 'docs/cs229_lectures/MachineLearning-Lecture03.pdf', 'page': 4}


### Addressing Specificity: working with metadata using self-query retriever

But we have an interesting challenge: we often want to infer the metadata from the query itself.

To address this, we can use `SelfQueryRetriever`, which uses an LLM to extract:
 
1. The `query` string to use for vector search
2. A metadata filter to pass in as well

Most vector databases support metadata filters, so this doesn't require any new databases or indexes.

In [19]:
from langchain.vectorstores import Chroma


In [20]:
persist_directory = 'docs/chroma/'

In [21]:
# !rm -rf ./docs/chroma  # remove old database files if any

In [24]:
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embedding_model,
    persist_directory=persist_directory
)

# print(vectordb._collection.count())

ValueError: [91mYou are using a deprecated configuration of Chroma.

[94mIf you do not have data you wish to migrate, you only need to change how you construct
your Chroma client. Please see the "New Clients" section of https://docs.trychroma.com/migration.
________________________________________________________________________________________________

If you do have data you wish to migrate, we have a migration tool you can use in order to
migrate your data to the new Chroma architecture.
Please `pip install chroma-migrate` and run `chroma-migrate` to migrate your data and then
change how you construct your Chroma client.

See https://docs.trychroma.com/migration for more information or join our discord at https://discord.gg/8g5FESbj for help![0m

In [39]:
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

In [30]:
metadata_field_info = [
    AttributeInfo(
        name="source",
        description="The lecture the chunk is from, should be one of `docs/cs229_lectures/MachineLearning-Lecture01.pdf`, `docs/cs229_lectures/MachineLearning-Lecture02.pdf`, or `docs/cs229_lectures/MachineLearning-Lecture03.pdf`",
        type="string",
    ),
    AttributeInfo(
        name="page",
        description="The page from the lecture",
        type="integer",
    ),
]

In [35]:
document_content_description = "Lecture notes"
llm = OpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)
retriever = SelfQueryRetriever.from_llm(
                                        llm,
                                        db,
                                        document_content_description,
                                        metadata_field_info,
                                        verbose=True
                                        )

ValueError: Self query retriever with Vector Store type <class 'langchain.vectorstores.faiss.FAISS'> not supported.

- we are using FASISS, which does not support selfQuery

In [45]:
question = "what did they say about regression in the third lecture?"

**You will receive a warning** about predict_and_parse being deprecated the first time you executing the next line. This can be safely ignored.

In [46]:
docs = retriever.get_relevant_documents(question)

NameError: name 'retriever' is not defined

In [None]:
for d in docs:
    print(d.metadata)

### Additional tricks: compression

Another approach for improving the quality of retrieved docs is compression.

Information most relevant to a query may be buried in a document with a lot of irrelevant text. 

Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. 

In [25]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [26]:
def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))


In [40]:
# Wrap our vectorstore
llm = OpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

In [42]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=db.as_retriever()
)

In [43]:
question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-3V91Qq3uFhZvtAzW06qF5bGN on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-3V91Qq3uFhZvtAzW06qF5bGN on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco

Document 1:

"MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write code for numerical routines, to move data around, to plot data. And it's sort of an extremely easy to learn tool to use for implementing a lot of learning algorithms."
----------------------------------------------------------------------------------------------------
Document 2:

"MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write code for numerical routines, to move data around, to plot data. And it's sort of an extremely easy to learn tool to use for implementing a lot of learning algorithms."
----------------------------------------------------------------------------------------------------
Document 3:

"And the student said, "Oh, it was the MATLAB." So for those of you that don't know MATLAB yet, I hope you do learn it. It's not hard, and we'll actually have a short MATLAB tutorial in one o

#### Combining various techniques

In [44]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=db.as_retriever(search_type = "mmr")
)

In [45]:
question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-3V91Qq3uFhZvtAzW06qF5bGN on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-3V91Qq3uFhZvtAzW06qF5bGN on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco

Document 1:

"MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write code for numerical routines, to move data around, to plot data. And it's sort of an extremely easy to learn tool to use for implementing a lot of learning algorithms."
----------------------------------------------------------------------------------------------------
Document 2:

"And the student said, "Oh, it was the MATLAB." So for those of you that don't know MATLAB yet, I hope you do learn it. It's not hard, and we'll actually have a short MATLAB tutorial in one of the discussion sections for those of you that don't know it."


#### Other types of retrieval

It's worth noting that vectordb as not the only kind of tool to retrieve documents. 

The `LangChain` retriever abstraction includes other ways to retrieve documents, such as TF-IDF or SVM.

In [46]:
from langchain.retrievers import SVMRetriever
from langchain.retrievers import TFIDFRetriever
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [47]:
# Load PDF
loader = PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf")
pages = loader.load()
all_page_text=[p.page_content for p in pages]
joined_page_text=" ".join(all_page_text)

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500, chunk_overlap = 150)
splits = text_splitter.split_text(joined_page_text)


In [49]:
len(splits)

45

In [50]:
# Retrieve
svm_retriever = SVMRetriever.from_texts(splits, embedding_model)
tfidf_retriever = TFIDFRetriever.from_texts(splits)

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-embedding-ada-002 in organization org-3V91Qq3uFhZvtAzW06qF5bGN on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-embedding-ada-002 in organization org-3V91Qq3uFhZvtAzW06qF5bGN on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/

RateLimitError: Rate limit reached for default-text-embedding-ada-002 in organization org-3V91Qq3uFhZvtAzW06qF5bGN on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.

In [53]:
# question = "What are major topics for this class?"
# docs_svm=svm_retriever.get_relevant_documents(question)
# docs_svm[0]

In [54]:
# question = "what did they say about matlab?"
# docs_tfidf=tfidf_retriever.get_relevant_documents(question)
# docs_tfidf[0]