# Retrieval

Retrieval is the centerpiece of our retrieval augmented generation (RAG) flow. 

Let's get our vectorDB from before.

## Vectorstore retrieval


In [2]:
import os
import openai
from dotenv import dotenv_values

from dotenv import load_dotenv
_ = load_dotenv(os.environ['HOME'] + "/.env") # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']


In [3]:
#!pip install lark

### Similarity Search

In [4]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = './docs/chroma/'

In [5]:
embedding = OpenAIEmbeddings()
vectordb = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding
)

  warn_deprecated(


In [6]:
print(vectordb._collection.count())

3735


In [7]:
texts = [
    """Kanuni tanımında birbirinin alternatifi olarak gösterilen birden çok hareketten biriyle işlenebilen suçlar “seçimlik hareketli”dir.""",
    """Bu tür suçlarda, kanuni tanımda gösterilen alternatif hareketlerin hepsinin aynı anda gerçekleştirilmesi şart olmayıp birinin icrasıyla suç oluştur.""",
    """Somut olayda kanuni tarie gösterilen hareketlerin hepsi icra edilmiş olsa dahi ortada tek suç vardır.""",
]

In [8]:
smalldb = Chroma.from_texts(texts, embedding=embedding)

In [9]:
question = "seçimlik hareketli nedir?"

In [10]:
smalldb.similarity_search(question, k=2)

[Document(page_content='Kanuni tanımında birbirinin alternatifi olarak gösterilen birden çok hareketten biriyle işlenebilen suçlar “seçimlik hareketli”dir.'),
 Document(page_content='Bu tür suçlarda, kanuni tanımda gösterilen alternatif hareketlerin hepsinin aynı anda gerçekleştirilmesi şart olmayıp birinin icrasıyla suç oluştur.')]

In [11]:
smalldb.max_marginal_relevance_search(question,k=2, fetch_k=3)

[Document(page_content='Kanuni tanımında birbirinin alternatifi olarak gösterilen birden çok hareketten biriyle işlenebilen suçlar “seçimlik hareketli”dir.'),
 Document(page_content='Somut olayda kanuni tari\x17e gösterilen hareketlerin hepsi icra edilmiş olsa dahi ortada tek suç vardır.')]

### Addressing Diversity: Maximum marginal relevance

Last class we introduced one problem: how to enforce diversity in the search results.
 
`Maximum marginal relevance` strives to achieve both relevance to the query *and diversity* among the results.

In [12]:
question = "mala zarar verme suçu nedir?"
docs_ss = vectordb.similarity_search(question,k=3)

In [13]:
docs_ss[0].page_content[:100]

'kasına a/i.dott malı yararlanmak maksadıyla değ/i.dotl de tahr/i.dotp etmek maksadıyla alması hâl/i.'

In [14]:
docs_ss[1].page_content[:100]

'olarak zarara uğrama tehl/i.dotkes/i.dot /i.dotle kar/uni015Fıla/uni015Fmı/uni015F olması, hareket/i'

Note the difference in results with `MMR`.

In [15]:
docs_mmr = vectordb.max_marginal_relevance_search(question,k=3)

In [16]:
docs_mmr[0].page_content[:100]

'mala zarar vermeye ve konut dokunulmazlığını /i.dothlale te/uni015Febbüs te/uni015Fk/i.dotl etmekle '

In [17]:
docs_mmr[1].page_content[:100]

'yaralama suçu olu/uni015Fmaz. Eğer fa/i.dotl daha d/i.dotkkatl/i.dot ve özenl/i.dot olsaydı, kend/i.'

### Addressing Specificity: working with metadata

In last lecture, we showed that a question about the third lecture can include results from other lectures as well.

To address this, many vectorstores support operations on `metadata`.

`metadata` provides context for each embedded chunk.

In [18]:
question = "what did they say about regression in the third lecture?"

In [19]:
docs = vectordb.similarity_search(
    question,
    k=3,
    filter={"source":"docs/cs229_lectures/MachineLearning-Lecture03.pdf"}
)

In [20]:
for d in docs:
    print(d.metadata)

### Addressing Specificity: working with metadata using self-query retriever

But we have an interesting challenge: we often want to infer the metadata from the query itself.

To address this, we can use `SelfQueryRetriever`, which uses an LLM to extract:
 
1. The `query` string to use for vector search
2. A metadata filter to pass in as well

Most vector databases support metadata filters, so this doesn't require any new databases or indexes.

In [21]:
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

In [22]:
metadata_field_info = [
    AttributeInfo(
        name="source",
        description="The lecture the chunk is from, should be one of `docs/cs229_lectures/MachineLearning-Lecture01.pdf`, `docs/cs229_lectures/MachineLearning-Lecture02.pdf`, or `docs/cs229_lectures/MachineLearning-Lecture03.pdf`",
        type="string",
    ),
    AttributeInfo(
        name="page",
        description="The page from the lecture",
        type="integer",
    ),
]

**Note:** The default model for `OpenAI` ("from langchain.llms import OpenAI") is `text-davinci-003`. Due to the deprication of OpenAI's model `text-davinci-003` on 4 January 2024, you'll be using OpenAI's recommended replacement model `gpt-3.5-turbo-instruct` instead.

In [23]:
document_content_description = "Lecture notes"
llm = OpenAI(model='gpt-3.5-turbo-instruct', temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectordb,
    document_content_description,
    metadata_field_info,
    verbose=True
)

  warn_deprecated(


In [24]:
question = "what did they say about regression in the third lecture?"

**You will receive a warning** about predict_and_parse being deprecated the first time you executing the next line. This can be safely ignored.

In [25]:
docs = retriever.get_relevant_documents(question)

In [26]:
for d in docs:
    print(d.metadata)

### Additional tricks: compression

Another approach for improving the quality of retrieved docs is compression.

Information most relevant to a query may be buried in a document with a lot of irrelevant text. 

Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. 

In [27]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [28]:
def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))


In [29]:
# Wrap our vectorstore
llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")
compressor = LLMChainExtractor.from_llm(llm)

In [30]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever()
)

In [31]:
question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)



Document 1:

"Oh, it was the MATLAB."
----------------------------------------------------------------------------------------------------
Document 2:

It has somewhat fewer features than MATLAB, but it's free, and for the purposes of this class, it will work for just about everything.
----------------------------------------------------------------------------------------------------
Document 3:

Instructor (Andrew Ng) : Of the project?
----------------------------------------------------------------------------------------------------
Document 4:

MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write code for numerical routines, to move data around, to


## Combining various techniques

In [32]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type = "mmr")
)

In [33]:
question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)



Document 1:

"Oh, it was the MATLAB."
----------------------------------------------------------------------------------------------------
Document 2:

Instructor (Andrew Ng) : Of the project?
----------------------------------------------------------------------------------------------------
Document 3:

MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write code for numerical routines, to move data around, to
----------------------------------------------------------------------------------------------------
Document 4:

I'm going to ask you to do most of your programming in MATLAB and Octave because if you try to implement the same algorithm in C or Java or something, I can tell you from personal, painful experience, you end up writing pages and pages of code rather than relatively few lines of code.


## Other types of retrieval

It's worth noting that vectordb as not the only kind of tool to retrieve documents. 

The `LangChain` retriever abstraction includes other ways to retrieve documents, such as TF-IDF or SVM.

In [34]:
from langchain.retrievers import SVMRetriever
from langchain.retrievers import TFIDFRetriever
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [35]:
# Load PDF
loader = PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf")
pages = loader.load()
all_page_text=[p.page_content for p in pages]
joined_page_text=" ".join(all_page_text)

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500,chunk_overlap = 150)
splits = text_splitter.split_text(joined_page_text)


In [36]:
# Retrieve
svm_retriever = SVMRetriever.from_texts(splits,embedding)
tfidf_retriever = TFIDFRetriever.from_texts(splits)

RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for text-embedding-ada-002 in organization org-GPTUHucAJvdwSQVWJYDCzroI on requests per min (RPM): Limit 3, Used 3, Requested 1. Please try again in 20s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing.', 'type': 'requests', 'param': None, 'code': 'rate_limit_exceeded'}}

In [None]:
question = "What are major topics for this class?"
docs_svm=svm_retriever.get_relevant_documents(question)
docs_svm[0]

In [None]:
question = "what did they say about matlab?"
docs_tfidf=tfidf_retriever.get_relevant_documents(question)
docs_tfidf[0]