# Advance Retriever

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

There are three strategy to retrieve data from vector store. But first, let's create a vector store

In [2]:
from langchain_chroma import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import TokenTextSplitter
import os

embeddings = OpenAIEmbeddings()

persist_directory = './db/sangkuriang_db'
# Create vector store
if not os.path.exists(persist_directory):
    text_loader = TextLoader('./sources/sangkuriang.txt')
    text = text_loader.load()

    splitter = TokenTextSplitter(chunk_size=70, chunk_overlap=10, encoding_name="cl100k_base")
    splitted_text = splitter.split_documents(text)

    Chroma.from_documents(splitted_text, embedding=embeddings, persist_directory=persist_directory)

# Get vectror store
vectorstore = Chroma(persist_directory=persist_directory, embedding_function=embeddings)

### Similarity search
- Using cosine similarity to search data
- Cosine simliarity measure distance between query vector and each data

Watch [this video](https://youtu.be/e9U0QAFbfLI) for more information

In [3]:
retriever = vectorstore.as_retriever(
    search_type='similarity', search_kwargs={'k': 3})
results = retriever.invoke("Why Sangkuriang kill Tumang?")

[res.page_content for res in results ]

['kuriang felt upset because he hadn’t successfully hunted any animals. Then, he decided to kill Tumang and gave Tumang’s heart to his mother.\nMoments later, Dayang Sumbi found out that Sangkuriang had killed Tumang. Furiously, Dayang Sumbi hit Sangkuriang’s head and threw him out of the house.\nYears later, Sangkuriang, who was kicked out of the house, had grown into a handsome',
 ' Sumbi married Tumang (long story short, Tumang was a god who was expelled from heaven). From that marriage, a son named Sangkuriang was born.\nTime went on until Sangkuriang grew into a handsome boy. One day, Sangkuriang found out that his mother wanted to eat a deer’s liver. Sangkuriang also hunted into the forest with Tumang. While hunting, Sangkuriang felt upset because he hadn’t successfully',
 ' out of the house, had grown into a handsome man. While Dayang Sumbi remained young, this was because at that time Dayang Sumbi ate Tumang’s liver. One day, Sangkuriang met Dayang Sumbi again, and they fell in

2. Similarity seach with threshold
- This performs a similarity search, but only returns documents with a similarity score above a specified threshold.
- You can set the threshold using the search_kwargs parameter, e.g., search_kwargs={"score_threshold": 0.5}.

In [4]:
retriever = vectorstore.as_retriever(
    search_type='similarity_score_threshold',
    search_kwargs={"score_threshold": 0.7, "k": 3})
results = retriever.invoke("Why Sangkuriang kill Tumang?")

[res.page_content for res in results ]

['kuriang felt upset because he hadn’t successfully hunted any animals. Then, he decided to kill Tumang and gave Tumang’s heart to his mother.\nMoments later, Dayang Sumbi found out that Sangkuriang had killed Tumang. Furiously, Dayang Sumbi hit Sangkuriang’s head and threw him out of the house.\nYears later, Sangkuriang, who was kicked out of the house, had grown into a handsome',
 ' Sumbi married Tumang (long story short, Tumang was a god who was expelled from heaven). From that marriage, a son named Sangkuriang was born.\nTime went on until Sangkuriang grew into a handsome boy. One day, Sangkuriang found out that his mother wanted to eat a deer’s liver. Sangkuriang also hunted into the forest with Tumang. While hunting, Sangkuriang felt upset because he hadn’t successfully',
 ' out of the house, had grown into a handsome man. While Dayang Sumbi remained young, this was because at that time Dayang Sumbi ate Tumang’s liver. One day, Sangkuriang met Dayang Sumbi again, and they fell in

3. Maximum marginal relevance search

This method balances selecting documents that are relevant to the query while ensuring diversity among them.
- `fetch_k` specifies the number of documents to initially retrieve based on similarity.
- `lambda_mult` controls the diversity of the results: 1 for minimum diversity, 0 for maximum diversity.
Use this when you want to avoid redundancy and retrieve documents that are both relevant and diverse.

Note: Relevance measures how closely documents match the query.
Note: Diversity ensures that the retrieved documents are not too similar to each other,
      providing a broader range of information.

In [5]:
retriever = vectorstore.as_retriever(
    search_type='mmr',
    search_kwargs={"k": 3, "lambda_mult": 0.7, "fetch_k": 20})
results = retriever.invoke("Why Sangkuriang kill Tumang?")

[res.page_content for res in results ]

Number of requested results 20 is greater than number of elements in index 7, updating n_results = 7


['kuriang felt upset because he hadn’t successfully hunted any animals. Then, he decided to kill Tumang and gave Tumang’s heart to his mother.\nMoments later, Dayang Sumbi found out that Sangkuriang had killed Tumang. Furiously, Dayang Sumbi hit Sangkuriang’s head and threw him out of the house.\nYears later, Sangkuriang, who was kicked out of the house, had grown into a handsome',
 ' Sumbi married Tumang (long story short, Tumang was a god who was expelled from heaven). From that marriage, a son named Sangkuriang was born.\nTime went on until Sangkuriang grew into a handsome boy. One day, Sangkuriang found out that his mother wanted to eat a deer’s liver. Sangkuriang also hunted into the forest with Tumang. While hunting, Sangkuriang felt upset because he hadn’t successfully',
 ' and made it look like the dawn had come. The spirits fled, so Sangkuriang failed to finish his boat. This made Sangkuriang furious and kicked the boat upside down. The overturned boat is now known as Mount Ta