# Retrieval

In [1]:
import os
import openai

## Vectorstore Retrieval

In [2]:
%pip install lark

Collecting lark
  Obtaining dependency information for lark from https://files.pythonhosted.org/packages/e7/9c/eef7c591e6dc952f3636cfe0df712c0f9916cedf317810a3bb53ccb65cdd/lark-1.1.9-py3-none-any.whl.metadata
  Downloading lark-1.1.9-py3-none-any.whl.metadata (1.9 kB)
Downloading lark-1.1.9-py3-none-any.whl (111 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m111.7/111.7 kB[0m [31m656.5 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: lark
Successfully installed lark-1.1.9
Note: you may need to restart the kernel to use updated packages.


### Similarity Search

In [3]:
# 
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'

In [4]:
embedding = OpenAIEmbeddings()

# Let's get our vectorDB from the previous notebook
vectordb = Chroma(
    persist_directory = persist_directory,
    embedding_function = embedding
)

In [5]:
print(vectordb._collection.count())

209


In [6]:
texts = [
    """The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).""",
    """A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.""",
    """A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.""",
]

In [7]:
# Create a small db to use for this e.g
smalldb = Chroma.from_texts(texts, embedding=embedding)

In [8]:
question = "Tell me about all-white mushroom with large fruiting bodies"

In [9]:
# Run a similarity Search
smalldb.similarity_search(question, k=2)

[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.'),
 Document(page_content='The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).')]

There is no mention that the marshrooms are poisonous.

In [10]:
# Run with MMR (Maximum Marginal Relevance)
smalldb.max_marginal_relevance_search(
    question,
    k=2,  # Return the most relevent 3 documents
    fetch_k=3  # Fetch 3 documents originaly
)

[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.'),
 Document(page_content='A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.')]

When can now see there is a mention of the information that poisonous mashroom is returned in the member document that we retrieve.