# [Vector stores and retrievers](https://python.langchain.com/v0.2/docs/tutorials/retrievers/)

This notebook follows along with langchain documentation to create [vector stores and retrievers](https://python.langchain.com/v0.2/docs/tutorials/retrievers/)

In [1]:
#load env variables
from dotenv import load_dotenv
import os
load_dotenv()
open_ai_model = "gpt-3.5-turbo-0125"
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [6]:
#mist imports
from typing import List, Tuple

# Vector Stores

In [2]:
#generating sample documents

from langchain_core.documents import Document

#this is a list of example documents
documents = [
    Document(
        page_content="Dogs are great companions, known for their loyalty and friendliness.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Cats are independent pets that often enjoy their own space.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Goldfish are popular pets for beginners, requiring relatively simple care.",
        metadata={"source": "fish-pets-doc"},
    ),
    Document(
        page_content="Parrots are intelligent birds capable of mimicking human speech.",
        metadata={"source": "bird-pets-doc"},
    ),
    Document(
        page_content="Rabbits are social animals that need plenty of space to hop around.",
        metadata={"source": "mammal-pets-doc"},
    ),
]


In [3]:
print(documents)

[Document(page_content='Dogs are great companions, known for their loyalty and friendliness.', metadata={'source': 'mammal-pets-doc'}), Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'source': 'mammal-pets-doc'}), Document(page_content='Goldfish are popular pets for beginners, requiring relatively simple care.', metadata={'source': 'fish-pets-doc'}), Document(page_content='Parrots are intelligent birds capable of mimicking human speech.', metadata={'source': 'bird-pets-doc'}), Document(page_content='Rabbits are social animals that need plenty of space to hop around.', metadata={'source': 'mammal-pets-doc'})]


the goal is to store documents in an unstructured way by associaitng them with an embedding. We can query the documents by turning the query to an embedding of the same dimension, and selecting the most similar documents

[This](https://python.langchain.com/v0.2/docs/integrations/document_loaders/microsoft_word/) is how you can load word documents  
[This](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf/) is how you can load pdfs

Vector stores need some place to store their embeddings. This can be a server such as postgres. Below, we use Chroma, which includes an in-memory implementation.

In [4]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

"""
vectorstore = Chroma.from_documents(
    documents,
    embedding=OpenAIEmbeddings(),
)

above code causes my kernel to crash. 
    ExitCode: 3221225477, Reason: (no reason given)
    I believe this is due to it running out of memory 
"""


#############
#Another method of creating db

vectorstore: Chroma = Chroma("langchain_store", OpenAIEmbeddings())
"""
ids = vectorstore.add_documents(documents)
similarly causes a crash
"""


'\nids = vectorstore.add_documents(documents)\nsimilarly causes a crash\n'

In [7]:
#to query data
pass # i cannot create a chromadb, so I will pass, below is what the code should look like
documents: List[Document] = vectorstore.similarity_search("cat") 

docs_similiary_list:List[Tuple[Document, float]] = vectorstore.similarity_search_with_score("cat")
#this also gives the similarity score. lower scores indicate more similar



# Retrievers

In [None]:
pass #skipping cell because having issues with vectorstore
from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

#create your own retreiver
retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1)  # select top result
#k=1 means only select the k=1 closest item(s)

#run retriever on vector store
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 1},
)

# this retrieves the k=1 closest items for cat and shark seperately
# type = List[List[Document]]
retriever.batch(["cat", "shark"])



# Example chain

In [None]:
pass
# get env var
#import os; OPENAI_API_KEY = os.envget("OPENAI_API_KEY")

# make model
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

# create prompt template
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""

prompt = ChatPromptTemplate.from_messages([("human", message)])

# create and use chain
rag_chain = {"context": retriever, "question": RunnablePassthrough()} | prompt | llm
# runnablePassthrough allows us to .invoke() rag_chain to use it

response = rag_chain.invoke("tell me about cats")

print(response.content)




In [9]:
#woring example found
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

vectorstore = FAISS.from_texts(
    ["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

retrieval_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

retrieval_chain.invoke("where did harrison work?")



'Harrison worked at Kensho.'

# further research

[Self Querying](https://python.langchain.com/v0.2/docs/how_to/self_query/): include limits on the kind of data that can be used
* additional data can be placed in metadata 