# Vector stores and retrievers
## Documents
- `page_content`: a string representing the content;
- `metadata`: a dict containing arbitrary metadata.

In [2]:
from dotenv import load_dotenv

load_dotenv(dotenv_path='./../../../.env')

True

In [2]:
from langchain_core.documents import Document

documents = [
    Document(
        page_content="Dogs are great companions, known for their loyalty and friendliness.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Cats are independent pets that often enjoy their own space.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Goldfish are popular pets for beginners, requiring relatively simple care.",
        metadata={"source": "fish-pets-doc"},
    ),
    Document(
        page_content="Parrots are intelligent birds capable of mimicking human speech.",
        metadata={"source": "bird-pets-doc"},
    ),
    Document(
        page_content="Rabbits are social animals that need plenty of space to hop around.",
        metadata={"source": "mammal-pets-doc"},
    ),
]

In [None]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents( # create vectorstore with the documents
    documents,
    embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
)
# vectorstore.add_documents(new_documents)  # 추가 문서 

### Examples
Return documents based on similarity to a string query:

In [7]:
vectorstore.similarity_search("cat", k=2)

[Document(metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]

Async query:

In [8]:
await vectorstore.asimilarity_search("cat", k=2)

[Document(metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]

Return scores:

In [None]:
# Chroma here returns a distance metric that should vary inversely with similarity.
vectorstore.similarity_search_with_score("cat", k=2)

[(Document(metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
  1.2405281066894531),
 (Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
  1.550061821937561)]

Return documents based on similarity to an embedded query:

In [10]:
embedding = OpenAIEmbeddings().embed_query("cat") # 'cat'에 대한 embedding을 수행
vectorstore.similarity_search_by_vector(embedding, k=2)

[Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.')]

## Retrievers
VectorStore를 Runnable 개체로 만들어 LCEL에 통합시킴으로서 Runnable 표준 메소드를 사용할 수 있게 한다.

In [None]:
from langchain_core.runnables import RunnableLambda
# RunnableLambda: Runnable 추상 개념을 구현한 클래스 중 하나로 Runnable 인터페이스를 구현
# 일반함수를 Runnable 객체로 변환하여 Runnable 표준 메서드(bind, invoke, batch, 등)를 사용할 수 있도록 보장
# 앞 Runnable 객체에서 값을 받아서 vectorstore.similarity_search을 수행하는 retriever를 만듬
retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1)  # select top result
# 'cat'으로 한번 수행하고 'shark'로 한번 더 수행함
retriever.batch(["cat", "shark"])

[[Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'source': 'mammal-pets-doc'})],
 [Document(page_content='Goldfish are popular pets for beginners, requiring relatively simple care.', metadata={'source': 'fish-pets-doc'})]]

Vectorstore는 Retriever를 만드는 `as_retriever`을 제공(인자: `search_type`, `search_kwargs`)<br>
as_retriever는 retriever runnable 객체, 즉 Retriever 기능을 가진 Runnable 객체를 만들어 vector db를 사용

In [8]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 1})
retriever.batch(["cat", "shark"])

[[Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'source': 'mammal-pets-doc'})],
 [Document(page_content='Goldfish are popular pets for beginners, requiring relatively simple care.', metadata={'source': 'fish-pets-doc'})]]

`VectorStoreRetriever`의 search type은
* `"similarity"` (default)
* `"mmr"` (maximum marginal relevance)
* `"similarity_score_threshold"`

In [1]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

In [10]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""

prompt = ChatPromptTemplate.from_messages([("human", message)])

rag_chain = {"context": retriever, "question": RunnablePassthrough()} | prompt | llm
# rag chain 내부의 {}안의 각각의 key에 대응하는 개별 value는 RunnableParallel로 동시 수행
# retriever와, RunnablePassthrough()는 동시에 "tell me about cats"라는 입력을 받는다.

In [11]:
response = rag_chain.invoke("tell me about cats")

print(response.content)

Cats are independent pets that often enjoy their own space.
