#### The documents in the vector store might contain phrasing that you are not aware of, due to their size. This can create problems in trying to think of correct query for comparison.

#  Multi-Query-Retriever

This method can automatically use a ChatModel to make slight variations of your initial query to help attempt overcome any issues with cosine similarity distances, this allows you to phrase things more like a general query question rather than a pure document similarity look-up.

In [12]:
# !pip install tiktoken

In [13]:
# Build a sample vectorDB
from langchain.vectorstores import Chroma
from langchain.document_loaders import WikipediaLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from dotenv import load_dotenv, find_dotenv
import os

### Loader

In [14]:
loader = WikipediaLoader(query='OpenAI')
documents = loader.load()

In [15]:
len(documents)

10

### Split Documents

In [16]:
# split it into chunks
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=500)
docs = text_splitter.split_documents(documents)

Created a chunk of size 515, which is longer than the specified 500


### OpenAI Connection for Embeddings

In [17]:
import os

load_dotenv(find_dotenv(), override=True)
api_key = os.getenv("OPENAI_API_KEY")
embedding_function = OpenAIEmbeddings()

In [18]:
# docs

### Embed Documents for ChromaDB

In [19]:
# load it into Chroma
db = Chroma.from_documents(docs, embedding_function,persist_directory='./openAI')
db.persist()

### Use Chat Model to Multi Query

In [25]:
from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever
question="When was OpenAI created?"
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0) 
retriever_from_llm = MultiQueryRetriever.from_llm(retriever=db.as_retriever(),llm=llm)

In [26]:
# Set logging for the queries to understand what's happening behind the scenes. 
import logging
logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)

In [27]:
# Return the docs that are most similar. 
unique_docs = retriever_from_llm.get_relevant_documents(query=question)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What is the founding date of OpenAI?', '2. Can you tell me the year in which OpenAI was established?', '3. At what time was OpenAI founded?']


In [28]:
len(unique_docs)

6

In [29]:
print(unique_docs[0].page_content)

=== 2015–2018: Non-profit beginnings ===
In December 2015, Sam Altman, Greg Brockman, Reid Hoffman, Jessica Livingston, Peter Thiel, Elon Musk, Amazon Web Services (AWS), Infosys, and YC Research announced the formation of OpenAI and pledged over $1 billion to the venture. The actual collected total amount of contributions was only $130 million until 2019. According to an investigation led by TechCrunch, Musk was its largest donor while YC Research did not contribute anything at all. The organization stated it would "freely collaborate" with other institutions and researchers by making its patents and research open to the public. OpenAI is headquartered at the Pioneer Building in Mission District, San Francisco.According to Wired, Brockman met with Yoshua Bengio, one of the "founding fathers" of deep learning, and drew up a list of the "best researchers in the field". Brockman was able to hire nine of them as the first employees in December 2015. In 2016, OpenAI paid corporate-level (r

In [31]:
print(unique_docs[1].page_content)

==== OpenAI ====
OpenAI was initially funded by Altman, Greg Brockman, Elon Musk, Jessica Livingston, Peter Thiel, Microsoft, Amazon Web Services, Infosys, and YC Research. When OpenAI launched in 2015, it had raised $1 billion. In March 2019, Sam Altman left Y Combinator to focus full-time on OpenAI as CEO. By the summer of


In [32]:
print(unique_docs[2].page_content)

OpenAI is an American artificial intelligence (AI) research organization consisting of the non-profit OpenAI, Inc. registered in Delaware and its for-profit subsidiary OpenAI Global, LLC. OpenAI researches artificial intelligence with the declared intention of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". OpenAI has also developed several large language models, such as ChatGPT and GPT-4, as well as advanced image generation models like DALL-E 3, and in the past published open-source models.The organization was founded in December 2015 by Ilya Sutskever, Greg Brockman, Trevor Blackwell, Vicki Cheung, Andrej Karpathy, Durk Kingma, Jessica Livingston, John Schulman, Pamela Vagata, and Wojciech Zaremba, with Sam Altman and Elon Musk serving as the initial board members. Microsoft provided OpenAI Global LLC with a $1 billion investment in 2019 and a $10 billion inve