You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def query(self, input_query):
"""
Queries the vector database based on the given input query.
Gets relevant doc based on the query and then passes it to an
LLM as context to get the answer.
:param input_query: The query to use.
:return: The answer to the query.
"""
result = self.collection.query(
query_texts=[input_query,],
n_results=1,
)
result_formatted = self._format_result(result)
answer = self.get_answer_from_llm(input_query, result_formatted[0][0].page_content)
return answer
As far as I can tell, (and I'm just reading, not necessarily understanding, correct me if I'm wrong), it will return the one single closest document. n_results=1
What if we have a more granular database, cut into smaller pieces?
E.g. the webpages and documents we added are only a paragraph long. Then it will only return that one paragraph. So let's keep imagining that a user asks a complex question for which the correct answer is stored in more than one document. Then it would only answer part of the question with limited knowledge.
Here's a simple example. Let's say we are in the car business and feed our database information about the Corvette, one page for each generation. Then a user asks how much does horsepower does the current Corvette make and how much did the first one make?. If my understanding is correct, it could not answer that question (for this specific question, ChatGPT knows the answer out of the box, but you get the point).
For these kinds of use cases I'm proposing to allow the retrieval of more than one document, configurable by the user. 1 can stay as the default. These are then all passed as context so a LLM can do it's magic and process the information.
The downside I can see is that it will require more tokens, and thus cost more. This is a compromise the user has to make for better results. The max token limit should also be considered, especially in cases where the database contains short and long text, for this edge case, max tokens should be configurable by the user, and in case a limit is set, the tokens of the prompt should be counted and cut off if necessary. edit: openai has a max tokens parameter that does all of this
P.S. Why are we prompting with prompt = f"""Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. {context} if we just use one piece of context.
I will propose a PR for his.
The text was updated successfully, but these errors were encountered:
Let's talk about this method:
As far as I can tell, (and I'm just reading, not necessarily understanding, correct me if I'm wrong), it will return the one single closest document.
n_results=1
What if we have a more granular database, cut into smaller pieces?
E.g. the webpages and documents we added are only a paragraph long. Then it will only return that one paragraph. So let's keep imagining that a user asks a complex question for which the correct answer is stored in more than one document. Then it would only answer part of the question with limited knowledge.
Here's a simple example. Let's say we are in the car business and feed our database information about the Corvette, one page for each generation. Then a user asks
how much does horsepower does the current Corvette make and how much did the first one make?
. If my understanding is correct, it could not answer that question (for this specific question, ChatGPT knows the answer out of the box, but you get the point).For these kinds of use cases I'm proposing to allow the retrieval of more than one document, configurable by the user. 1 can stay as the default. These are then all passed as context so a LLM can do it's magic and process the information.
The downside I can see is that it will require more tokens, and thus cost more. This is a compromise the user has to make for better results.
The max token limit should also be considered, especially in cases where the database contains short and long text, for this edge case, max tokens should be configurable by the user, and in case a limit is set, the tokens of the prompt should be counted and cut off if necessary.edit: openai has a max tokens parameter that does all of thisP.S. Why are we prompting with
prompt = f"""Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. {context}
if we just use one piece of context.I will propose a PR for his.
The text was updated successfully, but these errors were encountered: