feat: add multi-document answers #63

cachho · 2023-06-25T16:18:41Z

Adds number of documents as an optional argument. This is a very important feature for databases, that embed highly fractured, (short) documents (like my QnA).

The number 1 is used as a default so it's fully backwards compatible.

For this to work, context is now a list instead of a string.

Open questions:

is | the right delimiter?
I'm honestly not sure how exactly chroma does this. It's obviously using chunks. But could all chunks be from the same document? Then we might rename it from number_documents to number_chunks. This is just a naming/correctness issue, because we can't change anything about it anyways.

Followup

We could add a max distance setting. That way you don't query extra documents that are super far away, only costing you more tokens for a worse result
I think it would be cool to somehow process the distance as a weight through language in the prompt. Right now all documents are treated equally (unless the LLM prioritizes what comes first). Adding weights does have negative implications though: it adds complexity and you could say that the LLM is better at determining what's important than the embedding AI.

candidosales

Great PR! I left some comments

embedchain/embedchain.py

Co-authored-by: Candido Sales Gomes <candidosg@gmail.com>

cachho · 2023-07-02T17:27:47Z

Great PR! I left some comments

Hey, thanks for taking your time and the one fix. I will remove the type check for list, since it's not required anymore.

I'm not sure about the typing you add. I'm a big fan, I like strong typing a lot, but it just feels weird to have typing for one function and no where else in the code. There should be a full refactor that adds typing. What do you think?

cachho · 2023-07-02T17:49:17Z

@candidosales I get a linting error for the typing (I'm using Python 3.8.10). And it's also throwing an error when I run it.

def generate_prompt(self, input_query: str, contexts: list[str]):
TypeError: 'type' object is not subscriptable

I did some research. The problem is that we require python >=3.8. According to my linter and my research, your suggestion is only for Python >=3.9, so it's not fully compatible with Python 3.8 anymore.

I suggest we bump the required version to >=3.9. Then we can use the current (simple) typing, that you suggested. But that's something that should not be decided in this PR. I will remove the typing changes. @taranjeet please note.

The use of list or List from typing in Python depends on the version you're using. Here's a general guideline:

Python 3.9 and later: With this version, you can use the built-in list type directly for type hinting, such as list[int], thanks to PEP 585. This PEP introduced several changes to Python's typing system, including the ability to use built-in collection types (like list and dict) as generic types.
def my_function(my_list: list[int]) -> int:
    return sum(my_list)
Python 3.5 to 3.8: In these versions, you should use List from the typing module for type hinting.
from typing import List

def my_function(my_list: List[int]) -> int:
    return sum(my_list)
Python 3.7 or 3.8 (with future import): PEP 563 (Postponed Evaluation of Type Annotations) allows for more forward-compatible handling of type hints. If you're using Python 3.7 or 3.8, you can use a future import to use the built-in list and dict as generic types, similar to Python 3.9 and later.
from __future__ import annotations

def my_function(my_list: list[int]) -> int:
    return sum(my_list)
Python 3.5 and earlier: These versions of Python don't have built-in support for type hints in the same way. While you can still use the typing module for certain things, the typing system is less developed and doesn't support the same kind of generic types.

Remember, type hints are entirely optional in Python and do not affect how your code runs - they're just a tool to help with development. Your code should still run correctly even without type hints.

candidosales · 2023-07-03T19:05:24Z

@cachho I'm using Python 3.10.11. In my opinion, it would be beneficial to incorporate infer types in all methods. This will enhance the overall developer experience and facilitate maintenance in the long run. Perhaps we could propose implementing these changes in a new PR.

What do you think?

cachho · 2023-07-03T22:59:09Z

In my opinion, it would be beneficial to incorporate infer types in all methods

I agree, once again the only problem with it in this PR is that we lose compatibility with 3.8. We need to decide if that's worth it, and that's out of scope for this PR.

Perhaps we could propose implementing these changes in a new PR.

I agree, but I would talk to @taranjeet on discord first, before you go all in and then we decide to keep 3.8 compatibility. That's the version I use btw, and I don't think I'm the only one.

candidosales

I left a comment

embedchain/embedchain.py

…cumentAnswers

cachho · 2023-07-05T20:01:46Z

resolved the merge conflicts.

cachho · 2023-07-06T18:51:03Z

adjusted to new config

cachho · 2023-07-06T18:53:20Z

example:

import os

from embedchain import App
from embedchain.config import AddConfig, QueryConfig

naval_chat_bot = App()

add_config = AddConfig() # Currently no options
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44", add_config)
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf", add_config)
naval_chat_bot.add("web_page", "https://nav.al/feedback", add_config)
naval_chat_bot.add("web_page", "https://nav.al/agi", add_config)

naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."), add_config)

query_config = QueryConfig(number_documents=1)
print(naval_chat_bot.dry_run("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config))

query_config = QueryConfig(number_documents=5)
print(naval_chat_bot.dry_run("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config))

returns

Use the following context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: alien species that also had the power to generate these good explanations, there is no explanation that they could generate that we could not understand. We are maximally capable of understanding. There is no concept out there that is possible in this physical reality that a human being, given sufficient time and resources and education, could not understand. Subscribe to Naval Related Modal body text goes here. Close
Query: What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?
Helpful Answer:

versus

Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: alien species that also had the power to generate these good explanations, there is no explanation that they could generate that we could not understand. We are maximally capable of understanding. There is no concept out there that is possible in this physical reality that a human being, given sufficient time and resources and education, could not understand. Subscribe to Naval Related Modal body text goes here. Close | explanation. It’s parroting. It’s brilliant Bayesian reasoning. It’s extrapolating from what it already sees out there generated by humans on the web, but it doesn’t have an underlying model of reality that can explain the seen in terms of the unseen. And I think that’s critical. That is what humans do uniquely that no other creature, no other computer, no other intelligence—biological or artificial—that we have ever encountered does. And not only do we do it uniquely, but if we were to meet an | people find their way to Naval’ s wisdom. | 96 · THE ALMANACK OF NAVAL RAVIKANTThe really smart thinkers are clear thinkers. They understand the basics at a very, very fundamental level. I would rather understand the basics really well than memorize all kinds of complicated concepts I can’t stitch together and can’t rederive from the basics. If you can’t rederive concepts from the basics as you need them, you’re lost. You’re just memorizing. [4] The advanced concepts in a field are less proven. We use them to signal insider knowledge, but we’d be better off nailing the basics. [11] Clear thinkers appeal to their own authority. Part of making effective decisions boils down to dealing with reality. How do you make sure you’re dealing with reality when you’re making decisions? By not having a strong sense of self or judgments or mind presence. The “monkey mind” will always respond with this regurgitated emotional response to what it thinks the world should be. Those desires will cloud your reality. This happens a lot of times when | knowledge, capability, and desire nobody else in the world does, purely from the combinatorics of human DNA and development. The combinatorics of human DNA and experience are staggering. You will never meet any two humans who are substitutable for each other.
Query: What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?
Helpful Answer:

cachho · 2023-07-07T09:17:10Z

@taranjeet changed readme text to number of documents to be retrieved as context as asked for in #163

cachho · 2023-07-07T13:34:16Z

du to the custom prompt, the idea of changing the prompt based on plural and singular numbers was ditched. I think that's okay.

cachho added 2 commits June 25, 2023 18:18

feat: add multi-document answers

d426b86

docs: add bullet point

781f762

This was referenced Jun 28, 2023

Error when DB only has 1 resource added to it via the .add module #91

Closed

Response Amount - Custom OpenAI Settings #103

Closed

candidosales reviewed Jul 2, 2023

View reviewed changes

embedchain/embedchain.py Outdated Show resolved Hide resolved

embedchain/embedchain.py Outdated Show resolved Hide resolved

embedchain/embedchain.py Outdated Show resolved Hide resolved

embedchain/embedchain.py Outdated Show resolved Hide resolved

cachho and others added 4 commits July 2, 2023 19:21

fix: handling of lists with length one

d44824f

Co-authored-by: Candido Sales Gomes <candidosg@gmail.com>

refactor: rename variable from context to contexts

12e8914

Co-authored-by: Candido Sales Gomes <candidosg@gmail.com>

refactor: typing

b1c95a1

Co-authored-by: Candido Sales Gomes <candidosg@gmail.com>

refactor: typing

977ec7c

Co-authored-by: Candido Sales Gomes <candidosg@gmail.com>

cachho added 3 commits July 2, 2023 19:40

refactor: removed unnecessary type check

3563aad

fix: use right variable name

c69ba55

docs: changed variable name according to refactor

20335bf

cachho added 2 commits July 2, 2023 19:51

fix: remove typing

4fa89ee

docs: stress that a list is required.

72dc131

cachho mentioned this pull request Jul 2, 2023

[Feature Request] Prompt Customization #114

Closed

candidosales reviewed Jul 5, 2023

View reviewed changes

embedchain/embedchain.py Outdated Show resolved Hide resolved

cachho added 3 commits July 5, 2023 21:53

fix: condition to use plural

feb88df

Merge branch 'main' of github.com:cachho/embedchain into feat/MultiDo…

ded6ded

…cumentAnswers

fix: remove unnecessary print

2bd590a

cachho added 6 commits July 6, 2023 20:25

Merge branch 'main' into feat/MultiDocumentAnswers

d09b6e4

chore: merge

31bd1c2

chore: reset to master

bf57b1f

refactor: use new config

bee4a82

docs: added documentation

ac04227

docs: add number_documents

ee38b26

cachho added 2 commits July 6, 2023 20:46

fix: variable

005b83f

chore: change error type

0995dde

cachho added 2 commits July 6, 2023 20:56

fix: remerge

7037210

docs: change text

4263b33

Merge branch 'main' into feat/MultiDocumentAnswers

6ac3c6b

cachho added 7 commits July 10, 2023 18:01

Merge branch 'main' into feat/MultiDocumentAnswers

04e5cd2

Merge branch 'main' into feat/MultiDocumentAnswers

690b5a0

Merge branch 'main' into feat/MultiDocumentAnswers

c24ce3c

chore: linting

7c0dfcf

Merge branch 'main' into feat/MultiDocumentAnswers

b7d1d00

Merge branch 'main' into feat/MultiDocumentAnswers

db2b332

fix: order

4c94e4a

taranjeet approved these changes Jul 11, 2023

View reviewed changes

taranjeet merged commit 40dc284 into mem0ai:main Jul 11, 2023

This was referenced Jul 11, 2023

What are the implications of allowing more documents as context? #38

Closed

feat: chat multi-document answers #235

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multi-document answers #63

feat: add multi-document answers #63

cachho commented Jun 25, 2023 •

edited

Loading

candidosales left a comment

cachho commented Jul 2, 2023

cachho commented Jul 2, 2023 •

edited

Loading

candidosales commented Jul 3, 2023

cachho commented Jul 3, 2023

candidosales left a comment

cachho commented Jul 5, 2023

cachho commented Jul 6, 2023

cachho commented Jul 6, 2023 •

edited

Loading

cachho commented Jul 7, 2023

cachho commented Jul 7, 2023

feat: add multi-document answers #63

feat: add multi-document answers #63

Conversation

cachho commented Jun 25, 2023 • edited Loading

Open questions:

Followup

candidosales left a comment

Choose a reason for hiding this comment

cachho commented Jul 2, 2023

cachho commented Jul 2, 2023 • edited Loading

candidosales commented Jul 3, 2023

cachho commented Jul 3, 2023

candidosales left a comment

Choose a reason for hiding this comment

cachho commented Jul 5, 2023

cachho commented Jul 6, 2023

cachho commented Jul 6, 2023 • edited Loading

cachho commented Jul 7, 2023

cachho commented Jul 7, 2023

cachho commented Jun 25, 2023 •

edited

Loading

cachho commented Jul 2, 2023 •

edited

Loading

cachho commented Jul 6, 2023 •

edited

Loading