Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add multi-document answers #63

Merged
merged 32 commits into from
Jul 11, 2023

Conversation

cachho
Copy link
Contributor

@cachho cachho commented Jun 25, 2023

Adds number of documents as an optional argument. This is a very important feature for databases, that embed highly fractured, (short) documents (like my QnA).

The number 1 is used as a default so it's fully backwards compatible.

For this to work, context is now a list instead of a string.

Open questions:

  • is | the right delimiter?
  • I'm honestly not sure how exactly chroma does this. It's obviously using chunks. But could all chunks be from the same document? Then we might rename it from number_documents to number_chunks. This is just a naming/correctness issue, because we can't change anything about it anyways.

Followup

  • We could add a max distance setting. That way you don't query extra documents that are super far away, only costing you more tokens for a worse result
  • I think it would be cool to somehow process the distance as a weight through language in the prompt. Right now all documents are treated equally (unless the LLM prioritizes what comes first). Adding weights does have negative implications though: it adds complexity and you could say that the LLM is better at determining what's important than the embedding AI.

Copy link
Contributor

@candidosales candidosales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! I left some comments

embedchain/embedchain.py Outdated Show resolved Hide resolved
embedchain/embedchain.py Outdated Show resolved Hide resolved
embedchain/embedchain.py Outdated Show resolved Hide resolved
embedchain/embedchain.py Outdated Show resolved Hide resolved
cachho and others added 4 commits July 2, 2023 19:21
Co-authored-by: Candido Sales Gomes <candidosg@gmail.com>
Co-authored-by: Candido Sales Gomes <candidosg@gmail.com>
Co-authored-by: Candido Sales Gomes <candidosg@gmail.com>
Co-authored-by: Candido Sales Gomes <candidosg@gmail.com>
@cachho
Copy link
Contributor Author

cachho commented Jul 2, 2023

Great PR! I left some comments

Hey, thanks for taking your time and the one fix. I will remove the type check for list, since it's not required anymore.

I'm not sure about the typing you add. I'm a big fan, I like strong typing a lot, but it just feels weird to have typing for one function and no where else in the code. There should be a full refactor that adds typing. What do you think?

@cachho
Copy link
Contributor Author

cachho commented Jul 2, 2023

@candidosales I get a linting error for the typing (I'm using Python 3.8.10). And it's also throwing an error when I run it.

def generate_prompt(self, input_query: str, contexts: list[str]):
TypeError: 'type' object is not subscriptable

I did some research. The problem is that we require python >=3.8. According to my linter and my research, your suggestion is only for Python >=3.9, so it's not fully compatible with Python 3.8 anymore.

I suggest we bump the required version to >=3.9. Then we can use the current (simple) typing, that you suggested. But that's something that should not be decided in this PR. I will remove the typing changes. @taranjeet please note.

The use of list or List from typing in Python depends on the version you're using. Here's a general guideline:

  • Python 3.9 and later: With this version, you can use the built-in list type directly for type hinting, such as list[int], thanks to PEP 585. This PEP introduced several changes to Python's typing system, including the ability to use built-in collection types (like list and dict) as generic types.
def my_function(my_list: list[int]) -> int:
    return sum(my_list)
  • Python 3.5 to 3.8: In these versions, you should use List from the typing module for type hinting.
from typing import List

def my_function(my_list: List[int]) -> int:
    return sum(my_list)
  • Python 3.7 or 3.8 (with future import): PEP 563 (Postponed Evaluation of Type Annotations) allows for more forward-compatible handling of type hints. If you're using Python 3.7 or 3.8, you can use a future import to use the built-in list and dict as generic types, similar to Python 3.9 and later.
from __future__ import annotations

def my_function(my_list: list[int]) -> int:
    return sum(my_list)
  • Python 3.5 and earlier: These versions of Python don't have built-in support for type hints in the same way. While you can still use the typing module for certain things, the typing system is less developed and doesn't support the same kind of generic types.

Remember, type hints are entirely optional in Python and do not affect how your code runs - they're just a tool to help with development. Your code should still run correctly even without type hints.

@candidosales
Copy link
Contributor

@cachho I'm using Python 3.10.11. In my opinion, it would be beneficial to incorporate infer types in all methods. This will enhance the overall developer experience and facilitate maintenance in the long run. Perhaps we could propose implementing these changes in a new PR.

What do you think?

@cachho
Copy link
Contributor Author

cachho commented Jul 3, 2023

In my opinion, it would be beneficial to incorporate infer types in all methods

I agree, once again the only problem with it in this PR is that we lose compatibility with 3.8. We need to decide if that's worth it, and that's out of scope for this PR.

Perhaps we could propose implementing these changes in a new PR.

I agree, but I would talk to @taranjeet on discord first, before you go all in and then we decide to keep 3.8 compatibility. That's the version I use btw, and I don't think I'm the only one.

Copy link
Contributor

@candidosales candidosales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment

embedchain/embedchain.py Outdated Show resolved Hide resolved
@cachho
Copy link
Contributor Author

cachho commented Jul 5, 2023

resolved the merge conflicts.

@cachho
Copy link
Contributor Author

cachho commented Jul 6, 2023

adjusted to new config

@cachho
Copy link
Contributor Author

cachho commented Jul 6, 2023

example:

import os

from embedchain import App
from embedchain.config import AddConfig, QueryConfig

naval_chat_bot = App()

add_config = AddConfig() # Currently no options
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44", add_config)
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf", add_config)
naval_chat_bot.add("web_page", "https://nav.al/feedback", add_config)
naval_chat_bot.add("web_page", "https://nav.al/agi", add_config)

naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."), add_config)

query_config = QueryConfig(number_documents=1)
print(naval_chat_bot.dry_run("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config))

query_config = QueryConfig(number_documents=5)
print(naval_chat_bot.dry_run("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config))

returns

Use the following context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: alien species that also had the power to generate these good explanations, there is no explanation that they could generate that we could not understand. We are maximally capable of understanding. There is no concept out there that is possible in this physical reality that a human being, given sufficient time and resources and education, could not understand. Subscribe to Naval Related Modal body text goes here. Close
Query: What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?
Helpful Answer:

versus

Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: alien species that also had the power to generate these good explanations, there is no explanation that they could generate that we could not understand. We are maximally capable of understanding. There is no concept out there that is possible in this physical reality that a human being, given sufficient time and resources and education, could not understand. Subscribe to Naval Related Modal body text goes here. Close | explanation. It’s parroting. It’s brilliant Bayesian reasoning. It’s extrapolating from what it already sees out there generated by humans on the web, but it doesn’t have an underlying model of reality that can explain the seen in terms of the unseen. And I think that’s critical. That is what humans do uniquely that no other creature, no other computer, no other intelligence—biological or artificial—that we have ever encountered does. And not only do we do it uniquely, but if we were to meet an | people find their way to Naval’ s wisdom. | 96 · THE ALMANACK OF NAVAL RAVIKANTThe really smart thinkers are clear thinkers. They understand the basics at a very, very fundamental level. I would rather understand the basics really well than memorize all kinds of complicated concepts I can’t stitch together and can’t rederive from the basics. If you can’t rederive concepts from the basics as you need them, you’re lost. You’re just memorizing. [4] The advanced concepts in a field are less proven. We use them to signal insider knowledge, but we’d be better off nailing the basics. [11] Clear thinkers appeal to their own authority. Part of making effective decisions boils down to dealing with reality. How do you make sure you’re dealing with reality when you’re making decisions? By not having a strong sense of self or judgments or mind presence. The “monkey mind” will always respond with this regurgitated emotional response to what it thinks the world should be. Those desires will cloud your reality. This happens a lot of times when | knowledge, capability, and desire nobody else in the world does, purely from the combinatorics of human DNA and development. The combinatorics of human DNA and experience are staggering. You will never meet any two humans who are substitutable for each other.
Query: What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?
Helpful Answer:

@cachho
Copy link
Contributor Author

cachho commented Jul 7, 2023

@taranjeet changed readme text to number of documents to be retrieved as context as asked for in #163

@cachho
Copy link
Contributor Author

cachho commented Jul 7, 2023

du to the custom prompt, the idea of changing the prompt based on plural and singular numbers was ditched. I think that's okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants