Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add New Retriever Interface with Callbacks #5962

Merged
merged 13 commits into from
Jun 30, 2023
Merged

Conversation

vowelparrot
Copy link
Contributor

@vowelparrot vowelparrot commented Jun 9, 2023

Handle the new retriever events in a way that (I think) is entirely backwards compatible? Needs more testing for some of the chain changes and all.

This creates an entire new run type, however. We could also just treat this as an event within a chain run presumably (same with memory)

Adds a subclass initializer that upgrades old retriever implementations to the new schema, along with tests to ensure they work.

First commit doesn't upgrade any of our retriever implementations (to show that we can pass the tests along with additional ones testing the upgrade logic).

Second commit upgrades the known universe of retrievers in langchain.

  • Add callback handling methods for retriever start/end/error (open to renaming to 'retrieval' if you want that)
  • Update BaseRetriever schema to support callbacks
  • Tests for upgrading old "v1" retrievers for backwards compatibility
  • Update existing retriever implementations to implement the new interface
  • Update calls within chains to .{a]get_relevant_documents to pass the child callback manager
  • Update the notebooks/docs to reflect the new interface
  • Test notebooks thoroughly

Not handled:

  • Memory pass throughs: retrieval memory doesn't have a parent callback manager passed through the method

_new_arg_supported: bool = False
_expects_other_args: bool = False

def __init_subclass__(cls, **kwargs: Any) -> None:
Copy link
Contributor Author

@vowelparrot vowelparrot Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the magic is injected @hwchase17 @agola11 @nfcampos

@vowelparrot vowelparrot changed the title Add New Retriever Interface [RFC2] Add New Retriever Interface Jun 9, 2023
@vowelparrot vowelparrot marked this pull request as draft June 9, 2023 22:01
@vowelparrot vowelparrot requested review from agola11, hwchase17, dev2049 and nfcampos and removed request for hwchase17 June 10, 2023 00:36
@vercel
Copy link

vercel bot commented Jun 30, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Jun 30, 2023 9:39pm

Copy link
Collaborator

@nfcampos nfcampos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -107,7 +114,13 @@ def _call(
)
else:
new_question = question
docs = self._get_docs(new_question, inputs)
accepts_run_manager = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo this may be overkill. this only matters if someone has subclassed this and overridden _get_docs right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, though _get_docs is abstract so it would be anyone who subclasses this outside our repo. I don't know if it is the case

@hinthornw hinthornw marked this pull request as ready for review June 30, 2023 17:14
@langchain-ai langchain-ai deleted a comment from hinthornw Jun 30, 2023
@hinthornw hinthornw changed the title [RFC2] Add New Retriever Interface Add New Retriever Interface with Callbacks Jun 30, 2023
@hinthornw hinthornw merged commit b0859c9 into master Jun 30, 2023
@hinthornw hinthornw deleted the vwp/retrieval_callbacks_v3 branch June 30, 2023 21:44
"""Get documents relevant for a query.

def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh are we sure we wanna introduce arbitrary kwargs here? something we've previously discussed and ik @hwchase17 wanted to keep the interface tight/minimal

@rlancemartin
Copy link
Collaborator

rlancemartin commented Jul 3, 2023

This introduced unexpected behavior for some retrievers.

Running the MultiQueryRetriever notebook.

I no longer see expected logging when running:

unique_docs = retriever_from_llm.get_relevant_documents(query="What does the course say about regression?")
len(unique_docs)

For MultiQueryRetriever, get_relevant_documents does a few things (PR here).

E.g., it will run queries = self.generate_queries(query, run_manager) and log the queries.

It appears the method name has been changed to _get_relevant_documents?

And it now requires some additional args (e.g., run_manager)?

AFAICT, get_relevant_documents currently will just do retrieval w/o any of the MultiQueryRetriever logic.

If this is expected, then documentation will need to be updated for MultiQueryRetriever here to use _get_relevant_documents and supply run_manager and explain what run_manager is.

Also this exposes that fact that we need a test for MultiQueryRetriever. Will add.

rlancemartin added a commit that referenced this pull request Jul 4, 2023
* Add an easier-to-run example.
* Add logging per #6891.
* Updated params per #5962.

---------

Co-authored-by: R. Lance Martin <rlm@Rs-MacBook-Pro.local>
Co-authored-by: Lance Martin <lance@langchain.dev>
vowelparrot added a commit that referenced this pull request Jul 4, 2023
Handle the new retriever events in a way that (I think) is entirely
backwards compatible? Needs more testing for some of the chain changes
and all.

This creates an entire new run type, however. We could also just treat
this as an event within a chain run presumably (same with memory)

Adds a subclass initializer that upgrades old retriever implementations
to the new schema, along with tests to ensure they work.

First commit doesn't upgrade any of our retriever implementations (to
show that we can pass the tests along with additional ones testing the
upgrade logic).

Second commit upgrades the known universe of retrievers in langchain.

- [X] Add callback handling methods for retriever start/end/error (open
to renaming to 'retrieval' if you want that)
- [X] Update BaseRetriever schema to support callbacks
- [X] Tests for upgrading old "v1" retrievers for backwards
compatibility
- [X] Update existing retriever implementations to implement the new
interface
- [X] Update calls within chains to .{a]get_relevant_documents to pass
the child callback manager
- [X] Update the notebooks/docs to reflect the new interface
- [X] Test notebooks thoroughly


Not handled:
- Memory pass throughs: retrieval memory doesn't have a parent callback
manager passed through the method

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants