Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feat: Self Query Retriever #1266

Merged
merged 8 commits into from
May 18, 2023
Merged

Conversation

ppramesi
Copy link
Contributor

@ppramesi ppramesi commented May 15, 2023

This commit introduces the SelfQueryRetriever, a new class of object translated directly from the python version (https://github.com/hwchase17/langchain/tree/master/langchain/retrievers/self_query) that retrieves documents from vector stores by using a query-constructing LLM chain to write a structured query. This is achieved through the development of a query translator that translates LLM-generated output into a query that the vector stores can understand. I included basic query translators that works with Chroma and Pinecone, though it shouldn't be hard to create new query translators, as long as the retriever/vector store can also accept filter.

To support this functionality, three new parsers have been added:

  • ExpressionParser: Parses javascript expressions outputted by the LLM. This is achieved by using meriyah package to parse the output. This parser is general enough that it can parse any javascript call expressions.

  • AsymmetricOutputParser: This is like StructuredOutputParser, but can handle different input and output shape.

  • StructuredQueryOutputParser: Parses output from query-constructing LLM into structured query that can be used by retrievers as filter. This is a child of the new AsymmetricOutputParser class. The output from this class still needs to be translated by a Translator which then can be used as filter.

Additionally, this commit includes new unit and integration tests, examples, and docs for Pinecone and Chroma self query retrievers.

…anslator, but a basic translator that can be used with pinecone and chroma is included and can be imported
@vercel
Copy link

vercel bot commented May 15, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview May 16, 2023 7:57am

Copy link
Collaborator

@dqbd dqbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work 🔥, just small nitpicks and notes

@dqbd dqbd added the question Further information is requested label May 16, 2023
Copy link
Collaborator

@dqbd dqbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Collaborator

@dqbd dqbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@dqbd dqbd added lgtm PRs that are ready to be merged as-is and removed question Further information is requested labels May 18, 2023
@nfcampos nfcampos merged commit 74e4988 into langchain-ai:main May 18, 2023
@ppramesi ppramesi deleted the self-query-meriyah branch May 18, 2023 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PRs that are ready to be merged as-is
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants