Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add source metadata to bedrock retriever response #21349

Merged
merged 6 commits into from
May 9, 2024

Conversation

rozerarenu
Copy link
Contributor

@rozerarenu rozerarenu commented May 6, 2024

Thank you for contributing to LangChain!

  • PR title: "community: Add source metadata to bedrock retriever response"

  • PR message:

    • Description: Bedrock retrieve API returns extra metadata in the response which is currently not returned in the retriever response
    • Issue: The change adds the metadata from bedrock retrieve API response to the bedrock retriever in a backward compatible way. Renamed metadata to sourceMetadata as metadata term is being used in the Document already. This is in sync with what we are doing in llama-index as well.
    • Dependencies: No
  • Add tests and docs:

    1. Added unit tests
    2. Notebook already exists and does not need any change
    3. Response from end to end testing, just to ensure backward compatibility: [Document(page_content='Exoplanets.', metadata={'location': {'s3Location': {'uri': 's3://bucket/file_name.txt'}, 'type': 'S3'}, 'score': 0.46886647, 'source_metadata': {'x-amz-bedrock-kb-source-uri': 's3://bucket/file_name.txt', 'tag': 'space', 'team': 'Nasa', 'year': 1946.0}})]
  • Lint and test: Run make format, make lint and make test from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:

  • Make sure optional dependencies are imported within a function.
  • Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
  • Most PRs should not touch more than one package.
  • Changes should be backwards compatible.
  • If you are adding something to community, do not re-import it in langchain.

If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label May 6, 2024
Copy link

vercel bot commented May 6, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview May 8, 2024 10:02pm

@dosubot dosubot bot added Ɑ: retriever Related to retriever module 🤖:improvement Medium size change to existing code to handle new use-cases labels May 6, 2024
Copy link
Contributor

@3coins 3coins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rozerarenu
Thanks for making the change. One minor comment, can you also make sure this PR is in alignment with changes here.
langchain-ai/langchain-aws#36

libs/community/langchain_community/retrievers/bedrock.py Outdated Show resolved Hide resolved
Copy link
Contributor

@3coins 3coins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good! Can you please fix the lint errors.

@3coins
Copy link
Contributor

3coins commented May 8, 2024

@baskaryan
Can you help merge this PR. Seems like @rozerarenu has attempted to fix the CI issues, but the CI did not start after her changes, can you help with this. Thanks.

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label May 8, 2024
@hwchase17 hwchase17 self-assigned this May 8, 2024
…ng delay

I am not able to reproduce the latest CI lint failures in my local even with same python versions and the workflow has been failing one after another for tests. Removing these for now as we need this change merged in for unblocking a bedrock customer. Will revisit the tests later at some point.
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels May 8, 2024
@ccurme ccurme merged commit 4035a1d into langchain-ai:master May 9, 2024
60 checks passed
ccurme added a commit that referenced this pull request May 9, 2024
This was implemented in
#21349 but dropped before
merge.
@ccurme
Copy link
Collaborator

ccurme commented May 9, 2024

added back the unit tests here: #21485

Narapady pushed a commit to Narapady/langchain that referenced this pull request May 9, 2024
Thank you for contributing to LangChain!

- [X] **PR title**: "community: Add source metadata to bedrock retriever
response"

- [X] **PR message**: 
- **Description:** Bedrock retrieve API returns extra metadata in the
response which is currently not returned in the retriever response
- **Issue:** The change adds the metadata from bedrock retrieve API
response to the bedrock retriever in a backward compatible way. Renamed
metadata to sourceMetadata as metadata term is being used in the
Document already. This is in sync with what we are doing in llama-index
as well.
    - **Dependencies:** No


- [X] **Add tests and docs**:
  1. Added unit tests
  2. Notebook already exists and does not need any change
3. Response from end to end testing, just to ensure backward
compatibility: `[Document(page_content='Exoplanets.',
metadata={'location': {'s3Location': {'uri':
's3://bucket/file_name.txt'}, 'type': 'S3'}, 'score': 0.46886647,
'source_metadata': {'x-amz-bedrock-kb-source-uri':
's3://bucket/file_name.txt', 'tag': 'space', 'team': 'Nasa', 'year':
1946.0}})]`


- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
Narapady pushed a commit to Narapady/langchain that referenced this pull request May 9, 2024
This was implemented in
langchain-ai#21349 but dropped before
merge.
kyle-cassidy pushed a commit to kyle-cassidy/langchain that referenced this pull request May 10, 2024
Thank you for contributing to LangChain!

- [X] **PR title**: "community: Add source metadata to bedrock retriever
response"

- [X] **PR message**: 
- **Description:** Bedrock retrieve API returns extra metadata in the
response which is currently not returned in the retriever response
- **Issue:** The change adds the metadata from bedrock retrieve API
response to the bedrock retriever in a backward compatible way. Renamed
metadata to sourceMetadata as metadata term is being used in the
Document already. This is in sync with what we are doing in llama-index
as well.
    - **Dependencies:** No


- [X] **Add tests and docs**:
  1. Added unit tests
  2. Notebook already exists and does not need any change
3. Response from end to end testing, just to ensure backward
compatibility: `[Document(page_content='Exoplanets.',
metadata={'location': {'s3Location': {'uri':
's3://bucket/file_name.txt'}, 'type': 'S3'}, 'score': 0.46886647,
'source_metadata': {'x-amz-bedrock-kb-source-uri':
's3://bucket/file_name.txt', 'tag': 'space', 'team': 'Nasa', 'year':
1946.0}})]`


- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
kyle-cassidy pushed a commit to kyle-cassidy/langchain that referenced this pull request May 10, 2024
This was implemented in
langchain-ai#21349 but dropped before
merge.
kyle-cassidy pushed a commit to kyle-cassidy/langchain that referenced this pull request May 16, 2024
Thank you for contributing to LangChain!

- [X] **PR title**: "community: Add source metadata to bedrock retriever
response"

- [X] **PR message**: 
- **Description:** Bedrock retrieve API returns extra metadata in the
response which is currently not returned in the retriever response
- **Issue:** The change adds the metadata from bedrock retrieve API
response to the bedrock retriever in a backward compatible way. Renamed
metadata to sourceMetadata as metadata term is being used in the
Document already. This is in sync with what we are doing in llama-index
as well.
    - **Dependencies:** No


- [X] **Add tests and docs**:
  1. Added unit tests
  2. Notebook already exists and does not need any change
3. Response from end to end testing, just to ensure backward
compatibility: `[Document(page_content='Exoplanets.',
metadata={'location': {'s3Location': {'uri':
's3://bucket/file_name.txt'}, 'type': 'S3'}, 'score': 0.46886647,
'source_metadata': {'x-amz-bedrock-kb-source-uri':
's3://bucket/file_name.txt', 'tag': 'space', 'team': 'Nasa', 'year':
1946.0}})]`


- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
kyle-cassidy pushed a commit to kyle-cassidy/langchain that referenced this pull request May 16, 2024
This was implemented in
langchain-ai#21349 but dropped before
merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging. Ɑ: retriever Related to retriever module size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants