Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve effeciency of TextSplitter.split_documents, iterate once #5111

Merged
merged 5 commits into from
May 23, 2023

Conversation

eyurtsev
Copy link
Collaborator

Improve TextSplitter.split_documents, collect page_content and metadata in one iteration

Who can review?

Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:

@eyurtsev In the case where documents is a generator that can only be iterated once making this change is a huge help. Otherwise a silent issue happens where metadata is empty for all documents when documents is a generator. So we expand the argument from List[Document] to Union[Iterable[Document], Sequence[Document]]

@eyurtsev
Copy link
Collaborator Author

linted version of: #5083

@eyurtsev eyurtsev added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label May 23, 2023
@eyurtsev eyurtsev merged commit d56313a into master May 23, 2023
12 checks passed
@eyurtsev eyurtsev deleted the eugene/split_documents branch May 23, 2023 03:00
vowelparrot pushed a commit that referenced this pull request May 24, 2023
# Improve TextSplitter.split_documents, collect page_content and
metadata in one iteration

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@eyurtsev In the case where documents is a generator that can only be
iterated once making this change is a huge help. Otherwise a silent
issue happens where metadata is empty for all documents when documents
is a generator. So we expand the argument from `List[Document]` to
`Union[Iterable[Document], Sequence[Document]]`

---------

Co-authored-by: Steven Tartakovsky <tartakovsky.developer@gmail.com>
@danielchalef danielchalef mentioned this pull request Jun 5, 2023
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PR looks good. Use to confirm that a PR is ready for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants