Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipelines -- Batching sentences in document parser [ARElight backlog] #535

Closed
1 task done
nicolay-r opened this issue Nov 12, 2023 · 1 comment
Closed
1 task done
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers limitation

Comments

@nicolay-r
Copy link
Owner

nicolay-r commented Nov 12, 2023

This is originates from NER application. (nicolay-r/ARElight#118)
The snippet below illustrates that we apply text processing pipeline separately for each sentence (text_parser.run).
If we want to enhance the document processing performance, there is a need to switch from a single sentence to list of sentences. The latter denotes to support batching.

parsed_sentences = [text_parser.run(input_data=DocumentParser.__get_sent(doc, sent_ind).Text,
params_dict=DocumentParser.__create_ppl_params(doc=doc, sent_ind=sent_ind),
parent_ctx=parent_ppl_ctx)
for sent_ind in range(doc.SentencesCount)]
return ParsedDocument(doc_id=doc.ID,
parsed_sentences=parsed_sentences)

@nicolay-r nicolay-r added enhancement New feature or request good first issue Good for newcomers limitation labels Nov 12, 2023
@nicolay-r nicolay-r self-assigned this Nov 12, 2023
@nicolay-r
Copy link
Owner Author

nicolay-r commented Dec 27, 2023

Proposal for the pipeline core refactoring:

image

@nicolay-r nicolay-r changed the title Batching sentences in document parser [ARElight backlog] Pipelines -- Batching sentences in document parser [ARElight backlog] Dec 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers limitation
Projects
None yet
Development

No branches or pull requests

1 participant