add multiprocessing #52
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
===autogenerated===
This pull request introduces parallel processing to the CLI workflow in
src/prodigy_prot/cli.py, allowing multiple structure models to be processed concurrently. The main changes include adding a command-line argument to control processor usage, refactoring the execution logic to use a process pool, and encapsulating model processing in a dedicated function. These updates aim to improve performance and scalability when handling multiple input files or models.Parallelization and CLI enhancements:
--number-of-processors(-np) to allow users to specify how many processors to use for parallel execution.ProcessPoolExecutor, dynamically adjusting the number of workers based on available tasks.process_modelfunction to encapsulate the processing of a single model, capturing and returning its output for sequential printing after parallel execution.Imports and setup for parallel execution:
ProcessPoolExecutor,as_completed, andStringIOto support parallel processing and output capture.