Add parallel file conversion support with ThreadPoolExecutor#9
Add parallel file conversion support with ThreadPoolExecutor#9samuelduchesne merged 2 commits intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 67950de69d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "--file-workers", | ||
| type=int, | ||
| default=os.cpu_count() or 4, | ||
| help="Parallel file conversions per version (default: CPU count)", |
There was a problem hiding this comment.
Reduce default file-worker fanout in convert_all
--file-workers now defaults to os.cpu_count(), but convert_all already runs up to --max-workers processes (also defaulting to CPU count), so the default invocation can fan out to roughly cpu_count^2 concurrent file conversions and Pandoc subprocesses; on typical CI/dev machines this can cause heavy oversubscription, memory pressure, and build failures/timeouts instead of speeding up conversion. A safer default for per-version file workers (for example 1, or a value derived from max_workers) avoids breaking the default workflow.
Useful? React with 👍 / 👎.
Parallelize Pandoc file conversions within a single version using ThreadPoolExecutor, significantly speeding up the build. Each convert_tex_file() call is independent (label_index is read-only, output paths are unique), so files can safely be converted in parallel. - scripts/convert.py: Add --max-workers flag (default: CPU count) that controls thread pool size for file conversions within convert_doc_set() - scripts/convert_all.py: Add --file-workers flag (separate from the existing --max-workers for version-level process parallelism) - Makefile: Add MAX_WORKERS variable to the convert target https://claude.ai/code/session_01XAaaYCg6rvfpWNsmWRxZen
When convert_all runs with default settings, --max-workers (CPU count) processes each spawned --file-workers (CPU count) threads, resulting in cpu_count^2 concurrent Pandoc subprocesses. This causes memory pressure and build failures on typical CI/dev machines. Default --file-workers to 1 since the outer process pool already saturates the machine. Users can still opt into inner parallelism explicitly via --file-workers. https://claude.ai/code/session_01XAaaYCg6rvfpWNsmWRxZen
bb6e1d2 to
81c9637
Compare
Summary
This PR adds support for parallel file conversion in the EnergyPlus documentation build process. File conversions are now executed concurrently using a
ThreadPoolExecutorwhen multiple workers are specified, improving build performance for large documentation sets.Key Changes
Parallel conversion infrastructure: Introduced
_convert_files()function that usesThreadPoolExecutorto run file conversions in parallel whenmax_workers > 1. The Pandoc subprocess calls are I/O-bound, making threads well-suited for this workload.Task collection refactoring: Extracted
_collect_tasks()function to separate task building from execution, enabling better separation of concerns and cleaner parallel execution logic.Refactored
convert_doc_set(): Split into two phases:CLI enhancements:
--max-workersargument toscripts/convert.pywith default value of CPU count--file-workersargument toscripts/convert_all.pyto control per-version parallelismMAX_WORKERSenvironment variableAPI updates: Added
max_workerskeyword-only parameter toconvert_doc_set()andconvert_version()functions with sensible defaults.Implementation Details
max_workers > 1AND there are multiple tasks, otherwise falls back to sequential processingfile_result.sourceinstead oftex_pathfor consistency across parallel and sequential pathsconvert_all.pyscript can now control both version-level parallelism (multiple versions) and file-level parallelism (files within a version)https://claude.ai/code/session_01XAaaYCg6rvfpWNsmWRxZen