Add parallel file conversion support with ThreadPoolExecutor by samuelduchesne · Pull Request #9 · idfkit/idfkit-docs

samuelduchesne · 2026-02-25T16:39:32Z

Summary

This PR adds support for parallel file conversion in the EnergyPlus documentation build process. File conversions are now executed concurrently using a ThreadPoolExecutor when multiple workers are specified, improving build performance for large documentation sets.

Key Changes

Parallel conversion infrastructure: Introduced _convert_files() function that uses ThreadPoolExecutor to run file conversions in parallel when max_workers > 1. The Pandoc subprocess calls are I/O-bound, making threads well-suited for this workload.
Task collection refactoring: Extracted _collect_tasks() function to separate task building from execution, enabling better separation of concerns and cleaner parallel execution logic.
Refactored convert_doc_set(): Split into two phases:
- Phase 1: Parallel file conversions (when applicable)
- Phase 2: Sequential result logging and TOC generation (must happen after files are written)
CLI enhancements:
- Added --max-workers argument to scripts/convert.py with default value of CPU count
- Added --file-workers argument to scripts/convert_all.py to control per-version parallelism
- Updated Makefile to support MAX_WORKERS environment variable
API updates: Added max_workers keyword-only parameter to convert_doc_set() and convert_version() functions with sensible defaults.

Implementation Details

The parallel execution only activates when max_workers > 1 AND there are multiple tasks, otherwise falls back to sequential processing
TOC generation and result logging remain sequential to ensure proper file ordering and consistency
Error handling and logging use file_result.source instead of tex_path for consistency across parallel and sequential paths
The convert_all.py script can now control both version-level parallelism (multiple versions) and file-level parallelism (files within a version)

https://claude.ai/code/session_01XAaaYCg6rvfpWNsmWRxZen

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67950de69d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-25T16:43:21Z

+        "--file-workers",
+        type=int,
+        default=os.cpu_count() or 4,
+        help="Parallel file conversions per version (default: CPU count)",


Reduce default file-worker fanout in convert_all

--file-workers now defaults to os.cpu_count(), but convert_all already runs up to --max-workers processes (also defaulting to CPU count), so the default invocation can fan out to roughly cpu_count^2 concurrent file conversions and Pandoc subprocesses; on typical CI/dev machines this can cause heavy oversubscription, memory pressure, and build failures/timeouts instead of speeding up conversion. A safer default for per-version file workers (for example 1, or a value derived from max_workers) avoids breaking the default workflow.

Useful? React with 👍 / 👎.

Parallelize Pandoc file conversions within a single version using ThreadPoolExecutor, significantly speeding up the build. Each convert_tex_file() call is independent (label_index is read-only, output paths are unique), so files can safely be converted in parallel. - scripts/convert.py: Add --max-workers flag (default: CPU count) that controls thread pool size for file conversions within convert_doc_set() - scripts/convert_all.py: Add --file-workers flag (separate from the existing --max-workers for version-level process parallelism) - Makefile: Add MAX_WORKERS variable to the convert target https://claude.ai/code/session_01XAaaYCg6rvfpWNsmWRxZen

When convert_all runs with default settings, --max-workers (CPU count) processes each spawned --file-workers (CPU count) threads, resulting in cpu_count^2 concurrent Pandoc subprocesses. This causes memory pressure and build failures on typical CI/dev machines. Default --file-workers to 1 since the outer process pool already saturates the machine. Users can still opt into inner parallelism explicitly via --file-workers. https://claude.ai/code/session_01XAaaYCg6rvfpWNsmWRxZen

chatgpt-codex-connector Bot reviewed Feb 25, 2026

View reviewed changes

claude added 2 commits February 25, 2026 18:23

samuelduchesne force-pushed the claude/improve-convert-speed-H0A1O branch from bb6e1d2 to 81c9637 Compare February 25, 2026 18:24

samuelduchesne merged commit b70244f into main Feb 25, 2026
3 of 4 checks passed

samuelduchesne deleted the claude/improve-convert-speed-H0A1O branch February 25, 2026 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel file conversion support with ThreadPoolExecutor#9

Add parallel file conversion support with ThreadPoolExecutor#9
samuelduchesne merged 2 commits intomainfrom
claude/improve-convert-speed-H0A1O

samuelduchesne commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

samuelduchesne commented Feb 25, 2026

Summary

Key Changes

Implementation Details

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants