Skip to content

Improve compile times by removing unnecessary headers#12

Merged
matthewdcong merged 3 commits intomainfrom
compile_time_optimizations
Sep 16, 2025
Merged

Improve compile times by removing unnecessary headers#12
matthewdcong merged 3 commits intomainfrom
compile_time_optimizations

Conversation

@matthewdcong
Copy link
Contributor

@matthewdcong matthewdcong commented Sep 16, 2025

This saves around 5s per translation unit in my builds, which equates to about 40s of end-to-end parallel build time on my machine. On CI, this brings build times with CUDA 12.8 from 9m 20s to 7m and 50s and on CUDA 12.9 from 14m 45s to 8m 38s.

  • torch/extension.h includes torch/python.h which is unnecessary for the C++ portion of the code
  • In most headers, we just need Tensor and dtypes and including torch/types.h instead of the much larger torch/all.h is sufficient.

Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
@matthewdcong matthewdcong force-pushed the compile_time_optimizations branch from 41d19e1 to 98959f3 Compare September 16, 2025 19:19
Signed-off-by: Matthew Cong <mcong@nvidia.com>
@matthewdcong matthewdcong merged commit 1cba22b into main Sep 16, 2025
32 checks passed
@matthewdcong matthewdcong deleted the compile_time_optimizations branch September 16, 2025 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants