Skip to content

Improve the compilation speed when compiling for multiple architectures.#12490

Merged
pengwa merged 5 commits intomasterfrom
pengwa/cpp_extension
Aug 9, 2022
Merged

Improve the compilation speed when compiling for multiple architectures.#12490
pengwa merged 5 commits intomasterfrom
pengwa/cpp_extension

Conversation

@pengwa
Copy link
Contributor

@pengwa pengwa commented Aug 5, 2022

Description: Improve compiling speed for multiple architectures.

Since cuda 11.2, NVCC introduce a build option '--threads':

4.2.5.4. --threads number (-t)
Specify the maximum number of threads to be used to execute the compilation steps in parallel.

This option can be used to improve the compilation speed when compiling for multiple architectures. The compiler creates number threads to execute the compilation steps in parallel. If number is 1, this option is ignored. If number is 0, the number of threads used is the number of CPUs on the machine.

Previously, CUDA extensions built in 65.59178829193115 seconds, after enabling it, CUDA extensions built in 48.63271450996399 seconds.

Motivation and Context

  • Why is this change required? What problem does it solve?
  • If it fixes an open issue, please link to the issue here.

@snnn
Copy link
Contributor

snnn commented Aug 5, 2022

So this change is only for training?

@pengwa
Copy link
Contributor Author

pengwa commented Aug 6, 2022

So this change is only for training?

yes, and it only affect the torch extension build which is part of ORTModule training.

The flag for our CUDA EP code has already added in this PR #8974.

baijumeswani
baijumeswani previously approved these changes Aug 9, 2022
@pengwa pengwa merged commit a2dc3e9 into master Aug 9, 2022
@pengwa pengwa deleted the pengwa/cpp_extension branch August 9, 2022 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants