Improve the compilation speed when compiling for multiple architectures. by pengwa · Pull Request #12490 · microsoft/onnxruntime

pengwa · 2022-08-05T12:05:04Z

Description: Improve compiling speed for multiple architectures.

Since cuda 11.2, NVCC introduce a build option '--threads':

4.2.5.4. --threads number (-t)
Specify the maximum number of threads to be used to execute the compilation steps in parallel.

This option can be used to improve the compilation speed when compiling for multiple architectures. The compiler creates number threads to execute the compilation steps in parallel. If number is 1, this option is ignored. If number is 0, the number of threads used is the number of CPUs on the machine.

Previously, CUDA extensions built in 65.59178829193115 seconds, after enabling it, CUDA extensions built in 48.63271450996399 seconds.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.

snnn · 2022-08-05T17:15:53Z

So this change is only for training?

pengwa · 2022-08-06T08:38:38Z

So this change is only for training?

yes, and it only affect the torch extension build which is part of ORTModule training.

The flag for our CUDA EP code has already added in this PR #8974.

orttraining/orttraining/python/training/ortmodule/torch_cpp_extensions/install.py

improve the compilation speed when compiling for multiple architectures.

f9e9333

pengwa added the component:ortmodule label Aug 5, 2022

pengwa requested review from askhade and baijumeswani August 5, 2022 12:05

formatting

2df7518

pengwa added 2 commits August 8, 2022 09:17

fix

d92657c

use 0 by default

7e10381

baijumeswani previously approved these changes Aug 9, 2022

View reviewed changes

orttraining/orttraining/python/training/ortmodule/torch_cpp_extensions/install.py Outdated Show resolved Hide resolved

fix comments

9353605

pengwa dismissed baijumeswani’s stale review via 9353605 August 9, 2022 01:24

baijumeswani approved these changes Aug 9, 2022

View reviewed changes

pengwa merged commit a2dc3e9 into master Aug 9, 2022

pengwa deleted the pengwa/cpp_extension branch August 9, 2022 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the compilation speed when compiling for multiple architectures.#12490

Improve the compilation speed when compiling for multiple architectures.#12490
pengwa merged 5 commits intomasterfrom
pengwa/cpp_extension

pengwa commented Aug 5, 2022 •

edited

Loading

Uh oh!

snnn commented Aug 5, 2022

Uh oh!

pengwa commented Aug 6, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pengwa commented Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snnn commented Aug 5, 2022

Uh oh!

pengwa commented Aug 6, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pengwa commented Aug 5, 2022 •

edited

Loading