Prompt layer-wise recompute when applicable by pengwa · Pull Request #20126 · microsoft/onnxruntime

pengwa · 2024-03-28T16:01:31Z

Prompt layer-wise when applicable

Give explicit prompts in export failures to users to enable layer-wise memory optimization if we found the checkpoint function is used.

Using checkpoint function is a strong indicator that the model is too large to fit in GPU memory.
If we don't override the checkpoint function here, mostly ONNX export will be failed. 1. For old version PyTorch, when handling gradient checkpoint feature, we just throw an exception. 2. For new version PyTorch, an export failure happens.
But both failures did not give users explicitly "HOW" to mitigate. This PR did that.

``

Motivation and Context

…pengwa/enable_layerwise_automatically

wschin

Good efforts to consolidates the flags we have.

wschin

Thanks.

…pengwa/enable_layerwise_automatically

pengwa · 2024-04-10T03:50:48Z

Thanks @wschin @mindest !

### Prompt layer-wise when applicable Give explicit prompts in export failures to users to enable layer-wise memory optimization if we found the checkpoint function is used. - Using checkpoint function is a strong indicator that the model is too large to fit in GPU memory. - If we don't override the checkpoint function here, mostly ONNX export will be failed. 1. For old version PyTorch, when handling gradient checkpoint feature, we just throw an exception. 2. For new version PyTorch, an export failure happens. - But both failures did not give users explicitly "HOW" to mitigate. This PR did that. `` ![image](https://github.com/microsoft/onnxruntime/assets/10530022/c0476748-5818-4cc8-b2d6-88c7580fe4da) ### Motivation and Context

enable layerwise automatically

c1eb7d9

pengwa requested review from baijumeswani and thiagocrepaldi March 28, 2024 16:01

pengwa added the training issues related to ONNX Runtime training; typically submitted using template label Mar 28, 2024

pengwa marked this pull request as ready for review March 28, 2024 16:01

pengwa requested review from frank-dong-ms and guyang3532 March 28, 2024 16:02

pengwa changed the title ~~Enable layer-wise automatically when applicable~~ Enable layer-wise-recompute automatically when applicable Mar 28, 2024

pengwa added 5 commits March 28, 2024 16:11

minor

8b6e8de

lint

63e7eeb

fix

2fde670

fix

eba6e81

minors

3d3dc06

frank-dong-ms reviewed Mar 29, 2024

View reviewed changes

Comment thread orttraining/orttraining/python/training/ortmodule/__init__.py Outdated

pengwa added 3 commits March 29, 2024 06:25

allow auto enable only when user did not set ORTMODULE_MEMORY_OPT_LEVEL

6bacc56

fix

3cfd0c7

fix

38c97d1

pengwa changed the title ~~Enable layer-wise-recompute automatically when applicable~~ Enable layer-wise-recompute automatically Mar 29, 2024

pengwa added 3 commits March 29, 2024 10:22

minor

63e2794

fix ci for torch 2.0.0 cuda118

b4dc4d9

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

66750a9

…pengwa/enable_layerwise_automatically

wschin reviewed Apr 9, 2024

View reviewed changes

Comment thread docs/Memory_Optimizer.md Outdated

wschin reviewed Apr 9, 2024

View reviewed changes

Comment thread orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py Outdated

refine according to comment

439281f

pengwa changed the title ~~Enable layer-wise-recompute automatically~~ Prompt layer-wise when applicable Apr 9, 2024

pengwa changed the title ~~Prompt layer-wise when applicable~~ Prompt layer-wise recompute when applicable Apr 9, 2024

pengwa added 4 commits April 9, 2024 12:57

refinement

25346d6

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

d326c13

…pengwa/enable_layerwise_automatically

minor

d865264

fix

d8ae956

pengwa added 2 commits April 9, 2024 14:58

minor

6cfd124

minor

4f9e9ec

wschin reviewed Apr 9, 2024

View reviewed changes

Comment thread docs/Memory_Optimizer.md Outdated

wschin reviewed Apr 9, 2024

View reviewed changes

Comment thread orttraining/orttraining/python/training/ortmodule/__init__.py

fix amd ci

c129c4d

wschin reviewed Apr 9, 2024

View reviewed changes

Comment thread orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py

wschin reviewed Apr 9, 2024

View reviewed changes

Comment thread orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py

wschin previously approved these changes Apr 9, 2024

View reviewed changes

refinement

25f0427

pengwa dismissed wschin’s stale review via 25f0427 April 9, 2024 18:02

wschin previously approved these changes Apr 9, 2024

View reviewed changes

pengwa added 2 commits April 10, 2024 01:42

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

2b4c67c

…pengwa/enable_layerwise_automatically

fix ci

53d2706

pengwa dismissed wschin’s stale review via 53d2706 April 10, 2024 01:44

mindest approved these changes Apr 10, 2024

View reviewed changes

pengwa merged commit 280b263 into main Apr 10, 2024

pengwa deleted the pengwa/enable_layerwise_automatically branch April 10, 2024 03:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt layer-wise recompute when applicable#20126

Prompt layer-wise recompute when applicable#20126
pengwa merged 23 commits into
mainfrom
pengwa/enable_layerwise_automatically

pengwa commented Mar 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wschin left a comment

Uh oh!

wschin left a comment

Uh oh!

pengwa commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pengwa commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prompt layer-wise when applicable

Motivation and Context

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wschin left a comment

Choose a reason for hiding this comment

Uh oh!

wschin left a comment

Choose a reason for hiding this comment

Uh oh!

pengwa commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pengwa commented Mar 28, 2024 •

edited

Loading