[create_supervised_trainer] add automatic mixed precision #1589

ydcjeff · 2021-01-29T14:31:22Z

Description: Add automatic mixed precision using torch.cuda.amp and apex.

Usage:

# using autocast only
trainer = create_supervised_trainer(amp_mode='amp')

# using autocast and default scaler
trainer = create_supervised_trainer(amp_mode='amp', scaler=True)
# trainer state will have attribute scaler
print(trainer.state.scaler)
<torch.cuda.amp.grad_scaler.GradScaler object at 0x7f8e0dac7b80>

# using autocast and custom scaler
# trainer state will not have attribute scaler if scaler instance is passed
scaler = GradScaler(2**10)
trainer = create_supervised_trainer(amp_mode='amp', scaler=scaler)

# using apex
trainer = create_supervised_trainer(amp_mode='apex')

# scaler will be ignored, warning will show up
trainer = create_supervised_trainer(amp_mode='apex', scaler=True)

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

https://deploy-preview-1589--pytorch-ignite-preview.netlify.app/engine.html#

vfdev-5

Thanks a lot for the PR @ydcjeff !

docs/source/conf.py

ignite/engine/__init__.py

…ervised_trainer

…se checks

…ervised_trainer

vfdev-5 · 2021-01-29T18:12:18Z

Currently, launched it manually : https://app.circleci.com/pipelines/github/pytorch/ignite/1195/workflows/27d5b840-72bb-41e1-8d1c-84640f1f623c but I think either next commit or new PR will run auto on GPUs

ydcjeff · 2021-01-29T18:20:51Z

Currently, launched it manually : https://app.circleci.com/pipelines/github/pytorch/ignite/1195/workflows/27d5b840-72bb-41e1-8d1c-84640f1f623c but I think either next commit or new PR will run auto on GPUs

Thank you!

ydcjeff · 2021-01-30T09:45:00Z

I need help with the tests specifically with apex and GradScaler.

vfdev-5 · 2021-01-30T14:50:57Z

I need help with the tests specifically with apex and GradScaler.

I'll try to implement something from my side and we'll see.

…ervised_trainer

…/ydcjeff/ignite into engine/create_supervised_trainer

vfdev-5 · 2021-02-10T21:47:06Z

We discussed this PR and related issue with the team and we think that we should explore a bit different approach. Helper method create_supervised_trainer is roughly made of 2 things : update function definition and Engine setup.

Probably, it would more helpful to provide public methods like:

supervised_training_step
supervised_training_step_tpu
supervised_training_step_apex
supervised_training_step_amp

and inside create_supervised_trainer we could setup the trainer according to provided options without lots of if/else. Maybe, we can skip for instance grad norm.

Basically, the idea is something like that :

def get_training_step_1(a):
    def training_step(e, b):
        print(a, e, b)
    return training_step
    
def get_training_step_2(a):
    def training_step(e, b):
        print(a, e, b, "with amp")
    return training_step

def create_supervised_trainer(a, opt):
    training_step = None
    if opt == 1:
        training_step = get_training_step_1(a)
    elif opt == 2:
        training_step = get_training_step_2(a)
        
    e = Engine(training_step)
    return e

cc @sdesrozis any other ideas or thoughts ?

sdesrozis · 2021-02-11T20:27:36Z

That would be great for users to have these functions, helpful to check under the hood.

My thoughts on this topic is about the update function. The dream would be to pass a generic function and have an automatic (or close) tools to adapt to features like amp, tpu, etc.

…ervised_trainer

ydcjeff · 2021-02-13T13:26:04Z

Shall we also accept scaler argument or create only internally?

…ervised_trainer

vfdev-5

Thanks for the update!

ignite/engine/__init__.py

vfdev-5 · 2021-02-15T11:01:02Z

@sdesrozis can you review the PR please

…ervised_trainer

ignite/engine/__init__.py

ydcjeff · 2021-02-20T18:24:05Z

Thank you @sdesrozis for your review.
if gpu tests pass, we are ready to merge.

vfdev-5

Looks good to me as well ! Thanks a lot @ydcjeff !
I left few nit comments about removing TPU mentions where it is inappropriate.
The comment about the warning and adding usage examples of these features could be done in a follow-up PR...

ignite/engine/__init__.py

sdesrozis · 2021-02-20T21:44:57Z

Could we add tests to decrease codecov warnings ?

vfdev-5 · 2021-02-20T21:47:32Z

Could we add tests to decrease codecov warnings ?

Let's do that all in a follow-up PR :)

…ervised_trainer

Co-authored-by: vfdev <vfdev.5@gmail.com>

…/ydcjeff/ignite into engine/create_supervised_trainer

ydcjeff · 2021-02-21T05:59:35Z

Could we add tests to decrease codecov warnings ?

I think those warnings are from one_gpu_tests failing to upload coverage.

The comment about the warning and adding usage examples of these features could be done in a follow-up PR...

will do that.

ydcjeff · 2021-02-21T07:11:05Z

Found out that amp module is available in torch 1.5 which doesn't have autocast yet.
Changed to ImportError to handle all torch version.

sdesrozis · 2021-02-21T07:27:54Z

I think those warnings are from one_gpu_tests failing to upload coverage.

Do we know why it does not work ?

ydcjeff · 2021-02-21T07:52:20Z

I think those warnings are from one_gpu_tests failing to upload coverage.

Do we know why it does not work ?

I don't know very well, but it can be codecov failed to upload to its server.

vfdev-5 · 2021-02-21T09:53:47Z

I think those warnings are from one_gpu_tests failing to upload coverage.

Do we know why it does not work ?

Asked here: codecov/codecov-bash#411

Anyway, if there is no way to fix the uploading we can remove - Z option and silently ignore uploading issue.

vfdev-5 · 2021-02-21T09:54:33Z

@sdesrozis can you please merge this PR once the ci is OK.

ydcjeff · 2021-02-21T10:32:59Z

Thank you for your help and reviews.

ydcjeff added 3 commits January 29, 2021 18:14

amp init

5ad0e48

docs complete

621850f

add tests

8ec807b

vfdev-5 reviewed Jan 29, 2021

View reviewed changes

docs/source/conf.py Outdated Show resolved Hide resolved

ignite/engine/__init__.py Outdated Show resolved Hide resolved

ignite/engine/__init__.py Outdated Show resolved Hide resolved

ignite/engine/__init__.py Outdated Show resolved Hide resolved

vfdev-5 reviewed Jan 29, 2021

View reviewed changes

ignite/engine/__init__.py Outdated Show resolved Hide resolved

ydcjeff added 3 commits January 29, 2021 23:21

Merge remote-tracking branch 'upstream/master' into engine/create_sup…

05864d6

…ervised_trainer

unscale_ + clip_grad_norm_, move checks to private func, more edge ca…

8aa972a

…se checks

Merge remote-tracking branch 'upstream/master' into engine/create_sup…

0e1c387

…ervised_trainer

ydcjeff added 2 commits January 30, 2021 14:22

scaler must be provided by user and its optional

8f1280a

full docstring, on_cuda_amp test

7cba2a3

ydcjeff and others added 3 commits January 31, 2021 23:22

Merge remote-tracking branch 'upstream/master' into engine/create_sup…

599b20d

…ervised_trainer

Merge branch 'master' into engine/create_supervised_trainer

afce317

Merge branch 'engine/create_supervised_trainer' of https://github.com…

e113e46

…/ydcjeff/ignite into engine/create_supervised_trainer

vfdev-5 mentioned this pull request Feb 12, 2021

Add more options to create_supervised_trainer #1235

Closed

Merge remote-tracking branch 'upstream/master' into engine/create_sup…

c475a56

…ervised_trainer

ydcjeff added 2 commits February 15, 2021 13:53

Merge remote-tracking branch 'upstream/master' into engine/create_sup…

eeab791

…ervised_trainer

extract into 4 functions for normal, amp, apex and tpu training

2a125f2

vfdev-5 reviewed Feb 15, 2021

View reviewed changes

ignite/engine/__init__.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into engine/create_sup…

a79aae9

…ervised_trainer

sdesrozis reviewed Feb 15, 2021

View reviewed changes

ignite/engine/__init__.py Outdated Show resolved Hide resolved

ydcjeff added 2 commits February 15, 2021 19:51

explicit training step, independent mode

f7d7ca7

mypy fix

1b5f11c

vfdev-5 approved these changes Feb 20, 2021

View reviewed changes

ignite/engine/__init__.py Outdated Show resolved Hide resolved

ignite/engine/__init__.py Outdated Show resolved Hide resolved

ignite/engine/__init__.py Outdated Show resolved Hide resolved

ydcjeff and others added 5 commits February 21, 2021 12:20

Merge remote-tracking branch 'upstream/master' into engine/create_sup…

c8d788a

…ervised_trainer

Apply suggestions from code review

4b05cc1

Co-authored-by: vfdev <vfdev.5@gmail.com>

Merge branch 'engine/create_supervised_trainer' of https://github.com…

5d3175d

…/ydcjeff/ignite into engine/create_supervised_trainer

Merge branch 'master' into engine/create_supervised_trainer

f5d0d31

Merge branch 'engine/create_supervised_trainer' of https://github.com…

bbc450b

…/ydcjeff/ignite into engine/create_supervised_trainer

ydcjeff changed the title ~~[create_supervised_trainer] add amp and grad_norm~~ [create_supervised_trainer] add automatic mixed precision Feb 21, 2021

ydcjeff added 3 commits February 21, 2021 12:39

fix: skip apex tests if apex is not installed

4387e32

fix: skip apex error test if apex

935c852

fix: ImportError instead of ModuleNotFoundError

832564f

fix(docs): no device tpu in gpu functions and vice versa

dbe7a3d

fix: raise an error instead of warn for invalid scaler and amp_mode

e4681d8

ydcjeff mentioned this pull request Feb 21, 2021

Automatically create an issue if cron jobs fail #1659

Closed

Merge branch 'master' into engine/create_supervised_trainer

c8fcae4

sdesrozis merged commit f379b18 into pytorch:master Feb 21, 2021

ydcjeff deleted the engine/create_supervised_trainer branch February 21, 2021 10:31

This was referenced Feb 21, 2021

chore(docs): add usage in supervised_training_steps #1661

Merged

add gradient normalization and accumulation in supervised_training_step_* functions #1662

Closed

vfdev-5 mentioned this pull request Feb 21, 2021

Update cifar10 example, validation with amp #1666

Closed

ydcjeff mentioned this pull request Feb 22, 2021

add automatic mixed precision in create_supervised_evaluator #1670

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[create_supervised_trainer] add automatic mixed precision #1589

[create_supervised_trainer] add automatic mixed precision #1589

ydcjeff commented Jan 29, 2021 •

edited

vfdev-5 left a comment

vfdev-5 commented Jan 29, 2021

ydcjeff commented Jan 29, 2021

ydcjeff commented Jan 30, 2021

vfdev-5 commented Jan 30, 2021

vfdev-5 commented Feb 10, 2021 •

edited

sdesrozis commented Feb 11, 2021

ydcjeff commented Feb 13, 2021

vfdev-5 left a comment

vfdev-5 commented Feb 15, 2021

ydcjeff commented Feb 20, 2021

vfdev-5 left a comment

sdesrozis commented Feb 20, 2021

vfdev-5 commented Feb 20, 2021

ydcjeff commented Feb 21, 2021

ydcjeff commented Feb 21, 2021

sdesrozis commented Feb 21, 2021

ydcjeff commented Feb 21, 2021

vfdev-5 commented Feb 21, 2021

vfdev-5 commented Feb 21, 2021

ydcjeff commented Feb 21, 2021

[create_supervised_trainer] add automatic mixed precision #1589

[create_supervised_trainer] add automatic mixed precision #1589

Conversation

ydcjeff commented Jan 29, 2021 • edited

vfdev-5 left a comment

Choose a reason for hiding this comment

vfdev-5 commented Jan 29, 2021

ydcjeff commented Jan 29, 2021

ydcjeff commented Jan 30, 2021

vfdev-5 commented Jan 30, 2021

vfdev-5 commented Feb 10, 2021 • edited

sdesrozis commented Feb 11, 2021

ydcjeff commented Feb 13, 2021

vfdev-5 left a comment

Choose a reason for hiding this comment

vfdev-5 commented Feb 15, 2021

ydcjeff commented Feb 20, 2021

vfdev-5 left a comment

Choose a reason for hiding this comment

sdesrozis commented Feb 20, 2021

vfdev-5 commented Feb 20, 2021

ydcjeff commented Feb 21, 2021

ydcjeff commented Feb 21, 2021

sdesrozis commented Feb 21, 2021

ydcjeff commented Feb 21, 2021

vfdev-5 commented Feb 21, 2021

vfdev-5 commented Feb 21, 2021

ydcjeff commented Feb 21, 2021

ydcjeff commented Jan 29, 2021 •

edited

vfdev-5 commented Feb 10, 2021 •

edited