Skip to content

throughput drop when train resnet50 #1491

@wchen61

Description

@wchen61

Hi, @rwightman.
When I run resnet50 on my A100, I notice throughput will drop once in a while.
image

I use the following command to run resnet50 on A100 with NGC22.04:

python train.py ./imagenet -b 512 --sched cosine --epochs 1 --lr 0.05 --amp --channels-last --model resnet50 -j10 --pin-mem --log-interval 1

And if I run with benchmark.py, I can get good result, throughput always 2600 img/s.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions