accuracy regression for resnet50 in test_train_imagenet.py

I've tried 2 ways, both using the real imagenet dataset. I have not submitted https://github.com/pytorch/xla/pull/1012 with the resnet50 learning rate scheduler:
1. Brand new compute VM, using pytorch-nightly conda env
2. Brand new compute VM, pull from master on Github and run build_torch_wheels.sh

Both versions reach ~12% accuracy around epoch 2 or 3 and can't make it any higher no matter how many more epochs they train.

I have a different compute VM made on August 28 that uses pytorch-nightly conda env and on that compute VM I am able to reach 60% accuracy with no changes and 76% accuracy when I implement a learning rate schedule.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

accuracy regression for resnet50 in test_train_imagenet.py #1025

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

accuracy regression for resnet50 in test_train_imagenet.py #1025

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions