Skip to content

fp32 training failures in timm_models after enabling optimizer #93490

@desertfire

Description

@desertfire

The problem appeared after #90956.

Repro:

python benchmarks/dynamo/timm_models.py --accuracy --device cuda --backend aot_eager --float32  --training --only   volo_d1_224

The failure is consistent for volo_d1_224.

for i in {1..10}; do python benchmarks/dynamo/timm_models.py --accuracy --device cuda --backend aot_eager --float32  --training --only   fbnetv3_b; done

The failure is random for fbnetv3_b.

cc @ezyang @soumith @msaroufim @wconstab @ngimel @bdhirsh

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions