Skip to content

Track the accuracy regress for HF with max-autotune enabled #109736

@shunting314

Description

@shunting314

🐛 Describe the bug

HF accuracy check starts to regress link

Repro command:

TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=3 python benchmarks/dynamo/huggingface.py --backend inductor --amp --accuracy --only PLBartForCausalLM --training --cold-start-latency

Here are the things I have tried:

  • try to build pytorch and run the repro on both the old commit (1b3dc05) and new commit (d8da2a7). Both fail the accuracy check
  • try to rollback triton to older pin and run the repro on pytorch commit 1b3dc05. Accuracy check fail
  • try to rollback huggingface to the older pin and run the repro on pytorch commit 1b3dc05. Accuracy check pass.

The cause should be huggingface upgrade.
This log may be the reason 'WARNING:common:fp64 golden ref were not generated for PLBartForCausalLM. Setting accuracy check to cosine'

Error logs

No response

Minified repro

No response

Versions

x

cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions