Failing Torchbench Models: tracking issue #5932

ysiraichi · 2023-11-28T20:18:53Z

Summary of Contributions (9th Feb)

Improve the number of models in TorchBench that work with Dynamo as a tracer: These passing rates are now comparable to those from torch.compile using Inductor. Some of the fixes also improved the previous tracer that PyTorch/XLA used to use.

Inference Training

Inductor 87 63

Dynamo 60 to 82 41 to 53

Non-Dynamo 79 to 82 54 to 56
Improve the benchmarking tools used by Google: The initial Google runs benchmarking these models showed a discrepancy of about 15 models with the results reported. We identified and fixed 10+ issues that helped reconcile Google's benchmarks with those reported and, in turn, with the PyTorch HUD.

Current State

This post has two lists:

Failing inference models
Failing training models

Each of them shows the failing models:

Tracing without Dynamo (Eager-mode)
Tracing with Dynamo into openxla (Dynamo+openxla)

These lists were created using the benchmarking scripts that currently live in the upstream. The following command was executed:

python xla/benchmarks/experiment_runner.py \
       --suite-name torchbench \
       --accelerator cuda \
       --xla PJRT --xla None \
       --dynamo openxla --dynamo inductor --dynamo None \
       --test eval --test train \
       --repeat 30 --iterations-per-run 5 \
       --print-subprocess \
       --no-resume

Environment

GPU: A100 40GB

Inference

Non-Dynamo. Pass rate: 78/81 - 96% (against inductor)

~~[x] DALLE2_pytorch~~
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
  - PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
  - PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Moved to canary models: Upgrade numpy to 2.0.0rc benchmark#2311
cm3leon_generate
- Issue: [torchbench] Benchmarks failing on eager-mode with: TIMEOUT (30 minutes) #6004
hf_Longformer
- Issue: No support for overlapped tensors. #5835
  - PyTorch/XLA PR: Make as_strided_copy materialize a new tensor with index. #6624
hf_T5_generate
- Issue: [torchbench] Benchmarks failing on eager-mode with: TIMEOUT (30 minutes) #6004
moco
- Issue: [torchbench] moco fails to run. #6083
- Issue: [torchbench] Models require initialization on CUDA device. #6011
  - PyTorch/XLA PR: Re-land: Fix model initialization. #6296
  - PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
  - PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
  - PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Issue: [torchbench] moco inference fails to run on dynamo. #7636
- Issue: [torchbench] moco fails to run with CUDA OpenXLA fallback. #7647
nvidia_deeprecommender
- Issue: [torchbench] nvidia_deeprecommender fails to run. #6006
  - PyTorch/XLA PR: Re-land: Fix model initialization. #6296
  - PyTorch/XLA PR: Fix model initialization. #6076
pytorch_CycleGAN_and_pix2pix
- Issue: [torchbench] pytorch_CycleGAN_and_pix2pix fails to run. #6007
  - PyTorch/XLA PR: Re-land: Fix model initialization. #6296
  - PyTorch/XLA PR: Fix model initialization. #6076
~~[ ] simple_gpt~~
- RTX 2060 doesn't support BF16
- Issue: [torchbench] Models require initialization on CUDA device. #6011
  - PyTorch/XLA PR: Re-land: Fix model initialization. #6296
  - PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
  - PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
  - PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (only work with multiprocess enabled -- torchbench.yaml)
~~[ ] simple_gpt_tp_manual~~
- RTX 2060 doesn't support BF16
- Issue: [torchbench] Models require initialization on CUDA device. #6011
  - PyTorch/XLA PR: Re-land: Fix model initialization. #6296
  - PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
  - PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
  - PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (inside skip list -- torchbench.yaml)
~~[ ] tacotron2~~
- Issue: [torchbench] tacotron2 fails to run in eager-mode. #6112
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
  - PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
  - PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (inside skip list -- torchbench.yaml)
timm_efficientdet
- Issue: [torchbench] Models require initialization on CUDA device. #6011
  - PyTorch/XLA PR: Re-land: Fix model initialization. #6296
  - PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
  - PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
  - PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
vision_maskrcnn
- PyTorch/XLA PR: Fix XLA tensor storage device by using XlaDeviceToAtenDevice. #5743
- PyTorch PR: Skip aliasing correction for lift_fresh. pytorch#112202
- Issue: [torchbench] vision_maskrcnn failing on inference with dynamo after bfloat16 conversion. #6557
  - PyTorch/XLA PR: index: fix index of 0-element tensor by 0-element tensor. #7113

Dynamo+`openxla`. 78/81 - 96% (against inductor)

Models also Failing on Inductor

Inference Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

hf_clip
- 'str' object has no attribute 'shape'
mobilenet_v2_quantized_qat
resnet50_quantized_qat

Inference Failing on Inductor CUDA with Different Errors

simple_gpt
- RTX 2060 doesn't support BF16
- Issue: [torchbench] Models require initialization on CUDA device. #6011
  - PyTorch/XLA PR: Re-land: Fix model initialization. #6296
  - PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
  - PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
  - PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (only work with multiprocess enabled -- torchbench.yaml)
simple_gpt_tp_manual
- RTX 2060 doesn't support BF16
- Issue: [torchbench] Models require initialization on CUDA device. #6011
  - PyTorch/XLA PR: Re-land: Fix model initialization. #6296
  - PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
  - PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
  - PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (inside skip list -- torchbench.yaml)
tacotron2
- Issue: [torchbench] Check failed: xtensor #6005
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
  - PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
  - PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (inside skip list -- torchbench.yaml)

Training

Non-Dynamo. Pass rate: 64/66 - 96% (against inductor)

Dynamo+`openxla`. Pass rate: 55/66 - 83% (against inductor)

Models also Failing on Inductor

No Training Support on Inductor CUDA

Benchmarks that raise the error: Model's DEFAULT_TRAIN_BSIZE is not implemented.

Training Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

DALLE2_pytorch
- Issue: [torchbench] Training benchmarks failing with: tensor does not require grad #6084
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
  - PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
  - PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Moved to canary models: Upgrade numpy to 2.0.0rc benchmark#2311
llama_v2_7b_16h
- Issue: [torchbench] Training benchmarks failing with: OOM #6003
- SKIP (training not supported -- torchbench.yaml)
maml
- Issue: [torchbench] Training benchmarks failing with: tensor does not require grad #6084
- SKIP (training not supported -- torchbench.yaml)
vision_maskrcnn
- targets should not be none when in training mode
- Fix Decomposition for upsample_linear{1d, 3d} pytorch#114774

Training Failing on Inductor CUDA with Different Errors

cc @JackCaoG @miladm

The text was updated successfully, but these errors were encountered:

lezcano · 2023-12-01T11:52:55Z

State after 7 weeks of work:

Models fixed so far:

pyhpc_isoneutral_mixing
pyhpc_turbulent_kinetic_energy
dlrm
Super_SloMo
speech_transformer

PRs to fix the models. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2023-12-11T13:52:11Z

Weekly update (Dec 1~Dec 10):

Models fixed:

DALLE2_pytorch
- training is now failing with the same error as inductor
stable_diffusion_unet
- training is still failing with OOM
stable_diffusion_text_encoder
hf_GPT2
hf_GPT2_large
- training without dynamo is still failing
yolov3
- Failing possibly due to a cuNND error, which is likely an OOM, on a RTX 2060. Haven't tested it yet on a A100, though

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2023-12-15T19:51:59Z

Weekly update (Dec 11~Dec 15):

Models fixed:

pytorch_CycleGAN_and_pix2pix
nvidia_deeprecommender
- dynamo+openxla training is still failling
simple_gpt and simple_gpt_tp_manual
- failing due to the same reasons as inductor
moco
- failing due to distributed backend
timm_efficientdet
- dynamo+openxla training is still failing

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

miladm · 2024-01-10T18:20:19Z

Can we please add a pass rate table in the weekly report that includes:

Inference

Inductor, Dynamo+PyTocrh/XLA:GPU, Non-Dynamo+PyTocrh/XLA:GPU

Training

Inductor, Dynamo+PyTocrh/XLA:GPU, Non-Dynamo+PyTocrh/XLA:GPU

ysiraichi · 2024-01-16T16:05:24Z

Weekly update (Jan 8 ~ Jan 12):

Pass rate (out of 99 benchmarks):

	Inference	Training
Inductor	91	64
Non-Dynamo	87	67
Dynamo	86	57

Models fixed:

detectron2 models (inference with dynamo)
hf_BigBird (inference and training with dynamo)
torch_multimodal_clip (training with dynamo)
timm_vision_transformer (training with dynamo)
Likely not due to the merged PRs below:
- detectron2 models: all but detectron2_fcos_r_50_fpn (training without dynamo)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

Try creating a bf16 tensor as a last resort of is_bf16_supported(). pytorch#115924

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

test_trace_and_metrics fail if PyTorch has CUDA support. #6292

ysiraichi · 2024-01-23T15:30:36Z

Weekly update (Jan 15 ~ Jan 19):

Pass rate (out of 99 benchmarks):

	Inference	Training
Inductor	85	62
Non-Dynamo	70	57
Dynamo	71	55

Models that started failing:

After Re-land: Fix model initialization. #6296:
- detectron2_fasterrcnn_r_101_c4
- detectron2_fasterrcnn_r_101_dc5
- detectron2_fasterrcnn_r_101_fpn
- detectron2_fasterrcnn_r_50_c4
- detectron2_fasterrcnn_r_50_dc5
- detectron2_fasterrcnn_r_50_fpn
- detectron2_fcos_r_50_fpn
- detectron2_maskrcnn_r_101_c4
- detectron2_maskrcnn_r_101_fpn
- detectron2_maskrcnn_r_50_c4
- detectron2_maskrcnn_r_50_fpn
- mobilenet_v3_large
- timm_regnet
- hf_Bart
Started being skipped:
- pytorch_CycleGAN_and_pix2pix
- pytorch_unet
Unsupported precision:
- pytorch_unet
- yolov3
cuDNN error:
- Super_SloMo (inductor)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

[torchbench] Detectron2 benchmarks failing to run. #6336

miladm · 2024-01-23T16:52:41Z

Can we track separate passrate tables for L4 and A100 GPUs going forward @ysiraichi?

cc @frgossen @golechwierowicz @cota

ysiraichi · 2024-01-29T13:59:26Z

Weekly update (Jan 22 ~ Jan 26):

Pass rate (out of 99 benchmarks):

	Inference	Training
Inductor	88	63
Non-Dynamo	69	57
Dynamo	72	55

Models fixed:

(inductor) moco
(inductor) Super_SloMo
- Failed when executed with all other benchmarks
- Passed when executed alone (by specifying --filter argument)
(inference) llama_v2_7b_16h

Models that started failing:

(inference + non-dynamo) timm_efficientnet (to be fixed by: #6389)
(inference + non-dynamo) timm_nfnet (to be fixed by: #6389)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-02-05T14:58:19Z

Weekly update (Jan 29 ~ Feb 2):

Pass rate (out of 99 benchmarks):

A100

	Inference	Training
Inductor	87 (last: 88)	63
Non-Dynamo	82 (last: 69)	56 (last: 57)
Dynamo	82 (last: 72)	53 (last: 55)

L4

	Inference	Training
Inductor	86	60
Non-Dynamo	81	53
Dynamo	82	49

Models Summary (for A100)

Inductor: Inference (-4, +3)
- (fail) New skips by PyTorch's torchbench skip list:
  - detectron2_maskrcnn
  - hf_Bert
  - hf_Bert_large
  - maml
- (pass) Remove outdated skip:
  - vision_maskrcnn
- (pass) AMP supported:
  - pytorch_unet
  - yolov3
Inductor: Training (-3, +3)
- (fail) New skips by PyTorch's torchbench skip list:
  - hf_Bert
  - hf_Bert_large
- (fail) Failing due to sparse error:
  - dlrm
- (pass) AMP supported:
  - pytorch_unet
- (pass) No OOM:
  - demucs
  - opacus_cifar10
XLA:GPU (non-dynamo): Inference (-3, +16)
- (fail) New skips by PyTorch's torchbench skip list:
  - detectron2_maskrcnn
  - hf_Bert
  - hf_Bert_large
- (pass) Forcing fp32 precision (while setting XLA_USE_FP16):
  - detectron2 benchmarks (11)
  - mobilenet_v3_large
  - timm_efficientnet
  - timm_nfnet
  - timm_regnet
- (pass) AMP supported:
  - yolov3
XLA:GPU (non-dynamo): Training (-2, +1)
- (fail) New skips by PyTorch's torchbench skip list:
  - hf_Bert
  - hf_Bert_large
- (pass) No OOM:
  - hf_GPT2_large
XLA:GPU (dynamo): Inference (-4, +14)
- (fail) New skips by PyTorch's torchbench skip list:
  - detectron2_maskrcnn
  - hf_Bert
  - hf_Bert_large
  - maml
- (pass) Remove outdated skip:
  - vision_maskrcnn
- (pass) Forcing fp32 precision (while setting XLA_USE_FP16):
  - detectron2 benchmarks (11)
  - hf_Bart
- (pass) AMP supported:
  - yolov3
XLA:GPU (dynamo): Training (-2, +0)
- (fail) New skips by PyTorch's torchbench skip list:
  - hf_Bert
  - hf_Bert_large

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Implement shallow copy functions for FunctionalTensorWrapper. pytorch#118783

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-02-12T13:14:59Z

Weekly update (Feb 5 ~ Feb 9):

Pass rate (out of 99 benchmarks):

A100

	Inference	Training
Inductor	87 (last: 87)	63
Non-Dynamo	82 (last: 82)	57 (last: 56)
Dynamo	84 (last: 82)	53 (last: 53)

L4

	Inference	Training
Inductor	86	60
Non-Dynamo	81	53
Dynamo	84	49

Models Summary

XLA:GPU (non-dynamo): Training (0, +1)
- (pass) No OOM:
  - densenet121
XLA:GPU (dynamo): Inference (0, +2)
- (pass) Increased compilation cache:
  - cm3leon_generate
  - hf_T5_generate

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

[benchmarks] Default to bfloat16 (inference) and AMP (training) precision. #6518

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-02-19T14:38:18Z

Weekly update (Feb 12 ~ Feb 16):

Pass rate (out of 99 benchmarks):

Could not run the benchmarks this time, due to a compilation issue: #6564

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-02-26T14:41:32Z

Weekly update (Feb 19 ~ Feb 23):

Pass rate (out of 99 benchmarks):

There was an error in the benchmarking scripts, making it so we were unable to run using XLA: #6612

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-02-27T14:56:38Z

Pass rate (out of 99 benchmarks):

A100

	Inference	Training
Inductor	81 (last: 87)	65 (last: 63)
Non-Dynamo	72 (last: 82)	61 (last: 57)
Dynamo	73 (last: 84)	54 (last: 53)

L4

	Inference	Training
Inductor	81 (last: 86)	62 (last: 60)
Non-Dynamo	71 (last: 81)	57 (last: 53)
Dynamo	73 (last: 84)	52 (last: 49)

Models Summary

Inductor: Inference (-10, +4)
- (fail) "roi_align_forward_kernel" not implemented for 'BFloat16' (after: #6518)
  - detectron2 benchmarks (10)
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - maml
  - pytorch_CycleGAN_and_pix2pix
Inductor: Training (-3, +5)
- (fail) Running on AMP (after: #6518)
  - mobilenet_v2_quantized_qat
  - resnet50_quantized_qat
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - pytorch_CycleGAN_and_pix2pix
XLA:GPU (non-dynamo): Inference (-15, +5)
- (fail) Error while lowering: aten::upsample_bilinear2d (after: #6518) (issue: #6520)
  - Background_Matting
- (fail) CPU fallback does not work with mixed dtypes (issue: #6336)
  - detectron2 benchmarks (11)
- (fail) Seen floating point types of different precisions in HLO (after: #6518) (issue: #6521)
  - hf_GPT2 and hf_GPT2_large
- (fail) Indices types are not Long (they are Int) (after: #6518) (issue: #6648)
  - llama
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - maml
  - pytorch_CycleGAN_and_pix2pix
  - pytorch_unet
XLA:GPU (non-dynamo): Training (0, +4)
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - pytorch_CycleGAN_and_pix2pix
  - pytorch_unet
XLA:GPU (dynamo): Inference (-16, +5)
- (fail) expected scalar type Float but found Half (after: #6518) (issue: #6556)
  - Super_SloMo
- (fail) CPU fallback does not work with mixed dtypes (issue: #6336)
  - detectron2 benchmarks (11)
- (fail) Seen floating point types of different precisions in HLO (after: #6518) (issue: #6521)
  - hf_GPT2 and hf_GPT2_large
- (fail) Indices types are not Long (they are Int) (after: #6518) (issue: #6648)
  - llama
- (fail) Slice size at index 0 in gather op is out of range, must be within [0, 1), got 1. (issue: #6557)
  - vision_maskrcnn
XLA:GPU (dynamo): Training (-4, +5)
- (fail) expected scalar type Float but found Half (after: #6518) (issue: #6556)
  - Super_SloMo
- (fail) Seen floating point types of different precisions in HLO (after: #6518)
  - hf_GPT2 and hf_GPT2_large (issue: #6521)
  - timm_nfnet (issue: #6649)
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - pytorch_CycleGAN_and_pix2pix
  - pytorch_unet
- (pass) No OOM
  - stable_diffusion_unet

ysiraichi · 2024-03-04T14:44:54Z

Weekly update (Feb 26 ~ Mar 01):

Pass rate (out of 99 benchmarks):

PyTorch commit: d9db9e62e3d2d58d4e76a43f30c15db389e51c17
PyTorch/XLA commit: 5a113af
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	65 (last: 65)
Non-Dynamo	72 (last: 72)	61 (last: 61)
Dynamo	73 (last: 73)	56 (last: 54)

L4

	Inference	Training
Inductor	81 (last: 81)	63 (last: 62)
Non-Dynamo	72 (last: 71)	58 (last: 57)
Dynamo	71 (last: 73)	54 (last: 52)

Models Summary

XLA:GPU (non-dynamo): Training (-1, +1)
- (fail) Timeout:
  - timm_efficientdet
- (pass) Smaller batch size
  - demucs
XLA:GPU (dynamo): Inference (-2, 0)
- (fail) Timeout:
  - cm3leon_generate
  - hf_T5_generate
XLA:GPU (dynamo): Training (0, +2)
- (pass) Smaller batch size
  - densenet121
  - timm_efficientdet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-03-11T14:32:55Z

Weekly update (Mar 04 ~ Mar 08):

Pass rate (out of 99 benchmarks):

PyTorch commit: c253d1c1db06beb128f6bb4db861cd08a3c23c6b
PyTorch/XLA commit: 57f4780
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 65)
Non-Dynamo	72 (last: 72)	61 (last: 61)
Dynamo	71 (last: 71)	57 (last: 56)

L4

	Inference	Training
Inductor	81 (last: 81)	64 (last: 63)
Non-Dynamo	72 (last: 72)	58 (last: 58)
Dynamo	71 (last: 71)	55 (last: 54)

Models Summary (A100)

Inductor: Training (0, +1)
- (pass) Reason unknown
  - dlrm
XLA:GPU (dynamo): Training (0, +1)
- (pass) Tensor.new dynamo support
  - hf_Reformer

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-03-19T19:49:38Z

Weekly update (Mar 11 ~ Mar 15):

Pass rate (out of 99 benchmarks):

PyTorch commit: 5f601a41e0a8c91ecf7ca5e4b95d752166ed9093
PyTorch/XLA commit: dbe2bc2
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	37 (last: 72)	28 (last: 61)
Dynamo	31 (last: 71)	18 (last: 57)

L4

	Inference	Training
Inductor	81 (last: 81)	64 (last: 63)
Non-Dynamo	45 (last: 72)	38 (last: 58)
Dynamo	44 (last: 71)	22 (last: 55)

Models Summary (A100)

No summary this week because:

Diff is too big
It might be due to a pin update

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

vanbasten23 · 2024-03-21T16:46:49Z

@ysiraichi The regression you saw might be due to #6677 (open xla pin update). Our team is looking into this issue.

ysiraichi · 2024-03-25T15:40:30Z

Weekly update (Mar 18 ~ Mar 21):

Pass rate (out of 99 benchmarks):

PyTorch commit: 5f601a41e0a8c91ecf7ca5e4b95d752166ed9093
PyTorch/XLA commit: dbe2bc2
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	76 (last: 72)	64 (last: 61)
Dynamo	73 (last: 71)	58 (last: 57)

L4

	Inference	Training
Inductor	80 (last: 81)	64 (last: 64)
Non-Dynamo	76 (last: 72)	61 (last: 58)
Dynamo	74 (last: 71)	56 (last: 55)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (0, +4)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) pow data-type promotion fixed
  - hf_GPT2
  - hf_GPT2_large
- (pass) Loosen Embedding index type requirement
  - llama
XLA:GPU (non-dynamo): Training (0, +3)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) Unknown reason:
  - hf_T5_base
  - timm_efficientdet
XLA:GPU (dynamo): Inference (-2, +4)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) pow data-type promotion fixed
  - hf_GPT2
  - hf_GPT2_large
- (pass) Loosen Embedding index type requirement
  - llama
- (fail) Unknown reason:
  - doctr_reco_predictor [torchbench] doctr_reco_predictor fails to run inference on dynamo. #6832
  - speech_transformer [torchbench] speech_transformer fails to run on dynamo. #6831
XLA:GPU (dynamo): Training (-2, +3)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) pow data-type promotion fixed
  - hf_GPT2
  - hf_GPT2_large
- (fail) Unknown reason:
  - hf_Reformer [torchbench] hf_Reformer fails to run training on dynamo. #6830
  - speech_transformer [torchbench] speech_transformer fails to run on dynamo. #6831

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

miladm · 2024-04-01T16:34:21Z

Last week, the results were unchanged.
We are preparing for performance optimizations.
cc @ysiraichi

ysiraichi · 2024-04-08T15:20:31Z

Weekly update (Apr 1 ~ Apr 5):

Pass rate (out of 99 benchmarks):

PyTorch commit: 72662bf05b3499ce96aae9183a489c78f0c44c84
PyTorch/XLA commit: 5c48be1
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	75 (last: 76)	63 (last: 64)
Dynamo	73 (last: 73)	53 (last: 58)

L4

	Inference	Training
Inductor	82 (last: 80)	65 (last: 64)
Non-Dynamo	75 (last: 76)	61 (last: 61)
Dynamo	74 (last: 74)	51 (last: 56)

Models Summary (A100)

Inductor: Inference (-1, +1)
- (pass) dlrm
- (fail) maml
XLA:GPU (non-dynamo): Inference (-1, 0)
- (fail) timm_efficientdet Adjust allclose atol for the flash attention TPU test #6889
XLA:GPU (non-dynamo): Training (-1, 0)
- (fail) timm_efficientdet: OOM
XLA:GPU (dynamo): Inference (-1, +1)
- (pass) speech_transformer
- (fail) timm_efficientdet [torchbench] timm_efficientdet inference fails to run. #6899
XLA:GPU (dynamo): Training (-7, +2)
- (pass) hf_Reformer and speech_transformer
- (fail) hf_GPT2 and hf_GPT2_large [torchbench] hf_GPT2 and hf_GPT2_large training fails to run on dynamo. #6900
- (fail) hf_T5, hf_T5_base, stable_diffusion_unet, and timm_vision_transformer_large: OOM
- (fail) hf_T5_large [torchbench] hf_T5_large training fails to run on dynamo. #6901

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-04-15T14:56:34Z

Weekly update (Apr 8 ~ Apr 12):

Pass rate (out of 99 benchmarks):

PyTorch commit: f5331aade57725b03c36d5cc6c683f6a6bc0692d
PyTorch/XLA commit: 58a412c
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	74 (last: 75)	64 (last: 63)
Dynamo	74 (last: 73)	53 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	75 (last: 75)	61 (last: 61)
Dynamo	75 (last: 74)	51 (last: 51)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-1, 0)
- (fail) doctr_reco_predictor: TIMEOUT
XLA:GPU (non-dynamo): Training (0, +1)
- (pass) timm_efficientdet
XLA:GPU (dynamo): Inference (0, +1)
- (pass) hf_Reformer

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-04-22T15:10:12Z

Weekly update (Apr 15 ~ Apr 19):

Pass rate (out of 99 benchmarks):

PyTorch commit: f5331aade57725b03c36d5cc6c683f6a6bc0692d
PyTorch/XLA commit: b06c9c7
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	? (last: 81)	? (last: 66)
Non-Dynamo	? (last: 74)	? (last: 64)
Dynamo	? (last: 74)	? (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	76 (last: 75)	61 (last: 61)
Dynamo	76 (last: 75)	51 (last: 51)

Models Summary (A100)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

Make nms fallback by default. #6933

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-04-28T14:01:25Z

Weekly update (Apr 22 ~ Apr 26):

Pass rate (out of 99 benchmarks):

PyTorch commit: f5331aade57725b03c36d5cc6c683f6a6bc0692d
PyTorch/XLA commit: 2a204e9
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	75 (last: 74)	64 (last: 64)
Dynamo	75 (last: 74)	53 (last: 53)

L4

	Inference	Training
Inductor	81 (last: 82)	65 (last: 65)
Non-Dynamo	76 (last: 76)	61 (last: 61)
Dynamo	76 (last: 76)	51 (last: 51)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (0, +1)
- (pass) timm_efficientdet
XLA:GPU (dynamo): Inference (0, +1)
- (pass) timm_efficientdet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Re-land: dynamo expand test with view-replay. #6958

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

[torchbench] Inductor failing on training #6988

ysiraichi · 2024-05-04T18:44:56Z

Weekly update (Apr 29 ~ May 3):

Pass rate (out of 99 benchmarks):

PyTorch commit: 489b4586e95752dc65a1821a4383b9679ccd5b6b
PyTorch/XLA commit: d123585
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	76 (last: 75)	64 (last: 64)
Dynamo	75 (last: 75)	53 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 81)	65 (last: 65)
Non-Dynamo	76 (last: 76)	61 (last: 61)
Dynamo	76 (last: 76)	51 (last: 51)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (0, +1)
- (pass) doctr_reco_predictor

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

Re-land: dynamo expand test with view-replay. #6958

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-05-13T16:07:50Z

Weekly update (May 6 ~ May 10):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1 (before: 11.8)
Python version: 3.10 (before: 3.8)
- Reason: networkx had removed support to Python 3.9 (see issue update)
PyTorch commit: 946b96fd54fdaa05d2f5b1e49d837124fbace983
PyTorch/XLA commit: 40f7e1f
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 81)	66 (last: 66)
Non-Dynamo	76 (last: 75)	64 (last: 64)
Dynamo	75 (last: 75)	53 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	76 (last: 76)	61 (last: 61)
Dynamo	76 (last: 76)	51 (last: 51)

Notes

Inductor on L4 started failing with: SyntaxError: unterminated string literal
- Oddly enough, A100 didn't have the same error
- Didn't update the results of L4

Models Summary (A100)

Inductor: Inference (0, +1)
- (pass) maml

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

Keep track of ViewMeta with symbolic inputs. pytorch#125876

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-05-20T15:47:43Z

Weekly update (May 13 ~ May 17):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 8619fe6214cd8f31345ae73c5b90024a0233dc40
PyTorch/XLA commit: 62c3ba6
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	66 (last: 66)
Non-Dynamo	77 (last: 76)	61 (last: 64)
Dynamo	78 (last: 75)	55 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	77 (last: 76)	59 (last: 61)
Dynamo	78 (last: 76)	52 (last: 51)

Models Summary (A100)

All the difference shown bellow is likely the result of #7067, which fixes AMP. Reason: (i) training benchmarks use AMP, by default; and (ii) there are some inference benchmarks that use AMP instead of bfloat16.

XLA:GPU (non-dynamo): Inference (0, +1)
- (pass) detectron2_fcos_r_50_fpn
XLA:GPU (non-dynamo): Training (-5, +2)
- (fail) Super_SloMo
- (fail) mobilenet_v2_quantized_qat
- (fail) resnet50_quantized_qat
- (fail) timm_efficientdet
- (fail) timm_nfnet
- (pass) stable_diffusion_unet
- (pass) timm_vision_transformer_large
XLA:GPU (dynamo): Inference (0, +3)
- (pass) Super_SloMo
- (pass) detectron2_fcos_r_50_fpn
- (pass) doctr_reco_predictor
XLA:GPU (dynamo): Training (0, +2)
- (pass) Super_SloMo
- (pass) timm_nfnet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

[benchmarks] Fix AMP setup for torchbench models. #7067

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-05-25T00:32:43Z

Weekly update (May 20 ~ May 24):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 8619fe6214cd8f31345ae73c5b90024a0233dc40
PyTorch/XLA commit: cb8533b
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	66 (last: 66)
Non-Dynamo	77 (last: 77)	63 (last: 61)
Dynamo	78 (last: 78)	55 (last: 55)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	77 (last: 77)	61 (last: 59)
Dynamo	78 (last: 78)	52 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Training (-5, +2)
- (pass) Super_SloMo #7067
- (pass) timm_efficientdet #7091

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

upsample_bilinear2d HLO returns unexpected data-type. #7095

ysiraichi · 2024-06-03T14:20:14Z

Weekly update (May 27 ~ May 29):

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Re-land: upsample_bilinear: fix output data-type. #7168

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-06-10T15:05:29Z

Weekly update (June 3 ~ June 6):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: f5328542b5365741176e71dd8a2954e0f350b9bc
PyTorch/XLA commit: aec2730
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	65 (last: 66)
Non-Dynamo	79 (last: 77)	61 (last: 63)
Dynamo	79 (last: 78)	55 (last: 55)

L4

	Inference	Training
Inductor	82 (last: 82)	64 (last: 65)
Non-Dynamo	79 (last: 77)	60 (last: 61)
Dynamo	79 (last: 78)	52 (last: 52)

Models Summary (A100)

Inductor: Training (-1, +0)
- (fail) dlrm
XLA:GPU (non-dynamo): Inference (-0, +2)
- (pass) Background_Matting #7168
- (pass) vision_maskrcnn #7113 #7168
XLA:GPU (non-dynamo): Training (-3, +1)
- (pass) timm_nfnet #7130
- (fail) drq #7247
- (fail) stable_diffusion_unet: OOM
- (fail) timm_vision_transformer_large: OOM
XLA:GPU (dynamo): Inference (-0, +1)
- (pass) vision_maskrcnn #7113 #7116

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

Re-land: upsample_bilinear: fix output data-type. #7168

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-06-17T14:20:58Z

Weekly update (June 10 ~ June 14):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 0344f95c2ea944cc916290097133470f963a5532
PyTorch/XLA commit: 286b31f
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	79 (last: 79)	63 (last: 61)
Dynamo	79 (last: 79)	55 (last: 55)

L4

	Inference	Training
Inductor	82 (last: 82)	64 (last: 64)
Non-Dynamo	79 (last: 79)	61 (last: 60)
Dynamo	79 (last: 79)	52 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Training (-1, +3)
- (pass) drq
- (pass) stable_diffusion_unet
- (pass) timm_vision_transformer_large
- (fail) timm_nfnet #7271

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-06-24T11:30:49Z

Weekly update (June 17 ~ June 21):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 54b0006cb232f798281397b2261101625444c79b
PyTorch/XLA commit: cb6549a
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	81 (last: 82)	65 (last: 65)
Non-Dynamo	78 (last: 79)	63 (last: 63)
Dynamo	78 (last: 79)	55 (last: 55)

L4

	Inference	Training
Inductor	81 (last: 82)	64 (last: 64)
Non-Dynamo	78 (last: 79)	61 (last: 61)
Dynamo	78 (last: 79)	52 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-1, +0)
- (fail) DALLE2_pytorch pytorch/benchmark#2311
XLA:GPU (dynamo): Inference (-1, +0)
- (fail) DALLE2_pytorch pytorch/benchmark#2311

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Introduce CUDA OpenXLA fallback. #7318

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Sharing tensor storage (with DLPack) results in unexpected behavior. #7304

ysiraichi · 2024-07-01T14:52:04Z

Weekly update (June 24 ~ June 28):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 5ee893a84acb979c71bb427f53a25b2aa1e7b7ca
PyTorch/XLA commit: 7d41035
PyTorch/benchmark commit: 9e0971c0f34478e6ac10ebd2fa056b5681d3e454

A100

	Inference	Training
Inductor	74 (last: 81)	60 (last: 65)
Non-Dynamo	73 (last: 78)	60 (last: 63)
Dynamo	72 (last: 78)	54 (last: 55)

L4

	Inference	Training
Inductor	74 (last: 81)	59 (last: 64)
Non-Dynamo	73 (last: 78)	58 (last: 61)
Dynamo	72 (last: 78)	51 (last: 52)

Models Summary (A100)

Inductor: Inference (-7, +0)
- (fail) doctr_det_predictor (likely due to newer PyTorch/benchmark commit)
- (fail) doctr_reco_predictor (likely due to newer PyTorch/benchmark commit)
- (fail) hf_T5 (likely due to newer PyTorch/benchmark commit)
- (fail) hf_T5_base (likely due to newer PyTorch/benchmark commit)
- (fail) hf_T5_large (likely due to newer PyTorch/benchmark commit)
- (fail) moco (caused by #7321)
- (fail) soft_actor_critic (likely NumPy 2.0 issue)
Inductor: Training (-5, +0)
- (fail) hf_T5
- (fail) hf_T5_base
- (fail) hf_T5_large
- (fail) moco
- (fail) soft_actor_critic
XLA:GPU (non-dynamo): Inference (-6, +1)
- (fail) doctr_det_predictor
- (fail) doctr_reco_predictor
- (fail) hf_T5
- (fail) hf_T5_base
- (fail) hf_T5_large
- (fail) soft_actor_critic
- (pass) moco
XLA:GPU (non-dynamo): Training -4, +1)
- (fail) hf_T5
- (fail) hf_T5_base
- (fail) hf_T5_large
- (fail) soft_actor_critic
- (pass) moco
XLA:GPU (dynamo): Inference (-6, +0)
- (fail) doctr_det_predictor
- (fail) doctr_reco_predictor
- (fail) hf_T5
- (fail) hf_T5_base
- (fail) hf_T5_large
- (fail) soft_actor_critic
XLA:GPU (dynamo): Training -1, +0)
- (fail) soft_actor_critic

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-07-06T21:10:01Z

Weekly update (July 1 ~ July 5):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 7128504424ca54311efdf22f2c8425291586860e
PyTorch/XLA commit: c782e0d
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	81 (last: 74)	66 (last: 60)
Non-Dynamo	78 (last: 73)	64 (last: 60)
Dynamo	78 (last: 72)	55 (last: 54)

L4

	Inference	Training
Inductor	81 (last: 74)	65 (last: 59)
Non-Dynamo	78 (last: 73)	62 (last: 58)
Dynamo	78 (last: 72)	52 (last: 51)

Models Summary (A100)

Inductor: Inference (-0, +7)
- (pass) doctr_det_predictor (likely due to old PyTorch/benchmark commit)
- (pass) doctr_reco_predictor (likely due to old PyTorch/benchmark commit)
- (pass) hf_T5 (likely due to old PyTorch/benchmark commit)
- (pass) hf_T5_base (likely due to old PyTorch/benchmark commit)
- (pass) hf_T5_large (likely due to old PyTorch/benchmark commit)
- (pass) moco (caused by #7598)
- (pass) soft_actor_critic (likely due to old PyTorch/benchmark commit)
Inductor: Training (-0, +6)
- (pass) dlrm
- (pass) hf_T5
- (pass) hf_T5_base
- (pass) hf_T5_large
- (pass) moco
- (pass) soft_actor_critic
XLA:GPU (non-dynamo): Inference (-1, +6)
- (pass) doctr_det_predictor
- (pass) doctr_reco_predictor
- (pass) hf_T5
- (pass) hf_T5_base
- (pass) hf_T5_large
- (pass) soft_actor_critic
- (fail) moco (needs newer torchbench)
XLA:GPU (non-dynamo): Training (-1, +5)
- (pass) hf_T5
- (pass) hf_T5_base
- (pass) hf_T5_large
- (pass) soft_actor_critic
- (pass) timm_nfnet (fixed in #7602)
- (fail) moco (needs newer torchbench)
XLA:GPU (dynamo): Inference (-0, +6)
- (pass) doctr_det_predictor
- (pass) doctr_reco_predictor
- (pass) hf_T5
- (pass) hf_T5_base
- (pass) hf_T5_large
- (pass) soft_actor_critic
XLA:GPU (dynamo): Training -1, +0)
- (pass) soft_actor_critic

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Make CUDA OpenXLA fallback the default. #7630

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-07-15T14:57:08Z

Weekly update (July 8 ~ July 12):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: da030e7addfe94f27fb9428245b854bc93f5917f
PyTorch/XLA commit: 1651e76
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	75 (last: 78)	61 (last: 64)
Dynamo	75 (last: 78)	52 (last: 55)

L4

	Inference	Training
Inductor	81 (last: 81)	65 (last: 65)
Non-Dynamo	75 (last: 78)	59 (last: 62)
Dynamo	75 (last: 78)	49 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-3, +0)
- (fail) hf_Bart
- (fail) nanogpt
- (fail) torch_multimodal_clip
XLA:GPU (non-dynamo): Training (-3, +0)
- (fail) hf_Bart
- (fail) nanogpt
- (fail) torch_multimodal_clip
XLA:GPU (dynamo): Inference (-3, +0)
- (fail) hf_Bart
- (fail) nanogpt
- (fail) torch_multimodal_clip
XLA:GPU (dynamo): Training (-3, +0)
- (fail) hf_Bart
- (fail) nanogpt
- (fail) torch_multimodal_clip

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-07-22T15:04:54Z

Weekly update (July 15 ~ July 19):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: eee76c86a8462365b7423916607b7a40bfec6f73
PyTorch/XLA commit: 35c537a
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	78 (last: 75)	64 (last: 61)
Dynamo	78 (last: 75)	55 (last: 52)

L4

	Inference	Training
Inductor	81 (last: 81)	65 (last: 65)
Non-Dynamo	78 (last: 75)	62 (last: 59)
Dynamo	78 (last: 75)	52 (last: 49)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-0, +3)
- (pass) hf_Bart
- (pass) nanogpt
- (pass) torch_multimodal_clip
XLA:GPU (non-dynamo): Training (-0, +3)
- (pass) hf_Bart
- (pass) nanogpt
- (pass) torch_multimodal_clip
XLA:GPU (dynamo): Inference (-0, +3)
- (pass) hf_Bart
- (pass) nanogpt
- (pass) torch_multimodal_clip
XLA:GPU (dynamo): Training (-0, +3)
- (pass) hf_Bart
- (pass) nanogpt
- (pass) torch_multimodal_clip

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-07-26T20:07:54Z

Weekly update (July 22 ~ July 26):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: c3679bed35dc282606741d6ef06d6d0a21c0cc8a
PyTorch/XLA commit: 2870e93
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	77 (last: 78)	64 (last: 64)
Dynamo	78 (last: 78)	55 (last: 55)

L4

	Inference	Training
Inductor	81 (last: 81)	65 (last: 65)
Non-Dynamo	78 (last: 78)	62 (last: 62)
Dynamo	78 (last: 78)	52 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-1, +0)
- (fail) doctr_reco_predictor: timeout

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-08-12T15:03:27Z

Weekly update (July 29 ~ Aug 9):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 50595ecef4a4f9882a02539019b11a5e50295244
PyTorch/XLA commit: 60b9dfe
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	77 (last: 81)	66 (last: 66)
Non-Dynamo	78 (last: 77)	63 (last: 64)
Dynamo	77 (last: 78)	52 (last: 55)

L4

	Inference	Training
Inductor	77 (last: 81)	65 (last: 65)
Non-Dynamo	78 (last: 78)	62 (last: 62)
Dynamo	77 (last: 78)	45 (last: 52)

Models Summary (A100)

Inductor: Inference (-4, +0)
- (fail) cm3leon_generate (likely due to CUDAGraphs introduction #7749)
- (fail) hf_T5_generate (likely due to CUDAGraphs introduction #7749)
- (fail) llama (likely due to CUDAGraphs introduction #7749)
- (fail) maml (likely due to CUDAGraphs introduction #7749)
XLA:GPU (dynamo): Inference (-1, +0)
- (fail) hf_BigBird
XLA:GPU (dynamo): Training (-4, +0)
- (fail) Background_Matting: OOM
- (fail) hf_BigBird
- (fail) timm_nfnet: OOM

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-08-19T14:37:00Z

Weekly update (Aug 12 ~ Aug 16):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 99b3b58f39507bb8ad5b4bb1b9bedf7f47b64fa3
PyTorch/XLA commit: 0e35022
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	77 (last: 77)	66 (last: 66)
Non-Dynamo	78 (last: 78)	63 (last: 63)
Dynamo	77 (last: 77)	52 (last: 52)

L4

	Inference	Training
Inductor	77 (last: 77)	65 (last: 65)
Non-Dynamo	78 (last: 78)	62 (last: 62)
Dynamo	77 (last: 77)	44 (last: 45)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Fix norm data-type when using AMP. #7878

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-08-27T13:49:18Z

Weekly update (Aug 19 ~ Aug 23):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 2553278bae5993bd94bae4f04bf4586fb3f30d57
PyTorch/XLA commit: 5f82da9
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	77 (last: 77)	66 (last: 66)
Non-Dynamo	78 (last: 78)	63 (last: 63)
Dynamo	77 (last: 77)	49 (last: 52)

L4

	Inference	Training
Inductor	77 (last: 77)	65 (last: 65)
Non-Dynamo	78 (last: 78)	62 (last: 62)
Dynamo	77 (last: 77)	41 (last: 44)

Models Summary (A100)

XLA:GPU (dynamo): Training (-4, +1)
- (fail) basic_gnn_edgecnn
- (fail) basic_gnn_gin
- (fail) basic_gnn_sage
- (fail) stable_diffusion_text_encoder
- (pass) dlrm

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

Fix norm data-type when using AMP. #7878

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

AOTAutograd not tracing AMP correctly when using PyTorch/XLA. pytorch#133924

ysiraichi · 2024-09-03T14:06:03Z

Weekly update (Aug 26 ~ Aug 30):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 29b7852dc1a85b36688716e27ac3ce0fa06c4b84
PyTorch/XLA commit: 13affb9
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	77 (last: 77)	66 (last: 66)
Non-Dynamo	78 (last: 78)	64 (last: 63)
Dynamo	77 (last: 77)	51 (last: 49)

L4

	Inference	Training
Inductor	77 (last: 77)	65 (last: 65)
Non-Dynamo	78 (last: 78)	63 (last: 62)
Dynamo	77 (last: 77)	48 (last: 41)

Models Summary (A100)

XLA:GPU (non-dynamo): Training (-0, +1)
- (pass) dlrm
XLA:GPU (dynamo): Training (-0, +2)
- (pass) Background_Matting
- (pass) timm_nfnet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

DynamicShapeDetector with trie implementation. #7918

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi · 2024-09-09T15:04:44Z

Weekly update (Sep 2 ~ Sep 6):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: b143426db3910b8753255a034250ac0c9ea40aa3
PyTorch/XLA commit: 12e5958
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	77 (last: 77)	66 (last: 66)
Non-Dynamo	78 (last: 78)	64 (last: 64)
Dynamo	77 (last: 77)	52 (last: 51)

L4

	Inference	Training
Inductor	77 (last: 77)	65 (last: 65)
Non-Dynamo	78 (last: 78)	63 (last: 63)
Dynamo	77 (last: 77)	49 (last: 48)

Models Summary (A100)

XLA:GPU (dynamo): Training (-0, +1)
- (pass) nvidia_deeprecommender

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

DynamicShapeDetector with trie implementation. #7918

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Set size, stride, and offset of functional tensor. pytorch#135237

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

[benchmarks] dlrm training running twice on dynamo and non-dynamo configurations. #7976

ysiraichi · 2024-09-16T14:58:31Z

Weekly update (Sep 9 ~ Sep 13):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: b4c84c31679286080842236a7b1de8e8339a6963
PyTorch/XLA commit: 9c7f083
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	79 (last: 77)	66 (last: 66)
Non-Dynamo	78 (last: 78)	64 (last: 64)
Dynamo	77 (last: 77)	52 (last: 52)

L4

	Inference	Training
Inductor	79 (last: 77)	65 (last: 65)
Non-Dynamo	78 (last: 78)	63 (last: 63)
Dynamo	77 (last: 77)	49 (last: 49)

Models Summary (A100)

Inductor: Inference (-0, +1)
- (pass) cm3leon_generage
- (pass) hf_T5_generage

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

[benchmarks] Disallow XRT for XLA:CUDA. #8006

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

XLA tensor metadata differs from eager. #7983

ysiraichi · 2024-09-23T13:07:57Z

Weekly update (Sep 16 ~ Sep 20):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: cf31724db726ad210fc6638f9873e041c33c9034
PyTorch/XLA commit: d0ea5cc
PyTorch/benchmark commit: 23512dbebd44a11eb84afbf53c3c071dd105297e

A100

	Inference	Training
Inductor	79 (last: 79)	66 (last: 66)
Non-Dynamo	78 (last: 78)	64 (last: 64)
Dynamo	77 (last: 77)	52 (last: 52)

L4

	Inference	Training
Inductor	79 (last: 79)	65 (last: 65)
Non-Dynamo	78 (last: 78)	63 (last: 63)
Dynamo	77 (last: 77)	49 (last: 49)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

Remove contiguity assertion from XLATensorImpl. #7998

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

XLA tensor metadata differs from eager. #7983

lezcano changed the title ~~Torchbench benchmarks: tracking issue~~ Failing Torchbench benchmarks: tracking issue Nov 28, 2023

miladm assigned ysiraichi Dec 1, 2023

miladm added the xla:gpu label Dec 1, 2023

lezcano changed the title ~~Failing Torchbench benchmarks: tracking issue~~ Failing Torchbench Models: tracking issue Dec 1, 2023

ysiraichi mentioned this issue Jan 22, 2024

disable some models for torchbench #6352

Merged

pytorch deleted a comment from ysiraichi Feb 9, 2024

ysiraichi mentioned this issue May 20, 2024

[torchbench] hf_T5_large training fails to run on dynamo. #6901

Open

Failing Torchbench Models: tracking issue #5932

Failing Torchbench Models: tracking issue #5932

Comments

ysiraichi commented Nov 28, 2023 • edited Loading

Summary of Contributions (9th Feb)

Current State

Environment

Inference

Non-Dynamo. Pass rate: 78/81 - 96% (against inductor)

Dynamo+openxla. 78/81 - 96% (against inductor)

Models also Failing on Inductor

Inference Failing on Inductor CUDA with the Same Error

Inference Failing on Inductor CUDA with Different Errors

Training

Non-Dynamo. Pass rate: 64/66 - 96% (against inductor)

Dynamo+openxla. Pass rate: 55/66 - 83% (against inductor)

Models also Failing on Inductor

No Training Support on Inductor CUDA

Training Failing on Inductor CUDA with the Same Error

Training Failing on Inductor CUDA with Different Errors

lezcano commented Dec 1, 2023 • edited Loading

Models fixed so far:

PRs to fix the models. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented Dec 11, 2023 • edited Loading

Models fixed:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented Dec 15, 2023 • edited by lezcano Loading

Models fixed:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

miladm commented Jan 10, 2024 • edited Loading

ysiraichi commented Jan 16, 2024 • edited by lezcano Loading

Pass rate (out of 99 benchmarks):

Models fixed:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented Jan 23, 2024

Pass rate (out of 99 benchmarks):

Models that started failing:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

miladm commented Jan 23, 2024 • edited Loading

ysiraichi commented Jan 29, 2024

Pass rate (out of 99 benchmarks):

Models fixed:

Models that started failing:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented Feb 5, 2024

Pass rate (out of 99 benchmarks):

A100

L4

Models Summary (for A100)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented Feb 12, 2024 • edited Loading

Pass rate (out of 99 benchmarks):

A100

L4

Models Summary

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented Feb 19, 2024

Pass rate (out of 99 benchmarks):

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi commented Feb 26, 2024 • edited Loading

Pass rate (out of 99 benchmarks):

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

ysiraichi commented Nov 28, 2023 •

edited

Loading

Dynamo+`openxla`. 78/81 - 96% (against inductor)

Dynamo+`openxla`. Pass rate: 55/66 - 83% (against inductor)

lezcano commented Dec 1, 2023 •

edited

Loading

ysiraichi commented Dec 11, 2023 •

edited

Loading

ysiraichi commented Dec 15, 2023 •

edited by lezcano

Loading

miladm commented Jan 10, 2024 •

edited

Loading

ysiraichi commented Jan 16, 2024 •

edited by lezcano

Loading

miladm commented Jan 23, 2024 •

edited

Loading

ysiraichi commented Feb 12, 2024 •

edited

Loading

ysiraichi commented Feb 26, 2024 •

edited

Loading

ysiraichi commented Feb 27, 2024 •

edited

Loading

ysiraichi commented Mar 25, 2024 •

edited

Loading