Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASGD perf regression #1886

Open
github-actions bot opened this issue Sep 9, 2023 · 1 comment
Open

ASGD perf regression #1886

github-actions bot opened this issue Sep 9, 2023 · 1 comment

Comments

@github-actions
Copy link

github-actions bot commented Sep 9, 2023

TorchBench CI has detected a performance signal or runtime regression.

Base PyTorch commit: 0200b1106c4fe80ea0884181dc8d649ef6078ea3

Affected PyTorch commit: 806d1a871ddfd2d38e1791489892009feaec8425

Affected Tests:

  • resnet50, ASGD, cuda, default: +124.02993%
  • resnet50, ASGD, cuda, maximize: +91.23220%
  • resnet50, ASGD, cuda, no_foreach: +123.98829%
  • resnet50, ASGD, cuda, differentiable: +119.85504%
  • resnet50, ASGD, cuda, foreach: +105.76765%
  • mnasnet1_0, ASGD, cuda, default: +115.15906%
  • mnasnet1_0, ASGD, cuda, maximize: +93.84647%
  • mnasnet1_0, ASGD, cuda, no_foreach: +138.73016%
  • mnasnet1_0, ASGD, cuda, differentiable: +121.80861%
  • mnasnet1_0, ASGD, cuda, foreach: +104.30969%
  • squeezenet1_1, ASGD, cuda, default: +94.39419%
  • squeezenet1_1, ASGD, cuda, maximize: +86.72449%
  • squeezenet1_1, ASGD, cuda, no_foreach: +119.51013%
  • squeezenet1_1, ASGD, cuda, differentiable: +113.43829%
  • squeezenet1_1, ASGD, cuda, foreach: +102.96500%
  • sam, ASGD, cuda, default: +195.67645%
  • sam, ASGD, cuda, maximize: +181.55680%
  • sam, ASGD, cuda, no_foreach: +140.95377%
  • sam, ASGD, cuda, differentiable: +147.18748%
  • sam, ASGD, cuda, foreach: +193.14937%
  • vgg16, ASGD, cuda, default: +128.22068%
  • vgg16, ASGD, cuda, maximize: +95.99951%
  • vgg16, ASGD, cuda, no_foreach: +129.91319%
  • vgg16, ASGD, cuda, differentiable: +129.00090%
  • vgg16, ASGD, cuda, foreach: +126.44894%
  • timm_vision_transformer, Adadelta, cuda, (pt2) foreach: -99.98788%
  • timm_vision_transformer, Adam, cuda, (pt2) fused: +82.68205%
  • timm_vision_transformer, AdamW, cuda, (pt2) no_foreach: -99.98358%
  • timm_vision_transformer, AdamW, cuda, (pt2) foreach: -67.46933%
  • timm_vision_transformer, AdamW, cuda, (pt2) fused: +71.99635%
  • timm_vision_transformer, ASGD, cuda, default: +106.71975%
  • timm_vision_transformer, ASGD, cuda, maximize: +96.59117%
  • timm_vision_transformer, ASGD, cuda, (pt2) no_foreach: +143.52541%
  • timm_vision_transformer, ASGD, cuda, no_foreach: +91.65669%
  • timm_vision_transformer, ASGD, cuda, differentiable: +116.06366%
  • timm_vision_transformer, ASGD, cuda, foreach: +115.47567%
  • compile_time, timm_vision_transformer, Adadelta, cuda, (pt2) foreach: +37807.53782%
  • compile_time, timm_vision_transformer, AdamW, cuda, (pt2) no_foreach: +7077.47102%
  • compile_time, timm_vision_transformer, AdamW, cuda, (pt2) fused: +361.28020%
  • compile_time, timm_vision_transformer, Adamax, cuda, (pt2) foreach: +217.33371%
  • compile_time, timm_vision_transformer, SGD, cuda, (pt2) foreach: +813.99024%
  • timm_vovnet, ASGD, cuda, default: +79.70207%
  • timm_vovnet, ASGD, cuda, maximize: +81.93381%
  • timm_vovnet, ASGD, cuda, no_foreach: +92.76432%
  • timm_vovnet, ASGD, cuda, differentiable: +100.97167%
  • timm_vovnet, ASGD, cuda, foreach: +79.48599%
  • speech_transformer, ASGD, cuda, default: +98.57801%
  • speech_transformer, ASGD, cuda, maximize: +92.68815%
  • speech_transformer, ASGD, cuda, no_foreach: +113.97402%
  • speech_transformer, ASGD, cuda, differentiable: +109.36097%
  • speech_transformer, ASGD, cuda, foreach: +99.81050%
  • basic_gnn_sage, ASGD, cuda, default: +86.73222%
  • basic_gnn_sage, ASGD, cuda, maximize: +75.81285%
  • basic_gnn_sage, ASGD, cuda, no_foreach: +116.76823%
  • basic_gnn_sage, ASGD, cuda, differentiable: +106.38985%
  • basic_gnn_sage, ASGD, cuda, foreach: +96.91066%
  • hf_T5, ASGD, cuda, default: +127.93302%
  • hf_T5, ASGD, cuda, maximize: +114.74867%
  • hf_T5, ASGD, cuda, no_foreach: +131.01003%
  • hf_T5, ASGD, cuda, differentiable: +127.07553%
  • hf_T5, ASGD, cuda, foreach: +155.85461%
  • mobilenet_v2_quantized_qat, ASGD, cuda, default: +95.82244%
  • mobilenet_v2_quantized_qat, ASGD, cuda, maximize: +86.96235%
  • mobilenet_v2_quantized_qat, ASGD, cuda, no_foreach: +107.97934%
  • mobilenet_v2_quantized_qat, ASGD, cuda, differentiable: +116.05228%
  • mobilenet_v2_quantized_qat, ASGD, cuda, foreach: +106.87671%
  • maml_omniglot, ASGD, cuda, default: +87.10180%
  • maml_omniglot, ASGD, cuda, maximize: +75.81007%
  • maml_omniglot, ASGD, cuda, no_foreach: +109.66096%
  • maml_omniglot, ASGD, cuda, differentiable: +116.56157%
  • maml_omniglot, ASGD, cuda, foreach: +90.88092%
  • alexnet, ASGD, cuda, default: +142.00344%
  • alexnet, ASGD, cuda, maximize: +117.11555%
  • alexnet, ASGD, cuda, no_foreach: +148.61658%
  • alexnet, ASGD, cuda, differentiable: +147.32024%
  • alexnet, ASGD, cuda, foreach: +140.08364%
  • opacus_cifar10, ASGD, cuda, default: +88.46431%
  • opacus_cifar10, ASGD, cuda, maximize: +77.82195%
  • opacus_cifar10, ASGD, cuda, no_foreach: +114.27061%
  • opacus_cifar10, ASGD, cuda, differentiable: +109.31896%
  • opacus_cifar10, ASGD, cuda, foreach: +84.54530%
  • hf_Bert, ASGD, cuda, default: +142.73106%
  • hf_Bert, ASGD, cuda, maximize: +117.33846%
  • hf_Bert, ASGD, cuda, no_foreach: +117.59002%
  • hf_Bert, ASGD, cuda, differentiable: +109.75356%
  • hf_Bert, ASGD, cuda, foreach: +137.88056%
  • timm_vision_transformer_large, Adadelta, cuda, (pt2) no_foreach: -99.98482%
  • timm_vision_transformer_large, Adagrad, cuda, (pt2) default: +31.17705%
  • timm_vision_transformer_large, Adam, cuda, (pt2) default: +33.26611%
  • timm_vision_transformer_large, Adam, cuda, (pt2) no_foreach: +41.87344%
  • timm_vision_transformer_large, Adam, cuda, (pt2) foreach: +64.34802%
  • timm_vision_transformer_large, AdamW, cuda, (pt2) default: +37.09363%
  • timm_vision_transformer_large, AdamW, cuda, (pt2) foreach: +41.07935%
  • timm_vision_transformer_large, ASGD, cuda, default: +123.28451%
  • timm_vision_transformer_large, ASGD, cuda, maximize: +140.95934%
  • timm_vision_transformer_large, ASGD, cuda, no_foreach: +149.50390%
  • timm_vision_transformer_large, ASGD, cuda, differentiable: +218.22416%
  • timm_vision_transformer_large, ASGD, cuda, foreach: +124.79011%
  • timm_vision_transformer_large, Rprop, cuda, (pt2) default: +31.40632%
  • compile_time, timm_vision_transformer_large, Adadelta, cuda, (pt2) no_foreach: -7716.53938%
  • hf_T5_large, ASGD, cuda, default: +211.69880%
  • hf_T5_large, ASGD, cuda, maximize: +169.96072%
  • hf_T5_large, ASGD, cuda, no_foreach: +141.02743%
  • hf_T5_large, ASGD, cuda, differentiable: +143.63304%
  • hf_T5_large, ASGD, cuda, foreach: +208.19100%
  • vision_maskrcnn, ASGD, cuda, default: +100.31787%
  • vision_maskrcnn, ASGD, cuda, maximize: +89.12360%
  • vision_maskrcnn, ASGD, cuda, no_foreach: +98.99280%
  • vision_maskrcnn, ASGD, cuda, differentiable: +99.44593%
  • vision_maskrcnn, ASGD, cuda, foreach: +101.92607%
  • hf_GPT2_large, ASGD, cuda, default: +193.84464%
  • hf_GPT2_large, ASGD, cuda, maximize: +153.89762%
  • hf_GPT2_large, ASGD, cuda, no_foreach: +158.69141%
  • hf_GPT2_large, ASGD, cuda, differentiable: +163.62273%
  • hf_GPT2_large, ASGD, cuda, foreach: +192.48542%
  • BERT_pytorch, ASGD, cuda, default: +135.65602%
  • BERT_pytorch, ASGD, cuda, maximize: +119.74721%
  • BERT_pytorch, ASGD, cuda, no_foreach: +125.09949%
  • BERT_pytorch, ASGD, cuda, differentiable: +109.46087%
  • BERT_pytorch, ASGD, cuda, foreach: +133.43466%
  • detectron2_fasterrcnn_r_50_fpn, ASGD, cuda, default: +122.49460%
  • detectron2_fasterrcnn_r_50_fpn, ASGD, cuda, maximize: +103.09817%
  • detectron2_fasterrcnn_r_50_fpn, ASGD, cuda, no_foreach: +118.01190%
  • detectron2_fasterrcnn_r_50_fpn, ASGD, cuda, differentiable: +120.14865%
  • detectron2_fasterrcnn_r_50_fpn, ASGD, cuda, foreach: +121.01315%
  • dlrm, ASGD, cuda, default: +78.09405%
  • dlrm, ASGD, cuda, maximize: +59.68144%
  • dlrm, ASGD, cuda, foreach: +78.35686%
  • fastNLP_Bert, ASGD, cuda, default: +124.20659%
  • fastNLP_Bert, ASGD, cuda, maximize: +129.74865%
  • fastNLP_Bert, ASGD, cuda, no_foreach: +131.86236%
  • fastNLP_Bert, ASGD, cuda, differentiable: +124.50400%
  • fastNLP_Bert, ASGD, cuda, foreach: +129.15546%
  • basic_gnn_gcn, ASGD, cuda, default: +95.32429%
  • basic_gnn_gcn, ASGD, cuda, maximize: +87.14749%
  • basic_gnn_gcn, ASGD, cuda, no_foreach: +110.78651%
  • basic_gnn_gcn, ASGD, cuda, differentiable: +111.55591%
  • basic_gnn_gcn, ASGD, cuda, foreach: +107.76394%
  • phlippe_resnet, ASGD, cuda, default: +93.23522%
  • phlippe_resnet, ASGD, cuda, maximize: +86.87177%
  • phlippe_resnet, ASGD, cuda, no_foreach: +130.84100%
  • phlippe_resnet, ASGD, cuda, differentiable: +115.97482%
  • phlippe_resnet, ASGD, cuda, foreach: +98.55166%
  • timm_resnest, ASGD, cuda, default: +103.57621%
  • timm_resnest, ASGD, cuda, maximize: +107.16194%
  • timm_resnest, ASGD, cuda, no_foreach: +119.73967%
  • timm_resnest, ASGD, cuda, differentiable: +129.29690%
  • timm_resnest, ASGD, cuda, foreach: +118.03992%
  • basic_gnn_gin, ASGD, cuda, default: +115.13085%
  • basic_gnn_gin, ASGD, cuda, maximize: +83.92329%
  • basic_gnn_gin, ASGD, cuda, no_foreach: +107.37685%
  • basic_gnn_gin, ASGD, cuda, differentiable: +111.10815%
  • basic_gnn_gin, ASGD, cuda, foreach: +101.07616%
  • resnet50_quantized_qat, ASGD, cuda, default: +81.96953%
  • resnet50_quantized_qat, ASGD, cuda, maximize: +73.56566%
  • resnet50_quantized_qat, ASGD, cuda, no_foreach: +126.73629%
  • resnet50_quantized_qat, ASGD, cuda, differentiable: +108.65208%
  • resnet50_quantized_qat, ASGD, cuda, foreach: +90.47366%
  • Background_Matting, ASGD, cuda, default: +85.40617%
  • Background_Matting, ASGD, cuda, maximize: +80.75659%
  • Background_Matting, ASGD, cuda, no_foreach: +104.39196%
  • Background_Matting, ASGD, cuda, differentiable: +131.27337%
  • Background_Matting, ASGD, cuda, foreach: +95.64608%
  • tacotron2, ASGD, cuda, default: +119.66504%
  • tacotron2, ASGD, cuda, maximize: +107.00849%
  • tacotron2, ASGD, cuda, no_foreach: +140.78569%
  • tacotron2, ASGD, cuda, differentiable: +129.29204%
  • tacotron2, ASGD, cuda, foreach: +121.62964%
  • llama, ASGD, cuda, default: -37.89294%
  • llama, ASGD, cuda, maximize: -37.69863%
  • llama, ASGD, cuda, foreach: -39.76405%
  • demucs, ASGD, cuda, default: +133.49940%
  • demucs, ASGD, cuda, maximize: +100.55682%
  • demucs, ASGD, cuda, no_foreach: +145.58503%
  • demucs, ASGD, cuda, differentiable: +136.17159%
  • demucs, ASGD, cuda, foreach: +129.51777%
  • pytorch_unet, ASGD, cuda, default: +89.85632%
  • pytorch_unet, ASGD, cuda, maximize: +88.41714%
  • pytorch_unet, ASGD, cuda, no_foreach: +133.38601%
  • pytorch_unet, ASGD, cuda, differentiable: +124.86886%
  • pytorch_unet, ASGD, cuda, foreach: +95.00974%
  • hf_Albert, ASGD, cuda, default: +105.88921%
  • hf_Albert, ASGD, cuda, maximize: +95.25225%
  • hf_Albert, ASGD, cuda, no_foreach: +113.10521%
  • hf_Albert, ASGD, cuda, differentiable: +112.50182%
  • hf_Albert, ASGD, cuda, foreach: +121.51764%
  • tts_angular, ASGD, cuda, default: +111.68752%
  • tts_angular, ASGD, cuda, maximize: +91.98089%
  • tts_angular, ASGD, cuda, no_foreach: +106.44684%
  • tts_angular, ASGD, cuda, differentiable: +109.35268%
  • tts_angular, ASGD, cuda, foreach: +124.90487%
  • timm_nfnet, ASGD, cuda, default: +109.12394%
  • timm_nfnet, ASGD, cuda, maximize: +99.66265%
  • timm_nfnet, ASGD, cuda, no_foreach: +112.02449%
  • timm_nfnet, ASGD, cuda, differentiable: +120.72541%
  • timm_nfnet, ASGD, cuda, foreach: +97.20528%
  • dcgan, ASGD, cuda, default: +80.42478%
  • dcgan, ASGD, cuda, maximize: +74.82730%
  • dcgan, ASGD, cuda, no_foreach: +93.25956%
  • dcgan, ASGD, cuda, differentiable: +106.39970%
  • dcgan, ASGD, cuda, foreach: +84.90491%
  • moco, ASGD, cuda, default: +85.82898%
  • moco, ASGD, cuda, maximize: +79.96736%
  • moco, ASGD, cuda, no_foreach: +122.59127%
  • moco, ASGD, cuda, differentiable: +116.06429%
  • moco, ASGD, cuda, foreach: +100.20179%
  • detectron2_maskrcnn_r_101_fpn, ASGD, cuda, default: +121.34472%
  • detectron2_maskrcnn_r_101_fpn, ASGD, cuda, maximize: +102.83182%
  • detectron2_maskrcnn_r_101_fpn, ASGD, cuda, no_foreach: +119.60952%
  • detectron2_maskrcnn_r_101_fpn, ASGD, cuda, differentiable: +106.19279%
  • detectron2_maskrcnn_r_101_fpn, ASGD, cuda, foreach: +107.71527%
  • detectron2_maskrcnn, ASGD, cuda, default: +117.57517%
  • detectron2_maskrcnn, ASGD, cuda, maximize: +103.30191%
  • detectron2_maskrcnn, ASGD, cuda, no_foreach: +116.90257%
  • detectron2_maskrcnn, ASGD, cuda, differentiable: +117.56160%
  • detectron2_maskrcnn, ASGD, cuda, foreach: +110.69690%
  • mobilenet_v2, ASGD, cuda, default: +96.95437%
  • mobilenet_v2, ASGD, cuda, maximize: +91.66412%
  • mobilenet_v2, ASGD, cuda, no_foreach: +105.93441%
  • mobilenet_v2, ASGD, cuda, differentiable: +106.24384%
  • mobilenet_v2, ASGD, cuda, foreach: +101.98825%
  • phlippe_densenet, ASGD, cuda, default: +90.63482%
  • phlippe_densenet, ASGD, cuda, maximize: +82.98210%
  • phlippe_densenet, ASGD, cuda, no_foreach: +109.94099%
  • phlippe_densenet, ASGD, cuda, differentiable: +99.62891%
  • phlippe_densenet, ASGD, cuda, foreach: +102.07940%
  • stable_diffusion, ASGD, cuda, default: +173.16434%
  • stable_diffusion, ASGD, cuda, maximize: +172.79215%
  • stable_diffusion, ASGD, cuda, no_foreach: +133.62548%
  • stable_diffusion, ASGD, cuda, differentiable: +150.74043%
  • stable_diffusion, ASGD, cuda, foreach: +193.07518%
  • detectron2_fasterrcnn_r_101_dc5, ASGD, cuda, default: +212.36047%
  • detectron2_fasterrcnn_r_101_dc5, ASGD, cuda, maximize: +183.13374%
  • detectron2_fasterrcnn_r_101_dc5, ASGD, cuda, no_foreach: +157.73100%
  • detectron2_fasterrcnn_r_101_dc5, ASGD, cuda, differentiable: +158.22935%
  • detectron2_fasterrcnn_r_101_dc5, ASGD, cuda, foreach: +214.21778%
  • Super_SloMo, ASGD, cuda, default: +106.84278%
  • Super_SloMo, ASGD, cuda, maximize: +91.77756%
  • Super_SloMo, ASGD, cuda, no_foreach: +120.37297%
  • Super_SloMo, ASGD, cuda, differentiable: +107.75837%
  • Super_SloMo, ASGD, cuda, foreach: +105.20420%
  • timm_efficientnet, ASGD, cuda, default: +100.11609%
  • timm_efficientnet, ASGD, cuda, maximize: +96.02220%
  • timm_efficientnet, ASGD, cuda, no_foreach: +117.61484%
  • timm_efficientnet, ASGD, cuda, differentiable: +126.42353%
  • timm_efficientnet, ASGD, cuda, foreach: +104.98403%
  • shufflenet_v2_x1_0, ASGD, cuda, default: +113.97946%
  • shufflenet_v2_x1_0, ASGD, cuda, maximize: +94.74763%
  • shufflenet_v2_x1_0, ASGD, cuda, no_foreach: +126.35174%
  • shufflenet_v2_x1_0, ASGD, cuda, differentiable: +110.48815%
  • shufflenet_v2_x1_0, ASGD, cuda, foreach: +104.07755%
  • yolov3, ASGD, cuda, default: +78.25093%
  • yolov3, ASGD, cuda, maximize: +77.13986%
  • yolov3, ASGD, cuda, no_foreach: +93.41386%
  • yolov3, ASGD, cuda, differentiable: +90.89548%
  • yolov3, ASGD, cuda, foreach: +80.57876%
  • basic_gnn_edgecnn, ASGD, cuda, default: +99.06302%
  • basic_gnn_edgecnn, ASGD, cuda, maximize: +83.71059%
  • basic_gnn_edgecnn, ASGD, cuda, no_foreach: +113.92414%
  • basic_gnn_edgecnn, ASGD, cuda, differentiable: +110.96076%
  • basic_gnn_edgecnn, ASGD, cuda, foreach: +98.60346%
  • hf_Reformer, ASGD, cuda, default: +91.67268%
  • hf_Reformer, ASGD, cuda, maximize: +82.64084%
  • hf_Reformer, ASGD, cuda, no_foreach: +125.85424%
  • hf_Reformer, ASGD, cuda, differentiable: +105.94022%
  • hf_Reformer, ASGD, cuda, foreach: +90.47308%
  • fambench_xlmr, ASGD, cuda, default: +201.16636%
  • fambench_xlmr, ASGD, cuda, maximize: +188.32381%
  • fambench_xlmr, ASGD, cuda, no_foreach: +140.31486%
  • fambench_xlmr, ASGD, cuda, differentiable: +148.74302%
  • fambench_xlmr, ASGD, cuda, foreach: +203.57251%
  • hf_Bert_large, ASGD, cuda, default: +165.08672%
  • hf_Bert_large, ASGD, cuda, maximize: +151.30732%
  • hf_Bert_large, ASGD, cuda, no_foreach: +128.21581%
  • hf_Bert_large, ASGD, cuda, differentiable: +126.48303%
  • hf_Bert_large, ASGD, cuda, foreach: +158.99745%
  • hf_GPT2, ASGD, cuda, default: +160.72487%
  • hf_GPT2, ASGD, cuda, maximize: +160.21525%
  • hf_GPT2, ASGD, cuda, no_foreach: +138.23310%
  • hf_GPT2, ASGD, cuda, differentiable: +133.31532%
  • hf_GPT2, ASGD, cuda, foreach: +173.83932%
  • pytorch_stargan, ASGD, cuda, default: +87.11727%
  • pytorch_stargan, ASGD, cuda, maximize: +79.06760%
  • pytorch_stargan, ASGD, cuda, no_foreach: +114.98173%
  • pytorch_stargan, ASGD, cuda, differentiable: +109.09947%
  • pytorch_stargan, ASGD, cuda, foreach: +95.94788%
  • nanogpt_generate, ASGD, cuda, default: +159.68400%
  • nanogpt_generate, ASGD, cuda, maximize: +150.49595%
  • nanogpt_generate, ASGD, cuda, no_foreach: +138.45369%
  • nanogpt_generate, ASGD, cuda, differentiable: +136.21444%
  • nanogpt_generate, ASGD, cuda, foreach: +172.49098%
  • resnet152, ASGD, cuda, default: +104.25649%
  • resnet152, ASGD, cuda, maximize: +96.57891%
  • resnet152, ASGD, cuda, no_foreach: +119.36834%
  • resnet152, ASGD, cuda, differentiable: +121.75067%
  • resnet152, ASGD, cuda, foreach: +107.04176%
  • hf_Whisper, ASGD, cuda, default: +103.45888%
  • hf_Whisper, ASGD, cuda, maximize: +89.34926%
  • hf_Whisper, ASGD, cuda, no_foreach: +125.70960%
  • hf_Whisper, ASGD, cuda, differentiable: +123.69277%
  • hf_Whisper, ASGD, cuda, foreach: +109.05051%
  • maml, ASGD, cuda, default: +92.78801%
  • maml, ASGD, cuda, maximize: +82.35410%
  • maml, ASGD, cuda, no_foreach: +100.27183%
  • maml, ASGD, cuda, differentiable: +109.62979%
  • maml, ASGD, cuda, foreach: +98.58460%
  • detectron2_fasterrcnn_r_50_dc5, ASGD, cuda, default: +171.64929%
  • detectron2_fasterrcnn_r_50_dc5, ASGD, cuda, maximize: +138.27427%
  • detectron2_fasterrcnn_r_50_dc5, ASGD, cuda, no_foreach: +179.65960%
  • detectron2_fasterrcnn_r_50_dc5, ASGD, cuda, differentiable: +187.33681%
  • detectron2_fasterrcnn_r_50_dc5, ASGD, cuda, foreach: +167.31098%
  • hf_Bart, ASGD, cuda, default: +138.82232%
  • hf_Bart, ASGD, cuda, maximize: +130.23887%
  • hf_Bart, ASGD, cuda, no_foreach: +123.55852%
  • hf_Bart, ASGD, cuda, differentiable: +130.70510%
  • hf_Bart, ASGD, cuda, foreach: +134.35986%
  • cm3leon_generate, ASGD, cuda, default: +198.06865%
  • cm3leon_generate, ASGD, cuda, maximize: +151.59751%
  • cm3leon_generate, ASGD, cuda, no_foreach: +154.95043%
  • cm3leon_generate, ASGD, cuda, differentiable: +165.66967%
  • cm3leon_generate, ASGD, cuda, foreach: +197.54174%
  • mobilenet_v3_large, ASGD, cuda, default: +104.23360%
  • mobilenet_v3_large, ASGD, cuda, maximize: +93.58383%
  • mobilenet_v3_large, ASGD, cuda, no_foreach: +119.33540%
  • mobilenet_v3_large, ASGD, cuda, differentiable: +123.04459%
  • mobilenet_v3_large, ASGD, cuda, foreach: +110.83179%
  • hf_T5_base, ASGD, cuda, default: +173.34395%
  • hf_T5_base, ASGD, cuda, maximize: +154.96389%
  • hf_T5_base, ASGD, cuda, no_foreach: +131.31289%
  • hf_T5_base, ASGD, cuda, differentiable: +121.65876%
  • hf_T5_base, ASGD, cuda, foreach: +184.47838%
  • hf_BigBird, ASGD, cuda, default: +150.36223%
  • hf_BigBird, ASGD, cuda, maximize: +140.99390%
  • hf_BigBird, ASGD, cuda, no_foreach: +137.36228%
  • hf_BigBird, ASGD, cuda, differentiable: +130.69910%
  • hf_BigBird, ASGD, cuda, foreach: +163.78429%
  • nvidia_deeprecommender, ASGD, cuda, default: +84.22858%
  • nvidia_deeprecommender, ASGD, cuda, maximize: +73.59427%
  • nvidia_deeprecommender, ASGD, cuda, no_foreach: +38.45586%
  • nvidia_deeprecommender, ASGD, cuda, differentiable: +38.19497%
  • nvidia_deeprecommender, ASGD, cuda, foreach: +84.71843%
  • DALLE2_pytorch, ASGD, cuda, default: +120.79583%
  • DALLE2_pytorch, ASGD, cuda, maximize: +101.18644%
  • DALLE2_pytorch, ASGD, cuda, no_foreach: +128.45109%
  • DALLE2_pytorch, ASGD, cuda, differentiable: +119.58039%
  • DALLE2_pytorch, ASGD, cuda, foreach: +104.22114%
  • resnet18, Adadelta, cuda, (pt2) no_foreach: -99.98704%
  • resnet18, Adagrad, cuda, (pt2) no_foreach: -99.98506%
  • resnet18, Adam, cuda, (pt2) no_foreach: +646679.56820%
  • resnet18, Adam, cuda, (pt2) foreach: -99.97264%
  • resnet18, Adam, cuda, (pt2) fused: +86.64405%
  • resnet18, AdamW, cuda, (pt2) fused: +99.35395%
  • resnet18, ASGD, cuda, default: +97.33380%
  • resnet18, ASGD, cuda, maximize: +96.01251%
  • resnet18, ASGD, cuda, (pt2) no_foreach: +135.46526%
  • resnet18, ASGD, cuda, no_foreach: +111.06213%
  • resnet18, ASGD, cuda, differentiable: +123.23425%
  • resnet18, ASGD, cuda, foreach: +97.16978%
  • resnet18, SGD, cuda, (pt2) foreach: +32.97657%
  • resnet18, Rprop, cuda, (pt2) foreach: -99.99361%
  • compile_time, resnet18, Adadelta, cuda, (pt2) no_foreach: +400.19988%
  • compile_time, resnet18, Adagrad, cuda, (pt2) no_foreach: +429.48024%
  • compile_time, resnet18, Adam, cuda, (pt2) foreach: +66977.29421%
  • compile_time, resnet18, Adam, cuda, (pt2) fused: +223.51315%
  • compile_time, resnet18, Adamax, cuda, (pt2) foreach: +302.31806%
  • compile_time, resnet18, Rprop, cuda, (pt2) foreach: -4850.60281%
  • compile_time, resnet18, NAdam, cuda, (pt2) foreach: +317.34996%
  • detectron2_maskrcnn_r_101_c4, ASGD, cuda, default: +117.24819%
  • detectron2_maskrcnn_r_101_c4, ASGD, cuda, maximize: +105.55289%
  • detectron2_maskrcnn_r_101_c4, ASGD, cuda, no_foreach: +124.03172%
  • detectron2_maskrcnn_r_101_c4, ASGD, cuda, differentiable: +116.10379%
  • detectron2_maskrcnn_r_101_c4, ASGD, cuda, foreach: +111.02508%
  • hf_T5_generate, ASGD, cuda, default: +125.90060%
  • hf_T5_generate, ASGD, cuda, maximize: +113.89290%
  • hf_T5_generate, ASGD, cuda, no_foreach: +127.24467%
  • hf_T5_generate, ASGD, cuda, differentiable: +124.64940%
  • hf_T5_generate, ASGD, cuda, foreach: +124.15655%
  • hf_Longformer, ASGD, cuda, default: +125.81518%
  • hf_Longformer, ASGD, cuda, maximize: +113.85564%
  • hf_Longformer, ASGD, cuda, no_foreach: +126.27509%
  • hf_Longformer, ASGD, cuda, differentiable: +115.95017%
  • hf_Longformer, ASGD, cuda, foreach: +131.63718%
  • timm_regnet, ASGD, cuda, default: +125.73607%
  • timm_regnet, ASGD, cuda, maximize: +100.80882%
  • timm_regnet, ASGD, cuda, no_foreach: +120.90401%
  • timm_regnet, ASGD, cuda, differentiable: +126.60772%
  • timm_regnet, ASGD, cuda, foreach: +109.82537%
  • hf_DistilBert, ASGD, cuda, default: +136.46659%
  • hf_DistilBert, ASGD, cuda, maximize: +154.20666%
  • hf_DistilBert, ASGD, cuda, no_foreach: +133.22829%
  • hf_DistilBert, ASGD, cuda, differentiable: +132.68567%
  • hf_DistilBert, ASGD, cuda, foreach: +141.10390%
  • pytorch_CycleGAN_and_pix2pix, ASGD, cuda, default: +85.50919%
  • pytorch_CycleGAN_and_pix2pix, ASGD, cuda, maximize: +79.91551%
  • pytorch_CycleGAN_and_pix2pix, ASGD, cuda, no_foreach: +114.22815%
  • pytorch_CycleGAN_and_pix2pix, ASGD, cuda, differentiable: +105.53023%
  • pytorch_CycleGAN_and_pix2pix, ASGD, cuda, foreach: +99.25474%

Tests that were no longer run on affected commit:

  • timm_vision_transformer_large, RAdam, cuda, (pt2) default: 0.07160614579916
  • timm_vision_transformer_large, RAdam, cuda, default: 0.08048462790126602
  • timm_vision_transformer_large, RAdam, cuda, (pt2) foreach: 0.07142815878614783
  • timm_vision_transformer_large, RAdam, cuda, foreach: 0.07780114312966664
  • timm_vision_transformer_large, NAdam, cuda, default: 0.07558898767456412
  • compile_time, timm_vision_transformer_large, RAdam, cuda, (pt2) default: 0.07164892243842283
  • compile_time, timm_vision_transformer_large, RAdam, cuda, (pt2) foreach: 0.06621869125713906

Tests that were newly added on affected commit:

  • timm_vision_transformer, ASGD, cuda, (pt2) default: 0.003701934844932773
  • timm_vision_transformer, ASGD, cuda, (pt2) foreach: 0.0037623849197256343
  • compile_time, timm_vision_transformer, ASGD, cuda, (pt2) default: 54.06454637941594
  • compile_time, timm_vision_transformer, ASGD, cuda, (pt2) foreach: 51.20224057789892
  • timm_vision_transformer_large, ASGD, cuda, (pt2) default: 0.023793959265781775
  • timm_vision_transformer_large, ASGD, cuda, (pt2) no_foreach: 206.14174058521166
  • timm_vision_transformer_large, ASGD, cuda, (pt2) foreach: 0.0820496737336119
  • compile_time, timm_vision_transformer_large, ASGD, cuda, (pt2) default: 427.85859452995163
  • compile_time, timm_vision_transformer_large, ASGD, cuda, (pt2) no_foreach: 13.715244447502016
  • compile_time, timm_vision_transformer_large, ASGD, cuda, (pt2) foreach: 458.7825940608357
  • doctr_det_predictor, Adadelta, cuda, default: 0.0028802707511931657
  • doctr_det_predictor, Adadelta, cuda, maximize: 0.0031606352096423505
  • doctr_det_predictor, Adadelta, cuda, no_foreach: 0.017995787295512856
  • doctr_det_predictor, Adadelta, cuda, differentiable: 0.021498963190242647
  • doctr_det_predictor, Adadelta, cuda, foreach: 0.0025630442099645735
  • doctr_det_predictor, Adagrad, cuda, default: 0.003945592548698187
  • doctr_det_predictor, Adagrad, cuda, maximize: 0.0039761970192193985
  • doctr_det_predictor, Adagrad, cuda, no_foreach: 0.009838954374815028
  • doctr_det_predictor, Adagrad, cuda, differentiable: 0.010537376883439718
  • doctr_det_predictor, Adagrad, cuda, foreach: 0.0033657571813091635
  • doctr_det_predictor, Adam, cuda, default: 0.003894355911761522
  • doctr_det_predictor, Adam, cuda, amsgrad, maximize: 0.004776262207888067
  • doctr_det_predictor, Adam, cuda, no_foreach: 0.01696098819375038
  • doctr_det_predictor, Adam, cuda, differentiable: 0.0349869754165411
  • doctr_det_predictor, Adam, cuda, foreach: 0.0039052480412647126
  • doctr_det_predictor, Adam, cuda, foreach, maximize, capturable: 0.00634846021886915
  • doctr_det_predictor, Adam, cuda, foreach, maximize, capturable, amsgrad: 0.006354622212238609
  • doctr_det_predictor, Adam, cuda, fused: 0.0013104215636849403
  • doctr_det_predictor, Adam, cuda, fused, amsgrad, maximize: 0.0013829589146189393
  • doctr_det_predictor, Adam, cuda, fused, capturable: 0.0013211383065208793
  • doctr_det_predictor, Adam, cuda, fused, capturable, amsgrad: 0.0013856540946289896
  • doctr_det_predictor, AdamW, cuda, default: 0.003750761039555073
  • doctr_det_predictor, AdamW, cuda, amsgrad, maximize: 0.0046320953080430625
  • doctr_det_predictor, AdamW, cuda, no_foreach: 0.01873129182495177
  • doctr_det_predictor, AdamW, cuda, differentiable: 0.03530704751610756
  • doctr_det_predictor, AdamW, cuda, foreach: 0.003599941679276526
  • doctr_det_predictor, AdamW, cuda, foreach, maximize, capturable: 0.006230019740760326
  • doctr_det_predictor, AdamW, cuda, foreach, maximize, capturable, amsgrad: 0.006564915589988232
  • doctr_det_predictor, AdamW, cuda, fused: 0.0013546907948330045
  • doctr_det_predictor, AdamW, cuda, fused, amsgrad, maximize: 0.0014247736567631363
  • doctr_det_predictor, AdamW, cuda, fused, capturable: 0.001353818892966956
  • doctr_det_predictor, AdamW, cuda, fused, capturable, amsgrad: 0.0014276149054057896
  • doctr_det_predictor, Adamax, cuda, default: 0.018037584936246277
  • doctr_det_predictor, Adamax, cuda, maximize: 0.019045850331895053
  • doctr_det_predictor, Adamax, cuda, no_foreach: 0.02449008859694004
  • doctr_det_predictor, Adamax, cuda, differentiable: 0.029248994728550314
  • doctr_det_predictor, Adamax, cuda, foreach: 0.01793181947432458
  • doctr_det_predictor, ASGD, cuda, default: 0.014260424789972602
  • doctr_det_predictor, ASGD, cuda, maximize: 0.014247538428753615
  • doctr_det_predictor, ASGD, cuda, no_foreach: 0.028070646012201904
  • doctr_det_predictor, ASGD, cuda, differentiable: 0.027313575381413102
  • doctr_det_predictor, ASGD, cuda, foreach: 0.014710384048521518
  • doctr_det_predictor, SGD, cuda, default: 0.0007995309079997241
  • doctr_det_predictor, SGD, cuda, maximize: 0.0014174172608181833
  • doctr_det_predictor, SGD, cuda, no_foreach: 0.0017253931146115065
  • doctr_det_predictor, SGD, cuda, differentiable: 0.0008048358295733729
  • doctr_det_predictor, SGD, cuda, foreach: 0.000608943731058389
  • doctr_det_predictor, SGD, cuda, foreach, momentum=0.9, nesterov: 0.0009989061842982968
  • doctr_det_predictor, SGD, cuda, foreach, momentum=0.9: 0.0008235378485793868
  • doctr_det_predictor, RAdam, cuda, default: 0.004357524802908301
  • doctr_det_predictor, RAdam, cuda, no_foreach: 0.027767344983294605
  • doctr_det_predictor, RAdam, cuda, differentiable: 0.028750475216656923
  • doctr_det_predictor, RAdam, cuda, foreach: 0.004138795551843941
  • doctr_det_predictor, Rprop, cuda, default: 0.021949287690222263
  • doctr_det_predictor, Rprop, cuda, maximize: 0.022139092488214374
  • doctr_det_predictor, Rprop, cuda, no_foreach: 0.0354912742972374
  • doctr_det_predictor, Rprop, cuda, differentiable: 0.03846609420143068
  • doctr_det_predictor, Rprop, cuda, foreach: 0.022724414803087713
  • doctr_det_predictor, RMSprop, cuda, default: 0.0019387207855470477
  • doctr_det_predictor, RMSprop, cuda, maximize: 0.002601792109198868
  • doctr_det_predictor, RMSprop, cuda, no_foreach: 0.00912425147059063
  • doctr_det_predictor, RMSprop, cuda, differentiable: 0.009757765879233679
  • doctr_det_predictor, RMSprop, cuda, foreach: 0.0017368051270022988
  • doctr_det_predictor, NAdam, cuda, default: 0.005110027710907161
  • doctr_det_predictor, NAdam, cuda, no_foreach: 0.021710659796372055
  • doctr_det_predictor, NAdam, cuda, differentiable: 0.03594747381284833
  • doctr_det_predictor, NAdam, cuda, foreach: 0.004948554779402912
  • doctr_reco_predictor, Adadelta, cuda, default: 0.0015303045674227179
  • doctr_reco_predictor, Adadelta, cuda, maximize: 0.0017265696777030827
  • doctr_reco_predictor, Adadelta, cuda, no_foreach: 0.0067323700617998835
  • doctr_reco_predictor, Adadelta, cuda, differentiable: 0.00797147342003882
  • doctr_reco_predictor, Adadelta, cuda, foreach: 0.0014914630353450775
  • doctr_reco_predictor, Adagrad, cuda, default: 0.0014977923524565995
  • doctr_reco_predictor, Adagrad, cuda, maximize: 0.0014047086122445762
  • doctr_reco_predictor, Adagrad, cuda, no_foreach: 0.0037795773101970552
  • doctr_reco_predictor, Adagrad, cuda, differentiable: 0.0040812154300510885
  • doctr_reco_predictor, Adagrad, cuda, foreach: 0.001298849075101316
  • doctr_reco_predictor, Adam, cuda, default: 0.0014606264233589172
  • doctr_reco_predictor, Adam, cuda, amsgrad, maximize: 0.0017359229386784136
  • doctr_reco_predictor, Adam, cuda, no_foreach: 0.006150229680351913
  • doctr_reco_predictor, Adam, cuda, differentiable: 0.013960111676715313
  • doctr_reco_predictor, Adam, cuda, foreach: 0.0014205886446870862
  • doctr_reco_predictor, Adam, cuda, foreach, maximize, capturable: 0.002500549927353859
  • doctr_reco_predictor, Adam, cuda, foreach, maximize, capturable, amsgrad: 0.0024232029216364028
  • doctr_reco_predictor, Adam, cuda, fused: 0.0005264460835605859
  • doctr_reco_predictor, Adam, cuda, fused, amsgrad, maximize: 0.0007339034359902144
  • doctr_reco_predictor, Adam, cuda, fused, capturable: 0.000526783952023834
  • doctr_reco_predictor, Adam, cuda, fused, capturable, amsgrad: 0.000733910609036684
  • doctr_reco_predictor, AdamW, cuda, default: 0.0014952924964018166
  • doctr_reco_predictor, AdamW, cuda, amsgrad, maximize: 0.0017838270077481866
  • doctr_reco_predictor, AdamW, cuda, no_foreach: 0.006765100718475878
  • doctr_reco_predictor, AdamW, cuda, differentiable: 0.012846773536875845
  • doctr_reco_predictor, AdamW, cuda, foreach: 0.0015132253337651492
  • doctr_reco_predictor, AdamW, cuda, foreach, maximize, capturable: 0.0023649467574432493
  • doctr_reco_predictor, AdamW, cuda, foreach, maximize, capturable, amsgrad: 0.0025600428320467473
  • doctr_reco_predictor, AdamW, cuda, fused: 0.0005436234101653099
  • doctr_reco_predictor, AdamW, cuda, fused, amsgrad, maximize: 0.0007554181362502277
  • doctr_reco_predictor, AdamW, cuda, fused, capturable: 0.0005439776740968227
  • doctr_reco_predictor, AdamW, cuda, fused, capturable, amsgrad: 0.0007581162578426301
  • doctr_reco_predictor, Adamax, cuda, default: 0.0063193910196423534
  • doctr_reco_predictor, Adamax, cuda, maximize: 0.006546114273369312
  • doctr_reco_predictor, Adamax, cuda, no_foreach: 0.008442916348576546
  • doctr_reco_predictor, Adamax, cuda, differentiable: 0.010225607780739665
  • doctr_reco_predictor, Adamax, cuda, foreach: 0.0063998927315697075
  • doctr_reco_predictor, ASGD, cuda, default: 0.005111781670711935
  • doctr_reco_predictor, ASGD, cuda, maximize: 0.005300870798528194
  • doctr_reco_predictor, ASGD, cuda, no_foreach: 0.00970820013123254
  • doctr_reco_predictor, ASGD, cuda, differentiable: 0.009756598388776183
  • doctr_reco_predictor, ASGD, cuda, foreach: 0.005147161749191582
  • doctr_reco_predictor, SGD, cuda, default: 0.00033027638401836155
  • doctr_reco_predictor, SGD, cuda, maximize: 0.0005508650196716189
  • doctr_reco_predictor, SGD, cuda, no_foreach: 0.000654176879208535
  • doctr_reco_predictor, SGD, cuda, differentiable: 0.00033117017801851035
  • doctr_reco_predictor, SGD, cuda, foreach: 0.00024379345728084444
  • doctr_reco_predictor, SGD, cuda, foreach, momentum=0.9, nesterov: 0.0006105483700521291
  • doctr_reco_predictor, SGD, cuda, foreach, momentum=0.9: 0.0004410154549404979
  • doctr_reco_predictor, RAdam, cuda, default: 0.0016630525072105229
  • doctr_reco_predictor, RAdam, cuda, no_foreach: 0.009923019562847912
  • doctr_reco_predictor, RAdam, cuda, differentiable: 0.010303066140040755
  • doctr_reco_predictor, RAdam, cuda, foreach: 0.0016399111156351863
  • doctr_reco_predictor, Rprop, cuda, default: 0.008071534298360349
  • doctr_reco_predictor, Rprop, cuda, maximize: 0.008054946968331932
  • doctr_reco_predictor, Rprop, cuda, no_foreach: 0.0125361071433872
  • doctr_reco_predictor, Rprop, cuda, differentiable: 0.013898213882930577
  • doctr_reco_predictor, Rprop, cuda, foreach: 0.007811435428448021
  • doctr_reco_predictor, RMSprop, cuda, default: 0.0007729522828012705
  • doctr_reco_predictor, RMSprop, cuda, maximize: 0.0009816498712946972
  • doctr_reco_predictor, RMSprop, cuda, no_foreach: 0.0032276885583996775
  • doctr_reco_predictor, RMSprop, cuda, differentiable: 0.0036167620681226255
  • doctr_reco_predictor, RMSprop, cuda, foreach: 0.0006942034750245512
  • doctr_reco_predictor, NAdam, cuda, default: 0.0020321781514212487
  • doctr_reco_predictor, NAdam, cuda, no_foreach: 0.007569201388396323
  • doctr_reco_predictor, NAdam, cuda, differentiable: 0.013424197630956768
  • doctr_reco_predictor, NAdam, cuda, foreach: 0.0019183275150135158
  • resnet18, ASGD, cuda, (pt2) default: 0.0017022616861356516
  • resnet18, ASGD, cuda, (pt2) foreach: 0.0017011328324107295
  • compile_time, resnet18, ASGD, cuda, (pt2) default: 17.65067977675547
  • compile_time, resnet18, ASGD, cuda, (pt2) foreach: 16.845290329307318

Runtime regressions found?
An errors log was found. Please investigate runtime errors by looking into the logs of the workflow linked.

GitHub workflow that triggered this issue: https://github.com/pytorch/benchmark/actions/runs/6128788174

cc @janeyx99

@janeyx99
Copy link
Contributor

janeyx99 commented Sep 9, 2023

@mlazos moving the etas and mus and step to device doubles the runtime for eager on foreach and singletensor. I’ll probably open a PR to make it not the default

@janeyx99 janeyx99 changed the title Optim Perf Signal Detected by TorchBench CI on '806d1a871ddfd2d38e1791489892009feaec8425' ASGD perf regression Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant