Merge from the main branch by desertfire · Pull Request #796 · pytorch/benchmark

desertfire · 2022-03-14T20:07:24Z

No description provided.

Summary: Without CUDA Graph (with batch size 32): ``` $ python run.py resnet50 -t train -d cuda Running train method from resnet50 on cuda in eager mode. GPU Time: 1034.030 milliseconds CPU Dispatch Time: 1026.865 milliseconds CPU Total Wall Time: 1034.011 milliseconds ``` With CUDA Graph (with batch size 32): ``` $ python run.py resnet50 -t train -d cuda --train_cudagraph Running train method from resnet50 on cuda in eager mode. GPU Time: 1038.927 milliseconds CPU Dispatch Time: 346.313 milliseconds CPU Total Wall Time: 1038.941 milliseconds ``` # Latency by batch size (Train, fp32, on V100) <google-sheets-html-origin> Batch Size | Eager (ms) | CUDA Graph (ms) | Speedup -- | -- | -- | -- 1 | 89.033 | 50.233 | 43.58% 2 | 92.854 | 56.13 | 39.55% 4 | 93.676 | 71.465 | 23.71% 8 | 105.099 | 105.381 | -0.27% 16 | 167.292 | 167.966 | -0.40% 32 | 297.315 | 297.989 | -0.23% 64 | 561.262 | 562.029 | -0.14% Pull Request resolved: #706 Reviewed By: ngimel Differential Revision: D33720966 Pulled By: xuzhao9 fbshipit-source-id: 8d422d597a879488d14361466172d32a1eeb1f19

Summary: This PR fixes a few bugs in v2 bisection workflow. It also updates the new V2 config file to use the latest reference run results. Pull Request resolved: #709 Reviewed By: erichan1 Differential Revision: D33747367 Pulled By: xuzhao9 fbshipit-source-id: 64ec3f967ea0efba8e44cff4802ed761c58b6113

Summary: This fixes #710. The Dev Infra team is migrating off from RDS, so we are removing the code related to RDS uploading. Pull Request resolved: #712 Reviewed By: erichan1 Differential Revision: D33777915 Pulled By: xuzhao9 fbshipit-source-id: 35e1a97d2286d0ef236fdaa5e1693765093c69ea

Summary: These files are not used, but they causes dependbot to complain about using deprecated numpy versions (such as #714 and #713 ) Pull Request resolved: #716 Reviewed By: aaronenyeshi Differential Revision: D33781655 Pulled By: xuzhao9 fbshipit-source-id: 85dd00d85a21ca0d1b1da68bd569b143f133487a

Summary: Pull Request resolved: #719 Reviewed By: xuzhao9 Differential Revision: D33806240 Pulled By: jansel fbshipit-source-id: 389f0ffe6bfa18cb996720bdc89a9837b75fa5fc

Summary: Use `os.path.realpath()` to get the absolute path of the current file. Pull Request resolved: #721 Reviewed By: jansel Differential Revision: D33829270 Pulled By: xuzhao9 fbshipit-source-id: b07ab6e5190b02f0ea0a685fa38965fad196c320

Summary: Use `py38` and `cu113` as the default version to generate nightly configs. This PR also fixes a bug in the abtest config generation script which uses the `git_version` instead of master commit hash. Pull Request resolved: #718 Reviewed By: erichan1 Differential Revision: D33797706 Pulled By: xuzhao9 fbshipit-source-id: 4cfe45f8ddc13e84683ac0dc6491c8b229393d93

Summary: When there is no anomaly detected, we should remove the "tests" and "details" key from the dict. Pull Request resolved: #728 Reviewed By: erichan1 Differential Revision: D33899737 Pulled By: xuzhao9 fbshipit-source-id: c3377c74bcc4e3afb7a8f36ad994915f2933d2e4

Summary: This PR is to add the testing linux.2xlarge runner to pytorch/benchmark. Pull Request resolved: #729 Reviewed By: seemethere Differential Revision: D33905554 Pulled By: xuzhao9 fbshipit-source-id: 1034b4a2c50e467d708884de03a324a4fcb1f1f7

Summary: Pull Request resolved: #722 Reviewed By: wconstab Differential Revision: D33847275 Pulled By: jansel fbshipit-source-id: 3bc63248fbfbab1d61e234e101bd6b5c5e8faf40

Summary: Change to train_bs in training section. Pull Request resolved: #731 Reviewed By: xuzhao9 Differential Revision: D33936661 Pulled By: erichan1 fbshipit-source-id: 6b8729234691e21a8dfd86d3deb58d4a147f00b2

Summary: This PR enables torch_tensorrt (https://github.com/NVIDIA/Torch-TensorRT) to torchvision and timm models. It works on all timm models except timm_nfnet(pytorch/TensorRT#849), but it doesn't work on any of the torchvision models. Still looking into the root cause. Run with command: `python run.py timm_efficientnet -d cuda -t eval --torch_tensorrt` returns: ``` GPU Time: 19.196 milliseconds CPU Dispatch Time: 10.182 milliseconds CPU Total Wall Time: 19.181 milliseconds ``` `python run.py mnasnet1_0 -d cuda -t eval --torch_tensorrt` returns: ``` :Running eval method from mnasnet1_0 on cuda in eager mode. Traceback (most recent call last): File "run.py", line 192, in <module> m = Model(device=args.device, jit=(args.mode == "jit"), extra_args=extra_args) File "/fsx/users/xzhao9/benchmark/torchbenchmark/models/mnasnet1_0/__init__.py", line 8, in __init__ super().__init__(model_name="mnasnet1_0", device=device, jit=jit, File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/framework/vision/model_factory.py", line 30, in __init__ apply_args(self, self.args) File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/framework/vision/args.py", line 43, in apply_args model.eval_model = enable_tensortrt(model.eval_example_inputs, args.eval_fp16, model.eval_model) File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/framework/vision/args.py", line 55, in enable_tensortrt return torch_tensorrt.compile(eval_model, inputs=trt_input, enabled_precisions=enabled_precisions) File "/data/home/xzhao9/cluster/miniconda3/envs/py38/lib/python3.8/site-packages/torch_tensorrt/_compile.py", line 115, in compile return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs) File "/data/home/xzhao9/cluster/miniconda3/envs/py38/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py", line 119, in compile compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec)) RuntimeError: temporary: the only valid use of a module is looking up an attribute but found = prim::SetAttr[name="num_batches_tracked"](%_15, %1440) ``` Pull Request resolved: #732 Reviewed By: yinghai Differential Revision: D33986641 Pulled By: xuzhao9 fbshipit-source-id: 54d15a977b2b04dc5358f25f9cc2609554744944

Summary: This PR adds a new argument, 'test', which can be 'train' and 'eval', in the model initialization function. It has the following benefits: - At model initialization, we can choose to only initialize "train" or "eval" model datasets, to avoid wasting device memory - Separate the space of arguments. It is common that train and eval tests have different experimental features to apply. For example, optimize_for_inference, and tensorrt only work for eval. - Add an "extra_args" argument to the model, to enable more optional features in the future. Pull Request resolved: #735 Reviewed By: erichan1 Differential Revision: D34033140 Pulled By: xuzhao9 fbshipit-source-id: 449a963b1ee7ef450b0a5ccb24718d394dd8be40

…ed. (#736) Summary: This PR removes two models, maml and maml_omiglot. We are working on another round of enhancing models with tensorrt and other features, since these two models are in low quality and bad gpu utilization, and it is not worth to work on these models. So I am removing them for now. Pull Request resolved: #736 Reviewed By: erichan1 Differential Revision: D34018844 Pulled By: xuzhao9 fbshipit-source-id: 70da2086bf9782a351f9f8e79238d4d2029229ad

Summary: Pull Request resolved: pytorch/pytorch#72499 Pull Request resolved: #740 To fx2trt out of tree to remove bloatness of PyTorch core. It's the first and major step. Next, we will move acc_tracer out of the tree and rearrange some fx passes. Reviewed By: suo Differential Revision: D34065866 fbshipit-source-id: c72b7ad752d0706abd9a63caeef48430e85ec56d

Summary: This PR adds a new `run_sweep.py` script to TorchBench. It runs all specified model tests in subprocess worker, and apply optional arguments to them. Currently, only batch size is supported, we will support `fx2trt` and `torch_tensorrt` in a follow-up PR. Pull Request resolved: #742 Reviewed By: erichan1 Differential Revision: D34134739 Pulled By: xuzhao9 fbshipit-source-id: f798f415f976d7e3deee7ffed35f4cbfdb7487f0

Summary: Still, fp16 is not supported. I will try to address the fp16 issue in a follow-up PR. Sweep all models with command: `python run_sweep.py -d cuda -t eval --fx2trt` Result: ``` Running model BERT_pytorch ... [ TypeError ] Running model Background_Matting ... [ NotImplemented ] Running model LearningToPaint ... [ TypeError ] Running model Super_SloMo ... [ NameError ] Running model alexnet ... [ OK ] Running model attention_is_all_you_need_pytorch ... [ TypeError ] Running model dcgan ... [ OK ] Running model demucs ... [ NameError ] Running model densenet121 ... [ NotImplemented ] Running model detectron2_maskrcnn ... [ UnserializableException ] Running model dlrm ... [ RuntimeError ] Running model drq ... [ UnserializableException ] Running model fastNLP_Bert ... [ UnserializableException ] Running model hf_Albert ... [ ValueError ] Running model hf_Bart ... [ TypeError ] Running model hf_Bert ... [ ValueError ] Running model hf_BigBird ... [ ValueError ] Running model hf_DistilBert ... [ ValueError ] Running model hf_GPT2 ... [ ValueError ] Running model hf_Longformer ... [ ValueError ] Running model hf_Reformer ... [ ValueError ] Running model hf_T5 ... [ TypeError ] Running model mnasnet1_0 ... [ OK ] Running model mobilenet_v2 ... [ OK ] Running model mobilenet_v2_quantized_qat ... [ RuntimeError ] Running model mobilenet_v3_large ... [ OK ] Running model moco ... [ UnserializableException ] Running model nvidia_deeprecommender ... [ OK ] Running model opacus_cifar10 ... [ UnserializableException ] Running model pyhpc_equation_of_state ... [ TypeError ] Running model pyhpc_isoneutral_mixing ... [ TypeError ] Running model pyhpc_turbulent_kinetic_energy ... [ TypeError ] Running model pytorch_CycleGAN_and_pix2pix ... [ NotImplemented ] Running model pytorch_stargan ... [ UnserializableException ] Running model pytorch_struct ... [ NotImplemented ] Running model pytorch_unet ... [ AttributeError ] Running model resnet18 ... [ OK ] Running model resnet50 ... [ OK ] Running model resnet50_quantized_qat ... [ RuntimeError ] Running model resnext50_32x4d ... [ OK ] Running model shufflenet_v2_x1_0 ... [ OK ] Running model soft_actor_critic ... [ UnserializableException ] Running model speech_transformer ... [ TypeError ] Running model squeezenet1_1 ... [ OK ] Running model tacotron2 ... [ NotImplemented ] Running model timm_efficientnet ... [ OK ] Running model timm_nfnet ... [ UnserializableException ] Running model timm_regnet ... [ UnserializableException ] Running model timm_resnest ... [ OK ] Running model timm_vision_transformer ... [ UnserializableException ] Running model timm_vovnet ... [ UnserializableException ] Running model tts_angular ... [ UnserializableException ] Running model vgg16 ... [ OK ] Running model vision_maskrcnn ... [ UnserializableException ] Running model yolov3 ... [ UnserializableException ] ``` Pull Request resolved: #743 Reviewed By: yinghai Differential Revision: D34151603 Pulled By: xuzhao9 fbshipit-source-id: 049d090584b2fc85499a214d96ae34c41d0a1c8e

Summary: Run command: ``` python run.py mnasnet1_0 -d cuda -t eval GPU Time: 12.160 milliseconds CPU Dispatch Time: 12.070 milliseconds CPU Total Wall Time: 12.184 milliseconds ``` ``` python run.py mnasnet1_0 -d cuda -t eval --nvfuser fuser1 GPU Time: 11.600 milliseconds CPU Dispatch Time: 11.305 milliseconds CPU Total Wall Time: 11.604 milliseconds ``` ``` python run.py mnasnet1_0 -d cuda -t eval --nvfuser fuser2 GPU Time: 11.609 milliseconds CPU Dispatch Time: 11.377 milliseconds CPU Total Wall Time: 11.610 milliseconds ``` Pull Request resolved: #744 Reviewed By: davidberard98 Differential Revision: D34107295 Pulled By: xuzhao9 fbshipit-source-id: 59e5d8f5d90484eb7ca744ebd5e486a21fbd0bdb

Summary: This PR is reverting #736 because we prematurely removed `maml` and `maml_omiglot` models without adding proper replacements. These models will later be replaced with a higher quality implementation from fewshot (https://github.com/oscarknagg/few-shot) Pull Request resolved: #752 Reviewed By: jansel Differential Revision: D34217403 Pulled By: xuzhao9 fbshipit-source-id: ee6ae9a553f218a8a378b491e0ddcce0cca5e965

…754) Summary: This PR enables [torch_trt](https://github.com/NVIDIA/Torch-TensorRT) module on all the models. Currently the library will segfault on some models, and the subprocess_worker needs to correctly handle that (otherwise, it will just hang forever because it is blocked by `os.read()` on a pipe whose input process is dead). We introduce the following mechanism to handle subprocess segfault: 1. The `Pipe` class stores the pid of the child process if the pipe is reading from the child process. 2. When the pipe reads, it always creates a thread that periodically checks the status of the other process at the other end. If the other process dies or is in zombie status, the threads writes a special string, `_DEAD`, into the pipe, together with its exit code. 3. The main thread in the process checks the return message in the pipe, if it finds the `_DEAD` message, throw an exception, which is handled in `subprocess_worker`. Pull Request resolved: #754 Reviewed By: robieta Differential Revision: D34263322 Pulled By: xuzhao9 fbshipit-source-id: 59fc6858d3ea498c3137f406f7b3843f70316d83

Summary: This PR fixes the following workflow failures: https://github.com/pytorch/benchmark/actions/runs/1847380869 fails because `git lfs` updates and needs to overwrite the hook, adding `--force` option to workaround the failed command. https://github.com/pytorch/benchmark/actions/runs/1837003034 fails because there are (unexpected) two pytorch nightly releases on the same day. In this case, use the one with higher version number. Pull Request resolved: #756 Reviewed By: erichan1 Differential Revision: D34286177 Pulled By: xuzhao9 fbshipit-source-id: c66b437363e8215a54a550aa353d608a39a67e18

) Summary: Pull Request resolved: #758 Reviewed By: frank-wei Differential Revision: D34294547 Pulled By: xuzhao9 fbshipit-source-id: 5c2834ffc11ed64f82489c56a290c3b9c569a24a

Summary: Currently the CI fails with "ModuleNotFoundError: No module named 'pygame'". The reason seems to be a recent update go gym, see openai/gym#2634 Pull Request resolved: #766 Reviewed By: xuzhao9 Differential Revision: D34423163 Pulled By: kit1980 fbshipit-source-id: b4dc3ceb32c80d9e7d86c828441c6ddb28320e51

Summary: We need to remove current jit code from each model directory and use a unified entry for all the transformations. This is because if we do the jit script first, then change the precision to fp16, the CI test will fail with error: https://app.circleci.com/pipelines/github/pytorch/benchmark/3665/workflows/da928033-03fa-48d0-90a4-788d3ee794ed/jobs/3771 However, I noticed different models are using different `torch.jit` APIs: 1) `torch.jit.script(model)`, 2) `torch.jit.script(model, example_inputs)`, 3) `torch.jit.trace(model, example_inputs)` 4) an extra `torch.jit.optimize_for_inference()` for inference. Which one should I use if we are sharing the jit scripting code for all the models? Krovatkin The current design is to use `torch.jit.trace(model, example_inputs)` by default. For models that need to call `torch.jit.trace()` (like nvidia_deeprecommender), or models that need to script multiple `torch.nn.Module` instances, they should add a callback function, `jit_callback(self)`, to handle the JIT enablement. Pull Request resolved: #761 Reviewed By: davidberard98 Differential Revision: D34396461 Pulled By: xuzhao9 fbshipit-source-id: b51ef60b8ee28c0bd910404d549cfb4a75c0ae28

Summary: This PR prepares adding the correctness checking code to eval tests: 1. Each `eval()` function now returns `Tuple[torch.Tensor]`, i.e., the inference result 2. Add a test to check 1) is true for every model 3. change `run_sweep.py` to prepare for the correctness checking A follow-up PR is #763, which adds the actual correctness calculation code. Pull Request resolved: #762 Reviewed By: wushirong Differential Revision: D34438166 Pulled By: xuzhao9 fbshipit-source-id: b876795485d5942727c3f3dad6ec44eef3250678

Summary: This PR adds the correctness testing code for TensorRT using cosine similarities. Example command and output on A100: ``` $ python run.py resnet18 -d cuda -t eval --fx2trt GPU Time: 0.613 milliseconds CPU Dispatch Time: 2.319 milliseconds CPU Total Wall Time: 2.647 milliseconds Correctness: 0.999990403652191 $ python run.py resnet18 -d cuda -t eval --fx2trt --no-fp16 GPU Time: 0.929 milliseconds CPU Dispatch Time: 2.295 milliseconds CPU Total Wall Time: 2.926 milliseconds Correctness: 0.999999642372131 $ python run.py alexnet -d cuda -t eval --fx2trt GPU Time: 0.582 milliseconds CPU Dispatch Time: 2.338 milliseconds CPU Total Wall Time: 2.646 milliseconds Corrnectness: 1.000000000000000 $ python run.py alexnet -d cuda -t eval --fx2trt --no-fp16 GPU Time: 0.885 milliseconds CPU Dispatch Time: 2.352 milliseconds CPU Total Wall Time: 2.937 milliseconds Corrnectness: 1.000000000000000 $ python run.py mobilenet_v3_large -d cuda -t eval --fx2trt GPU Time: 1.695 milliseconds CPU Dispatch Time: 4.424 milliseconds CPU Total Wall Time: 5.561 milliseconds Correctness: 0.999975979328156 $ python run.py mobilenet_v3_large -d cuda -t eval --fx2trt --no-fp16 GPU Time: 3.241 milliseconds CPU Dispatch Time: 3.069 milliseconds CPU Total Wall Time: 5.590 milliseconds Correctness: 0.999904215335846 ``` Pull Request resolved: #763 Reviewed By: frank-wei Differential Revision: D34438175 Pulled By: xuzhao9 fbshipit-source-id: c309009d9676628aa693e0037ee5068ee1a15c76

Summary: This PR cleans up the timm model code to use the same code entry point (`model_factory.py`), making it easier to make changes or add experimental features. It also cleans up the code related to setting up random seeds, so that all models share the same code path as part of initialization. Pull Request resolved: #772 Reviewed By: erichan1 Differential Revision: D34524605 Pulled By: xuzhao9 fbshipit-source-id: d8445ef0c5c66e9404616aeb67dc033ac27974b1

Summary: In OnDemand CI, the script may load `torchbenchmark` module without pytorch installed (at the first time of running the bisection script). This PR fixes a bug when it fails because `torch` package is not found. Pull Request resolved: #770 Reviewed By: erichan1 Differential Revision: D34481528 Pulled By: xuzhao9 fbshipit-source-id: 53068747565b33e3d7c4615aa8ad0b3562e0c46d

Summary: Pull Request resolved: #777 The core lowering component is taking a fx.GraphModule, and turning it into a lowered, `nn.Module` (generally speaking). Or more specifically, turning it into a `TRTModule` in the case of fx2trt. ``` [nn.Module, PassContext] -> [nn.Module, PassContext] ``` As a matter of fact, the above signature is just a general module transformation pass function we should have consolidated and used across our stack. Today this involves two steps: 1. Run TRTInterpreter 2. Turn the TRTInterpreterResult into a TRTModule We wrap it into the above pass function. Why? This is one step towards making it possible to swap in a different fx -> trt implementation, e.g., torch-tensorrt. (see [discussion](https://fb.workplace.com/groups/890926038157430/posts/1058116424771723/) Reviewed By: xuzhao9 Differential Revision: D34540677 fbshipit-source-id: 3c332767dcde0496df3096a66c5be9ddffd1bd7f

Summary: This PR adds the first end-to-end workload, hf_bert, to the suite that: - Supports both train and inference - By default, uses `amp.autocast()` to do fp16 train/inference - Currently, report latency and qps as performance metrics - Doesn't support multi-GPU workload yet (will support in the future) To run the benchmark, use: `python run_e2e.py hf_bert -t eval --fp16 [no|amp]`. For example, on A100: ``` $ python run_e2e.py hf_bert -t eval {"device": "cuda", "device_num": 1, "test": "eval", "num_examples": 1043, "batch_size": 1, "result": {"latency": 14.56970322, "qps": 71.58690772563314}} $ python run_e2e.py hf_bert -t train {"device": "cuda", "device_num": 1, "test": "train", "num_examples": 8576, "batch_size": 32, "result": {"latency": 36.95959081, "qps": 232.03720095514768}} ``` Pull Request resolved: #771 Reviewed By: erichan1 Differential Revision: D34529471 Pulled By: xuzhao9 fbshipit-source-id: a9f8b43c9e4e4ff30dfd76c1c88fe3948976fbd2

Summary: This PR adds fp16 amp to all models. Basically, it adds an autocast context to all eval tests: ``` with torch.cuda.amp.autocast(): eval() ``` I have the following concerns regarding to the current amp mode: 1. Some models don't support it. Example: BERT_pytorch, attention_is_all_you_need_pytorch Reproduction: ``` $ python run.py BERT_pytorch -d cuda --fp16 amp Running eval method from BERT_pytorch on cuda in eager mode. File "/fsx/users/xzhao9/benchmark/torchbenchmark/models/BERT_pytorch/bert_pytorch/model/attention/single.py", line 19, in forward scores = scores.masked_fill(mask == 0, -1e9) RuntimeError: value cannot be converted to type at::Half without overflow ``` 2. Some models don't return correct result. Example: dlrm, moco, pyhpc_turbulent_kinetic_energy Reproduction: ``` $ python run.py pyhpc_turbulent_kinetic_energy -d cuda --fp16 amp Running eval method from pyhpc_turbulent_kinetic_energy on cuda in eager mode. GPU Time: 7.316 milliseconds CPU Dispatch Time: 7.251 milliseconds CPU Total Wall Time: 7.350 milliseconds Correctness: 0.000000000000000 ``` 3. About 2/3 models slightly regress in performance in amp mode. Example: squeezenet1_1, alexnet Reproduction: ``` $ python run.py alexnet -d cuda --fp16 amp Running eval method from alexnet on cuda in eager mode. GPU Time: 1.475 milliseconds CPU Dispatch Time: 1.305 milliseconds CPU Total Wall Time: 1.509 milliseconds Correctness: 0.999999880790710 ``` ``` $ python run.py alexnet -d cuda --fp16 no Running eval method from alexnet on cuda in eager mode. GPU Time: 1.095 milliseconds CPU Dispatch Time: 0.994 milliseconds CPU Total Wall Time: 1.126 milliseconds ``` The slowdown is 0.74x. Pull Request resolved: #776 Reviewed By: ejguan Differential Revision: D34559508 Pulled By: xuzhao9 fbshipit-source-id: cf585aac5e5eaedbcdca9e8292420a8beae82481

Summary: This PR adds two new types of runners to the repository: AWS V100 and A100. The correctness testing CI file is tentative, and will be tested in a follow-up PR. Pull Request resolved: #779 Reviewed By: ejguan Differential Revision: D34691844 Pulled By: xuzhao9 fbshipit-source-id: 13ad882b4ed817546f3be6b43653c519b97aae7d

Summary: This PR supports part of the HuggingFace models to run with fx2trt. It also enables `fp16` `half` support for hf models, but it is not default because `hf_BigBird` model doesn't support half for now. Supported: hf_Bert, hf_Albert, hf_GPT2, hf_DistilBert Not supported: hf_Bart, hf_BigBird, hf_Longformer, hf_Reformer, hf_T5 An example error log of unsupported models: ``` Traceback (most recent call last): File "run.py", line 177, in <module> m = Model(device=args.device, test=args.test, jit=(args.mode == "jit"), batch_size=args.bs, extra_args=extra_args) File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/model.py", line 13, in __call__ obj.__post__init__() File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/model.py", line 81, in __post__init__ apply_args(self, self.extra_args) File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/extra_args.py", line 108, in apply_args model.set_module(enable_fx2trt(args.batch_size, fp16=args.fp16, model=module, example_inputs=exmaple_inputs, File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/backends/fx2trt.py", line 63, in enable_fx2trt traced_model = hf_symbolic_trace( File "/data/home/xzhao9/cluster/miniconda3/envs/py38/lib/python3.8/site-packages/transformers/utils/fx.py", line 565, in symbolic_trace raise NotImplementedError( NotImplementedError: Model LongformerForMaskedLM is not supported yet, supported models: AlbertModel, AlbertForPreTraining, AlbertForMaskedLM, AlbertForMultipleChoice, AlbertForQuestionAnswering, AlbertForSequenceClassification, AlbertForTokenClassification, BertModel, BertForPreTraining, BertForNextSentencePrediction, BertForMaskedLM, BertLMHeadModel, BertForMultipleChoice, BertForQuestionAnswering, BertForSequenceClassification, BertForTokenClassification, DistilBertModel, DistilBertForMaskedLM, DistilBertForMaskedLM, DistilBertForMultipleChoice, DistilBertForQuestionAnswering, DistilBertForSequenceClassification, DistilBertForTokenClassification, MobileBertModel, MobileBertForPreTraining, MobileBertForNextSentencePrediction, MobileBertForMaskedLM, MobileBertForMultipleChoice, MobileBertForQuestionAnswering, MobileBertForSequenceClassification, MobileBertForTokenClassification, ElectraModel, ElectraForPreTraining, ElectraForMaskedLM, ElectraForMultipleChoice, ElectraForQuestionAnswering, ElectraForSequenceClassification, ElectraForTokenClassification, MegatronBertModel, MegatronBertForPreTraining, MegatronBertForNextSentencePrediction, MegatronBertForMaskedLM, MegatronBertForCausalLM, MegatronBertForMultipleChoice, MegatronBertForQuestionAnswering, MegatronBertForSequenceClassification, MegatronBertForTokenClassification, GPT2Model, GPT2LMHeadModel, GPT2LMHeadModel, GPT2ForSequenceClassification, GPT2ForTokenClassification, GPTJModel, GPTJForCausalLM, GPTJForSequenceClassification, GPTNeoModel, GPTNeoForCausalLM, GPTNeoForSequenceClassification, T5Model, T5ForConditionalGeneration, T5ForConditionalGeneration, GPT2DoubleHeadsModel ``` Pull Request resolved: #778 Reviewed By: frank-wei Differential Revision: D34757194 Pulled By: xuzhao9 fbshipit-source-id: 017bb2f8050cb28c7e9de3ab77fd2107cbbe10e1

…city (#781) Summary: When a test is flagged as "NotImplemented", there are actually two cases: 1. The test itself doesn't implement or handle the configs, e.g., unsupervised-learning models like pytorch_struct doesn't have `eval()` tests, or the pyhpc models don't have `train()` tests. 2. The test doesn't support running on our T4 CI GPU machine, but it runs totally fine on other GPUs, such as `V100` or `A100`. This PR is to eliminate the second case, so that the test can still run through `run.py` or `run_sweep.py` interfaces. Instead, we flag the test to be `not_implemented` in the `metadata.yaml`, and the CI script `test.py` or `test_bench.py` will read from the metadata and determine they are not suitable to run on the CI machine. This fixes #688, #626, and #598 Pull Request resolved: #781 Reviewed By: aaronenyeshi Differential Revision: D34786277 Pulled By: xuzhao9 fbshipit-source-id: d5d3d884839345f4fcad21ccf541a02d8e705f5f

Summary: This PR uses `fuser` context manager to manage run contexts to replace the old "hacky" implementation. Because the instance generated by `contextlib.contextmanager` can only be applied once, we need to pass in a lambda and instantiate a new instance every time we run the benchmark Pull Request resolved: #784 Reviewed By: davidberard98 Differential Revision: D34797602 Pulled By: xuzhao9 fbshipit-source-id: 95d46301c613b796e2b4c9aafc9e4b1a7fe6e59a

Summary: This PR sets `fp16` to be the default precision of all huggingface models. This PR also includes an extra patch to the transformers package, because `hf_BigBird` needs to be patched in order to support fp16. I believe this patch should also be upstream-ed: huggingface/transformers#16034 Pull Request resolved: #782 Reviewed By: frank-wei Differential Revision: D34803502 Pulled By: xuzhao9 fbshipit-source-id: 3d46f7983aa32333b12af605f69e45f1fe3134d7

Summary: Given the importance of the `get_module()` interface, we must implement it for every model. This PR forces the implementation of the `get_module()` interface across all models, and properly implement it for the `Background_Matting` model. Fixes #567. Pull Request resolved: #785 Reviewed By: jansel Differential Revision: D34804942 Pulled By: xuzhao9 fbshipit-source-id: 96708b9042a3fcf3e5f6c86c7cdfc5de0fbc3036

Summary: To enable TorchBench on Python 3.9, we need to remove the version locks on the dependencies. After removing dependencies, installing TorchBench on Python 3.9 no longer requires LLVM. Fixes #498 Pull Request resolved: #787 Reviewed By: jamesr66a Differential Revision: D34826115 Pulled By: xuzhao9 fbshipit-source-id: 5d1d24328dba5bde2387f814f19dfed7f09df4a9

Summary: Note that the CPU run is very slow so it is disabled in nightly run by metadata (https://github.com/pytorch/benchmark/blob/main/torchbenchmark/models/pytorch_CycleGAN_and_pix2pix/metadata.yaml#L10). Still, users can run CPU train or eval test with the `run.py` command: ``` $ python run.py pytorch_CycleGAN_and_pix2pix -d cpu -t [train|eval] ``` Fixes #788 Pull Request resolved: #790 Reviewed By: anijain2305 Differential Revision: D34832041 Pulled By: xuzhao9 fbshipit-source-id: 111622206fc82defa4641bcf03d82740e035bd01

Summary: This PR enables CPU train/eval test on speech_transformer (for accuracy test). Pull Request resolved: #791 Reviewed By: anijain2305 Differential Revision: D34836544 Pulled By: xuzhao9 fbshipit-source-id: 1e53fe02b118f9bfa81cff74fce7d5add94cc197

Summary: Recent torchtext API changes break the legacy code we use in `attention_is_all_you_need_pytorch` and `pytorch_struct` models. Adding the removed functions so that we can continue using them. Pull Request resolved: #795 Reviewed By: erichan1 Differential Revision: D34874088 Pulled By: xuzhao9 fbshipit-source-id: bc31c26a187c88169a379f26a4dc4208382bb14e

xuzhao9 and others added 30 commits January 21, 2022 17:24

Allow running install.py from other directories (#719)

830e165

Summary: Pull Request resolved: #719 Reviewed By: xuzhao9 Differential Revision: D33806240 Pulled By: jansel fbshipit-source-id: 389f0ffe6bfa18cb996720bdc89a9837b75fa5fc

Fix the installer script path (#721)

dc42262

Summary: Use `os.path.realpath()` to get the absolute path of the current file. Pull Request resolved: #721 Reviewed By: jansel Differential Revision: D33829270 Pulled By: xuzhao9 fbshipit-source-id: b07ab6e5190b02f0ea0a685fa38965fad196c320

Add the test workflow. (#729)

67a68c9

Summary: This PR is to add the testing linux.2xlarge runner to pytorch/benchmark. Pull Request resolved: #729 Reviewed By: seemethere Differential Revision: D33905554 Pulled By: xuzhao9 fbshipit-source-id: 1034b4a2c50e467d708884de03a324a4fcb1f1f7

Silence printouts in fastNLP_Bert and vision_maskrcnn (#722)

a389101

Summary: Pull Request resolved: #722 Reviewed By: wconstab Differential Revision: D33847275 Pulled By: jansel fbshipit-source-id: 3bc63248fbfbab1d61e234e101bd6b5c5e8faf40

changed eval_bs to train_bs (#731)

453b22e

Summary: Change to train_bs in training section. Pull Request resolved: #731 Reviewed By: xuzhao9 Differential Revision: D33936661 Pulled By: erichan1 fbshipit-source-id: 6b8729234691e21a8dfd86d3deb58d4a147f00b2

Enable fp16 for torchvision models inference CUDA tests by default (#758

2ee0fd0

) Summary: Pull Request resolved: #758 Reviewed By: frank-wei Differential Revision: D34294547 Pulled By: xuzhao9 fbshipit-source-id: 5c2834ffc11ed64f82489c56a290c3b9c569a24a

xuzhao9 and others added 11 commits March 7, 2022 09:59

Merge branch 'main' into ltc_merge

9069908

facebook-github-bot added the cla signed label Mar 14, 2022

desertfire requested review from Krovatkin and wconstab March 14, 2022 20:07

xuzhao9 and others added 3 commits March 15, 2022 17:41

Pass in a test option when calling Model

d824fe7

Move the computation of self.example_outputs to be training only

bac024c

desertfire force-pushed the ltc_merge branch from 4922580 to bac024c Compare March 16, 2022 01:16

desertfire mentioned this pull request Mar 16, 2022

Update lazy_bench.py to adapt to changes in TorchBench pytorch/pytorch#74298

Merged

desertfire requested a review from alanwaketan March 16, 2022 14:05

wconstab approved these changes Mar 16, 2022

View reviewed changes

desertfire merged commit d7c681c into wconstab/ltc Mar 16, 2022

desertfire deleted the ltc_merge branch March 16, 2022 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge from the main branch#796

Merge from the main branch#796
desertfire merged 44 commits intowconstab/ltcfrom
ltc_merge

desertfire commented Mar 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

desertfire commented Mar 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants