New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[static runtime] binding for aten::sub_out #56656
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
cla signed
oncall: jit
Add this issue/PR to JIT oncall triage queue
labels
Apr 22, 2021
💊 CI failures summary and remediationsAs of commit 08cc3ed (more details on the Dr. CI page):
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This pull request was exported from Phabricator. Differential Revision: D27929253 |
Test Plan: ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1 ``` ``` Time per node type: 1.48563 ms. 35.9861%. fb::sigrid_transforms_torch_bind (1 nodes) 0.92385 ms. 22.3783%. aten::linear (6 nodes) 0.681066 ms. 16.4974%. aten::argmin (1 nodes) 0.239311 ms. 5.79679%. aten::matmul (1 nodes) 0.140157 ms. 3.39501%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes) 0.0951568 ms. 2.30497%. fb::clip_ranges_gather (263 nodes) 0.0835801 ms. 2.02455%. aten::sub (1 nodes) 0.054081 ms. 1.31%. aten::repeat (1 nodes) 0.0424465 ms. 1.02818%. aten::norm (1 nodes) 0.0389049 ms. 0.942389%. fb::batch_box_cox (1 nodes) 0.0346992 ms. 0.840514%. aten::__getitem__ (506 nodes) 0.0341335 ms. 0.82681%. prim::TupleUnpack (254 nodes) 0.0306839 ms. 0.743252%. aten::sigmoid (2 nodes) 0.0280489 ms. 0.679426%. aten::mul (3 nodes) 0.0265321 ms. 0.642684%. fb::offsets_to_ranges (253 nodes) 0.0207622 ms. 0.50292%. aten::pow (1 nodes) 0.0202067 ms. 0.489465%. fb::simple_embedding_bag_sum (3 nodes) 0.0195497 ms. 0.47355%. fb::casted_batch_one_hot_lengths (1 nodes) 0.0184351 ms. 0.446551%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.016382 ms. 0.39682%. aten::sum (3 nodes) 0.0158651 ms. 0.384299%. prim::TupleConstruct (1 nodes) 0.0150918 ms. 0.365567%. prim::DictConstruct (2 nodes) 0.00858005 ms. 0.207833%. aten::div (1 nodes) 0.00810684 ms. 0.196371%. fb::sigrid_hash_precompute (1 nodes) 0.00796325 ms. 0.192893%. static_runtime::to_copy (8 nodes) 0.00782038 ms. 0.189432%. prim::ListConstruct (4 nodes) 0.0057504 ms. 0.139291%. aten::contiguous (1 nodes) 0.0044688 ms. 0.108247%. aten::narrow (4 nodes) 0.00284054 ms. 0.068806%. aten::logit (1 nodes) 0.00265049 ms. 0.0642024%. aten::add (1 nodes) 0.00216242 ms. 0.05238%. aten::full (1 nodes) 0.00207732 ms. 0.0503187%. aten::relu (1 nodes) 0.00198412 ms. 0.048061%. fb::gather_ranges (4 nodes) 0.00176954 ms. 0.0428632%. aten::stack (1 nodes) 0.00175913 ms. 0.0426112%. static_runtime::reshape_copy (2 nodes) 0.0016996 ms. 0.0411692%. aten::clamp_min (1 nodes) 0.00128528 ms. 0.0311331%. aten::size (3 nodes) 0.000849156 ms. 0.020569%. aten::expand_as (1 nodes) 0.000757672 ms. 0.018353%. fb::clip_ranges (2 nodes) 0.000596224 ms. 0.0144423%. fb::lengths_to_offsets (3 nodes) 0.000442632 ms. 0.0107218%. static_runtime::flatten_copy (1 nodes) 0.000196158 ms. 0.00475151%. prim::device (1 nodes) 4.12833 ms. in Total StaticRuntime setup time: 0.000451 ms Memory allocation time: 0.0089336 ms Memory deallocation time: 0.0578358 ms Outputs deallocation time: 0.0431742 ms Total memory managed: 947328 bytes Total number of reused tensors: 31 W0421 16:56:34.220682 1522800 PyTorchPredictorContainer.cpp:200] Failed to load metadata file W0421 16:56:34.220772 1522800 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config I0421 16:56:34.220791 1522800 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1 I0421 16:56:34.366667 1522800 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 145.863. Iters per second: 6.85573 I0421 16:56:34.514202 1522800 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results ``` Differential Revision: D27927731 fbshipit-source-id: 75a471289e8ef495f4cce773d17a0c2a75a445a8
Summary: Pull Request resolved: pytorch#56656 Test Plan: ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1 ``` ``` Time per node type: 1.85766 ms. 35.7817%. fb::sigrid_transforms_torch_bind (1 nodes) 1.1238 ms. 21.6464%. aten::linear (6 nodes) 0.858116 ms. 16.5288%. aten::argmin (1 nodes) 0.334183 ms. 6.43694%. aten::matmul (1 nodes) 0.173697 ms. 3.3457%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes) 0.118827 ms. 2.28881%. fb::clip_ranges_gather (263 nodes) 0.101348 ms. 1.95215%. aten::sub (1 nodes) 0.0748209 ms. 1.44118%. aten::repeat (1 nodes) 0.0582576 ms. 1.12214%. aten::norm (1 nodes) 0.0474353 ms. 0.913686%. fb::batch_box_cox (1 nodes) 0.0457588 ms. 0.881393%. aten::__getitem__ (506 nodes) 0.0435175 ms. 0.838222%. prim::TupleUnpack (254 nodes) 0.0425416 ms. 0.819425%. aten::sigmoid (2 nodes) 0.0383822 ms. 0.739308%. fb::offsets_to_ranges (253 nodes) 0.0330187 ms. 0.635996%. aten::mul (3 nodes) 0.027534 ms. 0.530352%. fb::simple_embedding_bag_sum (3 nodes) 0.0274914 ms. 0.529532%. aten::pow (1 nodes) 0.0236733 ms. 0.455989%. fb::casted_batch_one_hot_lengths (1 nodes) 0.023348 ms. 0.449723%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.0193511 ms. 0.372735%. aten::sum (3 nodes) 0.0188839 ms. 0.363737%. prim::DictConstruct (2 nodes) 0.0183191 ms. 0.352858%. prim::TupleConstruct (1 nodes) 0.0119029 ms. 0.22927%. aten::div (1 nodes) 0.0103263 ms. 0.198902%. static_runtime::to_copy (8 nodes) 0.00977658 ms. 0.188314%. prim::ListConstruct (4 nodes) 0.00924042 ms. 0.177986%. fb::sigrid_hash_precompute (1 nodes) 0.00692162 ms. 0.133322%. aten::contiguous (1 nodes) 0.00567485 ms. 0.109307%. aten::narrow (4 nodes) 0.00362285 ms. 0.0697823%. aten::logit (1 nodes) 0.00329995 ms. 0.0635627%. aten::add (1 nodes) 0.00285633 ms. 0.0550178%. aten::full (1 nodes) 0.00268469 ms. 0.0517118%. fb::gather_ranges (4 nodes) 0.00248577 ms. 0.0478803%. aten::stack (1 nodes) 0.00241782 ms. 0.0465715%. aten::relu (1 nodes) 0.00233674 ms. 0.0450096%. aten::clamp_min (1 nodes) 0.00222238 ms. 0.0428068%. static_runtime::reshape_copy (2 nodes) 0.00171177 ms. 0.0329716%. aten::size (3 nodes) 0.00120008 ms. 0.0231155%. aten::expand_as (1 nodes) 0.00112628 ms. 0.0216942%. fb::clip_ranges (2 nodes) 0.00103193 ms. 0.0198768%. fb::lengths_to_offsets (3 nodes) 0.000598624 ms. 0.0115305%. static_runtime::flatten_copy (1 nodes) 0.000236196 ms. 0.00454954%. prim::device (1 nodes) 5.19164 ms. in Total StaticRuntime setup time: 0.000868 ms Memory allocation time: 0.0109619 ms Memory deallocation time: 0.071791 ms Outputs deallocation time: 0.0560187 ms Total memory managed: 1232320 bytes Total number of reused tensors: 32 W0421 17:40:52.053653 1746499 PyTorchPredictorContainer.cpp:200] Failed to load metadata file W0421 17:40:52.053757 1746499 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config I0421 17:40:52.053779 1746499 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1 I0421 17:40:52.185776 1746499 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 131.985. Iters per second: 7.57661 I0421 17:40:52.337853 1746499 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results ``` Reviewed By: hlu1 Differential Revision: D27929253 fbshipit-source-id: 12651b789e9caace66ba640cd58bbda8692bdd14
This pull request was exported from Phabricator. Differential Revision: D27929253 |
This pull request has been merged in 690c8b4. |
krshrimali
pushed a commit
to krshrimali/pytorch
that referenced
this pull request
May 19, 2021
Summary: Pull Request resolved: pytorch#56656 Test Plan: ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1 ``` ``` Time per node type: 1.85766 ms. 35.7817%. fb::sigrid_transforms_torch_bind (1 nodes) 1.1238 ms. 21.6464%. aten::linear (6 nodes) 0.858116 ms. 16.5288%. aten::argmin (1 nodes) 0.334183 ms. 6.43694%. aten::matmul (1 nodes) 0.173697 ms. 3.3457%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes) 0.118827 ms. 2.28881%. fb::clip_ranges_gather (263 nodes) 0.101348 ms. 1.95215%. aten::sub (1 nodes) 0.0748209 ms. 1.44118%. aten::repeat (1 nodes) 0.0582576 ms. 1.12214%. aten::norm (1 nodes) 0.0474353 ms. 0.913686%. fb::batch_box_cox (1 nodes) 0.0457588 ms. 0.881393%. aten::__getitem__ (506 nodes) 0.0435175 ms. 0.838222%. prim::TupleUnpack (254 nodes) 0.0425416 ms. 0.819425%. aten::sigmoid (2 nodes) 0.0383822 ms. 0.739308%. fb::offsets_to_ranges (253 nodes) 0.0330187 ms. 0.635996%. aten::mul (3 nodes) 0.027534 ms. 0.530352%. fb::simple_embedding_bag_sum (3 nodes) 0.0274914 ms. 0.529532%. aten::pow (1 nodes) 0.0236733 ms. 0.455989%. fb::casted_batch_one_hot_lengths (1 nodes) 0.023348 ms. 0.449723%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.0193511 ms. 0.372735%. aten::sum (3 nodes) 0.0188839 ms. 0.363737%. prim::DictConstruct (2 nodes) 0.0183191 ms. 0.352858%. prim::TupleConstruct (1 nodes) 0.0119029 ms. 0.22927%. aten::div (1 nodes) 0.0103263 ms. 0.198902%. static_runtime::to_copy (8 nodes) 0.00977658 ms. 0.188314%. prim::ListConstruct (4 nodes) 0.00924042 ms. 0.177986%. fb::sigrid_hash_precompute (1 nodes) 0.00692162 ms. 0.133322%. aten::contiguous (1 nodes) 0.00567485 ms. 0.109307%. aten::narrow (4 nodes) 0.00362285 ms. 0.0697823%. aten::logit (1 nodes) 0.00329995 ms. 0.0635627%. aten::add (1 nodes) 0.00285633 ms. 0.0550178%. aten::full (1 nodes) 0.00268469 ms. 0.0517118%. fb::gather_ranges (4 nodes) 0.00248577 ms. 0.0478803%. aten::stack (1 nodes) 0.00241782 ms. 0.0465715%. aten::relu (1 nodes) 0.00233674 ms. 0.0450096%. aten::clamp_min (1 nodes) 0.00222238 ms. 0.0428068%. static_runtime::reshape_copy (2 nodes) 0.00171177 ms. 0.0329716%. aten::size (3 nodes) 0.00120008 ms. 0.0231155%. aten::expand_as (1 nodes) 0.00112628 ms. 0.0216942%. fb::clip_ranges (2 nodes) 0.00103193 ms. 0.0198768%. fb::lengths_to_offsets (3 nodes) 0.000598624 ms. 0.0115305%. static_runtime::flatten_copy (1 nodes) 0.000236196 ms. 0.00454954%. prim::device (1 nodes) 5.19164 ms. in Total StaticRuntime setup time: 0.000868 ms Memory allocation time: 0.0109619 ms Memory deallocation time: 0.071791 ms Outputs deallocation time: 0.0560187 ms Total memory managed: 1232320 bytes Total number of reused tensors: 32 W0421 17:40:52.053653 1746499 PyTorchPredictorContainer.cpp:200] Failed to load metadata file W0421 17:40:52.053757 1746499 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config I0421 17:40:52.053779 1746499 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1 I0421 17:40:52.185776 1746499 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 131.985. Iters per second: 7.57661 I0421 17:40:52.337853 1746499 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results ``` Reviewed By: hlu1 Differential Revision: D27929253 fbshipit-source-id: 5a7984ba3ce2d6d4bce0a0ab6c5e09e8c037b44e
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Test Plan:
Differential Revision: D27929253