Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[static runtime] binding for aten::sub_out #56656

Closed
wants to merge 2 commits into from

Commits on Apr 22, 2021

  1. [static runtime] binding for aten::div_out

    Test Plan:
    ```
    ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1
    ```
    
    ```
    Time per node type:
            1.48563 ms.    35.9861%. fb::sigrid_transforms_torch_bind (1 nodes)
            0.92385 ms.    22.3783%. aten::linear (6 nodes)
           0.681066 ms.    16.4974%. aten::argmin (1 nodes)
           0.239311 ms.    5.79679%. aten::matmul (1 nodes)
           0.140157 ms.    3.39501%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes)
          0.0951568 ms.    2.30497%. fb::clip_ranges_gather (263 nodes)
          0.0835801 ms.    2.02455%. aten::sub (1 nodes)
           0.054081 ms.       1.31%. aten::repeat (1 nodes)
          0.0424465 ms.    1.02818%. aten::norm (1 nodes)
          0.0389049 ms.   0.942389%. fb::batch_box_cox (1 nodes)
          0.0346992 ms.   0.840514%. aten::__getitem__ (506 nodes)
          0.0341335 ms.    0.82681%. prim::TupleUnpack (254 nodes)
          0.0306839 ms.   0.743252%. aten::sigmoid (2 nodes)
          0.0280489 ms.   0.679426%. aten::mul (3 nodes)
          0.0265321 ms.   0.642684%. fb::offsets_to_ranges (253 nodes)
          0.0207622 ms.    0.50292%. aten::pow (1 nodes)
          0.0202067 ms.   0.489465%. fb::simple_embedding_bag_sum (3 nodes)
          0.0195497 ms.    0.47355%. fb::casted_batch_one_hot_lengths (1 nodes)
          0.0184351 ms.   0.446551%. fb::concat_add_mul_replacenan_clip (1 nodes)
           0.016382 ms.    0.39682%. aten::sum (3 nodes)
          0.0158651 ms.   0.384299%. prim::TupleConstruct (1 nodes)
          0.0150918 ms.   0.365567%. prim::DictConstruct (2 nodes)
         0.00858005 ms.   0.207833%. aten::div (1 nodes)
         0.00810684 ms.   0.196371%. fb::sigrid_hash_precompute (1 nodes)
         0.00796325 ms.   0.192893%. static_runtime::to_copy (8 nodes)
         0.00782038 ms.   0.189432%. prim::ListConstruct (4 nodes)
          0.0057504 ms.   0.139291%. aten::contiguous (1 nodes)
          0.0044688 ms.   0.108247%. aten::narrow (4 nodes)
         0.00284054 ms.   0.068806%. aten::logit (1 nodes)
         0.00265049 ms.  0.0642024%. aten::add (1 nodes)
         0.00216242 ms.    0.05238%. aten::full (1 nodes)
         0.00207732 ms.  0.0503187%. aten::relu (1 nodes)
         0.00198412 ms.   0.048061%. fb::gather_ranges (4 nodes)
         0.00176954 ms.  0.0428632%. aten::stack (1 nodes)
         0.00175913 ms.  0.0426112%. static_runtime::reshape_copy (2 nodes)
          0.0016996 ms.  0.0411692%. aten::clamp_min (1 nodes)
         0.00128528 ms.  0.0311331%. aten::size (3 nodes)
        0.000849156 ms.   0.020569%. aten::expand_as (1 nodes)
        0.000757672 ms.   0.018353%. fb::clip_ranges (2 nodes)
        0.000596224 ms.  0.0144423%. fb::lengths_to_offsets (3 nodes)
        0.000442632 ms.  0.0107218%. static_runtime::flatten_copy (1 nodes)
        0.000196158 ms. 0.00475151%. prim::device (1 nodes)
            4.12833 ms. in Total
    StaticRuntime setup time: 0.000451 ms
    Memory allocation time: 0.0089336 ms
    Memory deallocation time: 0.0578358 ms
    Outputs deallocation time: 0.0431742 ms
    Total memory managed: 947328 bytes
    Total number of reused tensors: 31
    W0421 16:56:34.220682 1522800 PyTorchPredictorContainer.cpp:200] Failed to load metadata file
    W0421 16:56:34.220772 1522800 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config
    I0421 16:56:34.220791 1522800 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1
    I0421 16:56:34.366667 1522800 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 145.863. Iters per second: 6.85573
    I0421 16:56:34.514202 1522800 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results
    ```
    
    Differential Revision: D27927731
    
    fbshipit-source-id: 75a471289e8ef495f4cce773d17a0c2a75a445a8
    ajyu authored and facebook-github-bot committed Apr 22, 2021
    Copy the full SHA
    39ca973 View commit details
    Browse the repository at this point in the history
  2. [static runtime] binding for aten::sub_out (pytorch#56656)

    Summary: Pull Request resolved: pytorch#56656
    
    Test Plan:
    ```
    ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1
    ```
    ```
    Time per node type:
            1.85766 ms.    35.7817%. fb::sigrid_transforms_torch_bind (1 nodes)
             1.1238 ms.    21.6464%. aten::linear (6 nodes)
           0.858116 ms.    16.5288%. aten::argmin (1 nodes)
           0.334183 ms.    6.43694%. aten::matmul (1 nodes)
           0.173697 ms.     3.3457%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes)
           0.118827 ms.    2.28881%. fb::clip_ranges_gather (263 nodes)
           0.101348 ms.    1.95215%. aten::sub (1 nodes)
          0.0748209 ms.    1.44118%. aten::repeat (1 nodes)
          0.0582576 ms.    1.12214%. aten::norm (1 nodes)
          0.0474353 ms.   0.913686%. fb::batch_box_cox (1 nodes)
          0.0457588 ms.   0.881393%. aten::__getitem__ (506 nodes)
          0.0435175 ms.   0.838222%. prim::TupleUnpack (254 nodes)
          0.0425416 ms.   0.819425%. aten::sigmoid (2 nodes)
          0.0383822 ms.   0.739308%. fb::offsets_to_ranges (253 nodes)
          0.0330187 ms.   0.635996%. aten::mul (3 nodes)
           0.027534 ms.   0.530352%. fb::simple_embedding_bag_sum (3 nodes)
          0.0274914 ms.   0.529532%. aten::pow (1 nodes)
          0.0236733 ms.   0.455989%. fb::casted_batch_one_hot_lengths (1 nodes)
           0.023348 ms.   0.449723%. fb::concat_add_mul_replacenan_clip (1 nodes)
          0.0193511 ms.   0.372735%. aten::sum (3 nodes)
          0.0188839 ms.   0.363737%. prim::DictConstruct (2 nodes)
          0.0183191 ms.   0.352858%. prim::TupleConstruct (1 nodes)
          0.0119029 ms.    0.22927%. aten::div (1 nodes)
          0.0103263 ms.   0.198902%. static_runtime::to_copy (8 nodes)
         0.00977658 ms.   0.188314%. prim::ListConstruct (4 nodes)
         0.00924042 ms.   0.177986%. fb::sigrid_hash_precompute (1 nodes)
         0.00692162 ms.   0.133322%. aten::contiguous (1 nodes)
         0.00567485 ms.   0.109307%. aten::narrow (4 nodes)
         0.00362285 ms.  0.0697823%. aten::logit (1 nodes)
         0.00329995 ms.  0.0635627%. aten::add (1 nodes)
         0.00285633 ms.  0.0550178%. aten::full (1 nodes)
         0.00268469 ms.  0.0517118%. fb::gather_ranges (4 nodes)
         0.00248577 ms.  0.0478803%. aten::stack (1 nodes)
         0.00241782 ms.  0.0465715%. aten::relu (1 nodes)
         0.00233674 ms.  0.0450096%. aten::clamp_min (1 nodes)
         0.00222238 ms.  0.0428068%. static_runtime::reshape_copy (2 nodes)
         0.00171177 ms.  0.0329716%. aten::size (3 nodes)
         0.00120008 ms.  0.0231155%. aten::expand_as (1 nodes)
         0.00112628 ms.  0.0216942%. fb::clip_ranges (2 nodes)
         0.00103193 ms.  0.0198768%. fb::lengths_to_offsets (3 nodes)
        0.000598624 ms.  0.0115305%. static_runtime::flatten_copy (1 nodes)
        0.000236196 ms. 0.00454954%. prim::device (1 nodes)
            5.19164 ms. in Total
    StaticRuntime setup time: 0.000868 ms
    Memory allocation time: 0.0109619 ms
    Memory deallocation time: 0.071791 ms
    Outputs deallocation time: 0.0560187 ms
    Total memory managed: 1232320 bytes
    Total number of reused tensors: 32
    W0421 17:40:52.053653 1746499 PyTorchPredictorContainer.cpp:200] Failed to load metadata file
    W0421 17:40:52.053757 1746499 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config
    I0421 17:40:52.053779 1746499 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1
    I0421 17:40:52.185776 1746499 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 131.985. Iters per second: 7.57661
    I0421 17:40:52.337853 1746499 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results
    ```
    
    Reviewed By: hlu1
    
    Differential Revision: D27929253
    
    fbshipit-source-id: 12651b789e9caace66ba640cd58bbda8692bdd14
    ajyu authored and facebook-github-bot committed Apr 22, 2021
    Copy the full SHA
    08cc3ed View commit details
    Browse the repository at this point in the history