Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Jul 9, 2024

Closes #858

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 9, 2024
@github-actions
Copy link

github-actions bot commented Jul 9, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 37.2610μs 17.3406μs 57.6682 KOps/s 57.6218 KOps/s $\color{#35bf28}+0.08\%$
test_plain_set_stack_nested 44.0930μs 17.7043μs 56.4836 KOps/s 57.1564 KOps/s $\color{#d91a1a}-1.18\%$
test_plain_set_nested_inplace 64.3910μs 19.7415μs 50.6547 KOps/s 50.2882 KOps/s $\color{#35bf28}+0.73\%$
test_plain_set_stack_nested_inplace 69.3600μs 19.8185μs 50.4580 KOps/s 50.1717 KOps/s $\color{#35bf28}+0.57\%$
test_items 22.1120μs 2.6693μs 374.6352 KOps/s 389.0521 KOps/s $\color{#d91a1a}-3.71\%$
test_items_nested 0.3529ms 0.2706ms 3.6957 KOps/s 3.6674 KOps/s $\color{#35bf28}+0.77\%$
test_items_nested_locked 0.9833ms 0.2717ms 3.6799 KOps/s 3.6817 KOps/s $\color{#d91a1a}-0.05\%$
test_items_nested_leaf 0.1268ms 78.9201μs 12.6710 KOps/s 12.7059 KOps/s $\color{#d91a1a}-0.27\%$
test_items_stack_nested 0.4988ms 0.2673ms 3.7413 KOps/s 3.6538 KOps/s $\color{#35bf28}+2.39\%$
test_items_stack_nested_leaf 0.1524ms 79.2509μs 12.6182 KOps/s 12.3157 KOps/s $\color{#35bf28}+2.46\%$
test_items_stack_nested_locked 0.5912ms 0.2717ms 3.6810 KOps/s 3.6312 KOps/s $\color{#35bf28}+1.37\%$
test_keys 41.1480μs 3.8051μs 262.8033 KOps/s 253.8169 KOps/s $\color{#35bf28}+3.54\%$
test_keys_nested 0.1886ms 0.1383ms 7.2310 KOps/s 7.3372 KOps/s $\color{#d91a1a}-1.45\%$
test_keys_nested_locked 0.7092ms 0.1472ms 6.7927 KOps/s 7.0766 KOps/s $\color{#d91a1a}-4.01\%$
test_keys_nested_leaf 0.2061ms 0.1182ms 8.4591 KOps/s 8.5960 KOps/s $\color{#d91a1a}-1.59\%$
test_keys_stack_nested 0.2340ms 0.1391ms 7.1875 KOps/s 7.3562 KOps/s $\color{#d91a1a}-2.29\%$
test_keys_stack_nested_leaf 0.2051ms 0.1182ms 8.4616 KOps/s 8.6245 KOps/s $\color{#d91a1a}-1.89\%$
test_keys_stack_nested_locked 0.4885ms 0.1435ms 6.9680 KOps/s 7.1317 KOps/s $\color{#d91a1a}-2.30\%$
test_values 9.9335μs 1.2484μs 801.0331 KOps/s 843.4962 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_values_nested 0.1021ms 51.2599μs 19.5084 KOps/s 19.7950 KOps/s $\color{#d91a1a}-1.45\%$
test_values_nested_locked 0.1092ms 51.3654μs 19.4684 KOps/s 19.8947 KOps/s $\color{#d91a1a}-2.14\%$
test_values_nested_leaf 97.1010μs 46.4663μs 21.5210 KOps/s 21.9701 KOps/s $\color{#d91a1a}-2.04\%$
test_values_stack_nested 0.1017ms 51.3587μs 19.4709 KOps/s 19.4174 KOps/s $\color{#35bf28}+0.28\%$
test_values_stack_nested_leaf 94.8980μs 46.7592μs 21.3862 KOps/s 22.1467 KOps/s $\color{#d91a1a}-3.43\%$
test_values_stack_nested_locked 98.2050μs 51.6812μs 19.3494 KOps/s 19.5851 KOps/s $\color{#d91a1a}-1.20\%$
test_membership 19.0450μs 1.3446μs 743.7091 KOps/s 743.0298 KOps/s $\color{#35bf28}+0.09\%$
test_membership_nested 30.3570μs 3.4482μs 290.0084 KOps/s 292.2066 KOps/s $\color{#d91a1a}-0.75\%$
test_membership_nested_leaf 35.6270μs 3.4097μs 293.2785 KOps/s 288.0453 KOps/s $\color{#35bf28}+1.82\%$
test_membership_stacked_nested 27.2820μs 3.3997μs 294.1411 KOps/s 292.7262 KOps/s $\color{#35bf28}+0.48\%$
test_membership_stacked_nested_leaf 71.5750μs 3.4202μs 292.3787 KOps/s 286.6087 KOps/s $\color{#35bf28}+2.01\%$
test_membership_nested_last 87.9350μs 4.3055μs 232.2615 KOps/s 239.6852 KOps/s $\color{#d91a1a}-3.10\%$
test_membership_nested_leaf_last 22.6220μs 4.1224μs 242.5767 KOps/s 238.6024 KOps/s $\color{#35bf28}+1.67\%$
test_membership_stacked_nested_last 23.5540μs 4.0899μs 244.5030 KOps/s 189.4538 KOps/s $\textbf{\color{#35bf28}+29.06\%}$
test_membership_stacked_nested_leaf_last 23.9050μs 4.1182μs 242.8262 KOps/s 190.5272 KOps/s $\textbf{\color{#35bf28}+27.45\%}$
test_nested_getleaf 46.8980μs 10.9315μs 91.4789 KOps/s 94.4628 KOps/s $\color{#d91a1a}-3.16\%$
test_nested_get 28.4840μs 10.4012μs 96.1431 KOps/s 99.7020 KOps/s $\color{#d91a1a}-3.57\%$
test_stacked_getleaf 41.3880μs 11.0230μs 90.7191 KOps/s 91.7144 KOps/s $\color{#d91a1a}-1.09\%$
test_stacked_get 94.9490μs 10.5223μs 95.0366 KOps/s 100.2315 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_nested_getitemleaf 78.5970μs 11.5017μs 86.9435 KOps/s 87.1202 KOps/s $\color{#d91a1a}-0.20\%$
test_nested_getitem 49.9630μs 10.5836μs 94.4854 KOps/s 96.3085 KOps/s $\color{#d91a1a}-1.89\%$
test_stacked_getitemleaf 35.2160μs 11.2796μs 88.6559 KOps/s 90.9109 KOps/s $\color{#d91a1a}-2.48\%$
test_stacked_getitem 32.9920μs 10.4756μs 95.4598 KOps/s 96.7575 KOps/s $\color{#d91a1a}-1.34\%$
test_lock_nested 50.9437ms 0.3861ms 2.5901 KOps/s 3.0295 KOps/s $\textbf{\color{#d91a1a}-14.50\%}$
test_lock_stack_nested 0.5246ms 0.2999ms 3.3342 KOps/s 3.3629 KOps/s $\color{#d91a1a}-0.85\%$
test_unlock_nested 0.7100ms 0.3326ms 3.0070 KOps/s 2.9861 KOps/s $\color{#35bf28}+0.70\%$
test_unlock_stack_nested 0.5775ms 0.3081ms 3.2462 KOps/s 3.2794 KOps/s $\color{#d91a1a}-1.01\%$
test_flatten_speed 0.2060ms 99.4273μs 10.0576 KOps/s 10.0563 KOps/s $\color{#35bf28}+0.01\%$
test_unflatten_speed 0.7208ms 0.4091ms 2.4443 KOps/s 2.4977 KOps/s $\color{#d91a1a}-2.14\%$
test_common_ops 1.6006ms 0.7442ms 1.3438 KOps/s 1.3466 KOps/s $\color{#d91a1a}-0.21\%$
test_creation 16.7120μs 1.8464μs 541.5953 KOps/s 531.3205 KOps/s $\color{#35bf28}+1.93\%$
test_creation_empty 37.5510μs 11.9771μs 83.4930 KOps/s 81.1053 KOps/s $\color{#35bf28}+2.94\%$
test_creation_nested_1 38.9130μs 14.5290μs 68.8279 KOps/s 66.8602 KOps/s $\color{#35bf28}+2.94\%$
test_creation_nested_2 50.1440μs 18.0658μs 55.3533 KOps/s 54.0903 KOps/s $\color{#35bf28}+2.33\%$
test_clone 78.9280μs 12.7873μs 78.2025 KOps/s 73.4381 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_getitem[int] 34.7050μs 10.9051μs 91.6999 KOps/s 87.8051 KOps/s $\color{#35bf28}+4.44\%$
test_getitem[slice_int] 62.8780μs 21.8375μs 45.7928 KOps/s 43.6733 KOps/s $\color{#35bf28}+4.85\%$
test_getitem[range] 76.9540μs 58.4574μs 17.1065 KOps/s 16.6693 KOps/s $\color{#35bf28}+2.62\%$
test_getitem[tuple] 88.8470μs 18.5446μs 53.9240 KOps/s 52.6295 KOps/s $\color{#35bf28}+2.46\%$
test_getitem[list] 0.1164ms 39.8016μs 25.1246 KOps/s 24.6200 KOps/s $\color{#35bf28}+2.05\%$
test_setitem_dim[int] 79.5200μs 36.3560μs 27.5058 KOps/s 28.5730 KOps/s $\color{#d91a1a}-3.74\%$
test_setitem_dim[slice_int] 0.1292ms 62.3981μs 16.0261 KOps/s 16.1966 KOps/s $\color{#d91a1a}-1.05\%$
test_setitem_dim[range] 0.1289ms 85.2767μs 11.7265 KOps/s 11.7135 KOps/s $\color{#35bf28}+0.11\%$
test_setitem_dim[tuple] 0.1091ms 49.3301μs 20.2716 KOps/s 19.6920 KOps/s $\color{#35bf28}+2.94\%$
test_setitem 62.3370μs 20.1473μs 49.6343 KOps/s 47.1672 KOps/s $\textbf{\color{#35bf28}+5.23\%}$
test_set 77.2340μs 19.8479μs 50.3831 KOps/s 49.1238 KOps/s $\color{#35bf28}+2.56\%$
test_set_shared 4.0737ms 0.1439ms 6.9488 KOps/s 6.8751 KOps/s $\color{#35bf28}+1.07\%$
test_update 92.6440μs 23.1763μs 43.1475 KOps/s 41.2509 KOps/s $\color{#35bf28}+4.60\%$
test_update_nested 88.9970μs 32.1140μs 31.1391 KOps/s 30.0073 KOps/s $\color{#35bf28}+3.77\%$
test_update__nested 77.0150μs 25.0587μs 39.9064 KOps/s 39.6621 KOps/s $\color{#35bf28}+0.62\%$
test_set_nested 68.5890μs 21.6565μs 46.1754 KOps/s 45.3014 KOps/s $\color{#35bf28}+1.93\%$
test_set_nested_new 0.1406ms 27.1711μs 36.8038 KOps/s 38.6598 KOps/s $\color{#d91a1a}-4.80\%$
test_select 87.3240μs 40.9447μs 24.4232 KOps/s 24.1903 KOps/s $\color{#35bf28}+0.96\%$
test_select_nested 0.1215ms 57.0524μs 17.5278 KOps/s 17.4569 KOps/s $\color{#35bf28}+0.41\%$
test_exclude_nested 0.1878ms 0.1205ms 8.2977 KOps/s 8.3893 KOps/s $\color{#d91a1a}-1.09\%$
test_empty[True] 0.5038ms 0.4001ms 2.4992 KOps/s 2.5124 KOps/s $\color{#d91a1a}-0.53\%$
test_empty[False] 4.9154μs 1.0190μs 981.3953 KOps/s 972.1939 KOps/s $\color{#35bf28}+0.95\%$
test_unbind_speed 1.5062ms 0.2452ms 4.0777 KOps/s 4.0218 KOps/s $\color{#35bf28}+1.39\%$
test_unbind_speed_stack0 0.4438ms 0.2416ms 4.1385 KOps/s 4.1219 KOps/s $\color{#35bf28}+0.40\%$
test_unbind_speed_stack1 66.9747ms 0.7027ms 1.4231 KOps/s 1.3371 KOps/s $\textbf{\color{#35bf28}+6.43\%}$
test_split 67.8031ms 1.6107ms 620.8325 Ops/s 664.2931 Ops/s $\textbf{\color{#d91a1a}-6.54\%}$
test_chunk 66.0635ms 1.6133ms 619.8358 Ops/s 619.3569 Ops/s $\color{#35bf28}+0.08\%$
test_creation[device0] 0.1737ms 86.1186μs 11.6119 KOps/s 11.3661 KOps/s $\color{#35bf28}+2.16\%$
test_creation_from_tensor 0.1878ms 85.3003μs 11.7233 KOps/s 11.3518 KOps/s $\color{#35bf28}+3.27\%$
test_add_one[memmap_tensor0] 63.5790μs 5.5619μs 179.7934 KOps/s 180.2199 KOps/s $\color{#d91a1a}-0.24\%$
test_contiguous[memmap_tensor0] 10.0190μs 0.6288μs 1.5902 MOps/s 1.5403 MOps/s $\color{#35bf28}+3.24\%$
test_stack[memmap_tensor0] 24.7070μs 3.5767μs 279.5843 KOps/s 274.3697 KOps/s $\color{#35bf28}+1.90\%$
test_memmaptd_index 1.0270ms 0.2601ms 3.8453 KOps/s 3.8197 KOps/s $\color{#35bf28}+0.67\%$
test_memmaptd_index_astensor 0.5839ms 0.3320ms 3.0120 KOps/s 2.9853 KOps/s $\color{#35bf28}+0.89\%$
test_memmaptd_index_op 0.8788ms 0.6393ms 1.5643 KOps/s 1.5359 KOps/s $\color{#35bf28}+1.85\%$
test_serialize_model 0.1606s 0.1045s 9.5680 Ops/s 9.8307 Ops/s $\color{#d91a1a}-2.67\%$
test_serialize_model_pickle 0.4505s 0.3810s 2.6250 Ops/s 2.6357 Ops/s $\color{#d91a1a}-0.41\%$
test_serialize_weights 0.1013s 95.4765ms 10.4738 Ops/s 9.1166 Ops/s $\textbf{\color{#35bf28}+14.89\%}$
test_serialize_weights_returnearly 0.1796s 0.1262s 7.9220 Ops/s 8.1227 Ops/s $\color{#d91a1a}-2.47\%$
test_serialize_weights_pickle 1.0552s 0.5672s 1.7629 Ops/s 2.3873 Ops/s $\textbf{\color{#d91a1a}-26.15\%}$
test_serialize_weights_filesystem 96.6730ms 92.0850ms 10.8595 Ops/s 9.6717 Ops/s $\textbf{\color{#35bf28}+12.28\%}$
test_serialize_model_filesystem 0.1629s 99.4942ms 10.0508 Ops/s 10.2718 Ops/s $\color{#d91a1a}-2.15\%$
test_reshape_pytree 58.1900μs 25.8091μs 38.7460 KOps/s 38.6241 KOps/s $\color{#35bf28}+0.32\%$
test_reshape_td 0.1068ms 33.0480μs 30.2591 KOps/s 29.6684 KOps/s $\color{#35bf28}+1.99\%$
test_view_pytree 56.5150μs 25.6544μs 38.9796 KOps/s 39.0301 KOps/s $\color{#d91a1a}-0.13\%$
test_view_td 74.4200μs 37.7989μs 26.4558 KOps/s 25.4828 KOps/s $\color{#35bf28}+3.82\%$
test_unbind_pytree 67.3660μs 29.5989μs 33.7850 KOps/s 32.8243 KOps/s $\color{#35bf28}+2.93\%$
test_unbind_td 0.3657ms 35.7520μs 27.9704 KOps/s 27.0203 KOps/s $\color{#35bf28}+3.52\%$
test_split_pytree 63.2090μs 29.2222μs 34.2206 KOps/s 33.9174 KOps/s $\color{#35bf28}+0.89\%$
test_split_td 0.1168ms 39.8374μs 25.1020 KOps/s 24.5917 KOps/s $\color{#35bf28}+2.08\%$
test_add_pytree 0.1406ms 38.0074μs 26.3107 KOps/s 28.4031 KOps/s $\textbf{\color{#d91a1a}-7.37\%}$
test_add_td 0.1369ms 56.7761μs 17.6131 KOps/s 17.2846 KOps/s $\color{#35bf28}+1.90\%$
test_distributed 0.1803ms 0.1014ms 9.8589 KOps/s 9.4617 KOps/s $\color{#35bf28}+4.20\%$
test_tdmodule 60.1620μs 18.6993μs 53.4778 KOps/s 52.5214 KOps/s $\color{#35bf28}+1.82\%$
test_tdmodule_dispatch 64.2810μs 37.0166μs 27.0149 KOps/s 26.9951 KOps/s $\color{#35bf28}+0.07\%$
test_tdseq 36.2880μs 21.5189μs 46.4708 KOps/s 45.7063 KOps/s $\color{#35bf28}+1.67\%$
test_tdseq_dispatch 81.5940μs 42.3153μs 23.6321 KOps/s 23.4424 KOps/s $\color{#35bf28}+0.81\%$
test_instantiation_functorch 2.2111ms 1.3506ms 740.3985 Ops/s 752.8967 Ops/s $\color{#d91a1a}-1.66\%$
test_instantiation_td 1.4946ms 1.0182ms 982.1422 Ops/s 966.2306 Ops/s $\color{#35bf28}+1.65\%$
test_exec_functorch 0.3721ms 0.1626ms 6.1507 KOps/s 6.1914 KOps/s $\color{#d91a1a}-0.66\%$
test_exec_functional_call 0.2951ms 0.1498ms 6.6772 KOps/s 6.7413 KOps/s $\color{#d91a1a}-0.95\%$
test_exec_td 0.2358ms 0.1463ms 6.8337 KOps/s 6.9799 KOps/s $\color{#d91a1a}-2.10\%$
test_exec_td_decorator 0.5080ms 0.2244ms 4.4561 KOps/s 4.2107 KOps/s $\textbf{\color{#35bf28}+5.83\%}$
test_vmap_mlp_speed[True-True] 0.9685ms 0.5008ms 1.9969 KOps/s 1.9908 KOps/s $\color{#35bf28}+0.30\%$
test_vmap_mlp_speed[True-False] 0.8896ms 0.4952ms 2.0196 KOps/s 1.9742 KOps/s $\color{#35bf28}+2.30\%$
test_vmap_mlp_speed[False-True] 0.7009ms 0.4036ms 2.4779 KOps/s 2.4736 KOps/s $\color{#35bf28}+0.17\%$
test_vmap_mlp_speed[False-False] 0.7998ms 0.4044ms 2.4725 KOps/s 2.4939 KOps/s $\color{#d91a1a}-0.86\%$
test_vmap_mlp_speed_decorator[True-True] 1.1052ms 0.5705ms 1.7528 KOps/s 1.7604 KOps/s $\color{#d91a1a}-0.43\%$
test_vmap_mlp_speed_decorator[True-False] 0.8689ms 0.5698ms 1.7551 KOps/s 1.7552 KOps/s $-0.00\%$
test_vmap_mlp_speed_decorator[False-True] 0.9481ms 0.4836ms 2.0678 KOps/s 2.1442 KOps/s $\color{#d91a1a}-3.57\%$
test_vmap_mlp_speed_decorator[False-False] 0.8851ms 0.4922ms 2.0318 KOps/s 2.0133 KOps/s $\color{#35bf28}+0.92\%$
test_to_module_speed[True] 1.9506ms 1.6875ms 592.5774 Ops/s 597.7552 Ops/s $\color{#d91a1a}-0.87\%$
test_to_module_speed[False] 2.7179ms 1.6671ms 599.8412 Ops/s 603.8865 Ops/s $\color{#d91a1a}-0.67\%$
test_tc_init 0.1345ms 62.3707μs 16.0332 KOps/s 15.4619 KOps/s $\color{#35bf28}+3.69\%$
test_tc_init_nested 0.3042ms 0.1249ms 8.0043 KOps/s 7.8763 KOps/s $\color{#35bf28}+1.62\%$
test_tc_first_layer_tensor 32.1610μs 8.4438μs 118.4295 KOps/s 124.6586 KOps/s $\color{#d91a1a}-5.00\%$
test_tc_first_layer_nontensor 37.2630μs 8.4855μs 117.8485 KOps/s 124.1393 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_tc_second_layer_tensor 16.6810μs 2.5547μs 391.4385 KOps/s 397.5499 KOps/s $\color{#d91a1a}-1.54\%$
test_tc_second_layer_nontensor 33.1820μs 9.4929μs 105.3420 KOps/s 109.6453 KOps/s $\color{#d91a1a}-3.92\%$
test_unbind 83.6516ms 15.0110ms 66.6180 Ops/s 68.0867 Ops/s $\color{#d91a1a}-2.16\%$
test_full_like 15.4250ms 11.1879ms 89.3820 Ops/s 121.5747 Ops/s $\textbf{\color{#d91a1a}-26.48\%}$
test_zeros_like 10.9653ms 6.1539ms 162.4983 Ops/s 167.3410 Ops/s $\color{#d91a1a}-2.89\%$
test_ones_like 13.3998ms 6.5006ms 153.8322 Ops/s 155.4227 Ops/s $\color{#d91a1a}-1.02\%$
test_clone 14.4705ms 8.2101ms 121.8018 Ops/s 121.0551 Ops/s $\color{#35bf28}+0.62\%$
test_squeeze 92.9250μs 12.4153μs 80.5457 KOps/s 79.2325 KOps/s $\color{#35bf28}+1.66\%$
test_unsqueeze 0.1761ms 93.3167μs 10.7162 KOps/s 10.0967 KOps/s $\textbf{\color{#35bf28}+6.14\%}$
test_split 0.4195ms 0.2764ms 3.6184 KOps/s 3.6588 KOps/s $\color{#d91a1a}-1.10\%$
test_permute 0.3305ms 0.2271ms 4.4025 KOps/s 4.4462 KOps/s $\color{#d91a1a}-0.98\%$
test_stack 29.8785ms 22.5037ms 44.4371 Ops/s 42.8536 Ops/s $\color{#35bf28}+3.70\%$
test_cat 22.8498ms 22.1394ms 45.1684 Ops/s 43.0868 Ops/s $\color{#35bf28}+4.83\%$

@github-actions
Copy link

github-actions bot commented Jul 9, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 62.4400μs 12.2928μs 81.3486 KOps/s 82.7245 KOps/s $\color{#d91a1a}-1.66\%$
test_plain_set_stack_nested 0.1984ms 12.3548μs 80.9403 KOps/s 81.7435 KOps/s $\color{#d91a1a}-0.98\%$
test_plain_set_nested_inplace 32.6310μs 13.5838μs 73.6173 KOps/s 73.7973 KOps/s $\color{#d91a1a}-0.24\%$
test_plain_set_stack_nested_inplace 0.2001ms 13.6303μs 73.3658 KOps/s 74.6626 KOps/s $\color{#d91a1a}-1.74\%$
test_items 0.1949ms 4.6052μs 217.1477 KOps/s 215.2634 KOps/s $\color{#35bf28}+0.88\%$
test_items_nested 0.3923ms 0.3376ms 2.9623 KOps/s 3.0232 KOps/s $\color{#d91a1a}-2.01\%$
test_items_nested_locked 0.5262ms 0.3430ms 2.9156 KOps/s 2.9825 KOps/s $\color{#d91a1a}-2.24\%$
test_items_nested_leaf 0.2633ms 82.9769μs 12.0515 KOps/s 11.9406 KOps/s $\color{#35bf28}+0.93\%$
test_items_stack_nested 0.5246ms 0.3358ms 2.9778 KOps/s 2.9815 KOps/s $\color{#d91a1a}-0.12\%$
test_items_stack_nested_leaf 0.2661ms 83.2033μs 12.0188 KOps/s 11.8898 KOps/s $\color{#35bf28}+1.08\%$
test_items_stack_nested_locked 0.5157ms 0.3354ms 2.9819 KOps/s 2.9849 KOps/s $\color{#d91a1a}-0.10\%$
test_keys 0.1862ms 4.3311μs 230.8903 KOps/s 229.9095 KOps/s $\color{#35bf28}+0.43\%$
test_keys_nested 88.5810μs 68.3574μs 14.6290 KOps/s 14.5055 KOps/s $\color{#35bf28}+0.85\%$
test_keys_nested_locked 2.5182ms 76.2737μs 13.1107 KOps/s 13.3850 KOps/s $\color{#d91a1a}-2.05\%$
test_keys_nested_leaf 0.2415ms 59.7580μs 16.7342 KOps/s 16.8813 KOps/s $\color{#d91a1a}-0.87\%$
test_keys_stack_nested 0.2508ms 69.2677μs 14.4368 KOps/s 14.5154 KOps/s $\color{#d91a1a}-0.54\%$
test_keys_stack_nested_leaf 0.2466ms 57.8624μs 17.2824 KOps/s 16.9388 KOps/s $\color{#35bf28}+2.03\%$
test_keys_stack_nested_locked 0.2598ms 74.2849μs 13.4617 KOps/s 13.4171 KOps/s $\color{#35bf28}+0.33\%$
test_values 62.4173μs 1.7966μs 556.5935 KOps/s 550.1835 KOps/s $\color{#35bf28}+1.17\%$
test_values_nested 0.2252ms 35.6496μs 28.0508 KOps/s 28.2733 KOps/s $\color{#d91a1a}-0.79\%$
test_values_nested_locked 59.4900μs 37.3647μs 26.7633 KOps/s 26.8193 KOps/s $\color{#d91a1a}-0.21\%$
test_values_nested_leaf 0.2156ms 31.5459μs 31.6999 KOps/s 31.8959 KOps/s $\color{#d91a1a}-0.61\%$
test_values_stack_nested 0.2237ms 35.9251μs 27.8357 KOps/s 28.1846 KOps/s $\color{#d91a1a}-1.24\%$
test_values_stack_nested_leaf 0.2129ms 31.3418μs 31.9062 KOps/s 31.4651 KOps/s $\color{#35bf28}+1.40\%$
test_values_stack_nested_locked 0.2179ms 37.5603μs 26.6238 KOps/s 26.9208 KOps/s $\color{#d91a1a}-1.10\%$
test_membership 27.5789μs 0.7095μs 1.4094 MOps/s 1.4199 MOps/s $\color{#d91a1a}-0.73\%$
test_membership_nested 16.6300μs 2.4938μs 400.9982 KOps/s 397.6977 KOps/s $\color{#35bf28}+0.83\%$
test_membership_nested_leaf 0.2052ms 2.5100μs 398.4065 KOps/s 400.6956 KOps/s $\color{#d91a1a}-0.57\%$
test_membership_stacked_nested 17.1700μs 2.5077μs 398.7793 KOps/s 402.0338 KOps/s $\color{#d91a1a}-0.81\%$
test_membership_stacked_nested_leaf 23.8700μs 2.5858μs 386.7346 KOps/s 398.1888 KOps/s $\color{#d91a1a}-2.88\%$
test_membership_nested_last 0.1938ms 3.0107μs 332.1509 KOps/s 330.5440 KOps/s $\color{#35bf28}+0.49\%$
test_membership_nested_leaf_last 17.9400μs 3.0071μs 332.5473 KOps/s 332.0885 KOps/s $\color{#35bf28}+0.14\%$
test_membership_stacked_nested_last 0.1934ms 3.0318μs 329.8399 KOps/s 291.0790 KOps/s $\textbf{\color{#35bf28}+13.32\%}$
test_membership_stacked_nested_leaf_last 22.0600μs 3.0998μs 322.6037 KOps/s 289.5113 KOps/s $\textbf{\color{#35bf28}+11.43\%}$
test_nested_getleaf 0.1962ms 8.3115μs 120.3155 KOps/s 119.7364 KOps/s $\color{#35bf28}+0.48\%$
test_nested_get 0.1904ms 7.7998μs 128.2092 KOps/s 128.3950 KOps/s $\color{#d91a1a}-0.14\%$
test_stacked_getleaf 28.0300μs 8.3088μs 120.3543 KOps/s 120.4341 KOps/s $\color{#d91a1a}-0.07\%$
test_stacked_get 0.1931ms 7.7651μs 128.7815 KOps/s 128.8752 KOps/s $\color{#d91a1a}-0.07\%$
test_nested_getitemleaf 20.1000μs 8.5236μs 117.3210 KOps/s 118.1437 KOps/s $\color{#d91a1a}-0.70\%$
test_nested_getitem 0.1915ms 8.0088μs 124.8625 KOps/s 126.1154 KOps/s $\color{#d91a1a}-0.99\%$
test_stacked_getitemleaf 0.1941ms 8.4898μs 117.7882 KOps/s 118.2197 KOps/s $\color{#d91a1a}-0.37\%$
test_stacked_getitem 20.5300μs 8.0035μs 124.9454 KOps/s 125.0840 KOps/s $\color{#d91a1a}-0.11\%$
test_lock_nested 59.6506ms 0.3915ms 2.5543 KOps/s 2.4871 KOps/s $\color{#35bf28}+2.70\%$
test_lock_stack_nested 0.3162ms 0.2879ms 3.4735 KOps/s 3.3926 KOps/s $\color{#35bf28}+2.38\%$
test_unlock_nested 61.4017ms 0.3942ms 2.5366 KOps/s 2.4960 KOps/s $\color{#35bf28}+1.63\%$
test_unlock_stack_nested 0.3435ms 0.2982ms 3.3540 KOps/s 3.2913 KOps/s $\color{#35bf28}+1.91\%$
test_flatten_speed 0.3739ms 0.1015ms 9.8475 KOps/s 9.7444 KOps/s $\color{#35bf28}+1.06\%$
test_unflatten_speed 0.4679ms 0.2897ms 3.4522 KOps/s 3.4396 KOps/s $\color{#35bf28}+0.37\%$
test_common_ops 1.0269ms 0.5567ms 1.7964 KOps/s 1.6758 KOps/s $\textbf{\color{#35bf28}+7.20\%}$
test_creation 28.5000μs 1.5715μs 636.3485 KOps/s 628.5366 KOps/s $\color{#35bf28}+1.24\%$
test_creation_empty 23.6600μs 7.4818μs 133.6575 KOps/s 135.7480 KOps/s $\color{#d91a1a}-1.54\%$
test_creation_nested_1 25.6300μs 9.1226μs 109.6181 KOps/s 110.5406 KOps/s $\color{#d91a1a}-0.83\%$
test_creation_nested_2 0.2019ms 11.3673μs 87.9716 KOps/s 90.4807 KOps/s $\color{#d91a1a}-2.77\%$
test_clone 78.7500μs 10.9684μs 91.1708 KOps/s 87.3058 KOps/s $\color{#35bf28}+4.43\%$
test_getitem[int] 26.2700μs 10.3666μs 96.4633 KOps/s 97.0306 KOps/s $\color{#d91a1a}-0.58\%$
test_getitem[slice_int] 43.8500μs 20.0502μs 49.8748 KOps/s 50.0508 KOps/s $\color{#d91a1a}-0.35\%$
test_getitem[range] 64.1300μs 45.4464μs 22.0039 KOps/s 22.0709 KOps/s $\color{#d91a1a}-0.30\%$
test_getitem[tuple] 35.7110μs 18.1879μs 54.9816 KOps/s 55.7193 KOps/s $\color{#d91a1a}-1.32\%$
test_getitem[list] 0.1258ms 31.0747μs 32.1805 KOps/s 30.8708 KOps/s $\color{#35bf28}+4.24\%$
test_setitem_dim[int] 40.1500μs 24.4997μs 40.8168 KOps/s 41.8423 KOps/s $\color{#d91a1a}-2.45\%$
test_setitem_dim[slice_int] 0.2468ms 44.8635μs 22.2898 KOps/s 22.2898 KOps/s $+0.00\%$
test_setitem_dim[range] 83.6610μs 60.7051μs 16.4731 KOps/s 16.5595 KOps/s $\color{#d91a1a}-0.52\%$
test_setitem_dim[tuple] 59.9900μs 39.0640μs 25.5990 KOps/s 25.5680 KOps/s $\color{#35bf28}+0.12\%$
test_setitem 48.4300μs 15.1898μs 65.8336 KOps/s 65.2055 KOps/s $\color{#35bf28}+0.96\%$
test_set 0.2079ms 14.4733μs 69.0929 KOps/s 65.5737 KOps/s $\textbf{\color{#35bf28}+5.37\%}$
test_set_shared 1.7603ms 96.1034μs 10.4055 KOps/s 10.2110 KOps/s $\color{#35bf28}+1.90\%$
test_update 0.2155ms 17.2061μs 58.1189 KOps/s 57.9940 KOps/s $\color{#35bf28}+0.22\%$
test_update_nested 59.6700μs 22.2016μs 45.0418 KOps/s 44.4387 KOps/s $\color{#35bf28}+1.36\%$
test_update__nested 57.7610μs 20.9429μs 47.7489 KOps/s 45.3756 KOps/s $\textbf{\color{#35bf28}+5.23\%}$
test_set_nested 0.2017ms 15.8507μs 63.0888 KOps/s 62.0805 KOps/s $\color{#35bf28}+1.62\%$
test_set_nested_new 47.5710μs 18.4167μs 54.2985 KOps/s 54.5005 KOps/s $\color{#d91a1a}-0.37\%$
test_select 0.2230ms 29.8042μs 33.5523 KOps/s 31.9895 KOps/s $\color{#35bf28}+4.89\%$
test_select_nested 0.2502ms 51.8818μs 19.2746 KOps/s 19.3743 KOps/s $\color{#d91a1a}-0.51\%$
test_exclude_nested 0.1306ms 0.1053ms 9.4930 KOps/s 9.4822 KOps/s $\color{#35bf28}+0.11\%$
test_empty[True] 0.5288ms 0.3401ms 2.9401 KOps/s 2.9473 KOps/s $\color{#d91a1a}-0.24\%$
test_empty[False] 18.7432μs 0.7978μs 1.2534 MOps/s 1.2590 MOps/s $\color{#d91a1a}-0.45\%$
test_to 87.5510μs 57.4769μs 17.3983 KOps/s 17.5690 KOps/s $\color{#d91a1a}-0.97\%$
test_to_nonblocking 0.2378ms 34.5844μs 28.9147 KOps/s 28.0498 KOps/s $\color{#35bf28}+3.08\%$
test_unbind_speed 0.3078ms 0.2508ms 3.9876 KOps/s 3.9614 KOps/s $\color{#35bf28}+0.66\%$
test_unbind_speed_stack0 0.4472ms 0.2519ms 3.9706 KOps/s 3.9121 KOps/s $\color{#35bf28}+1.49\%$
test_unbind_speed_stack1 76.7619ms 0.8357ms 1.1965 KOps/s 1.2882 KOps/s $\textbf{\color{#d91a1a}-7.12\%}$
test_split 1.7174ms 1.5260ms 655.2975 Ops/s 606.7553 Ops/s $\textbf{\color{#35bf28}+8.00\%}$
test_chunk 77.0761ms 1.6333ms 612.2594 Ops/s 610.1040 Ops/s $\color{#35bf28}+0.35\%$
test_creation[device0] 0.1312ms 55.8632μs 17.9009 KOps/s 17.6955 KOps/s $\color{#35bf28}+1.16\%$
test_creation_from_tensor 0.2502ms 52.1199μs 19.1865 KOps/s 18.4568 KOps/s $\color{#35bf28}+3.95\%$
test_add_one[memmap_tensor0] 58.7800μs 6.5661μs 152.2963 KOps/s 147.1903 KOps/s $\color{#35bf28}+3.47\%$
test_contiguous[memmap_tensor0] 17.6700μs 0.6197μs 1.6137 MOps/s 1.5998 MOps/s $\color{#35bf28}+0.87\%$
test_stack[memmap_tensor0] 17.7100μs 4.7033μs 212.6159 KOps/s 210.6568 KOps/s $\color{#35bf28}+0.93\%$
test_memmaptd_index 1.0130ms 0.2596ms 3.8522 KOps/s 3.7329 KOps/s $\color{#35bf28}+3.20\%$
test_memmaptd_index_astensor 0.7012ms 0.3224ms 3.1013 KOps/s 3.0572 KOps/s $\color{#35bf28}+1.44\%$
test_memmaptd_index_op 0.8250ms 0.5858ms 1.7069 KOps/s 1.6732 KOps/s $\color{#35bf28}+2.01\%$
test_serialize_model 0.1727s 99.5730ms 10.0429 Ops/s 10.6233 Ops/s $\textbf{\color{#d91a1a}-5.46\%}$
test_serialize_model_pickle 1.3504s 1.2357s 0.8093 Ops/s 0.8086 Ops/s $\color{#35bf28}+0.08\%$
test_serialize_weights 0.1670s 96.9302ms 10.3167 Ops/s 10.6794 Ops/s $\color{#d91a1a}-3.40\%$
test_serialize_weights_returnearly 0.2619s 75.2811ms 13.2835 Ops/s 13.4669 Ops/s $\color{#d91a1a}-1.36\%$
test_serialize_weights_pickle 1.3467s 1.2554s 0.7966 Ops/s 0.8013 Ops/s $\color{#d91a1a}-0.59\%$
test_reshape_pytree 89.4810μs 25.6857μs 38.9321 KOps/s 38.7220 KOps/s $\color{#35bf28}+0.54\%$
test_reshape_td 0.1768ms 32.4977μs 30.7714 KOps/s 30.3738 KOps/s $\color{#35bf28}+1.31\%$
test_view_pytree 0.1582ms 25.8324μs 38.7110 KOps/s 37.9131 KOps/s $\color{#35bf28}+2.10\%$
test_view_td 89.0410μs 35.7397μs 27.9801 KOps/s 25.7844 KOps/s $\textbf{\color{#35bf28}+8.52\%}$
test_unbind_pytree 63.3000μs 31.7153μs 31.5306 KOps/s 31.1640 KOps/s $\color{#35bf28}+1.18\%$
test_unbind_td 0.4792ms 39.2537μs 25.4753 KOps/s 25.0073 KOps/s $\color{#35bf28}+1.87\%$
test_split_pytree 63.0700μs 34.0582μs 29.3615 KOps/s 27.5244 KOps/s $\textbf{\color{#35bf28}+6.67\%}$
test_split_td 0.1051ms 38.0539μs 26.2785 KOps/s 26.3564 KOps/s $\color{#d91a1a}-0.30\%$
test_add_pytree 70.6010μs 36.7124μs 27.2388 KOps/s 25.8080 KOps/s $\textbf{\color{#35bf28}+5.54\%}$
test_add_td 81.2910μs 47.2432μs 21.1671 KOps/s 20.5425 KOps/s $\color{#35bf28}+3.04\%$
test_distributed 3.6687ms 89.2196μs 11.2083 KOps/s 13.5520 KOps/s $\textbf{\color{#d91a1a}-17.29\%}$
test_tdmodule 42.9100μs 14.2225μs 70.3110 KOps/s 63.7764 KOps/s $\textbf{\color{#35bf28}+10.25\%}$
test_tdmodule_dispatch 47.8500μs 27.8947μs 35.8492 KOps/s 33.8051 KOps/s $\textbf{\color{#35bf28}+6.05\%}$
test_tdseq 30.9700μs 15.8339μs 63.1557 KOps/s 59.3606 KOps/s $\textbf{\color{#35bf28}+6.39\%}$
test_tdseq_dispatch 51.9800μs 30.8565μs 32.4081 KOps/s 30.5037 KOps/s $\textbf{\color{#35bf28}+6.24\%}$
test_instantiation_functorch 1.4557ms 1.3972ms 715.7033 Ops/s 715.6687 Ops/s $+0.00\%$
test_instantiation_td 1.4575ms 0.9871ms 1.0130 KOps/s 1.0250 KOps/s $\color{#d91a1a}-1.17\%$
test_exec_functorch 0.1650ms 0.1413ms 7.0779 KOps/s 6.8774 KOps/s $\color{#35bf28}+2.91\%$
test_exec_functional_call 0.1695ms 0.1285ms 7.7807 KOps/s 7.5097 KOps/s $\color{#35bf28}+3.61\%$
test_exec_td 0.1608ms 0.1267ms 7.8923 KOps/s 7.5554 KOps/s $\color{#35bf28}+4.46\%$
test_exec_td_decorator 0.7303ms 0.1985ms 5.0379 KOps/s 4.3998 KOps/s $\textbf{\color{#35bf28}+14.50\%}$
test_vmap_mlp_speed[True-True] 0.6172ms 0.5496ms 1.8194 KOps/s 1.7653 KOps/s $\color{#35bf28}+3.06\%$
test_vmap_mlp_speed[True-False] 0.6153ms 0.5495ms 1.8198 KOps/s 1.7360 KOps/s $\color{#35bf28}+4.83\%$
test_vmap_mlp_speed[False-True] 0.5235ms 0.4825ms 2.0725 KOps/s 1.9608 KOps/s $\textbf{\color{#35bf28}+5.70\%}$
test_vmap_mlp_speed[False-False] 0.6880ms 0.4924ms 2.0309 KOps/s 1.9681 KOps/s $\color{#35bf28}+3.19\%$
test_vmap_mlp_speed_decorator[True-True] 1.4693ms 0.6121ms 1.6338 KOps/s 1.6201 KOps/s $\color{#35bf28}+0.85\%$
test_vmap_mlp_speed_decorator[True-False] 0.7691ms 0.6089ms 1.6424 KOps/s 1.6106 KOps/s $\color{#35bf28}+1.98\%$
test_vmap_mlp_speed_decorator[False-True] 0.6735ms 0.5401ms 1.8514 KOps/s 1.8546 KOps/s $\color{#d91a1a}-0.17\%$
test_vmap_mlp_speed_decorator[False-False] 0.6612ms 0.5427ms 1.8427 KOps/s 1.8543 KOps/s $\color{#d91a1a}-0.63\%$
test_vmap_transformer_speed[True-True] 7.6245ms 7.3588ms 135.8912 Ops/s 136.2377 Ops/s $\color{#d91a1a}-0.25\%$
test_vmap_transformer_speed[True-False] 7.6397ms 7.3270ms 136.4809 Ops/s 136.5588 Ops/s $\color{#d91a1a}-0.06\%$
test_vmap_transformer_speed[False-True] 7.5968ms 7.2683ms 137.5829 Ops/s 137.9272 Ops/s $\color{#d91a1a}-0.25\%$
test_vmap_transformer_speed[False-False] 7.5970ms 7.2754ms 137.4488 Ops/s 136.0349 Ops/s $\color{#35bf28}+1.04\%$
test_vmap_transformer_speed_decorator[True-True] 17.9102ms 17.7346ms 56.3870 Ops/s 56.3197 Ops/s $\color{#35bf28}+0.12\%$
test_vmap_transformer_speed_decorator[True-False] 18.4362ms 17.7978ms 56.1866 Ops/s 56.3231 Ops/s $\color{#d91a1a}-0.24\%$
test_vmap_transformer_speed_decorator[False-True] 18.3036ms 17.7077ms 56.4725 Ops/s 56.7457 Ops/s $\color{#d91a1a}-0.48\%$
test_vmap_transformer_speed_decorator[False-False] 18.5415ms 17.7062ms 56.4774 Ops/s 56.7710 Ops/s $\color{#d91a1a}-0.52\%$
test_to_module_speed[True] 1.5759ms 1.4713ms 679.6760 Ops/s 669.5222 Ops/s $\color{#35bf28}+1.52\%$
test_to_module_speed[False] 1.5677ms 1.4608ms 684.5492 Ops/s 683.2436 Ops/s $\color{#35bf28}+0.19\%$
test_tc_init 79.3910μs 51.4000μs 19.4552 KOps/s 18.9133 KOps/s $\color{#35bf28}+2.87\%$
test_tc_init_nested 0.1418ms 0.1052ms 9.5024 KOps/s 9.5084 KOps/s $\color{#d91a1a}-0.06\%$
test_tc_first_layer_tensor 18.2500μs 3.6766μs 271.9904 KOps/s 270.5790 KOps/s $\color{#35bf28}+0.52\%$
test_tc_first_layer_nontensor 15.3300μs 3.7369μs 267.6037 KOps/s 270.4037 KOps/s $\color{#d91a1a}-1.04\%$
test_tc_second_layer_tensor 6.9725μs 1.1921μs 838.8454 KOps/s 836.4752 KOps/s $\color{#35bf28}+0.28\%$
test_tc_second_layer_nontensor 18.8200μs 4.2564μs 234.9386 KOps/s 236.3993 KOps/s $\color{#d91a1a}-0.62\%$
test_unbind 0.1126s 14.2480ms 70.1852 Ops/s 66.7715 Ops/s $\textbf{\color{#35bf28}+5.11\%}$
test_full_like 14.3754ms 13.6938ms 73.0258 Ops/s 107.1256 Ops/s $\textbf{\color{#d91a1a}-31.83\%}$
test_zeros_like 8.4938ms 8.0225ms 124.6492 Ops/s 125.2477 Ops/s $\color{#d91a1a}-0.48\%$
test_ones_like 8.3349ms 8.0001ms 124.9985 Ops/s 124.4500 Ops/s $\color{#35bf28}+0.44\%$
test_clone 9.7110ms 9.4485ms 105.8364 Ops/s 105.6196 Ops/s $\color{#35bf28}+0.21\%$
test_squeeze 83.8910μs 10.8203μs 92.4188 KOps/s 94.8314 KOps/s $\color{#d91a1a}-2.54\%$
test_unsqueeze 0.2344ms 87.5591μs 11.4209 KOps/s 11.5511 KOps/s $\color{#d91a1a}-1.13\%$
test_split 3.3879ms 3.0415ms 328.7834 Ops/s 327.3576 Ops/s $\color{#35bf28}+0.44\%$
test_permute 0.2562ms 0.2018ms 4.9563 KOps/s 4.9964 KOps/s $\color{#d91a1a}-0.80\%$
test_stack 27.7368ms 27.2715ms 36.6683 Ops/s 36.7453 Ops/s $\color{#d91a1a}-0.21\%$
test_cat 27.3349ms 27.0729ms 36.9372 Ops/s 37.2463 Ops/s $\color{#d91a1a}-0.83\%$

@vmoens vmoens merged commit fb4b629 into main Jul 10, 2024
@vmoens vmoens deleted the allow-for-nontensor-ragged-tensors branch October 21, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Question / BUG] Working with ragged tensors / stacking NonTensorData which contain tensors of different shapes

3 participants