Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented May 28, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 28, 2024
@github-actions
Copy link

github-actions bot commented May 28, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 44.1320μs 16.7652μs 59.6475 KOps/s 61.4691 KOps/s $\color{#d91a1a}-2.96\%$
test_plain_set_stack_nested 43.0300μs 16.5447μs 60.4423 KOps/s 60.6696 KOps/s $\color{#d91a1a}-0.37\%$
test_plain_set_nested_inplace 49.6020μs 19.1879μs 52.1160 KOps/s 53.3673 KOps/s $\color{#d91a1a}-2.34\%$
test_plain_set_stack_nested_inplace 53.9320μs 18.9011μs 52.9069 KOps/s 53.8843 KOps/s $\color{#d91a1a}-1.81\%$
test_items 18.3750μs 2.5436μs 393.1374 KOps/s 387.3747 KOps/s $\color{#35bf28}+1.49\%$
test_items_nested 1.0885ms 0.2691ms 3.7159 KOps/s 3.8128 KOps/s $\color{#d91a1a}-2.54\%$
test_items_nested_locked 0.4015ms 0.2701ms 3.7019 KOps/s 3.7738 KOps/s $\color{#d91a1a}-1.90\%$
test_items_nested_leaf 0.1524ms 77.4573μs 12.9103 KOps/s 12.9852 KOps/s $\color{#d91a1a}-0.58\%$
test_items_stack_nested 0.5052ms 0.2673ms 3.7409 KOps/s 3.8105 KOps/s $\color{#d91a1a}-1.83\%$
test_items_stack_nested_leaf 0.1529ms 80.0698μs 12.4891 KOps/s 12.9384 KOps/s $\color{#d91a1a}-3.47\%$
test_items_stack_nested_locked 0.7963ms 0.2741ms 3.6482 KOps/s 3.7968 KOps/s $\color{#d91a1a}-3.91\%$
test_keys 19.3660μs 3.9001μs 256.4022 KOps/s 251.6913 KOps/s $\color{#35bf28}+1.87\%$
test_keys_nested 0.2628ms 0.1382ms 7.2356 KOps/s 7.2589 KOps/s $\color{#d91a1a}-0.32\%$
test_keys_nested_locked 0.6481ms 0.1425ms 7.0194 KOps/s 7.0246 KOps/s $\color{#d91a1a}-0.07\%$
test_keys_nested_leaf 0.2043ms 0.1174ms 8.5185 KOps/s 8.4624 KOps/s $\color{#35bf28}+0.66\%$
test_keys_stack_nested 0.2802ms 0.1393ms 7.1800 KOps/s 7.1966 KOps/s $\color{#d91a1a}-0.23\%$
test_keys_stack_nested_leaf 0.2336ms 0.1177ms 8.4943 KOps/s 8.4763 KOps/s $\color{#35bf28}+0.21\%$
test_keys_stack_nested_locked 0.2641ms 0.1435ms 6.9680 KOps/s 6.9508 KOps/s $\color{#35bf28}+0.25\%$
test_values 6.0138μs 1.1483μs 870.8308 KOps/s 818.4541 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_values_nested 91.2110μs 50.2567μs 19.8978 KOps/s 19.8690 KOps/s $\color{#35bf28}+0.15\%$
test_values_nested_locked 0.1053ms 50.5447μs 19.7845 KOps/s 19.9460 KOps/s $\color{#d91a1a}-0.81\%$
test_values_nested_leaf 96.2300μs 45.5789μs 21.9400 KOps/s 21.9482 KOps/s $\color{#d91a1a}-0.04\%$
test_values_stack_nested 0.1007ms 51.8968μs 19.2690 KOps/s 20.0977 KOps/s $\color{#d91a1a}-4.12\%$
test_values_stack_nested_leaf 95.4980μs 45.9151μs 21.7793 KOps/s 21.8527 KOps/s $\color{#d91a1a}-0.34\%$
test_values_stack_nested_locked 95.0280μs 51.6078μs 19.3769 KOps/s 19.8601 KOps/s $\color{#d91a1a}-2.43\%$
test_membership 15.0080μs 1.3431μs 744.5626 KOps/s 744.0545 KOps/s $\color{#35bf28}+0.07\%$
test_membership_nested 17.9140μs 3.5168μs 284.3517 KOps/s 295.5401 KOps/s $\color{#d91a1a}-3.79\%$
test_membership_nested_leaf 20.8290μs 3.5832μs 279.0795 KOps/s 280.0145 KOps/s $\color{#d91a1a}-0.33\%$
test_membership_stacked_nested 22.2520μs 3.4892μs 286.5987 KOps/s 294.4006 KOps/s $\color{#d91a1a}-2.65\%$
test_membership_stacked_nested_leaf 22.6520μs 3.4658μs 288.5336 KOps/s 293.5681 KOps/s $\color{#d91a1a}-1.71\%$
test_membership_nested_last 30.9180μs 4.3162μs 231.6862 KOps/s 241.5682 KOps/s $\color{#d91a1a}-4.09\%$
test_membership_nested_leaf_last 23.8840μs 4.3018μs 232.4623 KOps/s 242.9366 KOps/s $\color{#d91a1a}-4.31\%$
test_membership_stacked_nested_last 33.9940μs 5.7825μs 172.9347 KOps/s 242.8784 KOps/s $\textbf{\color{#d91a1a}-28.80\%}$
test_membership_stacked_nested_leaf_last 23.5440μs 5.8812μs 170.0323 KOps/s 243.5353 KOps/s $\textbf{\color{#d91a1a}-30.18\%}$
test_nested_getleaf 32.5110μs 10.6537μs 93.8638 KOps/s 92.6227 KOps/s $\color{#35bf28}+1.34\%$
test_nested_get 35.8370μs 10.2200μs 97.8476 KOps/s 98.2020 KOps/s $\color{#d91a1a}-0.36\%$
test_stacked_getleaf 27.8720μs 10.6434μs 93.9547 KOps/s 92.9409 KOps/s $\color{#35bf28}+1.09\%$
test_stacked_get 20.2580μs 10.0385μs 99.6165 KOps/s 98.0361 KOps/s $\color{#35bf28}+1.61\%$
test_nested_getitemleaf 43.6680μs 11.1511μs 89.6776 KOps/s 88.4132 KOps/s $\color{#35bf28}+1.43\%$
test_nested_getitem 25.1870μs 10.3495μs 96.6227 KOps/s 96.0898 KOps/s $\color{#35bf28}+0.55\%$
test_stacked_getitemleaf 25.4570μs 11.1914μs 89.3547 KOps/s 90.5262 KOps/s $\color{#d91a1a}-1.29\%$
test_stacked_getitem 33.5730μs 10.2357μs 97.6971 KOps/s 98.0340 KOps/s $\color{#d91a1a}-0.34\%$
test_lock_nested 0.6816ms 0.3485ms 2.8692 KOps/s 2.9256 KOps/s $\color{#d91a1a}-1.93\%$
test_lock_stack_nested 0.5328ms 0.3138ms 3.1864 KOps/s 3.2275 KOps/s $\color{#d91a1a}-1.27\%$
test_unlock_nested 0.9293ms 0.3573ms 2.7987 KOps/s 2.5726 KOps/s $\textbf{\color{#35bf28}+8.79\%}$
test_unlock_stack_nested 0.6217ms 0.3213ms 3.1125 KOps/s 3.1510 KOps/s $\color{#d91a1a}-1.22\%$
test_flatten_speed 0.3448ms 96.7528μs 10.3356 KOps/s 10.3336 KOps/s $\color{#35bf28}+0.02\%$
test_unflatten_speed 0.7057ms 0.4078ms 2.4524 KOps/s 2.4795 KOps/s $\color{#d91a1a}-1.09\%$
test_common_ops 4.0358ms 0.6965ms 1.4357 KOps/s 1.4552 KOps/s $\color{#d91a1a}-1.34\%$
test_creation 26.1990μs 1.9127μs 522.8182 KOps/s 514.1379 KOps/s $\color{#35bf28}+1.69\%$
test_creation_empty 28.5740μs 9.8900μs 101.1123 KOps/s 100.7474 KOps/s $\color{#35bf28}+0.36\%$
test_creation_nested_1 49.6230μs 12.7092μs 78.6830 KOps/s 79.4151 KOps/s $\color{#d91a1a}-0.92\%$
test_creation_nested_2 35.7070μs 16.0795μs 62.1911 KOps/s 62.2784 KOps/s $\color{#d91a1a}-0.14\%$
test_clone 73.6280μs 13.4545μs 74.3247 KOps/s 74.9540 KOps/s $\color{#d91a1a}-0.84\%$
test_getitem[int] 36.2780μs 11.4835μs 87.0817 KOps/s 86.8931 KOps/s $\color{#35bf28}+0.22\%$
test_getitem[slice_int] 66.6050μs 22.9630μs 43.5483 KOps/s 42.0833 KOps/s $\color{#35bf28}+3.48\%$
test_getitem[range] 77.7660μs 57.2976μs 17.4527 KOps/s 16.8295 KOps/s $\color{#35bf28}+3.70\%$
test_getitem[tuple] 59.7020μs 19.7231μs 50.7020 KOps/s 51.9775 KOps/s $\color{#d91a1a}-2.45\%$
test_getitem[list] 0.1059ms 40.8388μs 24.4865 KOps/s 24.0319 KOps/s $\color{#35bf28}+1.89\%$
test_setitem_dim[int] 66.7550μs 35.6034μs 28.0872 KOps/s 29.7710 KOps/s $\textbf{\color{#d91a1a}-5.66\%}$
test_setitem_dim[slice_int] 0.1027ms 62.9878μs 15.8761 KOps/s 16.0771 KOps/s $\color{#d91a1a}-1.25\%$
test_setitem_dim[range] 0.1307ms 83.8469μs 11.9265 KOps/s 12.1799 KOps/s $\color{#d91a1a}-2.08\%$
test_setitem_dim[tuple] 0.1189ms 50.5703μs 19.7745 KOps/s 20.4713 KOps/s $\color{#d91a1a}-3.40\%$
test_setitem 56.8670μs 19.7756μs 50.5674 KOps/s 50.5479 KOps/s $\color{#35bf28}+0.04\%$
test_set 49.5530μs 19.0577μs 52.4721 KOps/s 50.9622 KOps/s $\color{#35bf28}+2.96\%$
test_set_shared 2.6661ms 0.1414ms 7.0712 KOps/s 7.1840 KOps/s $\color{#d91a1a}-1.57\%$
test_update 77.0340μs 20.9307μs 47.7768 KOps/s 47.7072 KOps/s $\color{#35bf28}+0.15\%$
test_update_nested 76.9850μs 28.8386μs 34.6757 KOps/s 34.5372 KOps/s $\color{#35bf28}+0.40\%$
test_update__nested 60.4130μs 25.2887μs 39.5433 KOps/s 39.9791 KOps/s $\color{#d91a1a}-1.09\%$
test_set_nested 62.9480μs 21.1355μs 47.3139 KOps/s 46.9328 KOps/s $\color{#35bf28}+0.81\%$
test_set_nested_new 58.9810μs 25.2525μs 39.6000 KOps/s 39.4866 KOps/s $\color{#35bf28}+0.29\%$
test_select 81.7430μs 40.3673μs 24.7725 KOps/s 25.0370 KOps/s $\color{#d91a1a}-1.06\%$
test_select_nested 0.1308ms 60.8814μs 16.4254 KOps/s 16.4094 KOps/s $\color{#35bf28}+0.10\%$
test_exclude_nested 0.2702ms 0.1213ms 8.2430 KOps/s 8.1986 KOps/s $\color{#35bf28}+0.54\%$
test_empty[True] 0.7525ms 0.3983ms 2.5104 KOps/s 2.5136 KOps/s $\color{#d91a1a}-0.13\%$
test_empty[False] 4.1418μs 1.1887μs 841.2773 KOps/s 873.5114 KOps/s $\color{#d91a1a}-3.69\%$
test_unbind_speed 3.4174ms 0.2726ms 3.6682 KOps/s 3.8493 KOps/s $\color{#d91a1a}-4.71\%$
test_unbind_speed_stack0 0.3678ms 0.2567ms 3.8957 KOps/s 3.8760 KOps/s $\color{#35bf28}+0.51\%$
test_unbind_speed_stack1 62.5680ms 0.7236ms 1.3819 KOps/s 1.3378 KOps/s $\color{#35bf28}+3.30\%$
test_split 59.6428ms 1.5878ms 629.7969 Ops/s 624.8229 Ops/s $\color{#35bf28}+0.80\%$
test_chunk 56.5745ms 1.5667ms 638.2924 Ops/s 659.6247 Ops/s $\color{#d91a1a}-3.23\%$
test_creation[device0] 3.6700ms 85.5930μs 11.6832 KOps/s 12.1644 KOps/s $\color{#d91a1a}-3.96\%$
test_creation_from_tensor 0.1648ms 85.2154μs 11.7350 KOps/s 11.9185 KOps/s $\color{#d91a1a}-1.54\%$
test_add_one[memmap_tensor0] 84.8690μs 5.4706μs 182.7959 KOps/s 178.2763 KOps/s $\color{#35bf28}+2.54\%$
test_contiguous[memmap_tensor0] 29.3250μs 0.6565μs 1.5233 MOps/s 1.5695 MOps/s $\color{#d91a1a}-2.95\%$
test_stack[memmap_tensor0] 23.5440μs 3.5663μs 280.4065 KOps/s 274.8538 KOps/s $\color{#35bf28}+2.02\%$
test_memmaptd_index 0.9866ms 0.2585ms 3.8688 KOps/s 3.7893 KOps/s $\color{#35bf28}+2.10\%$
test_memmaptd_index_astensor 0.6891ms 0.3320ms 3.0125 KOps/s 2.9676 KOps/s $\color{#35bf28}+1.51\%$
test_memmaptd_index_op 1.1962ms 0.6077ms 1.6455 KOps/s 1.5438 KOps/s $\textbf{\color{#35bf28}+6.59\%}$
test_serialize_model 0.1737s 0.1153s 8.6701 Ops/s 8.5117 Ops/s $\color{#35bf28}+1.86\%$
test_serialize_model_pickle 0.4481s 0.3766s 2.6556 Ops/s 2.6199 Ops/s $\color{#35bf28}+1.36\%$
test_serialize_weights 0.1737s 0.1102s 9.0720 Ops/s 8.9790 Ops/s $\color{#35bf28}+1.04\%$
test_serialize_weights_returnearly 0.3214s 0.1541s 6.4894 Ops/s 6.9782 Ops/s $\textbf{\color{#d91a1a}-7.00\%}$
test_serialize_weights_pickle 0.7082s 0.4920s 2.0323 Ops/s 2.4031 Ops/s $\textbf{\color{#d91a1a}-15.43\%}$
test_serialize_weights_filesystem 0.1044s 94.3392ms 10.6000 Ops/s 10.5014 Ops/s $\color{#35bf28}+0.94\%$
test_serialize_model_filesystem 0.1021s 93.5676ms 10.6875 Ops/s 9.9600 Ops/s $\textbf{\color{#35bf28}+7.30\%}$
test_reshape_pytree 67.7870μs 25.3941μs 39.3793 KOps/s 39.1986 KOps/s $\color{#35bf28}+0.46\%$
test_reshape_td 90.4090μs 34.4526μs 29.0254 KOps/s 29.6171 KOps/s $\color{#d91a1a}-2.00\%$
test_view_pytree 57.6180μs 25.2283μs 39.6381 KOps/s 39.4586 KOps/s $\color{#35bf28}+0.45\%$
test_view_td 69.1490μs 38.5725μs 25.9252 KOps/s 26.3484 KOps/s $\color{#d91a1a}-1.61\%$
test_unbind_pytree 70.2910μs 28.8350μs 34.6801 KOps/s 34.1596 KOps/s $\color{#35bf28}+1.52\%$
test_unbind_td 0.3797ms 38.3624μs 26.0672 KOps/s 26.3818 KOps/s $\color{#d91a1a}-1.19\%$
test_split_pytree 80.1000μs 29.3958μs 34.0184 KOps/s 33.8930 KOps/s $\color{#35bf28}+0.37\%$
test_split_td 0.1227ms 40.1686μs 24.8950 KOps/s 24.2327 KOps/s $\color{#35bf28}+2.73\%$
test_add_pytree 88.0650μs 34.7524μs 28.7750 KOps/s 28.5845 KOps/s $\color{#35bf28}+0.67\%$
test_add_td 0.1119ms 55.4080μs 18.0479 KOps/s 19.5164 KOps/s $\textbf{\color{#d91a1a}-7.52\%}$
test_distributed 0.1962ms 0.1007ms 9.9277 KOps/s 9.8344 KOps/s $\color{#35bf28}+0.95\%$
test_tdmodule 46.2460μs 16.9926μs 58.8492 KOps/s 56.8500 KOps/s $\color{#35bf28}+3.52\%$
test_tdmodule_dispatch 65.5030μs 33.7970μs 29.5884 KOps/s 28.9074 KOps/s $\color{#35bf28}+2.36\%$
test_tdseq 33.5930μs 19.7445μs 50.6471 KOps/s 48.6522 KOps/s $\color{#35bf28}+4.10\%$
test_tdseq_dispatch 60.4830μs 38.5603μs 25.9334 KOps/s 24.4876 KOps/s $\textbf{\color{#35bf28}+5.90\%}$
test_instantiation_functorch 2.5545ms 1.3308ms 751.4108 Ops/s 757.1752 Ops/s $\color{#d91a1a}-0.76\%$
test_instantiation_td 73.1935ms 1.0915ms 916.1823 Ops/s 995.8908 Ops/s $\textbf{\color{#d91a1a}-8.00\%}$
test_exec_functorch 0.2375ms 0.1604ms 6.2348 KOps/s 5.9474 KOps/s $\color{#35bf28}+4.83\%$
test_exec_functional_call 0.2983ms 0.1510ms 6.6215 KOps/s 6.5945 KOps/s $\color{#35bf28}+0.41\%$
test_exec_td 0.2321ms 0.1462ms 6.8394 KOps/s 6.7392 KOps/s $\color{#35bf28}+1.49\%$
test_exec_td_decorator 0.3809ms 0.2221ms 4.5033 KOps/s 4.4504 KOps/s $\color{#35bf28}+1.19\%$
test_vmap_mlp_speed[True-True] 0.7749ms 0.4880ms 2.0490 KOps/s 2.0670 KOps/s $\color{#d91a1a}-0.87\%$
test_vmap_mlp_speed[True-False] 0.7039ms 0.4838ms 2.0670 KOps/s 2.0871 KOps/s $\color{#d91a1a}-0.96\%$
test_vmap_mlp_speed[False-True] 0.5622ms 0.3940ms 2.5378 KOps/s 2.5352 KOps/s $\color{#35bf28}+0.10\%$
test_vmap_mlp_speed[False-False] 0.5975ms 0.3943ms 2.5363 KOps/s 2.5365 KOps/s $-0.01\%$
test_vmap_mlp_speed_decorator[True-True] 0.9593ms 0.5523ms 1.8106 KOps/s 1.8164 KOps/s $\color{#d91a1a}-0.32\%$
test_vmap_mlp_speed_decorator[True-False] 0.9354ms 0.5738ms 1.7428 KOps/s 1.8250 KOps/s $\color{#d91a1a}-4.51\%$
test_vmap_mlp_speed_decorator[False-True] 0.8873ms 0.4589ms 2.1792 KOps/s 2.1961 KOps/s $\color{#d91a1a}-0.77\%$
test_vmap_mlp_speed_decorator[False-False] 0.6901ms 0.4554ms 2.1960 KOps/s 2.2019 KOps/s $\color{#d91a1a}-0.27\%$
test_to_module_speed[True] 1.8689ms 1.7025ms 587.3741 Ops/s 584.5937 Ops/s $\color{#35bf28}+0.48\%$
test_to_module_speed[False] 2.5660ms 1.6859ms 593.1577 Ops/s 601.0722 Ops/s $\color{#d91a1a}-1.32\%$
test_tc_init 52.9490μs 26.5397μs 37.6793 KOps/s 37.2392 KOps/s $\color{#35bf28}+1.18\%$
test_tc_init_nested 0.1120ms 52.2427μs 19.1414 KOps/s 18.7000 KOps/s $\color{#35bf28}+2.36\%$
test_tc_first_layer_tensor 5.1754μs 0.7022μs 1.4241 MOps/s 1.4533 MOps/s $\color{#d91a1a}-2.01\%$
test_tc_first_layer_nontensor 1.9276μs 0.6688μs 1.4953 MOps/s 1.4674 MOps/s $\color{#35bf28}+1.90\%$
test_tc_second_layer_tensor 18.5950μs 1.8091μs 552.7517 KOps/s 539.3521 KOps/s $\color{#35bf28}+2.48\%$
test_tc_second_layer_nontensor 8.9033μs 1.5126μs 661.1289 KOps/s 653.0910 KOps/s $\color{#35bf28}+1.23\%$
test_unbind 86.0813ms 5.8837ms 169.9615 Ops/s 163.0798 Ops/s $\color{#35bf28}+4.22\%$
test_full_like 17.0749ms 11.8955ms 84.0651 Ops/s 82.6947 Ops/s $\color{#35bf28}+1.66\%$
test_zeros_like 6.4823ms 5.7961ms 172.5286 Ops/s 164.5671 Ops/s $\color{#35bf28}+4.84\%$
test_ones_like 13.4612ms 6.2732ms 159.4094 Ops/s 153.3158 Ops/s $\color{#35bf28}+3.97\%$
test_clone 12.8424ms 7.8200ms 127.8776 Ops/s 124.0067 Ops/s $\color{#35bf28}+3.12\%$
test_squeeze 59.1710μs 14.0262μs 71.2950 KOps/s 71.4190 KOps/s $\color{#d91a1a}-0.17\%$
test_unsqueeze 0.1139ms 59.3768μs 16.8416 KOps/s 14.5029 KOps/s $\textbf{\color{#35bf28}+16.13\%}$
test_split 0.2023ms 0.1119ms 8.9378 KOps/s 8.6619 KOps/s $\color{#35bf28}+3.18\%$
test_permute 0.2381ms 0.1262ms 7.9212 KOps/s 7.2841 KOps/s $\textbf{\color{#35bf28}+8.75\%}$
test_stack 30.2448ms 22.7772ms 43.9036 Ops/s 44.7675 Ops/s $\color{#d91a1a}-1.93\%$
test_cat 25.3157ms 22.7583ms 43.9400 Ops/s 44.6924 Ops/s $\color{#d91a1a}-1.68\%$

@github-actions
Copy link

github-actions bot commented May 28, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5825ms 13.0076μs 76.8780 KOps/s 76.9641 KOps/s $\color{#d91a1a}-0.11\%$
test_plain_set_stack_nested 32.3610μs 13.2734μs 75.3384 KOps/s 76.5092 KOps/s $\color{#d91a1a}-1.53\%$
test_plain_set_nested_inplace 29.4600μs 14.3970μs 69.4590 KOps/s 69.6289 KOps/s $\color{#d91a1a}-0.24\%$
test_plain_set_stack_nested_inplace 38.8910μs 14.5948μs 68.5176 KOps/s 69.2705 KOps/s $\color{#d91a1a}-1.09\%$
test_items 19.4700μs 4.7167μs 212.0122 KOps/s 210.8419 KOps/s $\color{#35bf28}+0.56\%$
test_items_nested 0.3939ms 0.3375ms 2.9629 KOps/s 2.9435 KOps/s $\color{#35bf28}+0.66\%$
test_items_nested_locked 0.4808ms 0.3515ms 2.8453 KOps/s 2.8840 KOps/s $\color{#d91a1a}-1.34\%$
test_items_nested_leaf 0.1110ms 83.6290μs 11.9576 KOps/s 12.0547 KOps/s $\color{#d91a1a}-0.81\%$
test_items_stack_nested 0.3699ms 0.3468ms 2.8833 KOps/s 2.8376 KOps/s $\color{#35bf28}+1.61\%$
test_items_stack_nested_leaf 98.0810μs 82.7369μs 12.0865 KOps/s 11.9675 KOps/s $\color{#35bf28}+0.99\%$
test_items_stack_nested_locked 0.3812ms 0.3392ms 2.9479 KOps/s 2.9274 KOps/s $\color{#35bf28}+0.70\%$
test_keys 20.6110μs 4.3294μs 230.9777 KOps/s 230.5274 KOps/s $\color{#35bf28}+0.20\%$
test_keys_nested 95.2320μs 67.4206μs 14.8323 KOps/s 14.9410 KOps/s $\color{#d91a1a}-0.73\%$
test_keys_nested_locked 2.0116ms 72.6748μs 13.7599 KOps/s 13.8701 KOps/s $\color{#d91a1a}-0.79\%$
test_keys_nested_leaf 78.9910μs 58.1537μs 17.1958 KOps/s 17.3146 KOps/s $\color{#d91a1a}-0.69\%$
test_keys_stack_nested 91.7610μs 66.6144μs 15.0118 KOps/s 14.8760 KOps/s $\color{#35bf28}+0.91\%$
test_keys_stack_nested_leaf 73.1110μs 57.4342μs 17.4112 KOps/s 17.2299 KOps/s $\color{#35bf28}+1.05\%$
test_keys_stack_nested_locked 88.3920μs 71.1999μs 14.0450 KOps/s 13.8859 KOps/s $\color{#35bf28}+1.15\%$
test_values 7.6070μs 1.8332μs 545.4895 KOps/s 553.2704 KOps/s $\color{#d91a1a}-1.41\%$
test_values_nested 53.7510μs 35.3119μs 28.3190 KOps/s 28.2085 KOps/s $\color{#35bf28}+0.39\%$
test_values_nested_locked 61.0010μs 37.5157μs 26.6555 KOps/s 26.8939 KOps/s $\color{#d91a1a}-0.89\%$
test_values_nested_leaf 47.8810μs 31.3986μs 31.8486 KOps/s 31.8063 KOps/s $\color{#35bf28}+0.13\%$
test_values_stack_nested 57.1610μs 36.0311μs 27.7538 KOps/s 27.6346 KOps/s $\color{#35bf28}+0.43\%$
test_values_stack_nested_leaf 48.2500μs 32.1154μs 31.1377 KOps/s 30.6582 KOps/s $\color{#35bf28}+1.56\%$
test_values_stack_nested_locked 61.2610μs 38.0893μs 26.2541 KOps/s 26.3778 KOps/s $\color{#d91a1a}-0.47\%$
test_membership 2.7571μs 0.7253μs 1.3788 MOps/s 1.3618 MOps/s $\color{#35bf28}+1.25\%$
test_membership_nested 20.8700μs 2.6002μs 384.5879 KOps/s 390.7616 KOps/s $\color{#d91a1a}-1.58\%$
test_membership_nested_leaf 21.7300μs 2.6263μs 380.7599 KOps/s 391.4864 KOps/s $\color{#d91a1a}-2.74\%$
test_membership_stacked_nested 15.3800μs 2.6207μs 381.5842 KOps/s 385.0058 KOps/s $\color{#d91a1a}-0.89\%$
test_membership_stacked_nested_leaf 15.0200μs 2.6198μs 381.7069 KOps/s 386.1655 KOps/s $\color{#d91a1a}-1.15\%$
test_membership_nested_last 16.8800μs 3.1338μs 319.0982 KOps/s 321.8878 KOps/s $\color{#d91a1a}-0.87\%$
test_membership_nested_leaf_last 26.5500μs 3.1104μs 321.5062 KOps/s 321.6208 KOps/s $\color{#d91a1a}-0.04\%$
test_membership_stacked_nested_last 30.0110μs 9.8385μs 101.6416 KOps/s 318.6476 KOps/s $\textbf{\color{#d91a1a}-68.10\%}$
test_membership_stacked_nested_leaf_last 22.8300μs 9.8704μs 101.3134 KOps/s 319.7340 KOps/s $\textbf{\color{#d91a1a}-68.31\%}$
test_nested_getleaf 33.9900μs 8.3711μs 119.4581 KOps/s 118.2376 KOps/s $\color{#35bf28}+1.03\%$
test_nested_get 23.0800μs 7.8631μs 127.1764 KOps/s 126.1773 KOps/s $\color{#35bf28}+0.79\%$
test_stacked_getleaf 36.6910μs 8.4137μs 118.8535 KOps/s 118.0965 KOps/s $\color{#35bf28}+0.64\%$
test_stacked_get 30.1510μs 7.9462μs 125.8469 KOps/s 125.3852 KOps/s $\color{#35bf28}+0.37\%$
test_nested_getitemleaf 25.6910μs 8.5305μs 117.2263 KOps/s 116.0903 KOps/s $\color{#35bf28}+0.98\%$
test_nested_getitem 22.6000μs 8.0177μs 124.7242 KOps/s 123.9708 KOps/s $\color{#35bf28}+0.61\%$
test_stacked_getitemleaf 26.8700μs 8.5550μs 116.8901 KOps/s 116.0661 KOps/s $\color{#35bf28}+0.71\%$
test_stacked_getitem 25.4510μs 8.0468μs 124.2731 KOps/s 122.1749 KOps/s $\color{#35bf28}+1.72\%$
test_lock_nested 59.0151ms 0.4160ms 2.4041 KOps/s 2.4094 KOps/s $\color{#d91a1a}-0.22\%$
test_lock_stack_nested 0.3294ms 0.3049ms 3.2801 KOps/s 3.2315 KOps/s $\color{#35bf28}+1.50\%$
test_unlock_nested 60.5371ms 0.4145ms 2.4128 KOps/s 2.7942 KOps/s $\textbf{\color{#d91a1a}-13.65\%}$
test_unlock_stack_nested 0.3389ms 0.3135ms 3.1899 KOps/s 3.1641 KOps/s $\color{#35bf28}+0.81\%$
test_flatten_speed 0.1860ms 0.1033ms 9.6788 KOps/s 9.8646 KOps/s $\color{#d91a1a}-1.88\%$
test_unflatten_speed 0.3334ms 0.2933ms 3.4098 KOps/s 3.4321 KOps/s $\color{#d91a1a}-0.65\%$
test_common_ops 1.0754ms 0.5854ms 1.7082 KOps/s 1.6860 KOps/s $\color{#35bf28}+1.32\%$
test_creation 33.5710μs 1.6420μs 608.9989 KOps/s 605.2293 KOps/s $\color{#35bf28}+0.62\%$
test_creation_empty 24.9110μs 9.0564μs 110.4187 KOps/s 111.1586 KOps/s $\color{#d91a1a}-0.67\%$
test_creation_nested_1 32.1610μs 10.8571μs 92.1053 KOps/s 92.3290 KOps/s $\color{#d91a1a}-0.24\%$
test_creation_nested_2 28.6900μs 12.9346μs 77.3121 KOps/s 76.9277 KOps/s $\color{#35bf28}+0.50\%$
test_clone 85.1710μs 11.5331μs 86.7067 KOps/s 85.2932 KOps/s $\color{#35bf28}+1.66\%$
test_getitem[int] 25.2900μs 10.9613μs 91.2303 KOps/s 89.9237 KOps/s $\color{#35bf28}+1.45\%$
test_getitem[slice_int] 50.1910μs 20.7892μs 48.1020 KOps/s 47.1856 KOps/s $\color{#35bf28}+1.94\%$
test_getitem[range] 76.3110μs 50.1128μs 19.9550 KOps/s 20.9948 KOps/s $\color{#d91a1a}-4.95\%$
test_getitem[tuple] 59.9510μs 18.8582μs 53.0274 KOps/s 52.3171 KOps/s $\color{#35bf28}+1.36\%$
test_getitem[list] 0.1350ms 33.8991μs 29.4993 KOps/s 29.3417 KOps/s $\color{#35bf28}+0.54\%$
test_setitem_dim[int] 50.3510μs 30.1259μs 33.1941 KOps/s 33.1275 KOps/s $\color{#35bf28}+0.20\%$
test_setitem_dim[slice_int] 72.9510μs 51.1044μs 19.5678 KOps/s 19.7113 KOps/s $\color{#d91a1a}-0.73\%$
test_setitem_dim[range] 89.6210μs 68.2906μs 14.6433 KOps/s 14.8306 KOps/s $\color{#d91a1a}-1.26\%$
test_setitem_dim[tuple] 61.8110μs 44.5115μs 22.4661 KOps/s 22.7664 KOps/s $\color{#d91a1a}-1.32\%$
test_setitem 51.9010μs 16.5498μs 60.4235 KOps/s 60.1745 KOps/s $\color{#35bf28}+0.41\%$
test_set 42.3510μs 16.1427μs 61.9477 KOps/s 61.6930 KOps/s $\color{#35bf28}+0.41\%$
test_set_shared 1.5046ms 96.9360μs 10.3161 KOps/s 10.2055 KOps/s $\color{#35bf28}+1.08\%$
test_update 0.1006ms 18.5000μs 54.0540 KOps/s 54.4449 KOps/s $\color{#d91a1a}-0.72\%$
test_update_nested 66.1610μs 24.3275μs 41.1057 KOps/s 42.4524 KOps/s $\color{#d91a1a}-3.17\%$
test_update__nested 48.9610μs 22.8245μs 43.8126 KOps/s 44.2078 KOps/s $\color{#d91a1a}-0.89\%$
test_set_nested 71.5710μs 17.1034μs 58.4680 KOps/s 59.3113 KOps/s $\color{#d91a1a}-1.42\%$
test_set_nested_new 67.4710μs 20.2817μs 49.3056 KOps/s 49.8916 KOps/s $\color{#d91a1a}-1.17\%$
test_select 67.3510μs 31.9560μs 31.2930 KOps/s 30.6364 KOps/s $\color{#35bf28}+2.14\%$
test_select_nested 0.8348ms 54.5683μs 18.3256 KOps/s 18.2831 KOps/s $\color{#35bf28}+0.23\%$
test_exclude_nested 0.1295ms 0.1115ms 8.9690 KOps/s 9.1130 KOps/s $\color{#d91a1a}-1.58\%$
test_empty[True] 0.4216ms 0.3553ms 2.8148 KOps/s 2.8679 KOps/s $\color{#d91a1a}-1.85\%$
test_empty[False] 3.0150μs 0.9183μs 1.0890 MOps/s 1.0859 MOps/s $\color{#35bf28}+0.29\%$
test_to 0.1026ms 78.0336μs 12.8150 KOps/s 13.1616 KOps/s $\color{#d91a1a}-2.63\%$
test_to_nonblocking 91.9510μs 61.6463μs 16.2216 KOps/s 16.8854 KOps/s $\color{#d91a1a}-3.93\%$
test_unbind_speed 1.5608ms 0.2710ms 3.6902 KOps/s 3.7042 KOps/s $\color{#d91a1a}-0.38\%$
test_unbind_speed_stack0 0.2924ms 0.2715ms 3.6829 KOps/s 3.6705 KOps/s $\color{#35bf28}+0.34\%$
test_unbind_speed_stack1 76.0381ms 0.8079ms 1.2377 KOps/s 1.2156 KOps/s $\color{#35bf28}+1.82\%$
test_split 76.7739ms 1.7076ms 585.6132 Ops/s 578.2267 Ops/s $\color{#35bf28}+1.28\%$
test_chunk 76.6379ms 1.7028ms 587.2637 Ops/s 578.1654 Ops/s $\color{#35bf28}+1.57\%$
test_creation[device0] 0.1299ms 57.9454μs 17.2576 KOps/s 17.2107 KOps/s $\color{#35bf28}+0.27\%$
test_creation_from_tensor 0.1308ms 55.0041μs 18.1805 KOps/s 18.3874 KOps/s $\color{#d91a1a}-1.13\%$
test_add_one[memmap_tensor0] 96.1310μs 7.0755μs 141.3334 KOps/s 135.5242 KOps/s $\color{#35bf28}+4.29\%$
test_contiguous[memmap_tensor0] 23.1200μs 0.6621μs 1.5104 MOps/s 1.5174 MOps/s $\color{#d91a1a}-0.46\%$
test_stack[memmap_tensor0] 28.7000μs 4.6646μs 214.3803 KOps/s 209.3450 KOps/s $\color{#35bf28}+2.41\%$
test_memmaptd_index 0.5039ms 0.2882ms 3.4701 KOps/s 3.4758 KOps/s $\color{#d91a1a}-0.16\%$
test_memmaptd_index_astensor 0.6449ms 0.3575ms 2.7970 KOps/s 2.7638 KOps/s $\color{#35bf28}+1.20\%$
test_memmaptd_index_op 1.0959ms 0.6611ms 1.5127 KOps/s 1.4946 KOps/s $\color{#35bf28}+1.21\%$
test_serialize_model 0.1871s 0.1122s 8.9122 Ops/s 8.6300 Ops/s $\color{#35bf28}+3.27\%$
test_serialize_model_pickle 1.3659s 1.2376s 0.8080 Ops/s 0.8062 Ops/s $\color{#35bf28}+0.22\%$
test_serialize_weights 0.1833s 0.1094s 9.1388 Ops/s 8.7925 Ops/s $\color{#35bf28}+3.94\%$
test_serialize_weights_returnearly 0.2397s 97.3281ms 10.2745 Ops/s 10.1937 Ops/s $\color{#35bf28}+0.79\%$
test_serialize_weights_pickle 1.3554s 1.2485s 0.8010 Ops/s 0.8010 Ops/s $-0.00\%$
test_reshape_pytree 51.4620μs 26.0867μs 38.3337 KOps/s 38.0298 KOps/s $\color{#35bf28}+0.80\%$
test_reshape_td 77.3710μs 30.8264μs 32.4398 KOps/s 32.0154 KOps/s $\color{#35bf28}+1.33\%$
test_view_pytree 57.3410μs 25.8712μs 38.6530 KOps/s 38.5554 KOps/s $\color{#35bf28}+0.25\%$
test_view_td 62.6810μs 35.9471μs 27.8186 KOps/s 27.8752 KOps/s $\color{#d91a1a}-0.20\%$
test_unbind_pytree 54.7510μs 32.9106μs 30.3853 KOps/s 29.9151 KOps/s $\color{#35bf28}+1.57\%$
test_unbind_td 0.4235ms 41.8938μs 23.8699 KOps/s 23.7673 KOps/s $\color{#35bf28}+0.43\%$
test_split_pytree 63.8810μs 34.6467μs 28.8628 KOps/s 28.2941 KOps/s $\color{#35bf28}+2.01\%$
test_split_td 0.1060ms 39.3370μs 25.4213 KOps/s 25.2176 KOps/s $\color{#35bf28}+0.81\%$
test_add_pytree 65.2110μs 38.2303μs 26.1572 KOps/s 25.7390 KOps/s $\color{#35bf28}+1.62\%$
test_add_td 72.9310μs 49.8878μs 20.0450 KOps/s 19.5118 KOps/s $\color{#35bf28}+2.73\%$
test_distributed 0.1966ms 65.7800μs 15.2022 KOps/s 11.6034 KOps/s $\textbf{\color{#35bf28}+31.02\%}$
test_tdmodule 84.0220μs 15.0381μs 66.4976 KOps/s 66.6849 KOps/s $\color{#d91a1a}-0.28\%$
test_tdmodule_dispatch 46.1410μs 29.5498μs 33.8411 KOps/s 34.1477 KOps/s $\color{#d91a1a}-0.90\%$
test_tdseq 30.9700μs 16.9521μs 58.9896 KOps/s 59.2318 KOps/s $\color{#d91a1a}-0.41\%$
test_tdseq_dispatch 47.0400μs 32.0111μs 31.2391 KOps/s 30.5462 KOps/s $\color{#35bf28}+2.27\%$
test_instantiation_functorch 1.9520ms 1.5370ms 650.6052 Ops/s 640.9854 Ops/s $\color{#35bf28}+1.50\%$
test_instantiation_td 81.2437ms 1.1437ms 874.3616 Ops/s 938.9383 Ops/s $\textbf{\color{#d91a1a}-6.88\%}$
test_exec_functorch 0.1842ms 0.1501ms 6.6621 KOps/s 6.4728 KOps/s $\color{#35bf28}+2.92\%$
test_exec_functional_call 0.2509ms 0.1417ms 7.0585 KOps/s 6.9059 KOps/s $\color{#35bf28}+2.21\%$
test_exec_td 0.1747ms 0.1399ms 7.1497 KOps/s 6.9850 KOps/s $\color{#35bf28}+2.36\%$
test_exec_td_decorator 0.6479ms 0.2123ms 4.7099 KOps/s 4.6614 KOps/s $\color{#35bf28}+1.04\%$
test_vmap_mlp_speed[True-True] 0.7497ms 0.6035ms 1.6570 KOps/s 1.6582 KOps/s $\color{#d91a1a}-0.07\%$
test_vmap_mlp_speed[True-False] 0.6519ms 0.5997ms 1.6675 KOps/s 1.6665 KOps/s $\color{#35bf28}+0.06\%$
test_vmap_mlp_speed[False-True] 0.5956ms 0.5307ms 1.8844 KOps/s 1.8867 KOps/s $\color{#d91a1a}-0.12\%$
test_vmap_mlp_speed[False-False] 0.6025ms 0.5303ms 1.8859 KOps/s 1.8781 KOps/s $\color{#35bf28}+0.41\%$
test_vmap_mlp_speed_decorator[True-True] 1.3533ms 0.6707ms 1.4909 KOps/s 1.5144 KOps/s $\color{#d91a1a}-1.55\%$
test_vmap_mlp_speed_decorator[True-False] 0.7825ms 0.6646ms 1.5046 KOps/s 1.5136 KOps/s $\color{#d91a1a}-0.59\%$
test_vmap_mlp_speed_decorator[False-True] 0.9060ms 0.6189ms 1.6158 KOps/s 1.7079 KOps/s $\textbf{\color{#d91a1a}-5.39\%}$
test_vmap_mlp_speed_decorator[False-False] 0.8155ms 0.6053ms 1.6522 KOps/s 1.7138 KOps/s $\color{#d91a1a}-3.59\%$
test_vmap_transformer_speed[True-True] 8.6987ms 8.2470ms 121.2557 Ops/s 123.7995 Ops/s $\color{#d91a1a}-2.05\%$
test_vmap_transformer_speed[True-False] 8.8854ms 8.2474ms 121.2507 Ops/s 123.4044 Ops/s $\color{#d91a1a}-1.75\%$
test_vmap_transformer_speed[False-True] 8.4698ms 8.1590ms 122.5640 Ops/s 125.0272 Ops/s $\color{#d91a1a}-1.97\%$
test_vmap_transformer_speed[False-False] 8.9095ms 8.1614ms 122.5276 Ops/s 125.0610 Ops/s $\color{#d91a1a}-2.03\%$
test_vmap_transformer_speed_decorator[True-True] 20.6840ms 19.9415ms 50.1467 Ops/s 50.9500 Ops/s $\color{#d91a1a}-1.58\%$
test_vmap_transformer_speed_decorator[True-False] 20.4687ms 19.9899ms 50.0254 Ops/s 50.8977 Ops/s $\color{#d91a1a}-1.71\%$
test_vmap_transformer_speed_decorator[False-True] 20.4224ms 19.8740ms 50.3171 Ops/s 51.1700 Ops/s $\color{#d91a1a}-1.67\%$
test_vmap_transformer_speed_decorator[False-False] 20.6796ms 19.7421ms 50.6531 Ops/s 51.2869 Ops/s $\color{#d91a1a}-1.24\%$
test_to_module_speed[True] 1.6451ms 1.5344ms 651.7013 Ops/s 653.9910 Ops/s $\color{#d91a1a}-0.35\%$
test_to_module_speed[False] 1.6093ms 1.5136ms 660.6983 Ops/s 661.2643 Ops/s $\color{#d91a1a}-0.09\%$
test_tc_init 0.1315ms 25.0505μs 39.9194 KOps/s 39.7503 KOps/s $\color{#35bf28}+0.43\%$
test_tc_init_nested 73.2810μs 49.3279μs 20.2725 KOps/s 18.8358 KOps/s $\textbf{\color{#35bf28}+7.63\%}$
test_tc_first_layer_tensor 10.6600μs 0.4879μs 2.0495 MOps/s 2.7766 MOps/s $\textbf{\color{#d91a1a}-26.19\%}$
test_tc_first_layer_nontensor 2.0746μs 0.3959μs 2.5258 MOps/s 2.5653 MOps/s $\color{#d91a1a}-1.54\%$
test_tc_second_layer_tensor 7.2542μs 0.9765μs 1.0241 MOps/s 1.0263 MOps/s $\color{#d91a1a}-0.22\%$
test_tc_second_layer_nontensor 5.4400μs 0.8316μs 1.2025 MOps/s 1.2244 MOps/s $\color{#d91a1a}-1.78\%$
test_unbind 95.5886ms 6.6991ms 149.2733 Ops/s 128.6951 Ops/s $\textbf{\color{#35bf28}+15.99\%}$
test_full_like 11.4492ms 11.1531ms 89.6608 Ops/s 73.6863 Ops/s $\textbf{\color{#35bf28}+21.68\%}$
test_zeros_like 8.0081ms 7.8298ms 127.7173 Ops/s 126.1463 Ops/s $\color{#35bf28}+1.25\%$
test_ones_like 8.8425ms 7.8393ms 127.5624 Ops/s 126.4298 Ops/s $\color{#35bf28}+0.90\%$
test_clone 9.7730ms 9.4861ms 105.4171 Ops/s 106.2923 Ops/s $\color{#d91a1a}-0.82\%$
test_squeeze 60.4710μs 11.0128μs 90.8035 KOps/s 88.7153 KOps/s $\color{#35bf28}+2.35\%$
test_unsqueeze 0.1078ms 52.5374μs 19.0341 KOps/s 15.7438 KOps/s $\textbf{\color{#35bf28}+20.90\%}$
test_split 0.1446ms 0.1017ms 9.8303 KOps/s 9.7228 KOps/s $\color{#35bf28}+1.11\%$
test_permute 0.1453ms 0.1092ms 9.1570 KOps/s 7.9762 KOps/s $\textbf{\color{#35bf28}+14.80\%}$
test_stack 27.7603ms 27.4134ms 36.4785 Ops/s 36.5782 Ops/s $\color{#d91a1a}-0.27\%$
test_cat 27.9966ms 27.3929ms 36.5058 Ops/s 36.6549 Ops/s $\color{#d91a1a}-0.41\%$

@vmoens vmoens added the enhancement New feature or request label May 28, 2024
@vmoens vmoens merged commit f5a3f95 into main May 30, 2024
@vmoens vmoens deleted the cleanup-tensorclass-methods branch May 30, 2024 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants