Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Jul 19, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 19, 2024
@vmoens vmoens merged commit 40be565 into main Jul 19, 2024
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}26$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 45.5250μs 21.9617μs 45.5338 KOps/s 44.0560 KOps/s $\color{#35bf28}+3.35\%$
test_plain_set_stack_nested 63.4780μs 22.1794μs 45.0869 KOps/s 43.9914 KOps/s $\color{#35bf28}+2.49\%$
test_plain_set_nested_inplace 89.7970μs 23.9012μs 41.8389 KOps/s 40.3770 KOps/s $\color{#35bf28}+3.62\%$
test_plain_set_stack_nested_inplace 76.5330μs 24.1039μs 41.4871 KOps/s 41.0286 KOps/s $\color{#35bf28}+1.12\%$
test_items 24.0540μs 2.5795μs 387.6664 KOps/s 357.7499 KOps/s $\textbf{\color{#35bf28}+8.36\%}$
test_items_nested 0.7392ms 0.3611ms 2.7695 KOps/s 2.7502 KOps/s $\color{#35bf28}+0.70\%$
test_items_nested_locked 2.1839ms 0.3632ms 2.7534 KOps/s 2.7660 KOps/s $\color{#d91a1a}-0.45\%$
test_items_nested_leaf 0.3255ms 89.1020μs 11.2231 KOps/s 11.6143 KOps/s $\color{#d91a1a}-3.37\%$
test_items_stack_nested 0.5639ms 0.3645ms 2.7434 KOps/s 2.7288 KOps/s $\color{#35bf28}+0.54\%$
test_items_stack_nested_leaf 0.1630ms 85.4703μs 11.7000 KOps/s 11.5941 KOps/s $\color{#35bf28}+0.91\%$
test_items_stack_nested_locked 0.7707ms 0.3696ms 2.7054 KOps/s 2.7677 KOps/s $\color{#d91a1a}-2.25\%$
test_keys 47.6290μs 3.8536μs 259.4956 KOps/s 244.1452 KOps/s $\textbf{\color{#35bf28}+6.29\%}$
test_keys_nested 0.3137ms 0.1430ms 6.9935 KOps/s 6.8945 KOps/s $\color{#35bf28}+1.44\%$
test_keys_nested_locked 1.9063ms 0.1537ms 6.5081 KOps/s 6.6518 KOps/s $\color{#d91a1a}-2.16\%$
test_keys_nested_leaf 0.2186ms 0.1237ms 8.0815 KOps/s 8.0074 KOps/s $\color{#35bf28}+0.93\%$
test_keys_stack_nested 0.2408ms 0.1447ms 6.9100 KOps/s 6.8815 KOps/s $\color{#35bf28}+0.42\%$
test_keys_stack_nested_leaf 0.2161ms 0.1238ms 8.0749 KOps/s 8.0556 KOps/s $\color{#35bf28}+0.24\%$
test_keys_stack_nested_locked 0.2599ms 0.1493ms 6.6962 KOps/s 6.6383 KOps/s $\color{#35bf28}+0.87\%$
test_values 10.3143μs 1.1639μs 859.1836 KOps/s 861.2822 KOps/s $\color{#d91a1a}-0.24\%$
test_values_nested 91.4700μs 48.9993μs 20.4085 KOps/s 20.0089 KOps/s $\color{#35bf28}+2.00\%$
test_values_nested_locked 94.2850μs 48.7434μs 20.5156 KOps/s 19.9248 KOps/s $\color{#35bf28}+2.97\%$
test_values_nested_leaf 96.3670μs 44.7104μs 22.3662 KOps/s 22.3119 KOps/s $\color{#35bf28}+0.24\%$
test_values_stack_nested 0.1002ms 50.6312μs 19.7507 KOps/s 19.9275 KOps/s $\color{#d91a1a}-0.89\%$
test_values_stack_nested_leaf 96.4800μs 44.2706μs 22.5884 KOps/s 22.3222 KOps/s $\color{#35bf28}+1.19\%$
test_values_stack_nested_locked 0.1024ms 51.1807μs 19.5386 KOps/s 20.0695 KOps/s $\color{#d91a1a}-2.65\%$
test_membership 2.8383μs 0.7063μs 1.4159 MOps/s 1.1023 MOps/s $\textbf{\color{#35bf28}+28.45\%}$
test_membership_nested 27.9420μs 2.6925μs 371.3982 KOps/s 372.0381 KOps/s $\color{#d91a1a}-0.17\%$
test_membership_nested_leaf 29.2140μs 2.7090μs 369.1374 KOps/s 326.7643 KOps/s $\textbf{\color{#35bf28}+12.97\%}$
test_membership_stacked_nested 28.4840μs 2.6800μs 373.1388 KOps/s 369.4733 KOps/s $\color{#35bf28}+0.99\%$
test_membership_stacked_nested_leaf 30.5770μs 2.7093μs 369.0963 KOps/s 369.4705 KOps/s $\color{#d91a1a}-0.10\%$
test_membership_nested_last 41.9380μs 3.9699μs 251.8933 KOps/s 250.1299 KOps/s $\color{#35bf28}+0.71\%$
test_membership_nested_leaf_last 24.2750μs 4.0002μs 249.9870 KOps/s 250.9570 KOps/s $\color{#d91a1a}-0.39\%$
test_membership_stacked_nested_last 58.9900μs 12.9737μs 77.0792 KOps/s 252.3239 KOps/s $\textbf{\color{#d91a1a}-69.45\%}$
test_membership_stacked_nested_leaf_last 43.1510μs 13.1713μs 75.9226 KOps/s 248.8684 KOps/s $\textbf{\color{#d91a1a}-69.49\%}$
test_nested_getleaf 57.0260μs 11.0861μs 90.2031 KOps/s 95.1283 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_nested_get 51.8170μs 10.6157μs 94.1997 KOps/s 99.3060 KOps/s $\textbf{\color{#d91a1a}-5.14\%}$
test_stacked_getleaf 50.0640μs 10.9301μs 91.4908 KOps/s 94.0257 KOps/s $\color{#d91a1a}-2.70\%$
test_stacked_get 33.6420μs 10.4294μs 95.8824 KOps/s 99.0112 KOps/s $\color{#d91a1a}-3.16\%$
test_nested_getitemleaf 53.8700μs 11.3772μs 87.8949 KOps/s 90.0638 KOps/s $\color{#d91a1a}-2.41\%$
test_nested_getitem 60.3820μs 10.5287μs 94.9789 KOps/s 97.7026 KOps/s $\color{#d91a1a}-2.79\%$
test_stacked_getitemleaf 40.4650μs 11.2858μs 88.6073 KOps/s 89.9988 KOps/s $\color{#d91a1a}-1.55\%$
test_stacked_getitem 56.8350μs 10.5348μs 94.9237 KOps/s 97.4208 KOps/s $\color{#d91a1a}-2.56\%$
test_lock_nested 3.0446ms 0.5262ms 1.9006 KOps/s 1.6648 KOps/s $\textbf{\color{#35bf28}+14.16\%}$
test_lock_stack_nested 0.6106ms 0.4677ms 2.1381 KOps/s 2.0306 KOps/s $\textbf{\color{#35bf28}+5.29\%}$
test_unlock_nested 0.9455ms 0.4449ms 2.2478 KOps/s 1.9542 KOps/s $\textbf{\color{#35bf28}+15.03\%}$
test_unlock_stack_nested 0.5492ms 0.3819ms 2.6186 KOps/s 2.4484 KOps/s $\textbf{\color{#35bf28}+6.95\%}$
test_flatten_speed 0.2396ms 0.1078ms 9.2736 KOps/s 9.3572 KOps/s $\color{#d91a1a}-0.89\%$
test_unflatten_speed 0.6077ms 0.4529ms 2.2078 KOps/s 2.2503 KOps/s $\color{#d91a1a}-1.89\%$
test_common_ops 5.3983ms 1.1482ms 870.9199 Ops/s 850.5103 Ops/s $\color{#35bf28}+2.40\%$
test_creation 26.3890μs 2.5970μs 385.0580 KOps/s 399.5637 KOps/s $\color{#d91a1a}-3.63\%$
test_creation_empty 65.9130μs 18.9621μs 52.7368 KOps/s 48.3839 KOps/s $\textbf{\color{#35bf28}+9.00\%}$
test_creation_nested_1 50.9250μs 22.1614μs 45.1235 KOps/s 42.3411 KOps/s $\textbf{\color{#35bf28}+6.57\%}$
test_creation_nested_2 90.2990μs 26.4394μs 37.8224 KOps/s 35.5785 KOps/s $\textbf{\color{#35bf28}+6.31\%}$
test_clone 87.3230μs 17.8868μs 55.9072 KOps/s 55.9399 KOps/s $\color{#d91a1a}-0.06\%$
test_getitem[int] 0.9763ms 13.3332μs 75.0008 KOps/s 76.8919 KOps/s $\color{#d91a1a}-2.46\%$
test_getitem[slice_int] 0.1541ms 34.0167μs 29.3973 KOps/s 30.5000 KOps/s $\color{#d91a1a}-3.62\%$
test_getitem[range] 0.4018ms 57.1003μs 17.5131 KOps/s 17.5687 KOps/s $\color{#d91a1a}-0.32\%$
test_getitem[tuple] 0.1322ms 27.2979μs 36.6328 KOps/s 36.5675 KOps/s $\color{#35bf28}+0.18\%$
test_getitem[list] 0.2565ms 52.0509μs 19.2120 KOps/s 19.3084 KOps/s $\color{#d91a1a}-0.50\%$
test_setitem_dim[int] 57.7170μs 32.7733μs 30.5126 KOps/s 26.6784 KOps/s $\textbf{\color{#35bf28}+14.37\%}$
test_setitem_dim[slice_int] 0.1685ms 70.7185μs 14.1406 KOps/s 13.1765 KOps/s $\textbf{\color{#35bf28}+7.32\%}$
test_setitem_dim[range] 0.1389ms 92.2327μs 10.8421 KOps/s 10.1312 KOps/s $\textbf{\color{#35bf28}+7.02\%}$
test_setitem_dim[tuple] 0.1077ms 58.9015μs 16.9775 KOps/s 15.6625 KOps/s $\textbf{\color{#35bf28}+8.40\%}$
test_setitem 0.1111ms 30.8819μs 32.3814 KOps/s 31.3893 KOps/s $\color{#35bf28}+3.16\%$
test_set 0.1247ms 30.0619μs 33.2647 KOps/s 31.5649 KOps/s $\textbf{\color{#35bf28}+5.39\%}$
test_set_shared 3.8803ms 0.2219ms 4.5075 KOps/s 4.4708 KOps/s $\color{#35bf28}+0.82\%$
test_update 0.1658ms 36.6309μs 27.2994 KOps/s 25.5093 KOps/s $\textbf{\color{#35bf28}+7.02\%}$
test_update_nested 0.2261ms 47.4724μs 21.0649 KOps/s 19.9409 KOps/s $\textbf{\color{#35bf28}+5.64\%}$
test_update__nested 0.1950ms 34.8115μs 28.7262 KOps/s 28.3219 KOps/s $\color{#35bf28}+1.43\%$
test_set_nested 0.1649ms 32.2656μs 30.9928 KOps/s 30.0308 KOps/s $\color{#35bf28}+3.20\%$
test_set_nested_new 0.1400ms 37.0331μs 27.0029 KOps/s 25.7790 KOps/s $\color{#35bf28}+4.75\%$
test_select 0.1462ms 54.8544μs 18.2301 KOps/s 18.1048 KOps/s $\color{#35bf28}+0.69\%$
test_select_nested 0.1283ms 60.5448μs 16.5167 KOps/s 16.6768 KOps/s $\color{#d91a1a}-0.96\%$
test_exclude_nested 0.1628ms 81.6546μs 12.2467 KOps/s 12.4947 KOps/s $\color{#d91a1a}-1.98\%$
test_empty[True] 0.4698ms 0.3475ms 2.8775 KOps/s 2.9709 KOps/s $\color{#d91a1a}-3.14\%$
test_empty[False] 11.7895μs 1.2930μs 773.3822 KOps/s 819.6408 KOps/s $\textbf{\color{#d91a1a}-5.64\%}$
test_unbind_speed 0.7840ms 0.3350ms 2.9849 KOps/s 2.9951 KOps/s $\color{#d91a1a}-0.34\%$
test_unbind_speed_stack0 0.5580ms 0.3086ms 3.2409 KOps/s 3.0356 KOps/s $\textbf{\color{#35bf28}+6.76\%}$
test_unbind_speed_stack1 84.0227ms 0.8102ms 1.2342 KOps/s 1.3879 KOps/s $\textbf{\color{#d91a1a}-11.07\%}$
test_split 84.2526ms 2.2327ms 447.8861 Ops/s 425.0238 Ops/s $\textbf{\color{#35bf28}+5.38\%}$
test_chunk 83.9164ms 2.2321ms 448.0180 Ops/s 420.9829 Ops/s $\textbf{\color{#35bf28}+6.42\%}$
test_creation[device0] 0.2793ms 0.1227ms 8.1528 KOps/s 8.0588 KOps/s $\color{#35bf28}+1.17\%$
test_creation_from_tensor 3.3868ms 0.1229ms 8.1370 KOps/s 8.1320 KOps/s $\color{#35bf28}+0.06\%$
test_add_one[memmap_tensor0] 0.1872ms 7.6793μs 130.2203 KOps/s 120.5144 KOps/s $\textbf{\color{#35bf28}+8.05\%}$
test_contiguous[memmap_tensor0] 25.9180μs 2.2112μs 452.2454 KOps/s 450.3832 KOps/s $\color{#35bf28}+0.41\%$
test_stack[memmap_tensor0] 72.2750μs 6.0014μs 166.6291 KOps/s 164.9375 KOps/s $\color{#35bf28}+1.03\%$
test_memmaptd_index 1.1382ms 0.4386ms 2.2802 KOps/s 2.2862 KOps/s $\color{#d91a1a}-0.26\%$
test_memmaptd_index_astensor 0.7801ms 0.5166ms 1.9357 KOps/s 1.9206 KOps/s $\color{#35bf28}+0.79\%$
test_memmaptd_index_op 1.4163ms 1.0637ms 940.1244 Ops/s 900.7616 Ops/s $\color{#35bf28}+4.37\%$
test_serialize_model 0.1352s 0.1283s 7.7947 Ops/s 7.0631 Ops/s $\textbf{\color{#35bf28}+10.36\%}$
test_serialize_model_pickle 0.4465s 0.3941s 2.5377 Ops/s 2.4965 Ops/s $\color{#35bf28}+1.65\%$
test_serialize_weights 0.1457s 0.1292s 7.7397 Ops/s 7.8621 Ops/s $\color{#d91a1a}-1.56\%$
test_serialize_weights_returnearly 0.1858s 0.1664s 6.0096 Ops/s 6.0906 Ops/s $\color{#d91a1a}-1.33\%$
test_serialize_weights_pickle 1.0404s 0.7489s 1.3352 Ops/s 2.4615 Ops/s $\textbf{\color{#d91a1a}-45.76\%}$
test_serialize_weights_filesystem 0.1477s 0.1448s 6.9084 Ops/s 6.9730 Ops/s $\color{#d91a1a}-0.93\%$
test_serialize_model_filesystem 0.1552s 0.1490s 6.7123 Ops/s 6.7188 Ops/s $\color{#d91a1a}-0.10\%$
test_reshape_pytree 89.7170μs 39.4581μs 25.3434 KOps/s 25.4025 KOps/s $\color{#d91a1a}-0.23\%$
test_reshape_td 0.1352ms 49.5370μs 20.1869 KOps/s 19.4473 KOps/s $\color{#35bf28}+3.80\%$
test_view_pytree 95.6790μs 39.2206μs 25.4968 KOps/s 25.6273 KOps/s $\color{#d91a1a}-0.51\%$
test_view_td 0.1187ms 55.1220μs 18.1416 KOps/s 17.2698 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_unbind_pytree 84.5980μs 35.7377μs 27.9816 KOps/s 28.1659 KOps/s $\color{#d91a1a}-0.65\%$
test_unbind_td 0.3589ms 48.6826μs 20.5412 KOps/s 17.8744 KOps/s $\textbf{\color{#35bf28}+14.92\%}$
test_split_pytree 0.1077ms 39.1108μs 25.5684 KOps/s 26.4003 KOps/s $\color{#d91a1a}-3.15\%$
test_split_td 0.5476ms 60.5027μs 16.5282 KOps/s 16.1651 KOps/s $\color{#35bf28}+2.25\%$
test_add_pytree 0.1217ms 44.0748μs 22.6887 KOps/s 22.4682 KOps/s $\color{#35bf28}+0.98\%$
test_add_td 0.1734ms 82.1569μs 12.1718 KOps/s 11.2424 KOps/s $\textbf{\color{#35bf28}+8.27\%}$
test_distributed 0.2937ms 0.1318ms 7.5861 KOps/s 7.5281 KOps/s $\color{#35bf28}+0.77\%$
test_tdmodule 40.7560μs 17.5237μs 57.0654 KOps/s 57.2777 KOps/s $\color{#d91a1a}-0.37\%$
test_tdmodule_dispatch 78.8770μs 37.2370μs 26.8550 KOps/s 26.9176 KOps/s $\color{#d91a1a}-0.23\%$
test_tdseq 41.9380μs 19.6569μs 50.8726 KOps/s 52.0305 KOps/s $\color{#d91a1a}-2.23\%$
test_tdseq_dispatch 85.8200μs 43.0589μs 23.2240 KOps/s 24.0971 KOps/s $\color{#d91a1a}-3.62\%$
test_instantiation_functorch 2.9127ms 1.5996ms 625.1481 Ops/s 637.1188 Ops/s $\color{#d91a1a}-1.88\%$
test_instantiation_td 1.8868ms 1.1770ms 849.5863 Ops/s 882.3800 Ops/s $\color{#d91a1a}-3.72\%$
test_exec_functorch 0.3198ms 0.1801ms 5.5526 KOps/s 5.4188 KOps/s $\color{#35bf28}+2.47\%$
test_exec_functional_call 0.2930ms 0.1693ms 5.9068 KOps/s 5.7416 KOps/s $\color{#35bf28}+2.88\%$
test_exec_td 0.3253ms 0.1722ms 5.8081 KOps/s 5.8716 KOps/s $\color{#d91a1a}-1.08\%$
test_exec_td_decorator 83.8653ms 0.3218ms 3.1079 KOps/s 3.9247 KOps/s $\textbf{\color{#d91a1a}-20.81\%}$
test_vmap_mlp_speed[True-True] 0.8064ms 0.6111ms 1.6365 KOps/s 1.6191 KOps/s $\color{#35bf28}+1.07\%$
test_vmap_mlp_speed[True-False] 1.3898ms 0.6191ms 1.6154 KOps/s 1.6322 KOps/s $\color{#d91a1a}-1.03\%$
test_vmap_mlp_speed[False-True] 0.7495ms 0.4962ms 2.0152 KOps/s 1.9881 KOps/s $\color{#35bf28}+1.36\%$
test_vmap_mlp_speed[False-False] 0.6995ms 0.4954ms 2.0185 KOps/s 1.9735 KOps/s $\color{#35bf28}+2.28\%$
test_vmap_mlp_speed_decorator[True-True] 1.2148ms 0.6980ms 1.4327 KOps/s 1.4159 KOps/s $\color{#35bf28}+1.18\%$
test_vmap_mlp_speed_decorator[True-False] 1.2134ms 0.7026ms 1.4233 KOps/s 1.4189 KOps/s $\color{#35bf28}+0.31\%$
test_vmap_mlp_speed_decorator[False-True] 0.8101ms 0.5771ms 1.7328 KOps/s 1.7194 KOps/s $\color{#35bf28}+0.78\%$
test_vmap_mlp_speed_decorator[False-False] 0.7292ms 0.5776ms 1.7314 KOps/s 1.7054 KOps/s $\color{#35bf28}+1.52\%$
test_to_module_speed[True] 2.8508ms 1.8597ms 537.7103 Ops/s 558.4528 Ops/s $\color{#d91a1a}-3.71\%$
test_to_module_speed[False] 4.1727ms 1.8502ms 540.4939 Ops/s 565.2006 Ops/s $\color{#d91a1a}-4.37\%$
test_tc_init 0.1001ms 43.9544μs 22.7508 KOps/s 22.0774 KOps/s $\color{#35bf28}+3.05\%$
test_tc_init_nested 0.1874ms 87.2828μs 11.4570 KOps/s 11.1972 KOps/s $\color{#35bf28}+2.32\%$
test_tc_first_layer_tensor 51.5160μs 9.4453μs 105.8731 KOps/s 109.4267 KOps/s $\color{#d91a1a}-3.25\%$
test_tc_first_layer_nontensor 40.6260μs 9.2622μs 107.9654 KOps/s 108.8915 KOps/s $\color{#d91a1a}-0.85\%$
test_tc_second_layer_tensor 78.9380μs 2.9365μs 340.5458 KOps/s 356.0622 KOps/s $\color{#d91a1a}-4.36\%$
test_tc_second_layer_nontensor 35.0760μs 10.5991μs 94.3479 KOps/s 98.3132 KOps/s $\color{#d91a1a}-4.03\%$
test_unbind 0.1090s 13.9113ms 71.8840 Ops/s 69.7159 Ops/s $\color{#35bf28}+3.11\%$
test_full_like 10.4053ms 8.4895ms 117.7920 Ops/s 141.7561 Ops/s $\textbf{\color{#d91a1a}-16.91\%}$
test_zeros_like 11.7537ms 7.0446ms 141.9529 Ops/s 142.5975 Ops/s $\color{#d91a1a}-0.45\%$
test_ones_like 16.4042ms 8.1388ms 122.8679 Ops/s 133.5150 Ops/s $\textbf{\color{#d91a1a}-7.97\%}$
test_clone 18.6706ms 10.0841ms 99.1663 Ops/s 109.6168 Ops/s $\textbf{\color{#d91a1a}-9.53\%}$
test_squeeze 71.2320μs 15.1322μs 66.0841 KOps/s 69.0604 KOps/s $\color{#d91a1a}-4.31\%$
test_unsqueeze 0.1650ms 97.9576μs 10.2085 KOps/s 10.1546 KOps/s $\color{#35bf28}+0.53\%$
test_split 0.4583ms 0.2108ms 4.7432 KOps/s 4.7645 KOps/s $\color{#d91a1a}-0.45\%$
test_permute 0.3733ms 0.2241ms 4.4618 KOps/s 4.3284 KOps/s $\color{#35bf28}+3.08\%$
test_stack 33.9953ms 26.9606ms 37.0912 Ops/s 40.4623 Ops/s $\textbf{\color{#d91a1a}-8.33\%}$
test_cat 38.1198ms 28.0177ms 35.6917 Ops/s 40.1871 Ops/s $\textbf{\color{#d91a1a}-11.19\%}$

@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}25$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 27.4200μs 16.6275μs 60.1413 KOps/s 55.7803 KOps/s $\textbf{\color{#35bf28}+7.82\%}$
test_plain_set_stack_nested 34.6610μs 16.2696μs 61.4642 KOps/s 56.1647 KOps/s $\textbf{\color{#35bf28}+9.44\%}$
test_plain_set_nested_inplace 37.9720μs 17.4744μs 57.2267 KOps/s 52.2553 KOps/s $\textbf{\color{#35bf28}+9.51\%}$
test_plain_set_stack_nested_inplace 45.0410μs 17.2813μs 57.8659 KOps/s 52.3854 KOps/s $\textbf{\color{#35bf28}+10.46\%}$
test_items 23.4900μs 4.7139μs 212.1371 KOps/s 211.2163 KOps/s $\color{#35bf28}+0.44\%$
test_items_nested 0.4542ms 0.3949ms 2.5322 KOps/s 2.5693 KOps/s $\color{#d91a1a}-1.44\%$
test_items_nested_locked 0.4581ms 0.4004ms 2.4976 KOps/s 2.5179 KOps/s $\color{#d91a1a}-0.81\%$
test_items_nested_leaf 0.1276ms 87.2761μs 11.4579 KOps/s 11.4943 KOps/s $\color{#d91a1a}-0.32\%$
test_items_stack_nested 0.4529ms 0.3944ms 2.5355 KOps/s 2.5452 KOps/s $\color{#d91a1a}-0.38\%$
test_items_stack_nested_leaf 0.1203ms 88.4886μs 11.3009 KOps/s 11.2733 KOps/s $\color{#35bf28}+0.24\%$
test_items_stack_nested_locked 0.4573ms 0.4020ms 2.4874 KOps/s 2.5462 KOps/s $\color{#d91a1a}-2.31\%$
test_keys 21.9100μs 4.4491μs 224.7628 KOps/s 226.7229 KOps/s $\color{#d91a1a}-0.86\%$
test_keys_nested 83.7120μs 67.0394μs 14.9166 KOps/s 14.7062 KOps/s $\color{#35bf28}+1.43\%$
test_keys_nested_locked 1.8916ms 75.0286μs 13.3283 KOps/s 13.4313 KOps/s $\color{#d91a1a}-0.77\%$
test_keys_nested_leaf 77.3810μs 57.5560μs 17.3744 KOps/s 17.5021 KOps/s $\color{#d91a1a}-0.73\%$
test_keys_stack_nested 84.0520μs 67.4978μs 14.8153 KOps/s 15.0656 KOps/s $\color{#d91a1a}-1.66\%$
test_keys_stack_nested_leaf 75.3110μs 57.9153μs 17.2666 KOps/s 17.0262 KOps/s $\color{#35bf28}+1.41\%$
test_keys_stack_nested_locked 93.4210μs 74.6499μs 13.3959 KOps/s 13.4594 KOps/s $\color{#d91a1a}-0.47\%$
test_values 9.5907μs 1.8011μs 555.2276 KOps/s 555.0669 KOps/s $\color{#35bf28}+0.03\%$
test_values_nested 97.7720μs 34.4806μs 29.0018 KOps/s 29.5414 KOps/s $\color{#d91a1a}-1.83\%$
test_values_nested_locked 51.0410μs 35.8828μs 27.8685 KOps/s 27.8277 KOps/s $\color{#35bf28}+0.15\%$
test_values_nested_leaf 50.7110μs 30.3097μs 32.9927 KOps/s 33.0660 KOps/s $\color{#d91a1a}-0.22\%$
test_values_stack_nested 58.1210μs 35.0536μs 28.5277 KOps/s 28.7017 KOps/s $\color{#d91a1a}-0.61\%$
test_values_stack_nested_leaf 46.9910μs 31.5750μs 31.6706 KOps/s 32.8461 KOps/s $\color{#d91a1a}-3.58\%$
test_values_stack_nested_locked 66.4110μs 37.3864μs 26.7477 KOps/s 27.6401 KOps/s $\color{#d91a1a}-3.23\%$
test_membership 2.1212μs 0.5777μs 1.7309 MOps/s 1.7825 MOps/s $\color{#d91a1a}-2.90\%$
test_membership_nested 17.9100μs 2.0491μs 488.0199 KOps/s 486.1990 KOps/s $\color{#35bf28}+0.37\%$
test_membership_nested_leaf 13.1505μs 2.0007μs 499.8188 KOps/s 513.0497 KOps/s $\color{#d91a1a}-2.58\%$
test_membership_stacked_nested 22.7900μs 2.0996μs 476.2768 KOps/s 473.4756 KOps/s $\color{#35bf28}+0.59\%$
test_membership_stacked_nested_leaf 15.2700μs 2.0881μs 478.9039 KOps/s 479.9968 KOps/s $\color{#d91a1a}-0.23\%$
test_membership_nested_last 21.8500μs 3.0348μs 329.5138 KOps/s 335.7276 KOps/s $\color{#d91a1a}-1.85\%$
test_membership_nested_leaf_last 15.0500μs 2.9993μs 333.4155 KOps/s 333.7747 KOps/s $\color{#d91a1a}-0.11\%$
test_membership_stacked_nested_last 25.3300μs 3.8235μs 261.5373 KOps/s 326.2389 KOps/s $\textbf{\color{#d91a1a}-19.83\%}$
test_membership_stacked_nested_leaf_last 15.1700μs 3.7916μs 263.7442 KOps/s 331.4646 KOps/s $\textbf{\color{#d91a1a}-20.43\%}$
test_nested_getleaf 25.8500μs 8.1136μs 123.2495 KOps/s 124.1011 KOps/s $\color{#d91a1a}-0.69\%$
test_nested_get 28.8100μs 7.5317μs 132.7729 KOps/s 132.6260 KOps/s $\color{#35bf28}+0.11\%$
test_stacked_getleaf 25.6900μs 8.0805μs 123.7546 KOps/s 123.7242 KOps/s $\color{#35bf28}+0.02\%$
test_stacked_get 23.5000μs 7.5628μs 132.2268 KOps/s 131.9494 KOps/s $\color{#35bf28}+0.21\%$
test_nested_getitemleaf 22.2410μs 8.3263μs 120.1012 KOps/s 121.8553 KOps/s $\color{#d91a1a}-1.44\%$
test_nested_getitem 24.0600μs 7.6826μs 130.1643 KOps/s 129.3548 KOps/s $\color{#35bf28}+0.63\%$
test_stacked_getitemleaf 24.1310μs 8.1735μs 122.3466 KOps/s 121.2356 KOps/s $\color{#35bf28}+0.92\%$
test_stacked_getitem 24.9000μs 7.6947μs 129.9590 KOps/s 128.4970 KOps/s $\color{#35bf28}+1.14\%$
test_lock_nested 4.3851ms 0.4808ms 2.0800 KOps/s 2.1162 KOps/s $\color{#d91a1a}-1.71\%$
test_lock_stack_nested 0.5020ms 0.4363ms 2.2917 KOps/s 2.3101 KOps/s $\color{#d91a1a}-0.80\%$
test_unlock_nested 0.8696ms 0.4014ms 2.4913 KOps/s 2.5255 KOps/s $\color{#d91a1a}-1.35\%$
test_unlock_stack_nested 0.4254ms 0.3539ms 2.8253 KOps/s 2.8522 KOps/s $\color{#d91a1a}-0.95\%$
test_flatten_speed 0.2139ms 0.1085ms 9.2139 KOps/s 9.3483 KOps/s $\color{#d91a1a}-1.44\%$
test_unflatten_speed 0.3890ms 0.3027ms 3.3033 KOps/s 3.3528 KOps/s $\color{#d91a1a}-1.48\%$
test_common_ops 1.7608ms 1.3041ms 766.8076 Ops/s 755.4838 Ops/s $\color{#35bf28}+1.50\%$
test_creation 16.2310μs 2.0390μs 490.4280 KOps/s 500.3419 KOps/s $\color{#d91a1a}-1.98\%$
test_creation_empty 34.2510μs 15.9307μs 62.7720 KOps/s 52.4190 KOps/s $\textbf{\color{#35bf28}+19.75\%}$
test_creation_nested_1 38.2400μs 17.7328μs 56.3927 KOps/s 48.2570 KOps/s $\textbf{\color{#35bf28}+16.86\%}$
test_creation_nested_2 38.1710μs 20.8432μs 47.9773 KOps/s 42.5916 KOps/s $\textbf{\color{#35bf28}+12.64\%}$
test_clone 53.7310μs 32.3456μs 30.9161 KOps/s 34.1344 KOps/s $\textbf{\color{#d91a1a}-9.43\%}$
test_getitem[int] 1.1253ms 17.1136μs 58.4330 KOps/s 60.4198 KOps/s $\color{#d91a1a}-3.29\%$
test_getitem[slice_int] 0.1488ms 28.5179μs 35.0656 KOps/s 34.8834 KOps/s $\color{#35bf28}+0.52\%$
test_getitem[range] 0.2962ms 0.1139ms 8.7823 KOps/s 8.7827 KOps/s $-0.00\%$
test_getitem[tuple] 0.1531ms 25.5525μs 39.1352 KOps/s 39.1228 KOps/s $\color{#35bf28}+0.03\%$
test_getitem[list] 0.2259ms 0.1026ms 9.7501 KOps/s 9.5003 KOps/s $\color{#35bf28}+2.63\%$
test_setitem_dim[int] 74.2120μs 51.6020μs 19.3791 KOps/s 17.8856 KOps/s $\textbf{\color{#35bf28}+8.35\%}$
test_setitem_dim[slice_int] 0.1273ms 81.9282μs 12.2058 KOps/s 12.8242 KOps/s $\color{#d91a1a}-4.82\%$
test_setitem_dim[range] 0.1727ms 0.1451ms 6.8935 KOps/s 7.0253 KOps/s $\color{#d91a1a}-1.88\%$
test_setitem_dim[tuple] 90.5820μs 73.8534μs 13.5403 KOps/s 13.5004 KOps/s $\color{#35bf28}+0.30\%$
test_setitem 72.2510μs 46.8522μs 21.3437 KOps/s 22.8240 KOps/s $\textbf{\color{#d91a1a}-6.49\%}$
test_set 68.3710μs 45.5662μs 21.9461 KOps/s 23.1938 KOps/s $\textbf{\color{#d91a1a}-5.38\%}$
test_set_shared 0.3999ms 56.0141μs 17.8527 KOps/s 18.6479 KOps/s $\color{#d91a1a}-4.26\%$
test_update 83.2010μs 50.4002μs 19.8412 KOps/s 18.9368 KOps/s $\color{#35bf28}+4.78\%$
test_update_nested 95.6010μs 61.0634μs 16.3764 KOps/s 15.6692 KOps/s $\color{#35bf28}+4.51\%$
test_update__nested 0.1022ms 65.7648μs 15.2057 KOps/s 16.6525 KOps/s $\textbf{\color{#d91a1a}-8.69\%}$
test_set_nested 82.2510μs 48.5883μs 20.5811 KOps/s 21.7755 KOps/s $\textbf{\color{#d91a1a}-5.49\%}$
test_set_nested_new 0.5492ms 52.0438μs 19.2146 KOps/s 19.3044 KOps/s $\color{#d91a1a}-0.47\%$
test_select 0.1079ms 68.2853μs 14.6444 KOps/s 15.3077 KOps/s $\color{#d91a1a}-4.33\%$
test_select_nested 79.6520μs 54.0718μs 18.4939 KOps/s 18.2731 KOps/s $\color{#35bf28}+1.21\%$
test_exclude_nested 0.1060ms 72.3312μs 13.8253 KOps/s 14.0302 KOps/s $\color{#d91a1a}-1.46\%$
test_empty[True] 0.3722ms 0.3030ms 3.3000 KOps/s 3.3181 KOps/s $\color{#d91a1a}-0.54\%$
test_empty[False] 2.5312μs 0.9924μs 1.0077 MOps/s 1.0478 MOps/s $\color{#d91a1a}-3.82\%$
test_to 64.8310μs 38.7991μs 25.7738 KOps/s 26.5540 KOps/s $\color{#d91a1a}-2.94\%$
test_to_nonblocking 48.5810μs 23.7870μs 42.0397 KOps/s 42.3945 KOps/s $\color{#d91a1a}-0.84\%$
test_unbind_speed 0.3826ms 0.3264ms 3.0640 KOps/s 3.3060 KOps/s $\textbf{\color{#d91a1a}-7.32\%}$
test_unbind_speed_stack0 0.3973ms 0.3164ms 3.1607 KOps/s 3.3078 KOps/s $\color{#d91a1a}-4.45\%$
test_unbind_speed_stack1 95.2872ms 0.8771ms 1.1402 KOps/s 1.2525 KOps/s $\textbf{\color{#d91a1a}-8.97\%}$
test_split 2.4179ms 2.0861ms 479.3662 Ops/s 430.8863 Ops/s $\textbf{\color{#35bf28}+11.25\%}$
test_chunk 95.1120ms 2.5063ms 398.9897 Ops/s 429.7260 Ops/s $\textbf{\color{#d91a1a}-7.15\%}$
test_creation[device0] 0.1423ms 0.1030ms 9.7088 KOps/s 9.8276 KOps/s $\color{#d91a1a}-1.21\%$
test_creation_from_tensor 0.1555ms 0.1006ms 9.9413 KOps/s 10.0686 KOps/s $\color{#d91a1a}-1.26\%$
test_add_one[memmap_tensor0] 20.9110μs 8.8569μs 112.9064 KOps/s 115.6682 KOps/s $\color{#d91a1a}-2.39\%$
test_contiguous[memmap_tensor0] 23.6810μs 2.1782μs 459.0896 KOps/s 473.0169 KOps/s $\color{#d91a1a}-2.94\%$
test_stack[memmap_tensor0] 32.5510μs 6.7225μs 148.7537 KOps/s 153.9780 KOps/s $\color{#d91a1a}-3.39\%$
test_memmaptd_index 1.0577ms 0.4180ms 2.3923 KOps/s 2.4417 KOps/s $\color{#d91a1a}-2.02\%$
test_memmaptd_index_astensor 0.9072ms 0.4929ms 2.0290 KOps/s 2.0789 KOps/s $\color{#d91a1a}-2.40\%$
test_memmaptd_index_op 1.4730ms 1.0034ms 996.5980 Ops/s 960.0169 Ops/s $\color{#35bf28}+3.81\%$
test_serialize_model 99.9471ms 95.2779ms 10.4956 Ops/s 10.2492 Ops/s $\color{#35bf28}+2.40\%$
test_serialize_model_pickle 1.3529s 1.2384s 0.8075 Ops/s 0.8059 Ops/s $\color{#35bf28}+0.21\%$
test_serialize_weights 0.1884s 0.1033s 9.6783 Ops/s 10.2960 Ops/s $\textbf{\color{#d91a1a}-6.00\%}$
test_serialize_weights_returnearly 75.3639ms 70.1493ms 14.2553 Ops/s 11.5553 Ops/s $\textbf{\color{#35bf28}+23.37\%}$
test_serialize_weights_pickle 1.4034s 1.2454s 0.8030 Ops/s 0.8085 Ops/s $\color{#d91a1a}-0.68\%$
test_reshape_pytree 64.8910μs 38.2897μs 26.1167 KOps/s 26.2489 KOps/s $\color{#d91a1a}-0.50\%$
test_reshape_td 0.2570ms 43.7454μs 22.8596 KOps/s 22.5761 KOps/s $\color{#35bf28}+1.26\%$
test_view_pytree 60.6910μs 37.7317μs 26.5029 KOps/s 26.4404 KOps/s $\color{#35bf28}+0.24\%$
test_view_td 0.2525ms 53.7250μs 18.6133 KOps/s 19.7699 KOps/s $\textbf{\color{#d91a1a}-5.85\%}$
test_unbind_pytree 68.3010μs 37.0955μs 26.9574 KOps/s 27.6617 KOps/s $\color{#d91a1a}-2.55\%$
test_unbind_td 91.6892ms 52.8417μs 18.9245 KOps/s 21.7569 KOps/s $\textbf{\color{#d91a1a}-13.02\%}$
test_split_pytree 78.9810μs 50.6513μs 19.7428 KOps/s 19.6698 KOps/s $\color{#35bf28}+0.37\%$
test_split_td 0.2648ms 58.9889μs 16.9523 KOps/s 16.6808 KOps/s $\color{#35bf28}+1.63\%$
test_add_pytree 95.3930μs 58.0741μs 17.2194 KOps/s 17.4374 KOps/s $\color{#d91a1a}-1.25\%$
test_add_td 0.3776ms 90.2655μs 11.0784 KOps/s 10.5932 KOps/s $\color{#35bf28}+4.58\%$
test_compile_add_one_nested[tensordict-compile] 0.4127ms 0.2110ms 4.7383 KOps/s 4.9021 KOps/s $\color{#d91a1a}-3.34\%$
test_compile_add_one_nested[tensordict-eager] 0.2740ms 0.1738ms 5.7552 KOps/s 5.7834 KOps/s $\color{#d91a1a}-0.49\%$
test_compile_add_one_nested[pytree-compile] 0.2130ms 0.1430ms 6.9919 KOps/s 7.0930 KOps/s $\color{#d91a1a}-1.43\%$
test_compile_add_one_nested[pytree-eager] 0.2486ms 0.1899ms 5.2671 KOps/s 5.2768 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_copy_nested[tensordict-compile] 42.2100μs 21.6576μs 46.1732 KOps/s 44.6901 KOps/s $\color{#35bf28}+3.32\%$
test_compile_copy_nested[tensordict-eager] 82.3410μs 49.9581μs 20.0168 KOps/s 20.7224 KOps/s $\color{#d91a1a}-3.41\%$
test_compile_copy_nested[pytree-compile] 0.1207ms 74.3966μs 13.4415 KOps/s 13.6327 KOps/s $\color{#d91a1a}-1.40\%$
test_compile_copy_nested[pytree-eager] 92.1710μs 61.1767μs 16.3461 KOps/s 16.5005 KOps/s $\color{#d91a1a}-0.94\%$
test_compile_add_one_flat[tensordict-compile] 0.4201ms 0.3308ms 3.0226 KOps/s 3.1329 KOps/s $\color{#d91a1a}-3.52\%$
test_compile_add_one_flat[tensordict-eager] 0.3150ms 0.2255ms 4.4337 KOps/s 4.4753 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_add_one_flat[tensorclass-compile] 0.1811ms 0.1313ms 7.6137 KOps/s 7.9805 KOps/s $\color{#d91a1a}-4.60\%$
test_compile_add_one_flat[tensorclass-eager] 0.1301ms 63.2368μs 15.8136 KOps/s 15.9363 KOps/s $\color{#d91a1a}-0.77\%$
test_compile_add_one_flat[pytree-compile] 0.3934ms 0.3271ms 3.0567 KOps/s 3.1549 KOps/s $\color{#d91a1a}-3.11\%$
test_compile_add_one_flat[pytree-eager] 0.6748ms 0.6060ms 1.6500 KOps/s 1.6164 KOps/s $\color{#35bf28}+2.08\%$
test_compile_add_self_flat[tensordict-eager] 0.3360ms 0.2753ms 3.6323 KOps/s 3.6553 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_add_self_flat[tensordict-compile] 0.3949ms 0.3301ms 3.0292 KOps/s 3.1406 KOps/s $\color{#d91a1a}-3.55\%$
test_compile_add_self_flat[tensorclass-eager] 0.1755ms 76.4164μs 13.0862 KOps/s 12.9380 KOps/s $\color{#35bf28}+1.15\%$
test_compile_add_self_flat[tensorclass-compile] 0.1883ms 0.1315ms 7.6020 KOps/s 7.9416 KOps/s $\color{#d91a1a}-4.28\%$
test_compile_add_self_flat[pytree-eager] 0.5906ms 0.5272ms 1.8969 KOps/s 1.7707 KOps/s $\textbf{\color{#35bf28}+7.13\%}$
test_compile_add_self_flat[pytree-compile] 0.3908ms 0.3218ms 3.1072 KOps/s 3.1230 KOps/s $\color{#d91a1a}-0.51\%$
test_compile_copy_flat[tensordict-compile] 44.4810μs 18.8745μs 52.9815 KOps/s 55.4555 KOps/s $\color{#d91a1a}-4.46\%$
test_compile_copy_flat[tensordict-eager] 59.8610μs 32.7118μs 30.5700 KOps/s 30.9447 KOps/s $\color{#d91a1a}-1.21\%$
test_compile_copy_flat[pytree-compile] 0.1088ms 76.5091μs 13.0703 KOps/s 13.3911 KOps/s $\color{#d91a1a}-2.40\%$
test_compile_copy_flat[pytree-eager] 84.7210μs 61.2903μs 16.3158 KOps/s 16.4438 KOps/s $\color{#d91a1a}-0.78\%$
test_compile_assign_and_add[tensordict-compile] 2.7824ms 0.9872ms 1.0130 KOps/s 1.0374 KOps/s $\color{#d91a1a}-2.35\%$
test_compile_assign_and_add[tensordict-eager] 3.6383ms 3.2648ms 306.3016 Ops/s 307.2478 Ops/s $\color{#d91a1a}-0.31\%$
test_compile_assign_and_add[pytree-compile] 2.5631ms 0.9253ms 1.0807 KOps/s 1.0787 KOps/s $\color{#35bf28}+0.19\%$
test_compile_assign_and_add[pytree-eager] 3.3847ms 3.1636ms 316.0940 Ops/s 319.4569 Ops/s $\color{#d91a1a}-1.05\%$
test_compile_indexing[tensor-tensordict-compile] 0.1879ms 0.1096ms 9.1261 KOps/s 9.4073 KOps/s $\color{#d91a1a}-2.99\%$
test_compile_indexing[tensor-tensordict-eager] 0.2232ms 63.9160μs 15.6455 KOps/s 15.3719 KOps/s $\color{#35bf28}+1.78\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1717ms 0.1058ms 9.4494 KOps/s 10.1503 KOps/s $\textbf{\color{#d91a1a}-6.91\%}$
test_compile_indexing[tensor-tensorclass-eager] 98.9010μs 43.9990μs 22.7278 KOps/s 21.8880 KOps/s $\color{#35bf28}+3.84\%$
test_compile_indexing[tensor-pytree-compile] 0.1720ms 0.1036ms 9.6538 KOps/s 9.7274 KOps/s $\color{#d91a1a}-0.76\%$
test_compile_indexing[tensor-pytree-eager] 79.8110μs 43.9812μs 22.7370 KOps/s 21.0350 KOps/s $\textbf{\color{#35bf28}+8.09\%}$
test_compile_indexing[slice-tensordict-compile] 0.1908ms 0.1387ms 7.2107 KOps/s 7.3777 KOps/s $\color{#d91a1a}-2.26\%$
test_compile_indexing[slice-tensordict-eager] 0.1862ms 26.3717μs 37.9194 KOps/s 39.1969 KOps/s $\color{#d91a1a}-3.26\%$
test_compile_indexing[slice-tensorclass-compile] 0.1794ms 0.1288ms 7.7666 KOps/s 7.9562 KOps/s $\color{#d91a1a}-2.38\%$
test_compile_indexing[slice-tensorclass-eager] 64.5710μs 22.6478μs 44.1544 KOps/s 44.7776 KOps/s $\color{#d91a1a}-1.39\%$
test_compile_indexing[slice-pytree-compile] 0.2104ms 0.1291ms 7.7462 KOps/s 7.9163 KOps/s $\color{#d91a1a}-2.15\%$
test_compile_indexing[slice-pytree-eager] 50.2810μs 22.5431μs 44.3595 KOps/s 41.9962 KOps/s $\textbf{\color{#35bf28}+5.63\%}$
test_compile_indexing[int-tensordict-compile] 0.1972ms 0.1371ms 7.2923 KOps/s 7.4710 KOps/s $\color{#d91a1a}-2.39\%$
test_compile_indexing[int-tensordict-eager] 0.5220ms 25.4578μs 39.2806 KOps/s 39.6523 KOps/s $\color{#d91a1a}-0.94\%$
test_compile_indexing[int-tensorclass-compile] 0.1809ms 0.1290ms 7.7526 KOps/s 7.9787 KOps/s $\color{#d91a1a}-2.83\%$
test_compile_indexing[int-tensorclass-eager] 55.5710μs 22.5234μs 44.3983 KOps/s 45.5605 KOps/s $\color{#d91a1a}-2.55\%$
test_compile_indexing[int-pytree-compile] 0.1965ms 0.1281ms 7.8060 KOps/s 7.9830 KOps/s $\color{#d91a1a}-2.22\%$
test_compile_indexing[int-pytree-eager] 53.7800μs 22.1152μs 45.2178 KOps/s 45.6170 KOps/s $\color{#d91a1a}-0.88\%$
test_mod_add[eager] 98.4710μs 36.4800μs 27.4123 KOps/s 24.6742 KOps/s $\textbf{\color{#35bf28}+11.10\%}$
test_mod_add[compile] 0.1238ms 68.5807μs 14.5814 KOps/s 14.9464 KOps/s $\color{#d91a1a}-2.44\%$
test_mod_add[compile-overhead] 0.2582ms 0.1460ms 6.8512 KOps/s 6.9089 KOps/s $\color{#d91a1a}-0.84\%$
test_mod_wrap[eager] 0.3487ms 0.2444ms 4.0918 KOps/s 3.8263 KOps/s $\textbf{\color{#35bf28}+6.94\%}$
test_mod_wrap[compile] 0.3540ms 0.2923ms 3.4207 KOps/s 3.2672 KOps/s $\color{#35bf28}+4.70\%$
test_mod_wrap[compile-overhead] 8.4125ms 4.4096ms 226.7754 Ops/s 229.3415 Ops/s $\color{#d91a1a}-1.12\%$
test_mod_wrap_and_backward[eager] 1.5140ms 1.4022ms 713.1805 Ops/s 743.1192 Ops/s $\color{#d91a1a}-4.03\%$
test_mod_wrap_and_backward[compile] 1.6520ms 1.4435ms 692.7837 Ops/s 690.1405 Ops/s $\color{#35bf28}+0.38\%$
test_mod_wrap_and_backward[compile-overhead] 1.4780ms 0.9926ms 1.0075 KOps/s 1.0104 KOps/s $\color{#d91a1a}-0.29\%$
test_seq_add[eager] 0.2204ms 0.1069ms 9.3524 KOps/s 8.9882 KOps/s $\color{#35bf28}+4.05\%$
test_seq_add[compile] 0.1353ms 84.5005μs 11.8342 KOps/s 11.9647 KOps/s $\color{#d91a1a}-1.09\%$
test_seq_add[compile-overhead] 0.1853ms 0.1208ms 8.2752 KOps/s 8.1858 KOps/s $\color{#35bf28}+1.09\%$
test_seq_wrap[eager] 0.5097ms 0.4146ms 2.4123 KOps/s 2.3130 KOps/s $\color{#35bf28}+4.29\%$
test_seq_wrap[compile] 1.5376ms 0.3265ms 3.0628 KOps/s 3.0228 KOps/s $\color{#35bf28}+1.32\%$
test_seq_wrap[compile-overhead] 0.3100s 0.1483s 6.7447 Ops/s 6.7216 Ops/s $\color{#35bf28}+0.34\%$
test_func_call_runtime[False-eager] 0.8236ms 0.7351ms 1.3603 KOps/s 1.4055 KOps/s $\color{#d91a1a}-3.22\%$
test_func_call_runtime[False-compile] 0.8681ms 0.8063ms 1.2403 KOps/s 1.2316 KOps/s $\color{#35bf28}+0.70\%$
test_func_call_runtime[False-compile-overhead] 0.4156ms 0.3674ms 2.7221 KOps/s 2.7123 KOps/s $\color{#35bf28}+0.36\%$
test_func_call_runtime[True-eager] 1.0569ms 0.9681ms 1.0330 KOps/s 1.0298 KOps/s $\color{#35bf28}+0.31\%$
test_func_call_runtime[True-compile] 0.9457ms 0.8422ms 1.1873 KOps/s 1.1777 KOps/s $\color{#35bf28}+0.82\%$
test_func_call_runtime[True-compile-overhead] 0.4689ms 0.4079ms 2.4516 KOps/s 2.4160 KOps/s $\color{#35bf28}+1.48\%$
test_distributed 0.2630ms 70.4561μs 14.1932 KOps/s 13.7128 KOps/s $\color{#35bf28}+3.50\%$
test_tdmodule 83.0410μs 15.0949μs 66.2473 KOps/s 54.6456 KOps/s $\textbf{\color{#35bf28}+21.23\%}$
test_tdmodule_dispatch 50.7410μs 30.8341μs 32.4316 KOps/s 27.1476 KOps/s $\textbf{\color{#35bf28}+19.46\%}$
test_tdseq 31.8400μs 15.7014μs 63.6886 KOps/s 56.0165 KOps/s $\textbf{\color{#35bf28}+13.70\%}$
test_tdseq_dispatch 50.3010μs 33.0111μs 30.2928 KOps/s 25.6177 KOps/s $\textbf{\color{#35bf28}+18.25\%}$
test_instantiation_functorch 2.0684ms 1.9752ms 506.2675 Ops/s 506.5423 Ops/s $\color{#d91a1a}-0.05\%$
test_instantiation_td 2.0188ms 1.3179ms 758.7708 Ops/s 769.3446 Ops/s $\color{#d91a1a}-1.37\%$
test_exec_functorch 0.2772ms 0.2202ms 4.5422 KOps/s 4.5484 KOps/s $\color{#d91a1a}-0.14\%$
test_exec_functional_call 0.2605ms 0.2087ms 4.7908 KOps/s 4.4861 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_exec_td 0.2913ms 0.2089ms 4.7862 KOps/s 4.6087 KOps/s $\color{#35bf28}+3.85\%$
test_exec_td_decorator 0.4960ms 0.2842ms 3.5182 KOps/s 3.3840 KOps/s $\color{#35bf28}+3.96\%$
test_vmap_mlp_speed[True-True] 0.7758ms 0.6420ms 1.5577 KOps/s 1.4706 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_vmap_mlp_speed[True-False] 0.7259ms 0.6395ms 1.5636 KOps/s 1.4864 KOps/s $\textbf{\color{#35bf28}+5.20\%}$
test_vmap_mlp_speed[False-True] 0.6426ms 0.5613ms 1.7817 KOps/s 1.7324 KOps/s $\color{#35bf28}+2.85\%$
test_vmap_mlp_speed[False-False] 0.6448ms 0.5860ms 1.7065 KOps/s 1.6827 KOps/s $\color{#35bf28}+1.42\%$
test_vmap_mlp_speed_decorator[True-True] 1.5539ms 0.7235ms 1.3821 KOps/s 1.3546 KOps/s $\color{#35bf28}+2.03\%$
test_vmap_mlp_speed_decorator[True-False] 0.9000ms 0.7194ms 1.3900 KOps/s 1.3641 KOps/s $\color{#35bf28}+1.89\%$
test_vmap_mlp_speed_decorator[False-True] 0.7820ms 0.6278ms 1.5929 KOps/s 1.5480 KOps/s $\color{#35bf28}+2.90\%$
test_vmap_mlp_speed_decorator[False-False] 0.7484ms 0.6281ms 1.5921 KOps/s 1.5765 KOps/s $\color{#35bf28}+0.99\%$
test_vmap_transformer_speed[True-True] 8.6877ms 8.4784ms 117.9464 Ops/s 116.7469 Ops/s $\color{#35bf28}+1.03\%$
test_vmap_transformer_speed[True-False] 8.7365ms 8.4429ms 118.4426 Ops/s 116.6183 Ops/s $\color{#35bf28}+1.56\%$
test_vmap_transformer_speed[False-True] 8.5760ms 8.3169ms 120.2375 Ops/s 117.5465 Ops/s $\color{#35bf28}+2.29\%$
test_vmap_transformer_speed[False-False] 8.7319ms 8.3458ms 119.8213 Ops/s 118.4971 Ops/s $\color{#35bf28}+1.12\%$
test_vmap_transformer_speed_decorator[True-True] 20.7067ms 20.3157ms 49.2230 Ops/s 48.5215 Ops/s $\color{#35bf28}+1.45\%$
test_vmap_transformer_speed_decorator[True-False] 21.1312ms 20.1782ms 49.5585 Ops/s 48.7671 Ops/s $\color{#35bf28}+1.62\%$
test_vmap_transformer_speed_decorator[False-True] 20.1909ms 19.9431ms 50.1427 Ops/s 49.8228 Ops/s $\color{#35bf28}+0.64\%$
test_vmap_transformer_speed_decorator[False-False] 20.8096ms 19.9837ms 50.0407 Ops/s 49.0351 Ops/s $\color{#35bf28}+2.05\%$
test_to_module_speed[True] 2.7672ms 1.5216ms 657.2084 Ops/s 658.7015 Ops/s $\color{#d91a1a}-0.23\%$
test_to_module_speed[False] 2.0350ms 1.5052ms 664.3767 Ops/s 663.5525 Ops/s $\color{#35bf28}+0.12\%$
test_tc_init 61.4110μs 35.9707μs 27.8004 KOps/s 25.9630 KOps/s $\textbf{\color{#35bf28}+7.08\%}$
test_tc_init_nested 94.8810μs 71.8255μs 13.9226 KOps/s 12.3680 KOps/s $\textbf{\color{#35bf28}+12.57\%}$
test_tc_first_layer_tensor 19.1510μs 3.9884μs 250.7253 KOps/s 247.6769 KOps/s $\color{#35bf28}+1.23\%$
test_tc_first_layer_nontensor 27.7500μs 4.0210μs 248.6914 KOps/s 246.2047 KOps/s $\color{#35bf28}+1.01\%$
test_tc_second_layer_tensor 33.7380μs 1.3011μs 768.5742 KOps/s 760.1912 KOps/s $\color{#35bf28}+1.10\%$
test_tc_second_layer_nontensor 21.5910μs 4.6742μs 213.9399 KOps/s 215.8284 KOps/s $\color{#d91a1a}-0.87\%$
test_unbind 0.3253s 12.9979ms 76.9356 Ops/s 71.7861 Ops/s $\textbf{\color{#35bf28}+7.17\%}$
test_full_like 0.6535ms 0.5784ms 1.7288 KOps/s 1.7311 KOps/s $\color{#d91a1a}-0.14\%$
test_zeros_like 0.2596ms 0.1979ms 5.0543 KOps/s 5.0614 KOps/s $\color{#d91a1a}-0.14\%$
test_ones_like 0.2257ms 0.1976ms 5.0597 KOps/s 5.0651 KOps/s $\color{#d91a1a}-0.11\%$
test_clone 0.4467ms 0.4144ms 2.4132 KOps/s 2.4204 KOps/s $\color{#d91a1a}-0.30\%$
test_squeeze 27.5810μs 11.7168μs 85.3477 KOps/s 83.4446 KOps/s $\color{#35bf28}+2.28\%$
test_unsqueeze 0.2495ms 86.0552μs 11.6205 KOps/s 11.7908 KOps/s $\color{#d91a1a}-1.44\%$
test_split 0.4329ms 0.1791ms 5.5835 KOps/s 5.5562 KOps/s $\color{#35bf28}+0.49\%$
test_permute 0.2561ms 0.1911ms 5.2339 KOps/s 5.1307 KOps/s $\color{#35bf28}+2.01\%$
test_stack 1.2633ms 0.9275ms 1.0782 KOps/s 1.0961 KOps/s $\color{#d91a1a}-1.63\%$
test_cat 1.2568ms 1.2314ms 812.0556 Ops/s 812.3090 Ops/s $\color{#d91a1a}-0.03\%$

@vmoens vmoens deleted the fix_expand_to_match_shape branch October 21, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants