Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Jul 31, 2024

No description provided.

Vincent Moens added 2 commits July 31, 2024 18:28
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 31, 2024
@vmoens vmoens added the enhancement New feature or request label Jul 31, 2024
@github-actions
Copy link

github-actions bot commented Jul 31, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}17$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 58.0590μs 21.0843μs 47.4287 KOps/s 47.9913 KOps/s $\color{#d91a1a}-1.17\%$
test_plain_set_stack_nested 88.6730μs 21.3331μs 46.8755 KOps/s 48.0756 KOps/s $\color{#d91a1a}-2.50\%$
test_plain_set_nested_inplace 79.4080μs 23.2443μs 43.0214 KOps/s 44.6614 KOps/s $\color{#d91a1a}-3.67\%$
test_plain_set_stack_nested_inplace 62.0360μs 23.1760μs 43.1481 KOps/s 44.4705 KOps/s $\color{#d91a1a}-2.97\%$
test_items 43.4110μs 2.6461μs 377.9214 KOps/s 364.4548 KOps/s $\color{#35bf28}+3.70\%$
test_items_nested 1.5717ms 0.3385ms 2.9546 KOps/s 2.9464 KOps/s $\color{#35bf28}+0.28\%$
test_items_nested_locked 0.6186ms 0.3372ms 2.9654 KOps/s 2.9635 KOps/s $\color{#35bf28}+0.06\%$
test_items_nested_leaf 0.1490ms 87.2109μs 11.4665 KOps/s 11.4826 KOps/s $\color{#d91a1a}-0.14\%$
test_items_stack_nested 0.6042ms 0.3417ms 2.9269 KOps/s 2.9403 KOps/s $\color{#d91a1a}-0.46\%$
test_items_stack_nested_leaf 0.1478ms 85.8760μs 11.6447 KOps/s 11.0017 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_items_stack_nested_locked 1.2064ms 0.3412ms 2.9307 KOps/s 2.9273 KOps/s $\color{#35bf28}+0.12\%$
test_keys 50.1540μs 3.9089μs 255.8260 KOps/s 256.4702 KOps/s $\color{#d91a1a}-0.25\%$
test_keys_nested 0.2800ms 0.1450ms 6.8955 KOps/s 6.9074 KOps/s $\color{#d91a1a}-0.17\%$
test_keys_nested_locked 0.8359ms 0.1512ms 6.6147 KOps/s 6.7341 KOps/s $\color{#d91a1a}-1.77\%$
test_keys_nested_leaf 0.1993ms 0.1231ms 8.1218 KOps/s 8.0299 KOps/s $\color{#35bf28}+1.14\%$
test_keys_stack_nested 0.2768ms 0.1448ms 6.9060 KOps/s 6.8482 KOps/s $\color{#35bf28}+0.84\%$
test_keys_stack_nested_leaf 0.2225ms 0.1236ms 8.0926 KOps/s 7.9619 KOps/s $\color{#35bf28}+1.64\%$
test_keys_stack_nested_locked 0.2921ms 0.1495ms 6.6911 KOps/s 6.6396 KOps/s $\color{#35bf28}+0.78\%$
test_values 11.3712μs 1.2257μs 815.8537 KOps/s 859.4261 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_values_nested 0.1162ms 50.6507μs 19.7431 KOps/s 19.6861 KOps/s $\color{#35bf28}+0.29\%$
test_values_nested_locked 0.1112ms 50.8308μs 19.6731 KOps/s 19.7803 KOps/s $\color{#d91a1a}-0.54\%$
test_values_nested_leaf 93.3850μs 45.4982μs 21.9789 KOps/s 22.0461 KOps/s $\color{#d91a1a}-0.31\%$
test_values_stack_nested 0.1141ms 51.9221μs 19.2596 KOps/s 19.3331 KOps/s $\color{#d91a1a}-0.38\%$
test_values_stack_nested_leaf 97.2120μs 44.8650μs 22.2891 KOps/s 21.9629 KOps/s $\color{#35bf28}+1.48\%$
test_values_stack_nested_locked 94.4170μs 51.8166μs 19.2988 KOps/s 18.6377 KOps/s $\color{#35bf28}+3.55\%$
test_membership 16.3076μs 0.7930μs 1.2611 MOps/s 1.3497 MOps/s $\textbf{\color{#d91a1a}-6.57\%}$
test_membership_nested 29.3440μs 2.6022μs 384.2911 KOps/s 380.1740 KOps/s $\color{#35bf28}+1.08\%$
test_membership_nested_leaf 45.3340μs 2.6151μs 382.3888 KOps/s 380.5617 KOps/s $\color{#35bf28}+0.48\%$
test_membership_stacked_nested 22.4620μs 2.6077μs 383.4751 KOps/s 378.7848 KOps/s $\color{#35bf28}+1.24\%$
test_membership_stacked_nested_leaf 26.4600μs 2.6065μs 383.6518 KOps/s 376.4040 KOps/s $\color{#35bf28}+1.93\%$
test_membership_nested_last 30.5970μs 3.9489μs 253.2367 KOps/s 246.0784 KOps/s $\color{#35bf28}+2.91\%$
test_membership_nested_leaf_last 45.4650μs 3.9646μs 252.2341 KOps/s 251.7386 KOps/s $\color{#35bf28}+0.20\%$
test_membership_stacked_nested_last 0.1396ms 4.6195μs 216.4731 KOps/s 218.5233 KOps/s $\color{#d91a1a}-0.94\%$
test_membership_stacked_nested_leaf_last 79.0680μs 4.5985μs 217.4628 KOps/s 217.1232 KOps/s $\color{#35bf28}+0.16\%$
test_nested_getleaf 46.5770μs 10.4522μs 95.6733 KOps/s 95.8775 KOps/s $\color{#d91a1a}-0.21\%$
test_nested_get 54.6220μs 9.9535μs 100.4675 KOps/s 101.1553 KOps/s $\color{#d91a1a}-0.68\%$
test_stacked_getleaf 50.7950μs 10.3726μs 96.4075 KOps/s 95.5053 KOps/s $\color{#35bf28}+0.94\%$
test_stacked_get 89.9080μs 9.7543μs 102.5194 KOps/s 102.1813 KOps/s $\color{#35bf28}+0.33\%$
test_nested_getitemleaf 61.2050μs 11.2447μs 88.9306 KOps/s 90.7812 KOps/s $\color{#d91a1a}-2.04\%$
test_nested_getitem 53.3400μs 10.1471μs 98.5501 KOps/s 98.8590 KOps/s $\color{#d91a1a}-0.31\%$
test_stacked_getitemleaf 63.2380μs 10.9318μs 91.4759 KOps/s 90.9935 KOps/s $\color{#35bf28}+0.53\%$
test_stacked_getitem 37.3900μs 9.9744μs 100.2568 KOps/s 98.4906 KOps/s $\color{#35bf28}+1.79\%$
test_lock_nested 78.4442ms 0.5692ms 1.7570 KOps/s 2.0112 KOps/s $\textbf{\color{#d91a1a}-12.64\%}$
test_lock_stack_nested 0.6584ms 0.4509ms 2.2180 KOps/s 2.1723 KOps/s $\color{#35bf28}+2.10\%$
test_unlock_nested 80.7407ms 0.4892ms 2.0442 KOps/s 2.4303 KOps/s $\textbf{\color{#d91a1a}-15.88\%}$
test_unlock_stack_nested 0.7445ms 0.3684ms 2.7143 KOps/s 2.6586 KOps/s $\color{#35bf28}+2.10\%$
test_flatten_speed 0.5161ms 0.1064ms 9.3985 KOps/s 9.4122 KOps/s $\color{#d91a1a}-0.15\%$
test_unflatten_speed 1.1426ms 0.4298ms 2.3264 KOps/s 2.3331 KOps/s $\color{#d91a1a}-0.29\%$
test_common_ops 5.2771ms 1.0607ms 942.8160 Ops/s 974.3137 Ops/s $\color{#d91a1a}-3.23\%$
test_creation 39.3440μs 2.0479μs 488.3168 KOps/s 467.3478 KOps/s $\color{#35bf28}+4.49\%$
test_creation_empty 40.6060μs 17.4044μs 57.4569 KOps/s 63.4365 KOps/s $\textbf{\color{#d91a1a}-9.43\%}$
test_creation_nested_1 1.2316ms 20.7163μs 48.2712 KOps/s 52.6818 KOps/s $\textbf{\color{#d91a1a}-8.37\%}$
test_creation_nested_2 0.1124ms 24.0623μs 41.5587 KOps/s 44.7013 KOps/s $\textbf{\color{#d91a1a}-7.03\%}$
test_clone 92.5930μs 16.2355μs 61.5933 KOps/s 59.5058 KOps/s $\color{#35bf28}+3.51\%$
test_getitem[int] 0.7239ms 16.5318μs 60.4894 KOps/s 60.2933 KOps/s $\color{#35bf28}+0.33\%$
test_getitem[slice_int] 0.1361ms 30.8339μs 32.4318 KOps/s 31.2515 KOps/s $\color{#35bf28}+3.78\%$
test_getitem[range] 0.2576ms 54.9842μs 18.1870 KOps/s 17.4226 KOps/s $\color{#35bf28}+4.39\%$
test_getitem[tuple] 0.1219ms 25.4272μs 39.3279 KOps/s 39.6086 KOps/s $\color{#d91a1a}-0.71\%$
test_getitem[list] 0.2635ms 51.2709μs 19.5042 KOps/s 19.2295 KOps/s $\color{#35bf28}+1.43\%$
test_setitem_dim[int] 81.3320μs 40.7454μs 24.5427 KOps/s 27.3285 KOps/s $\textbf{\color{#d91a1a}-10.19\%}$
test_setitem_dim[slice_int] 0.1147ms 70.7038μs 14.1435 KOps/s 14.5814 KOps/s $\color{#d91a1a}-3.00\%$
test_setitem_dim[range] 0.1597ms 91.5451μs 10.9236 KOps/s 11.1308 KOps/s $\color{#d91a1a}-1.86\%$
test_setitem_dim[tuple] 0.1030ms 57.0345μs 17.5333 KOps/s 18.4505 KOps/s $\color{#d91a1a}-4.97\%$
test_setitem 90.8100μs 28.0036μs 35.7097 KOps/s 36.3850 KOps/s $\color{#d91a1a}-1.86\%$
test_set 79.8390μs 27.3466μs 36.5676 KOps/s 37.3115 KOps/s $\color{#d91a1a}-1.99\%$
test_set_shared 3.8084ms 0.2146ms 4.6596 KOps/s 4.6819 KOps/s $\color{#d91a1a}-0.48\%$
test_update 0.1341ms 33.8357μs 29.5546 KOps/s 31.2083 KOps/s $\textbf{\color{#d91a1a}-5.30\%}$
test_update_nested 0.1283ms 42.5601μs 23.4962 KOps/s 23.8045 KOps/s $\color{#d91a1a}-1.30\%$
test_update__nested 0.1053ms 33.3554μs 29.9802 KOps/s 29.3688 KOps/s $\color{#35bf28}+2.08\%$
test_set_nested 0.1042ms 29.2875μs 34.1443 KOps/s 33.9613 KOps/s $\color{#35bf28}+0.54\%$
test_set_nested_new 91.7420μs 33.9034μs 29.4955 KOps/s 29.4473 KOps/s $\color{#35bf28}+0.16\%$
test_select 0.1497ms 50.2101μs 19.9163 KOps/s 19.6627 KOps/s $\color{#35bf28}+1.29\%$
test_select_nested 0.1315ms 59.3640μs 16.8452 KOps/s 16.8353 KOps/s $\color{#35bf28}+0.06\%$
test_exclude_nested 0.1460ms 77.2187μs 12.9502 KOps/s 12.9872 KOps/s $\color{#d91a1a}-0.28\%$
test_empty[True] 0.4160ms 0.3200ms 3.1247 KOps/s 3.0775 KOps/s $\color{#35bf28}+1.53\%$
test_empty[False] 9.2172μs 1.1585μs 863.1650 KOps/s 843.2558 KOps/s $\color{#35bf28}+2.36\%$
test_unbind_speed 0.4035ms 0.3049ms 3.2803 KOps/s 3.2101 KOps/s $\color{#35bf28}+2.19\%$
test_unbind_speed_stack0 0.6491ms 0.2921ms 3.4229 KOps/s 3.3172 KOps/s $\color{#35bf28}+3.19\%$
test_unbind_speed_stack1 83.7383ms 0.7662ms 1.3052 KOps/s 1.3827 KOps/s $\textbf{\color{#d91a1a}-5.60\%}$
test_split 82.7924ms 2.1137ms 473.1105 Ops/s 465.6519 Ops/s $\color{#35bf28}+1.60\%$
test_chunk 80.5618ms 2.1093ms 474.0806 Ops/s 465.5971 Ops/s $\color{#35bf28}+1.82\%$
test_creation[device0] 0.2257ms 0.1182ms 8.4607 KOps/s 8.1539 KOps/s $\color{#35bf28}+3.76\%$
test_creation_from_tensor 4.5284ms 0.1204ms 8.3079 KOps/s 8.0706 KOps/s $\color{#35bf28}+2.94\%$
test_add_one[memmap_tensor0] 0.2147ms 7.9575μs 125.6684 KOps/s 134.9105 KOps/s $\textbf{\color{#d91a1a}-6.85\%}$
test_contiguous[memmap_tensor0] 20.1980μs 2.0546μs 486.7117 KOps/s 498.7535 KOps/s $\color{#d91a1a}-2.41\%$
test_stack[memmap_tensor0] 42.3190μs 5.9916μs 166.8995 KOps/s 169.8733 KOps/s $\color{#d91a1a}-1.75\%$
test_memmaptd_index 1.1236ms 0.4063ms 2.4612 KOps/s 2.4840 KOps/s $\color{#d91a1a}-0.92\%$
test_memmaptd_index_astensor 1.1100ms 0.4850ms 2.0619 KOps/s 2.0216 KOps/s $\color{#35bf28}+2.00\%$
test_memmaptd_index_op 1.7580ms 1.0211ms 979.3247 Ops/s 1.0262 KOps/s $\color{#d91a1a}-4.57\%$
test_serialize_model 0.1289s 0.1180s 8.4741 Ops/s 7.3074 Ops/s $\textbf{\color{#35bf28}+15.96\%}$
test_serialize_model_pickle 0.4930s 0.3924s 2.5482 Ops/s 2.5311 Ops/s $\color{#35bf28}+0.68\%$
test_serialize_weights 0.1331s 0.1205s 8.2967 Ops/s 8.3310 Ops/s $\color{#d91a1a}-0.41\%$
test_serialize_weights_returnearly 0.1689s 0.1588s 6.2974 Ops/s 6.1401 Ops/s $\color{#35bf28}+2.56\%$
test_serialize_weights_pickle 0.5503s 0.4302s 2.3247 Ops/s 2.4473 Ops/s $\textbf{\color{#d91a1a}-5.01\%}$
test_serialize_weights_filesystem 0.2162s 0.1548s 6.4613 Ops/s 6.5190 Ops/s $\color{#d91a1a}-0.89\%$
test_serialize_model_filesystem 0.1621s 0.1487s 6.7265 Ops/s 6.5967 Ops/s $\color{#35bf28}+1.97\%$
test_reshape_pytree 83.7570μs 39.7638μs 25.1485 KOps/s 24.8584 KOps/s $\color{#35bf28}+1.17\%$
test_reshape_td 86.6120μs 46.0675μs 21.7073 KOps/s 21.0487 KOps/s $\color{#35bf28}+3.13\%$
test_view_pytree 0.1017ms 39.5449μs 25.2877 KOps/s 25.0882 KOps/s $\color{#35bf28}+0.80\%$
test_view_td 0.1114ms 53.5391μs 18.6779 KOps/s 18.4501 KOps/s $\color{#35bf28}+1.24\%$
test_unbind_pytree 99.0250μs 36.7195μs 27.2334 KOps/s 26.9363 KOps/s $\color{#35bf28}+1.10\%$
test_unbind_td 0.4120ms 45.1210μs 22.1626 KOps/s 21.3897 KOps/s $\color{#35bf28}+3.61\%$
test_split_pytree 97.8330μs 40.1225μs 24.9236 KOps/s 25.4263 KOps/s $\color{#d91a1a}-1.98\%$
test_split_td 0.4767ms 57.6525μs 17.3453 KOps/s 16.9231 KOps/s $\color{#35bf28}+2.50\%$
test_add_pytree 0.1061ms 47.1159μs 21.2243 KOps/s 22.2611 KOps/s $\color{#d91a1a}-4.66\%$
test_add_td 0.1504ms 82.9514μs 12.0553 KOps/s 12.7570 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_compile_add_one_nested[tensordict-compile] 0.1952ms 54.2728μs 18.4254 KOps/s 18.3447 KOps/s $\color{#35bf28}+0.44\%$
test_compile_add_one_nested[tensordict-eager] 0.3767ms 0.1870ms 5.3485 KOps/s 5.2652 KOps/s $\color{#35bf28}+1.58\%$
test_compile_add_one_nested[pytree-compile] 0.1142ms 54.2258μs 18.4414 KOps/s 18.4678 KOps/s $\color{#d91a1a}-0.14\%$
test_compile_add_one_nested[pytree-eager] 0.2511ms 0.1452ms 6.8886 KOps/s 7.0446 KOps/s $\color{#d91a1a}-2.21\%$
test_compile_copy_nested[tensordict-compile] 56.9470μs 20.6165μs 48.5048 KOps/s 50.4102 KOps/s $\color{#d91a1a}-3.78\%$
test_compile_copy_nested[tensordict-eager] 0.1386ms 64.2887μs 15.5548 KOps/s 15.7356 KOps/s $\color{#d91a1a}-1.15\%$
test_compile_copy_nested[pytree-compile] 0.1522ms 79.2373μs 12.6203 KOps/s 12.5394 KOps/s $\color{#35bf28}+0.65\%$
test_compile_copy_nested[pytree-eager] 0.1481ms 71.5006μs 13.9859 KOps/s 13.9326 KOps/s $\color{#35bf28}+0.38\%$
test_compile_add_one_flat[tensordict-compile] 0.3639ms 0.1739ms 5.7519 KOps/s 5.6983 KOps/s $\color{#35bf28}+0.94\%$
test_compile_add_one_flat[tensordict-eager] 0.4516ms 0.1955ms 5.1154 KOps/s 5.1912 KOps/s $\color{#d91a1a}-1.46\%$
test_compile_add_one_flat[tensorclass-compile] 98.9050μs 39.6864μs 25.1976 KOps/s 24.8426 KOps/s $\color{#35bf28}+1.43\%$
test_compile_add_one_flat[tensorclass-eager] 0.9085ms 71.0644μs 14.0718 KOps/s 14.5361 KOps/s $\color{#d91a1a}-3.19\%$
test_compile_add_one_flat[pytree-compile] 0.3877ms 0.1737ms 5.7584 KOps/s 5.5909 KOps/s $\color{#35bf28}+2.99\%$
test_compile_add_one_flat[pytree-eager] 0.5814ms 0.2950ms 3.3902 KOps/s 3.4462 KOps/s $\color{#d91a1a}-1.62\%$
test_compile_add_self_flat[tensordict-eager] 0.3626ms 0.2050ms 4.8790 KOps/s 4.8523 KOps/s $\color{#35bf28}+0.55\%$
test_compile_add_self_flat[tensordict-compile] 0.3324ms 0.1776ms 5.6320 KOps/s 5.6460 KOps/s $\color{#d91a1a}-0.25\%$
test_compile_add_self_flat[tensorclass-eager] 0.9320ms 63.5884μs 15.7261 KOps/s 15.6851 KOps/s $\color{#35bf28}+0.26\%$
test_compile_add_self_flat[tensorclass-compile] 0.1007ms 40.9606μs 24.4137 KOps/s 25.6175 KOps/s $\color{#d91a1a}-4.70\%$
test_compile_add_self_flat[pytree-eager] 0.4906ms 0.2376ms 4.2091 KOps/s 4.1944 KOps/s $\color{#35bf28}+0.35\%$
test_compile_add_self_flat[pytree-compile] 0.3881ms 0.1741ms 5.7422 KOps/s 5.7506 KOps/s $\color{#d91a1a}-0.15\%$
test_compile_copy_flat[tensordict-compile] 0.2533ms 0.1081ms 9.2531 KOps/s 9.1389 KOps/s $\color{#35bf28}+1.25\%$
test_compile_copy_flat[tensordict-eager] 0.1258ms 56.2012μs 17.7932 KOps/s 17.8415 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_copy_flat[pytree-compile] 0.1583ms 79.9165μs 12.5131 KOps/s 12.7028 KOps/s $\color{#d91a1a}-1.49\%$
test_compile_copy_flat[pytree-eager] 0.1359ms 70.5240μs 14.1796 KOps/s 14.0889 KOps/s $\color{#35bf28}+0.64\%$
test_compile_assign_and_add[tensordict-compile] 0.4367ms 0.1922ms 5.2021 KOps/s 5.3079 KOps/s $\color{#d91a1a}-1.99\%$
test_compile_assign_and_add[tensordict-eager] 1.8827ms 1.6239ms 615.8014 Ops/s 598.2492 Ops/s $\color{#35bf28}+2.93\%$
test_compile_assign_and_add[pytree-compile] 0.3160ms 0.1890ms 5.2918 KOps/s 5.3379 KOps/s $\color{#d91a1a}-0.87\%$
test_compile_assign_and_add[pytree-eager] 1.2997ms 1.0918ms 915.9417 Ops/s 916.7420 Ops/s $\color{#d91a1a}-0.09\%$
test_compile_assign_and_add_stack[compile] 0.5157ms 0.4132ms 2.4199 KOps/s 2.4009 KOps/s $\color{#35bf28}+0.79\%$
test_compile_assign_and_add_stack[eager] 5.6545ms 3.7021ms 270.1189 Ops/s 275.9683 Ops/s $\color{#d91a1a}-2.12\%$
test_compile_indexing[tensor-tensordict-compile] 83.9870μs 34.1652μs 29.2695 KOps/s 30.3905 KOps/s $\color{#d91a1a}-3.69\%$
test_compile_indexing[tensor-tensordict-eager] 0.9757ms 46.3444μs 21.5776 KOps/s 20.5429 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_compile_indexing[tensor-tensorclass-compile] 92.4530μs 29.1355μs 34.3224 KOps/s 35.0358 KOps/s $\color{#d91a1a}-2.04\%$
test_compile_indexing[tensor-tensorclass-eager] 91.4210μs 30.3498μs 32.9491 KOps/s 32.0296 KOps/s $\color{#35bf28}+2.87\%$
test_compile_indexing[tensor-pytree-compile] 0.2208ms 29.0109μs 34.4698 KOps/s 35.2557 KOps/s $\color{#d91a1a}-2.23\%$
test_compile_indexing[tensor-pytree-eager] 0.1077ms 29.8673μs 33.4815 KOps/s 32.7992 KOps/s $\color{#35bf28}+2.08\%$
test_compile_indexing[slice-tensordict-compile] 0.1759ms 71.7631μs 13.9347 KOps/s 13.7627 KOps/s $\color{#35bf28}+1.25\%$
test_compile_indexing[slice-tensordict-eager] 0.5267ms 27.3369μs 36.5805 KOps/s 35.6258 KOps/s $\color{#35bf28}+2.68\%$
test_compile_indexing[slice-tensorclass-compile] 0.1710ms 66.9813μs 14.9295 KOps/s 14.5944 KOps/s $\color{#35bf28}+2.30\%$
test_compile_indexing[slice-tensorclass-eager] 87.8250μs 24.8227μs 40.2857 KOps/s 41.3602 KOps/s $\color{#d91a1a}-2.60\%$
test_compile_indexing[slice-pytree-compile] 0.1398ms 66.6488μs 15.0040 KOps/s 14.8386 KOps/s $\color{#35bf28}+1.11\%$
test_compile_indexing[slice-pytree-eager] 71.4930μs 24.5142μs 40.7926 KOps/s 41.0568 KOps/s $\color{#d91a1a}-0.64\%$
test_compile_indexing[int-tensordict-compile] 0.1318ms 71.5452μs 13.9772 KOps/s 14.0973 KOps/s $\color{#d91a1a}-0.85\%$
test_compile_indexing[int-tensordict-eager] 0.9299ms 27.2686μs 36.6722 KOps/s 35.2875 KOps/s $\color{#35bf28}+3.92\%$
test_compile_indexing[int-tensorclass-compile] 0.1609ms 67.2788μs 14.8635 KOps/s 14.9730 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_indexing[int-tensorclass-eager] 85.0490μs 23.8908μs 41.8571 KOps/s 41.0659 KOps/s $\color{#35bf28}+1.93\%$
test_compile_indexing[int-pytree-compile] 0.1799ms 67.0196μs 14.9210 KOps/s 14.9478 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_indexing[int-pytree-eager] 0.5365ms 24.5801μs 40.6833 KOps/s 41.5167 KOps/s $\color{#d91a1a}-2.01\%$
test_mod_add[eager] 0.1142ms 24.0563μs 41.5692 KOps/s 43.8954 KOps/s $\textbf{\color{#d91a1a}-5.30\%}$
test_mod_add[compile] 99.7570μs 36.9860μs 27.0373 KOps/s 27.3964 KOps/s $\color{#d91a1a}-1.31\%$
test_mod_add[compile-overhead] 0.1062ms 38.3633μs 26.0666 KOps/s 27.5646 KOps/s $\textbf{\color{#d91a1a}-5.43\%}$
test_mod_wrap[eager] 0.4371ms 0.2065ms 4.8421 KOps/s 4.8579 KOps/s $\color{#d91a1a}-0.32\%$
test_mod_wrap[compile] 1.3405ms 0.2283ms 4.3806 KOps/s 4.3551 KOps/s $\color{#35bf28}+0.59\%$
test_mod_wrap[compile-overhead] 0.3629ms 0.2243ms 4.4590 KOps/s 4.4347 KOps/s $\color{#35bf28}+0.55\%$
test_mod_wrap_and_backward[eager] 12.3245ms 11.0818ms 90.2378 Ops/s 88.2360 Ops/s $\color{#35bf28}+2.27\%$
test_mod_wrap_and_backward[compile] 14.6761ms 11.8700ms 84.2459 Ops/s 86.7197 Ops/s $\color{#d91a1a}-2.85\%$
test_mod_wrap_and_backward[compile-overhead] 15.3573ms 11.8807ms 84.1698 Ops/s 83.4849 Ops/s $\color{#35bf28}+0.82\%$
test_seq_add[eager] 0.2147ms 87.6080μs 11.4145 KOps/s 12.0385 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_seq_add[compile] 0.1501ms 59.5548μs 16.7913 KOps/s 15.6040 KOps/s $\textbf{\color{#35bf28}+7.61\%}$
test_seq_add[compile-overhead] 0.1967ms 58.5299μs 17.0853 KOps/s 16.7018 KOps/s $\color{#35bf28}+2.30\%$
test_seq_wrap[eager] 0.5928ms 0.3727ms 2.6833 KOps/s 2.7474 KOps/s $\color{#d91a1a}-2.33\%$
test_seq_wrap[compile] 0.4670ms 0.2613ms 3.8271 KOps/s 3.7471 KOps/s $\color{#35bf28}+2.13\%$
test_seq_wrap[compile-overhead] 0.4258ms 0.2603ms 3.8413 KOps/s 3.8252 KOps/s $\color{#35bf28}+0.42\%$
test_func_call_runtime[False-eager] 0.6687ms 0.5189ms 1.9270 KOps/s 1.9003 KOps/s $\color{#35bf28}+1.40\%$
test_func_call_runtime[False-compile] 0.7120ms 0.4943ms 2.0230 KOps/s 1.9959 KOps/s $\color{#35bf28}+1.36\%$
test_func_call_runtime[False-compile-overhead] 0.8418ms 0.4968ms 2.0129 KOps/s 2.0141 KOps/s $\color{#d91a1a}-0.06\%$
test_func_call_runtime[True-eager] 1.0427ms 0.7441ms 1.3439 KOps/s 1.3311 KOps/s $\color{#35bf28}+0.96\%$
test_func_call_runtime[True-compile] 0.8362ms 0.5186ms 1.9284 KOps/s 1.9463 KOps/s $\color{#d91a1a}-0.92\%$
test_func_call_runtime[True-compile-overhead] 0.6167ms 0.5162ms 1.9371 KOps/s 1.9388 KOps/s $\color{#d91a1a}-0.09\%$
test_func_call_cm_runtime[False-eager] 1.1259ms 0.5330ms 1.8761 KOps/s 1.9218 KOps/s $\color{#d91a1a}-2.38\%$
test_func_call_cm_runtime[False-compile] 0.6127ms 0.4989ms 2.0046 KOps/s 2.0104 KOps/s $\color{#d91a1a}-0.29\%$
test_func_call_cm_runtime[False-compile-overhead] 0.8912ms 0.5035ms 1.9861 KOps/s 2.0090 KOps/s $\color{#d91a1a}-1.14\%$
test_func_call_cm_runtime[True-eager] 1.1028ms 0.8769ms 1.1403 KOps/s 1.1288 KOps/s $\color{#35bf28}+1.03\%$
test_func_call_cm_runtime[True-compile] 1.1565ms 0.8292ms 1.2060 KOps/s 1.2044 KOps/s $\color{#35bf28}+0.14\%$
test_func_call_cm_runtime[True-compile-overhead] 1.0212ms 0.8374ms 1.1942 KOps/s 1.1965 KOps/s $\color{#d91a1a}-0.19\%$
test_distributed 0.2827ms 0.1323ms 7.5559 KOps/s 7.4349 KOps/s $\color{#35bf28}+1.63\%$
test_tdmodule 75.6010μs 16.6749μs 59.9702 KOps/s 62.4856 KOps/s $\color{#d91a1a}-4.03\%$
test_tdmodule_dispatch 60.9140μs 35.1435μs 28.4548 KOps/s 29.9345 KOps/s $\color{#d91a1a}-4.94\%$
test_tdseq 32.4310μs 18.7669μs 53.2852 KOps/s 52.5481 KOps/s $\color{#35bf28}+1.40\%$
test_tdseq_dispatch 65.6430μs 38.8179μs 25.7613 KOps/s 26.9278 KOps/s $\color{#d91a1a}-4.33\%$
test_instantiation_functorch 2.1956ms 1.6318ms 612.8043 Ops/s 616.9295 Ops/s $\color{#d91a1a}-0.67\%$
test_instantiation_td 1.9677ms 1.1681ms 856.0921 Ops/s 849.6484 Ops/s $\color{#35bf28}+0.76\%$
test_exec_functorch 0.4295ms 0.1817ms 5.5046 KOps/s 5.5756 KOps/s $\color{#d91a1a}-1.27\%$
test_exec_functional_call 0.3626ms 0.1762ms 5.6764 KOps/s 6.0583 KOps/s $\textbf{\color{#d91a1a}-6.30\%}$
test_exec_td 0.2761ms 0.1709ms 5.8512 KOps/s 5.9026 KOps/s $\color{#d91a1a}-0.87\%$
test_exec_td_decorator 0.9011ms 0.2259ms 4.4265 KOps/s 4.5220 KOps/s $\color{#d91a1a}-2.11\%$
test_vmap_mlp_speed[True-True] 0.8196ms 0.5708ms 1.7521 KOps/s 1.7534 KOps/s $\color{#d91a1a}-0.07\%$
test_vmap_mlp_speed[True-False] 1.0101ms 0.5682ms 1.7600 KOps/s 1.7702 KOps/s $\color{#d91a1a}-0.58\%$
test_vmap_mlp_speed[False-True] 0.9641ms 0.4717ms 2.1198 KOps/s 2.1246 KOps/s $\color{#d91a1a}-0.22\%$
test_vmap_mlp_speed[False-False] 0.7085ms 0.4696ms 2.1295 KOps/s 2.1141 KOps/s $\color{#35bf28}+0.73\%$
test_vmap_mlp_speed_decorator[True-True] 1.0500ms 0.6329ms 1.5800 KOps/s 1.6072 KOps/s $\color{#d91a1a}-1.69\%$
test_vmap_mlp_speed_decorator[True-False] 0.9524ms 0.6288ms 1.5903 KOps/s 1.6118 KOps/s $\color{#d91a1a}-1.33\%$
test_vmap_mlp_speed_decorator[False-True] 0.8848ms 0.5217ms 1.9169 KOps/s 1.9321 KOps/s $\color{#d91a1a}-0.78\%$
test_vmap_mlp_speed_decorator[False-False] 0.7551ms 0.5185ms 1.9287 KOps/s 1.9349 KOps/s $\color{#d91a1a}-0.32\%$
test_to_module_speed[True] 2.1593ms 1.3529ms 739.1391 Ops/s 744.8487 Ops/s $\color{#d91a1a}-0.77\%$
test_to_module_speed[False] 1.8615ms 1.3233ms 755.7109 Ops/s 771.6190 Ops/s $\color{#d91a1a}-2.06\%$
test_tc_init 0.1458ms 42.1830μs 23.7062 KOps/s 23.3492 KOps/s $\color{#35bf28}+1.53\%$
test_tc_init_nested 0.1678ms 82.3630μs 12.1414 KOps/s 11.5690 KOps/s $\color{#35bf28}+4.95\%$
test_tc_first_layer_tensor 14.6380μs 1.4782μs 676.5178 KOps/s 657.6365 KOps/s $\color{#35bf28}+2.87\%$
test_tc_first_layer_nontensor 33.4330μs 4.2906μs 233.0658 KOps/s 238.9087 KOps/s $\color{#d91a1a}-2.45\%$
test_tc_second_layer_tensor 44.1030μs 2.7490μs 363.7664 KOps/s 357.1179 KOps/s $\color{#35bf28}+1.86\%$
test_tc_second_layer_nontensor 29.3750μs 5.4999μs 181.8221 KOps/s 181.5727 KOps/s $\color{#35bf28}+0.14\%$
test_unbind 0.4582s 14.2083ms 70.3812 Ops/s 72.7535 Ops/s $\color{#d91a1a}-3.26\%$
test_full_like 9.0036ms 6.9450ms 143.9889 Ops/s 131.2187 Ops/s $\textbf{\color{#35bf28}+9.73\%}$
test_zeros_like 14.0885ms 7.2406ms 138.1106 Ops/s 129.4694 Ops/s $\textbf{\color{#35bf28}+6.67\%}$
test_ones_like 13.3598ms 7.4942ms 133.4366 Ops/s 131.1200 Ops/s $\color{#35bf28}+1.77\%$
test_clone 14.0323ms 8.9320ms 111.9569 Ops/s 105.1750 Ops/s $\textbf{\color{#35bf28}+6.45\%}$
test_squeeze 64.8920μs 13.0912μs 76.3871 KOps/s 75.6388 KOps/s $\color{#35bf28}+0.99\%$
test_unsqueeze 0.1736ms 93.8724μs 10.6528 KOps/s 10.6003 KOps/s $\color{#35bf28}+0.50\%$
test_split 0.5317ms 0.2009ms 4.9786 KOps/s 4.9086 KOps/s $\color{#35bf28}+1.43\%$
test_permute 0.3724ms 0.2259ms 4.4258 KOps/s 4.3941 KOps/s $\color{#35bf28}+0.72\%$
test_stack 32.4888ms 24.5352ms 40.7578 Ops/s 39.4099 Ops/s $\color{#35bf28}+3.42\%$
test_cat 26.8826ms 24.3255ms 41.1091 Ops/s 38.0457 Ops/s $\textbf{\color{#35bf28}+8.05\%}$

@github-actions
Copy link

github-actions bot commented Jul 31, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 225. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1553ms 16.9992μs 58.8264 KOps/s 56.2959 KOps/s $\color{#35bf28}+4.50\%$
test_plain_set_stack_nested 0.1259ms 17.0777μs 58.5560 KOps/s 56.8387 KOps/s $\color{#35bf28}+3.02\%$
test_plain_set_nested_inplace 40.4220μs 18.2114μs 54.9105 KOps/s 53.5776 KOps/s $\color{#35bf28}+2.49\%$
test_plain_set_stack_nested_inplace 60.6530μs 18.2197μs 54.8855 KOps/s 53.7002 KOps/s $\color{#35bf28}+2.21\%$
test_items 26.0510μs 4.7685μs 209.7088 KOps/s 213.2068 KOps/s $\color{#d91a1a}-1.64\%$
test_items_nested 0.3961ms 0.3645ms 2.7432 KOps/s 2.7655 KOps/s $\color{#d91a1a}-0.81\%$
test_items_nested_locked 0.4071ms 0.3696ms 2.7059 KOps/s 2.7530 KOps/s $\color{#d91a1a}-1.71\%$
test_items_nested_leaf 0.1017ms 84.8167μs 11.7901 KOps/s 11.8726 KOps/s $\color{#d91a1a}-0.69\%$
test_items_stack_nested 0.5184ms 0.3665ms 2.7282 KOps/s 2.7607 KOps/s $\color{#d91a1a}-1.18\%$
test_items_stack_nested_leaf 0.2728ms 86.1077μs 11.6134 KOps/s 11.7792 KOps/s $\color{#d91a1a}-1.41\%$
test_items_stack_nested_locked 0.5852ms 0.3687ms 2.7124 KOps/s 2.7507 KOps/s $\color{#d91a1a}-1.39\%$
test_keys 20.6010μs 4.4006μs 227.2427 KOps/s 227.3918 KOps/s $\color{#d91a1a}-0.07\%$
test_keys_nested 92.4750μs 68.6210μs 14.5728 KOps/s 15.0866 KOps/s $\color{#d91a1a}-3.41\%$
test_keys_nested_locked 0.7220ms 73.8511μs 13.5408 KOps/s 13.8373 KOps/s $\color{#d91a1a}-2.14\%$
test_keys_nested_leaf 0.2232ms 58.6030μs 17.0640 KOps/s 17.3628 KOps/s $\color{#d91a1a}-1.72\%$
test_keys_stack_nested 0.2596ms 68.8411μs 14.5262 KOps/s 15.2651 KOps/s $\color{#d91a1a}-4.84\%$
test_keys_stack_nested_leaf 77.0740μs 58.4768μs 17.1008 KOps/s 17.1757 KOps/s $\color{#d91a1a}-0.44\%$
test_keys_stack_nested_locked 0.1043ms 74.3968μs 13.4414 KOps/s 13.8053 KOps/s $\color{#d91a1a}-2.64\%$
test_values 9.0507μs 1.7771μs 562.7133 KOps/s 562.7703 KOps/s $\color{#d91a1a}-0.01\%$
test_values_nested 50.1530μs 34.4658μs 29.0143 KOps/s 29.4510 KOps/s $\color{#d91a1a}-1.48\%$
test_values_nested_locked 67.1340μs 36.8343μs 27.1486 KOps/s 27.7877 KOps/s $\color{#d91a1a}-2.30\%$
test_values_nested_leaf 53.2530μs 30.6024μs 32.6772 KOps/s 33.3530 KOps/s $\color{#d91a1a}-2.03\%$
test_values_stack_nested 61.9730μs 34.5726μs 28.9246 KOps/s 29.1561 KOps/s $\color{#d91a1a}-0.79\%$
test_values_stack_nested_leaf 55.5730μs 30.8128μs 32.4540 KOps/s 33.1074 KOps/s $\color{#d91a1a}-1.97\%$
test_values_stack_nested_locked 0.1816ms 36.6022μs 27.3208 KOps/s 27.6531 KOps/s $\color{#d91a1a}-1.20\%$
test_membership 1.4956μs 0.5573μs 1.7945 MOps/s 1.8499 MOps/s $\color{#d91a1a}-3.00\%$
test_membership_nested 9.4505μs 1.9657μs 508.7220 KOps/s 525.9234 KOps/s $\color{#d91a1a}-3.27\%$
test_membership_nested_leaf 91.5650μs 1.9936μs 501.6082 KOps/s 530.5866 KOps/s $\textbf{\color{#d91a1a}-5.46\%}$
test_membership_stacked_nested 21.0510μs 2.0525μs 487.2059 KOps/s 504.7404 KOps/s $\color{#d91a1a}-3.47\%$
test_membership_stacked_nested_leaf 17.6010μs 2.0265μs 493.4587 KOps/s 509.4435 KOps/s $\color{#d91a1a}-3.14\%$
test_membership_nested_last 22.4310μs 3.0319μs 329.8306 KOps/s 335.2996 KOps/s $\color{#d91a1a}-1.63\%$
test_membership_nested_leaf_last 32.5310μs 3.0173μs 331.4200 KOps/s 344.0814 KOps/s $\color{#d91a1a}-3.68\%$
test_membership_stacked_nested_last 23.0610μs 3.0687μs 325.8680 KOps/s 342.3748 KOps/s $\color{#d91a1a}-4.82\%$
test_membership_stacked_nested_leaf_last 15.4110μs 3.0209μs 331.0223 KOps/s 345.3491 KOps/s $\color{#d91a1a}-4.15\%$
test_nested_getleaf 36.8120μs 8.0390μs 124.3929 KOps/s 125.7260 KOps/s $\color{#d91a1a}-1.06\%$
test_nested_get 0.1881ms 7.5535μs 132.3882 KOps/s 134.7631 KOps/s $\color{#d91a1a}-1.76\%$
test_stacked_getleaf 34.6220μs 8.0396μs 124.3847 KOps/s 125.6954 KOps/s $\color{#d91a1a}-1.04\%$
test_stacked_get 28.9110μs 7.5292μs 132.8161 KOps/s 134.2749 KOps/s $\color{#d91a1a}-1.09\%$
test_nested_getitemleaf 31.9420μs 8.1527μs 122.6592 KOps/s 124.4678 KOps/s $\color{#d91a1a}-1.45\%$
test_nested_getitem 23.1410μs 7.6957μs 129.9432 KOps/s 131.4874 KOps/s $\color{#d91a1a}-1.17\%$
test_stacked_getitemleaf 34.5320μs 8.1731μs 122.3518 KOps/s 123.8245 KOps/s $\color{#d91a1a}-1.19\%$
test_stacked_getitem 30.2210μs 7.7145μs 129.6264 KOps/s 131.5943 KOps/s $\color{#d91a1a}-1.50\%$
test_lock_nested 0.9419ms 0.4638ms 2.1563 KOps/s 2.1537 KOps/s $\color{#35bf28}+0.12\%$
test_lock_stack_nested 0.4619ms 0.4256ms 2.3495 KOps/s 2.3114 KOps/s $\color{#35bf28}+1.65\%$
test_unlock_nested 0.8038ms 0.3846ms 2.5999 KOps/s 2.5971 KOps/s $\color{#35bf28}+0.11\%$
test_unlock_stack_nested 0.5342ms 0.3474ms 2.8788 KOps/s 2.8369 KOps/s $\color{#35bf28}+1.48\%$
test_flatten_speed 92.5102ms 0.1177ms 8.4942 KOps/s 9.5497 KOps/s $\textbf{\color{#d91a1a}-11.05\%}$
test_unflatten_speed 0.3163ms 0.2860ms 3.4964 KOps/s 3.5166 KOps/s $\color{#d91a1a}-0.57\%$
test_common_ops 1.4618ms 1.2690ms 788.0190 Ops/s 784.6715 Ops/s $\color{#35bf28}+0.43\%$
test_creation 16.5110μs 1.6524μs 605.1831 KOps/s 611.2457 KOps/s $\color{#d91a1a}-0.99\%$
test_creation_empty 39.3220μs 17.4091μs 57.4412 KOps/s 54.1720 KOps/s $\textbf{\color{#35bf28}+6.03\%}$
test_creation_nested_1 45.5720μs 19.2427μs 51.9677 KOps/s 49.3091 KOps/s $\textbf{\color{#35bf28}+5.39\%}$
test_creation_nested_2 48.4130μs 21.7967μs 45.8785 KOps/s 44.0048 KOps/s $\color{#35bf28}+4.26\%$
test_clone 0.1840ms 28.7283μs 34.8088 KOps/s 35.5018 KOps/s $\color{#d91a1a}-1.95\%$
test_getitem[int] 1.2037ms 16.6640μs 60.0096 KOps/s 60.1361 KOps/s $\color{#d91a1a}-0.21\%$
test_getitem[slice_int] 0.1558ms 28.6253μs 34.9341 KOps/s 35.6571 KOps/s $\color{#d91a1a}-2.03\%$
test_getitem[range] 0.2479ms 0.1116ms 8.9646 KOps/s 8.9273 KOps/s $\color{#35bf28}+0.42\%$
test_getitem[tuple] 0.1380ms 24.5221μs 40.7795 KOps/s 40.5328 KOps/s $\color{#35bf28}+0.61\%$
test_getitem[list] 0.2489ms 0.1015ms 9.8513 KOps/s 9.7166 KOps/s $\color{#35bf28}+1.39\%$
test_setitem_dim[int] 79.0240μs 54.1968μs 18.4513 KOps/s 18.2542 KOps/s $\color{#35bf28}+1.08\%$
test_setitem_dim[slice_int] 0.1925ms 79.4884μs 12.5804 KOps/s 12.5982 KOps/s $\color{#d91a1a}-0.14\%$
test_setitem_dim[range] 0.2762ms 0.1401ms 7.1366 KOps/s 7.1464 KOps/s $\color{#d91a1a}-0.14\%$
test_setitem_dim[tuple] 90.5650μs 72.0508μs 13.8791 KOps/s 13.9803 KOps/s $\color{#d91a1a}-0.72\%$
test_setitem 0.1930ms 42.1066μs 23.7493 KOps/s 23.8145 KOps/s $\color{#d91a1a}-0.27\%$
test_set 0.1896ms 40.8880μs 24.4571 KOps/s 24.7384 KOps/s $\color{#d91a1a}-1.14\%$
test_set_shared 0.3942ms 52.2502μs 19.1387 KOps/s 19.3455 KOps/s $\color{#d91a1a}-1.07\%$
test_update 0.2702ms 50.8845μs 19.6523 KOps/s 19.6743 KOps/s $\color{#d91a1a}-0.11\%$
test_update_nested 0.2103ms 58.4433μs 17.1106 KOps/s 17.0554 KOps/s $\color{#35bf28}+0.32\%$
test_update__nested 0.2444ms 59.8808μs 16.6998 KOps/s 17.1158 KOps/s $\color{#d91a1a}-2.43\%$
test_set_nested 0.1915ms 43.8823μs 22.7882 KOps/s 23.0259 KOps/s $\color{#d91a1a}-1.03\%$
test_set_nested_new 0.1962ms 47.9693μs 20.8467 KOps/s 21.0812 KOps/s $\color{#d91a1a}-1.11\%$
test_select 0.2395ms 63.1734μs 15.8295 KOps/s 15.7643 KOps/s $\color{#35bf28}+0.41\%$
test_select_nested 73.0540μs 50.6922μs 19.7269 KOps/s 19.6834 KOps/s $\color{#35bf28}+0.22\%$
test_exclude_nested 87.4350μs 69.5241μs 14.3835 KOps/s 14.6610 KOps/s $\color{#d91a1a}-1.89\%$
test_empty[True] 0.4547ms 0.2799ms 3.5731 KOps/s 3.5941 KOps/s $\color{#d91a1a}-0.58\%$
test_empty[False] 2.9241μs 0.9020μs 1.1087 MOps/s 1.1341 MOps/s $\color{#d91a1a}-2.25\%$
test_to 0.1481ms 39.9993μs 25.0005 KOps/s 25.4039 KOps/s $\color{#d91a1a}-1.59\%$
test_to_nonblocking 0.1606ms 26.2473μs 38.0991 KOps/s 40.7819 KOps/s $\textbf{\color{#d91a1a}-6.58\%}$
test_unbind_speed 1.3519ms 0.2994ms 3.3400 KOps/s 3.3039 KOps/s $\color{#35bf28}+1.09\%$
test_unbind_speed_stack0 0.3639ms 0.2930ms 3.4129 KOps/s 3.3341 KOps/s $\color{#35bf28}+2.36\%$
test_unbind_speed_stack1 92.1568ms 0.7619ms 1.3125 KOps/s 1.2826 KOps/s $\color{#35bf28}+2.33\%$
test_split 93.8237ms 2.3132ms 432.3010 Ops/s 431.5393 Ops/s $\color{#35bf28}+0.18\%$
test_chunk 94.9574ms 2.2955ms 435.6422 Ops/s 428.9239 Ops/s $\color{#35bf28}+1.57\%$
test_creation[device0] 0.2395ms 0.1026ms 9.7432 KOps/s 9.7852 KOps/s $\color{#d91a1a}-0.43\%$
test_creation_from_tensor 0.2501ms 0.1003ms 9.9673 KOps/s 10.0039 KOps/s $\color{#d91a1a}-0.37\%$
test_add_one[memmap_tensor0] 0.1118ms 8.6635μs 115.4266 KOps/s 114.9495 KOps/s $\color{#35bf28}+0.42\%$
test_contiguous[memmap_tensor0] 25.0110μs 2.1308μs 469.3126 KOps/s 462.4316 KOps/s $\color{#35bf28}+1.49\%$
test_stack[memmap_tensor0] 34.4910μs 6.6254μs 150.9344 KOps/s 154.1921 KOps/s $\color{#d91a1a}-2.11\%$
test_memmaptd_index 1.3729ms 0.4220ms 2.3695 KOps/s 2.3454 KOps/s $\color{#35bf28}+1.03\%$
test_memmaptd_index_astensor 0.7580ms 0.4830ms 2.0705 KOps/s 2.0865 KOps/s $\color{#d91a1a}-0.76\%$
test_memmaptd_index_op 1.4704ms 1.0327ms 968.3296 Ops/s 952.0848 Ops/s $\color{#35bf28}+1.71\%$
test_serialize_model 92.7067ms 89.2788ms 11.2009 Ops/s 10.9640 Ops/s $\color{#35bf28}+2.16\%$
test_serialize_model_pickle 1.3903s 1.2415s 0.8055 Ops/s 0.8061 Ops/s $\color{#d91a1a}-0.08\%$
test_serialize_weights 0.1817s 96.7022ms 10.3410 Ops/s 9.9268 Ops/s $\color{#35bf28}+4.17\%$
test_serialize_weights_returnearly 0.2700s 66.6818ms 14.9966 Ops/s 15.4423 Ops/s $\color{#d91a1a}-2.89\%$
test_serialize_weights_pickle 1.3516s 1.2367s 0.8086 Ops/s 0.8085 Ops/s $+0.01\%$
test_reshape_pytree 0.1739ms 38.9688μs 25.6615 KOps/s 26.5795 KOps/s $\color{#d91a1a}-3.45\%$
test_reshape_td 0.2015ms 45.4164μs 22.0185 KOps/s 23.5129 KOps/s $\textbf{\color{#d91a1a}-6.36\%}$
test_view_pytree 0.1964ms 37.8871μs 26.3942 KOps/s 27.6302 KOps/s $\color{#d91a1a}-4.47\%$
test_view_td 0.1070ms 49.3662μs 20.2568 KOps/s 19.9437 KOps/s $\color{#35bf28}+1.57\%$
test_unbind_pytree 0.1667ms 35.9360μs 27.8273 KOps/s 26.6066 KOps/s $\color{#35bf28}+4.59\%$
test_unbind_td 0.4431ms 44.1683μs 22.6407 KOps/s 22.8655 KOps/s $\color{#d91a1a}-0.98\%$
test_split_pytree 0.1612ms 49.8130μs 20.0751 KOps/s 20.5672 KOps/s $\color{#d91a1a}-2.39\%$
test_split_td 0.5328ms 59.0927μs 16.9226 KOps/s 14.7040 KOps/s $\textbf{\color{#35bf28}+15.09\%}$
test_add_pytree 0.2549ms 56.9944μs 17.5456 KOps/s 17.6238 KOps/s $\color{#d91a1a}-0.44\%$
test_add_td 0.2406ms 94.3178μs 10.6025 KOps/s 10.8484 KOps/s $\color{#d91a1a}-2.27\%$
test_compile_add_one_nested[tensordict-compile] 0.4139ms 0.2121ms 4.7148 KOps/s 4.7582 KOps/s $\color{#d91a1a}-0.91\%$
test_compile_add_one_nested[tensordict-eager] 0.3156ms 0.1735ms 5.7632 KOps/s 5.8103 KOps/s $\color{#d91a1a}-0.81\%$
test_compile_add_one_nested[pytree-compile] 0.2856ms 0.1459ms 6.8551 KOps/s 6.8560 KOps/s $\color{#d91a1a}-0.01\%$
test_compile_add_one_nested[pytree-eager] 0.3414ms 0.1889ms 5.2937 KOps/s 5.3298 KOps/s $\color{#d91a1a}-0.68\%$
test_compile_copy_nested[tensordict-compile] 0.1411ms 21.7463μs 45.9848 KOps/s 46.2840 KOps/s $\color{#d91a1a}-0.65\%$
test_compile_copy_nested[tensordict-eager] 0.1719ms 47.3040μs 21.1399 KOps/s 21.3965 KOps/s $\color{#d91a1a}-1.20\%$
test_compile_copy_nested[pytree-compile] 0.1840ms 72.4042μs 13.8113 KOps/s 13.8608 KOps/s $\color{#d91a1a}-0.36\%$
test_compile_copy_nested[pytree-eager] 85.8140μs 58.9136μs 16.9740 KOps/s 16.6678 KOps/s $\color{#35bf28}+1.84\%$
test_compile_add_one_flat[tensordict-compile] 0.4676ms 0.3282ms 3.0470 KOps/s 3.0385 KOps/s $\color{#35bf28}+0.28\%$
test_compile_add_one_flat[tensordict-eager] 0.3685ms 0.2247ms 4.4513 KOps/s 4.4772 KOps/s $\color{#d91a1a}-0.58\%$
test_compile_add_one_flat[tensorclass-compile] 0.2804ms 0.1312ms 7.6206 KOps/s 7.6828 KOps/s $\color{#d91a1a}-0.81\%$
test_compile_add_one_flat[tensorclass-eager] 0.2538ms 64.1725μs 15.5830 KOps/s 16.0457 KOps/s $\color{#d91a1a}-2.88\%$
test_compile_add_one_flat[pytree-compile] 0.4117ms 0.3252ms 3.0753 KOps/s 3.0454 KOps/s $\color{#35bf28}+0.98\%$
test_compile_add_one_flat[pytree-eager] 0.7851ms 0.6085ms 1.6435 KOps/s 1.6084 KOps/s $\color{#35bf28}+2.18\%$
test_compile_add_self_flat[tensordict-eager] 0.4375ms 0.2711ms 3.6891 KOps/s 3.7221 KOps/s $\color{#d91a1a}-0.89\%$
test_compile_add_self_flat[tensordict-compile] 0.4679ms 0.3278ms 3.0511 KOps/s 3.0205 KOps/s $\color{#35bf28}+1.01\%$
test_compile_add_self_flat[tensorclass-eager] 0.2382ms 76.5667μs 13.0605 KOps/s 13.6450 KOps/s $\color{#d91a1a}-4.28\%$
test_compile_add_self_flat[tensorclass-compile] 0.2740ms 0.1351ms 7.4009 KOps/s 7.6647 KOps/s $\color{#d91a1a}-3.44\%$
test_compile_add_self_flat[pytree-eager] 0.6815ms 0.5221ms 1.9154 KOps/s 1.9305 KOps/s $\color{#d91a1a}-0.79\%$
test_compile_add_self_flat[pytree-compile] 0.4235ms 0.3274ms 3.0548 KOps/s 3.0494 KOps/s $\color{#35bf28}+0.18\%$
test_compile_copy_flat[tensordict-compile] 0.1658ms 20.2585μs 49.3620 KOps/s 54.3594 KOps/s $\textbf{\color{#d91a1a}-9.19\%}$
test_compile_copy_flat[tensordict-eager] 57.2130μs 32.3836μs 30.8798 KOps/s 31.3040 KOps/s $\color{#d91a1a}-1.35\%$
test_compile_copy_flat[pytree-compile] 0.1690ms 77.1020μs 12.9698 KOps/s 13.0507 KOps/s $\color{#d91a1a}-0.62\%$
test_compile_copy_flat[pytree-eager] 89.8950μs 61.0392μs 16.3829 KOps/s 16.4679 KOps/s $\color{#d91a1a}-0.52\%$
test_compile_assign_and_add[tensordict-compile] 2.4323ms 0.8343ms 1.1986 KOps/s 1.1027 KOps/s $\textbf{\color{#35bf28}+8.69\%}$
test_compile_assign_and_add[tensordict-eager] 3.5006ms 3.2289ms 309.7026 Ops/s 304.2823 Ops/s $\color{#35bf28}+1.78\%$
test_compile_assign_and_add[pytree-compile] 2.4055ms 0.8396ms 1.1910 KOps/s 1.1086 KOps/s $\textbf{\color{#35bf28}+7.43\%}$
test_compile_assign_and_add[pytree-eager] 3.6920ms 3.4309ms 291.4700 Ops/s 312.5240 Ops/s $\textbf{\color{#d91a1a}-6.74\%}$
test_compile_indexing[tensor-tensordict-compile] 0.2235ms 0.1153ms 8.6710 KOps/s 9.0243 KOps/s $\color{#d91a1a}-3.92\%$
test_compile_indexing[tensor-tensordict-eager] 0.2141ms 65.2967μs 15.3147 KOps/s 15.3530 KOps/s $\color{#d91a1a}-0.25\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2536ms 0.1030ms 9.7126 KOps/s 9.4269 KOps/s $\color{#35bf28}+3.03\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1993ms 44.2010μs 22.6239 KOps/s 20.3044 KOps/s $\textbf{\color{#35bf28}+11.42\%}$
test_compile_indexing[tensor-pytree-compile] 0.2690ms 0.1056ms 9.4673 KOps/s 9.3086 KOps/s $\color{#35bf28}+1.70\%$
test_compile_indexing[tensor-pytree-eager] 0.2016ms 47.8265μs 20.9089 KOps/s 20.7144 KOps/s $\color{#35bf28}+0.94\%$
test_compile_indexing[slice-tensordict-compile] 0.2858ms 0.1394ms 7.1755 KOps/s 7.1699 KOps/s $\color{#35bf28}+0.08\%$
test_compile_indexing[slice-tensordict-eager] 0.1806ms 25.2894μs 39.5423 KOps/s 38.8394 KOps/s $\color{#35bf28}+1.81\%$
test_compile_indexing[slice-tensorclass-compile] 0.3356ms 0.1310ms 7.6332 KOps/s 7.5973 KOps/s $\color{#35bf28}+0.47\%$
test_compile_indexing[slice-tensorclass-eager] 52.5930μs 21.7392μs 45.9999 KOps/s 45.4438 KOps/s $\color{#35bf28}+1.22\%$
test_compile_indexing[slice-pytree-compile] 0.2989ms 0.1308ms 7.6430 KOps/s 7.6014 KOps/s $\color{#35bf28}+0.55\%$
test_compile_indexing[slice-pytree-eager] 0.1456ms 21.9089μs 45.6436 KOps/s 45.4982 KOps/s $\color{#35bf28}+0.32\%$
test_compile_indexing[int-tensordict-compile] 0.2923ms 0.1393ms 7.1805 KOps/s 7.0667 KOps/s $\color{#35bf28}+1.61\%$
test_compile_indexing[int-tensordict-eager] 0.4695ms 25.3384μs 39.4657 KOps/s 39.5231 KOps/s $\color{#d91a1a}-0.15\%$
test_compile_indexing[int-tensorclass-compile] 0.3385ms 0.1321ms 7.5723 KOps/s 7.6036 KOps/s $\color{#d91a1a}-0.41\%$
test_compile_indexing[int-tensorclass-eager] 41.1120μs 21.9966μs 45.4615 KOps/s 45.6354 KOps/s $\color{#d91a1a}-0.38\%$
test_compile_indexing[int-pytree-compile] 0.2805ms 0.1312ms 7.6213 KOps/s 7.6130 KOps/s $\color{#35bf28}+0.11\%$
test_compile_indexing[int-pytree-eager] 92.9850μs 21.9573μs 45.5430 KOps/s 45.0898 KOps/s $\color{#35bf28}+1.01\%$
test_mod_add[eager] 0.1863ms 39.5681μs 25.2729 KOps/s 25.8939 KOps/s $\color{#d91a1a}-2.40\%$
test_mod_add[compile] 0.1963ms 68.1463μs 14.6743 KOps/s 14.8195 KOps/s $\color{#d91a1a}-0.98\%$
test_mod_add[compile-overhead] 0.2610ms 0.1349ms 7.4117 KOps/s 6.8109 KOps/s $\textbf{\color{#35bf28}+8.82\%}$
test_mod_wrap[eager] 0.4224ms 0.2633ms 3.7980 KOps/s 3.9165 KOps/s $\color{#d91a1a}-3.02\%$
test_mod_wrap[compile] 1.0630ms 0.2879ms 3.4729 KOps/s 3.4060 KOps/s $\color{#35bf28}+1.96\%$
test_mod_wrap[compile-overhead] 8.3497ms 4.4267ms 225.9005 Ops/s 225.6996 Ops/s $\color{#35bf28}+0.09\%$
test_mod_wrap_and_backward[eager] 1.9545ms 1.4647ms 682.7373 Ops/s 683.3663 Ops/s $\color{#d91a1a}-0.09\%$
test_mod_wrap_and_backward[compile] 1.6016ms 1.4232ms 702.6632 Ops/s 755.1333 Ops/s $\textbf{\color{#d91a1a}-6.95\%}$
test_mod_wrap_and_backward[compile-overhead] 1.4759ms 1.0078ms 992.2795 Ops/s 1.1057 KOps/s $\textbf{\color{#d91a1a}-10.26\%}$
test_seq_add[eager] 0.2576ms 0.1077ms 9.2819 KOps/s 8.8645 KOps/s $\color{#35bf28}+4.71\%$
test_seq_add[compile] 0.2292ms 83.8741μs 11.9226 KOps/s 11.9731 KOps/s $\color{#d91a1a}-0.42\%$
test_seq_add[compile-overhead] 0.2676ms 0.1207ms 8.2859 KOps/s 8.0913 KOps/s $\color{#35bf28}+2.41\%$
test_seq_wrap[eager] 0.5768ms 0.4188ms 2.3877 KOps/s 2.2508 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_seq_wrap[compile] 0.5153ms 0.3201ms 3.1241 KOps/s 3.0770 KOps/s $\color{#35bf28}+1.53\%$
test_seq_wrap[compile-overhead] 0.1967s 91.0988ms 10.9771 Ops/s 7.7843 Ops/s $\textbf{\color{#35bf28}+41.02\%}$
test_func_call_runtime[False-eager] 0.9031ms 0.7381ms 1.3549 KOps/s 1.3061 KOps/s $\color{#35bf28}+3.74\%$
test_func_call_runtime[False-compile] 0.9690ms 0.7874ms 1.2700 KOps/s 1.2476 KOps/s $\color{#35bf28}+1.80\%$
test_func_call_runtime[False-compile-overhead] 0.5239ms 0.3654ms 2.7364 KOps/s 2.7199 KOps/s $\color{#35bf28}+0.61\%$
test_func_call_runtime[True-eager] 1.1085ms 0.9338ms 1.0708 KOps/s 1.0465 KOps/s $\color{#35bf28}+2.33\%$
test_func_call_runtime[True-compile] 0.9873ms 0.8363ms 1.1957 KOps/s 1.1953 KOps/s $\color{#35bf28}+0.03\%$
test_func_call_runtime[True-compile-overhead] 0.5744ms 0.4139ms 2.4163 KOps/s 2.4212 KOps/s $\color{#d91a1a}-0.20\%$
test_func_call_cm_runtime[False-eager] 0.9634ms 0.7769ms 1.2872 KOps/s 1.3215 KOps/s $\color{#d91a1a}-2.59\%$
test_func_call_cm_runtime[False-compile] 0.9985ms 0.7886ms 1.2681 KOps/s 1.2538 KOps/s $\color{#35bf28}+1.14\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5187ms 0.3672ms 2.7233 KOps/s 2.7202 KOps/s $\color{#35bf28}+0.11\%$
test_func_call_cm_runtime[True-eager] 1.2491ms 1.0443ms 957.5923 Ops/s 933.2524 Ops/s $\color{#35bf28}+2.61\%$
test_func_call_cm_runtime[True-compile] 1.1533ms 1.0166ms 983.6451 Ops/s 961.1367 Ops/s $\color{#35bf28}+2.34\%$
test_func_call_cm_runtime[True-compile-overhead] 1.2135ms 1.0199ms 980.5182 Ops/s 963.3932 Ops/s $\color{#35bf28}+1.78\%$
test_distributed 5.1683ms 72.7157μs 13.7522 KOps/s 13.3315 KOps/s $\color{#35bf28}+3.16\%$
test_tdmodule 0.1720ms 16.4622μs 60.7451 KOps/s 60.9420 KOps/s $\color{#d91a1a}-0.32\%$
test_tdmodule_dispatch 49.3420μs 32.7799μs 30.5065 KOps/s 29.7049 KOps/s $\color{#35bf28}+2.70\%$
test_tdseq 32.5220μs 17.0121μs 58.7818 KOps/s 53.9310 KOps/s $\textbf{\color{#35bf28}+8.99\%}$
test_tdseq_dispatch 52.6930μs 34.9984μs 28.5727 KOps/s 27.2638 KOps/s $\color{#35bf28}+4.80\%$
test_instantiation_functorch 2.1811ms 1.9822ms 504.4913 Ops/s 505.4560 Ops/s $\color{#d91a1a}-0.19\%$
test_instantiation_td 1.9857ms 1.2952ms 772.0649 Ops/s 774.1469 Ops/s $\color{#d91a1a}-0.27\%$
test_exec_functorch 0.3736ms 0.2257ms 4.4301 KOps/s 4.5468 KOps/s $\color{#d91a1a}-2.57\%$
test_exec_functional_call 0.4474ms 0.2252ms 4.4413 KOps/s 4.6617 KOps/s $\color{#d91a1a}-4.73\%$
test_exec_td 0.4236ms 0.2339ms 4.2748 KOps/s 4.4966 KOps/s $\color{#d91a1a}-4.93\%$
test_exec_td_decorator 1.0144ms 0.2871ms 3.4831 KOps/s 3.6596 KOps/s $\color{#d91a1a}-4.82\%$
test_vmap_mlp_speed[True-True] 0.8325ms 0.6818ms 1.4668 KOps/s 1.4724 KOps/s $\color{#d91a1a}-0.38\%$
test_vmap_mlp_speed[True-False] 0.9533ms 0.6758ms 1.4798 KOps/s 1.5386 KOps/s $\color{#d91a1a}-3.82\%$
test_vmap_mlp_speed[False-True] 0.7986ms 0.5935ms 1.6850 KOps/s 1.7564 KOps/s $\color{#d91a1a}-4.06\%$
test_vmap_mlp_speed[False-False] 0.8220ms 0.5930ms 1.6863 KOps/s 1.7605 KOps/s $\color{#d91a1a}-4.22\%$
test_vmap_mlp_speed_decorator[True-True] 1.3768ms 0.7056ms 1.4173 KOps/s 1.4234 KOps/s $\color{#d91a1a}-0.43\%$
test_vmap_mlp_speed_decorator[True-False] 0.8433ms 0.6944ms 1.4401 KOps/s 1.4222 KOps/s $\color{#35bf28}+1.26\%$
test_vmap_mlp_speed_decorator[False-True] 0.8365ms 0.6196ms 1.6140 KOps/s 1.6391 KOps/s $\color{#d91a1a}-1.53\%$
test_vmap_mlp_speed_decorator[False-False] 0.7777ms 0.6174ms 1.6198 KOps/s 1.6333 KOps/s $\color{#d91a1a}-0.83\%$
test_vmap_transformer_speed[True-True] 9.4901ms 8.9409ms 111.8452 Ops/s 114.1035 Ops/s $\color{#d91a1a}-1.98\%$
test_vmap_transformer_speed[True-False] 9.5215ms 8.9211ms 112.0940 Ops/s 115.4913 Ops/s $\color{#d91a1a}-2.94\%$
test_vmap_transformer_speed[False-True] 9.1354ms 8.7229ms 114.6414 Ops/s 115.4754 Ops/s $\color{#d91a1a}-0.72\%$
test_vmap_transformer_speed[False-False] 9.4335ms 8.7745ms 113.9660 Ops/s 116.0665 Ops/s $\color{#d91a1a}-1.81\%$
test_vmap_transformer_speed_decorator[True-True] 21.8675ms 21.1992ms 47.1717 Ops/s 48.7607 Ops/s $\color{#d91a1a}-3.26\%$
test_vmap_transformer_speed_decorator[True-False] 21.7233ms 20.7622ms 48.1646 Ops/s 48.8496 Ops/s $\color{#d91a1a}-1.40\%$
test_vmap_transformer_speed_decorator[False-True] 21.6200ms 20.6141ms 48.5105 Ops/s 49.2059 Ops/s $\color{#d91a1a}-1.41\%$
test_vmap_transformer_speed_decorator[False-False] 21.5737ms 20.6206ms 48.4951 Ops/s 49.1848 Ops/s $\color{#d91a1a}-1.40\%$
test_to_module_speed[True] 1.7796ms 1.1455ms 873.0127 Ops/s 883.4707 Ops/s $\color{#d91a1a}-1.18\%$
test_to_module_speed[False] 1.6228ms 1.1207ms 892.3380 Ops/s 906.0004 Ops/s $\color{#d91a1a}-1.51\%$
test_tc_init 56.6230μs 39.1065μs 25.5712 KOps/s 24.7523 KOps/s $\color{#35bf28}+3.31\%$
test_tc_init_nested 0.1852ms 76.4264μs 13.0845 KOps/s 12.2530 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_tc_first_layer_tensor 14.7708μs 0.8058μs 1.2410 MOps/s 1.2306 MOps/s $\color{#35bf28}+0.85\%$
test_tc_first_layer_nontensor 16.4300μs 2.5899μs 386.1174 KOps/s 392.9700 KOps/s $\color{#d91a1a}-1.74\%$
test_tc_second_layer_tensor 41.9890μs 1.6432μs 608.5707 KOps/s 621.8882 KOps/s $\color{#d91a1a}-2.14\%$
test_tc_second_layer_nontensor 25.2710μs 3.4415μs 290.5722 KOps/s 297.7467 KOps/s $\color{#d91a1a}-2.41\%$
test_unbind 0.1889s 10.5381ms 94.8934 Ops/s 62.7585 Ops/s $\textbf{\color{#35bf28}+51.20\%}$
test_full_like 0.7569ms 0.5820ms 1.7183 KOps/s 1.7197 KOps/s $\color{#d91a1a}-0.08\%$
test_zeros_like 0.3363ms 0.1978ms 5.0550 KOps/s 5.0514 KOps/s $\color{#35bf28}+0.07\%$
test_ones_like 0.3504ms 0.1979ms 5.0543 KOps/s 5.0579 KOps/s $\color{#d91a1a}-0.07\%$
test_clone 0.5698ms 0.4154ms 2.4074 KOps/s 2.4118 KOps/s $\color{#d91a1a}-0.19\%$
test_squeeze 27.7920μs 11.0573μs 90.4377 KOps/s 91.5550 KOps/s $\color{#d91a1a}-1.22\%$
test_unsqueeze 0.2464ms 78.1786μs 12.7912 KOps/s 12.9036 KOps/s $\color{#d91a1a}-0.87\%$
test_split 0.4364ms 0.1703ms 5.8730 KOps/s 5.8556 KOps/s $\color{#35bf28}+0.30\%$
test_permute 0.2811ms 0.1843ms 5.4247 KOps/s 5.5108 KOps/s $\color{#d91a1a}-1.56\%$
test_stack 1.3024ms 0.9155ms 1.0923 KOps/s 1.0816 KOps/s $\color{#35bf28}+0.98\%$
test_cat 1.3671ms 1.2322ms 811.5540 Ops/s 811.7386 Ops/s $\color{#d91a1a}-0.02\%$

@vmoens vmoens merged commit 2e32dda into main Aug 1, 2024
@vmoens vmoens deleted the from-struct-array branch August 1, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants