Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Oct 21, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 21, 2024
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}24$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 42.5900μs 24.4561μs 40.8897 KOps/s 42.8171 KOps/s $\color{#d91a1a}-4.50\%$
test_plain_set_stack_nested 65.4830μs 24.6421μs 40.5809 KOps/s 40.5337 KOps/s $\color{#35bf28}+0.12\%$
test_plain_set_nested_inplace 72.4370μs 26.7761μs 37.3468 KOps/s 39.0084 KOps/s $\color{#d91a1a}-4.26\%$
test_plain_set_stack_nested_inplace 81.4440μs 26.8371μs 37.2618 KOps/s 39.4321 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_items 35.0240μs 4.1009μs 243.8460 KOps/s 239.3520 KOps/s $\color{#35bf28}+1.88\%$
test_items_nested 0.4893ms 0.3781ms 2.6447 KOps/s 2.6298 KOps/s $\color{#35bf28}+0.57\%$
test_items_nested_locked 0.8040ms 0.3821ms 2.6169 KOps/s 2.6041 KOps/s $\color{#35bf28}+0.49\%$
test_items_nested_leaf 0.1436ms 82.9542μs 12.0549 KOps/s 12.3944 KOps/s $\color{#d91a1a}-2.74\%$
test_items_stack_nested 0.6279ms 0.3956ms 2.5275 KOps/s 2.6319 KOps/s $\color{#d91a1a}-3.97\%$
test_items_stack_nested_leaf 0.1631ms 85.3209μs 11.7205 KOps/s 12.2121 KOps/s $\color{#d91a1a}-4.03\%$
test_items_stack_nested_locked 0.7605ms 0.3814ms 2.6219 KOps/s 2.6177 KOps/s $\color{#35bf28}+0.16\%$
test_keys 52.8000μs 3.5993μs 277.8296 KOps/s 275.7495 KOps/s $\color{#35bf28}+0.75\%$
test_keys_nested 0.2193ms 0.1340ms 7.4629 KOps/s 7.4414 KOps/s $\color{#35bf28}+0.29\%$
test_keys_nested_locked 0.7485ms 0.1390ms 7.1958 KOps/s 7.1826 KOps/s $\color{#35bf28}+0.18\%$
test_keys_nested_leaf 0.2212ms 0.1175ms 8.5079 KOps/s 8.4944 KOps/s $\color{#35bf28}+0.16\%$
test_keys_stack_nested 0.2100ms 0.1349ms 7.4124 KOps/s 7.4596 KOps/s $\color{#d91a1a}-0.63\%$
test_keys_stack_nested_leaf 0.1957ms 0.1174ms 8.5167 KOps/s 8.4816 KOps/s $\color{#35bf28}+0.41\%$
test_keys_stack_nested_locked 0.2473ms 0.1399ms 7.1492 KOps/s 7.1209 KOps/s $\color{#35bf28}+0.40\%$
test_values 8.9192μs 1.0733μs 931.6764 KOps/s 962.2688 KOps/s $\color{#d91a1a}-3.18\%$
test_values_nested 0.3854ms 96.0620μs 10.4099 KOps/s 10.6789 KOps/s $\color{#d91a1a}-2.52\%$
test_values_nested_locked 0.1606ms 94.0210μs 10.6359 KOps/s 10.7176 KOps/s $\color{#d91a1a}-0.76\%$
test_values_nested_leaf 0.1300ms 79.5932μs 12.5639 KOps/s 12.5556 KOps/s $\color{#35bf28}+0.07\%$
test_values_stack_nested 0.1757ms 93.5198μs 10.6929 KOps/s 10.6405 KOps/s $\color{#35bf28}+0.49\%$
test_values_stack_nested_leaf 0.1526ms 80.7109μs 12.3899 KOps/s 12.6173 KOps/s $\color{#d91a1a}-1.80\%$
test_values_stack_nested_locked 0.1640ms 93.6558μs 10.6774 KOps/s 11.0359 KOps/s $\color{#d91a1a}-3.25\%$
test_membership 25.7180μs 0.9298μs 1.0754 MOps/s 1.1186 MOps/s $\color{#d91a1a}-3.86\%$
test_membership_nested 0.1058ms 2.7866μs 358.8639 KOps/s 368.8002 KOps/s $\color{#d91a1a}-2.69\%$
test_membership_nested_leaf 40.9670μs 2.7786μs 359.8952 KOps/s 364.2193 KOps/s $\color{#d91a1a}-1.19\%$
test_membership_stacked_nested 21.7310μs 2.7460μs 364.1725 KOps/s 374.4334 KOps/s $\color{#d91a1a}-2.74\%$
test_membership_stacked_nested_leaf 49.0620μs 2.7325μs 365.9632 KOps/s 370.8719 KOps/s $\color{#d91a1a}-1.32\%$
test_membership_nested_last 28.4530μs 4.2268μs 236.5837 KOps/s 241.0045 KOps/s $\color{#d91a1a}-1.83\%$
test_membership_nested_leaf_last 27.2920μs 4.2419μs 235.7442 KOps/s 245.5189 KOps/s $\color{#d91a1a}-3.98\%$
test_membership_stacked_nested_last 33.1120μs 4.1700μs 239.8064 KOps/s 241.8710 KOps/s $\color{#d91a1a}-0.85\%$
test_membership_stacked_nested_leaf_last 36.5890μs 4.1582μs 240.4890 KOps/s 240.9103 KOps/s $\color{#d91a1a}-0.17\%$
test_nested_getleaf 37.3300μs 10.4940μs 95.2922 KOps/s 96.8405 KOps/s $\color{#d91a1a}-1.60\%$
test_nested_get 49.6420μs 9.8950μs 101.0609 KOps/s 101.9298 KOps/s $\color{#d91a1a}-0.85\%$
test_stacked_getleaf 51.8280μs 10.4457μs 95.7334 KOps/s 96.1285 KOps/s $\color{#d91a1a}-0.41\%$
test_stacked_get 60.8750μs 9.9880μs 100.1201 KOps/s 100.9122 KOps/s $\color{#d91a1a}-0.78\%$
test_nested_getitemleaf 57.3180μs 10.9651μs 91.1982 KOps/s 91.2852 KOps/s $\color{#d91a1a}-0.10\%$
test_nested_getitem 61.2970μs 10.2684μs 97.3861 KOps/s 96.8943 KOps/s $\color{#35bf28}+0.51\%$
test_stacked_getitemleaf 57.1460μs 10.8268μs 92.3637 KOps/s 91.6506 KOps/s $\color{#35bf28}+0.78\%$
test_stacked_getitem 44.1530μs 10.4235μs 95.9368 KOps/s 98.5099 KOps/s $\color{#d91a1a}-2.61\%$
test_lock_nested 0.9593ms 0.5098ms 1.9614 KOps/s 1.9632 KOps/s $\color{#d91a1a}-0.09\%$
test_lock_stack_nested 0.8651ms 0.4778ms 2.0931 KOps/s 2.1103 KOps/s $\color{#d91a1a}-0.82\%$
test_unlock_nested 0.1057s 0.5259ms 1.9014 KOps/s 2.3359 KOps/s $\textbf{\color{#d91a1a}-18.60\%}$
test_unlock_stack_nested 0.6030ms 0.3897ms 2.5659 KOps/s 2.5825 KOps/s $\color{#d91a1a}-0.64\%$
test_flatten_speed 0.2058ms 0.1032ms 9.6890 KOps/s 10.0188 KOps/s $\color{#d91a1a}-3.29\%$
test_unflatten_speed 0.7980ms 0.5233ms 1.9111 KOps/s 2.0021 KOps/s $\color{#d91a1a}-4.54\%$
test_common_ops 8.5131ms 1.1684ms 855.8707 Ops/s 908.1772 Ops/s $\textbf{\color{#d91a1a}-5.76\%}$
test_creation 39.5240μs 2.2675μs 441.0199 KOps/s 484.2100 KOps/s $\textbf{\color{#d91a1a}-8.92\%}$
test_creation_empty 54.3430μs 18.3657μs 54.4492 KOps/s 58.7797 KOps/s $\textbf{\color{#d91a1a}-7.37\%}$
test_creation_nested_1 77.6460μs 22.0017μs 45.4510 KOps/s 49.3594 KOps/s $\textbf{\color{#d91a1a}-7.92\%}$
test_creation_nested_2 0.1044ms 26.2226μs 38.1350 KOps/s 40.7389 KOps/s $\textbf{\color{#d91a1a}-6.39\%}$
test_clone 1.3674ms 17.7369μs 56.3797 KOps/s 59.0921 KOps/s $\color{#d91a1a}-4.59\%$
test_getitem[int] 0.8059ms 16.9945μs 58.8427 KOps/s 60.0477 KOps/s $\color{#d91a1a}-2.01\%$
test_getitem[slice_int] 0.1619ms 31.9713μs 31.2781 KOps/s 32.5226 KOps/s $\color{#d91a1a}-3.83\%$
test_getitem[range] 0.1870ms 61.1400μs 16.3559 KOps/s 17.4591 KOps/s $\textbf{\color{#d91a1a}-6.32\%}$
test_getitem[tuple] 0.1484ms 25.5789μs 39.0947 KOps/s 38.8039 KOps/s $\color{#35bf28}+0.75\%$
test_getitem[list] 0.3632ms 56.7090μs 17.6339 KOps/s 18.8241 KOps/s $\textbf{\color{#d91a1a}-6.32\%}$
test_setitem_dim[int] 66.3350μs 33.9011μs 29.4976 KOps/s 30.5253 KOps/s $\color{#d91a1a}-3.37\%$
test_setitem_dim[slice_int] 0.1317ms 63.6132μs 15.7200 KOps/s 16.4546 KOps/s $\color{#d91a1a}-4.46\%$
test_setitem_dim[range] 0.1346ms 86.3587μs 11.5796 KOps/s 11.8069 KOps/s $\color{#d91a1a}-1.92\%$
test_setitem_dim[tuple] 99.5070μs 51.1247μs 19.5600 KOps/s 20.3444 KOps/s $\color{#d91a1a}-3.86\%$
test_setitem 0.2043ms 30.8234μs 32.4429 KOps/s 35.0365 KOps/s $\textbf{\color{#d91a1a}-7.40\%}$
test_set 0.1456ms 29.7383μs 33.6267 KOps/s 36.3843 KOps/s $\textbf{\color{#d91a1a}-7.58\%}$
test_set_shared 3.3017ms 0.2248ms 4.4477 KOps/s 4.5659 KOps/s $\color{#d91a1a}-2.59\%$
test_update 0.1758ms 38.7526μs 25.8047 KOps/s 28.2895 KOps/s $\textbf{\color{#d91a1a}-8.78\%}$
test_update_nested 0.1483ms 49.5313μs 20.1892 KOps/s 21.1725 KOps/s $\color{#d91a1a}-4.64\%$
test_update__nested 0.7236ms 45.9098μs 21.7818 KOps/s 22.1999 KOps/s $\color{#d91a1a}-1.88\%$
test_set_nested 0.1580ms 32.7491μs 30.5352 KOps/s 32.1348 KOps/s $\color{#d91a1a}-4.98\%$
test_set_nested_new 0.1625ms 37.2431μs 26.8506 KOps/s 27.8475 KOps/s $\color{#d91a1a}-3.58\%$
test_select 0.1684ms 56.0645μs 17.8366 KOps/s 18.4131 KOps/s $\color{#d91a1a}-3.13\%$
test_select_nested 0.1223ms 60.7158μs 16.4702 KOps/s 16.7046 KOps/s $\color{#d91a1a}-1.40\%$
test_exclude_nested 0.1479ms 76.7908μs 13.0224 KOps/s 13.3654 KOps/s $\color{#d91a1a}-2.57\%$
test_empty[True] 0.6459ms 0.3534ms 2.8298 KOps/s 2.7324 KOps/s $\color{#35bf28}+3.56\%$
test_empty[False] 11.8422μs 1.3171μs 759.2466 KOps/s 821.7520 KOps/s $\textbf{\color{#d91a1a}-7.61\%}$
test_unbind_speed 0.4562ms 0.2983ms 3.3527 KOps/s 3.3518 KOps/s $\color{#35bf28}+0.03\%$
test_unbind_speed_stack0 0.6716ms 0.3022ms 3.3096 KOps/s 3.3689 KOps/s $\color{#d91a1a}-1.76\%$
test_unbind_speed_stack1 0.1074s 0.8363ms 1.1958 KOps/s 1.3401 KOps/s $\textbf{\color{#d91a1a}-10.77\%}$
test_split 2.2620ms 2.0274ms 493.2429 Ops/s 455.4804 Ops/s $\textbf{\color{#35bf28}+8.29\%}$
test_chunk 0.1042s 2.2559ms 443.2751 Ops/s 459.6286 Ops/s $\color{#d91a1a}-3.56\%$
test_creation[device0] 0.2386ms 0.1180ms 8.4730 KOps/s 8.7263 KOps/s $\color{#d91a1a}-2.90\%$
test_creation_from_tensor 3.4908ms 0.1189ms 8.4137 KOps/s 8.4733 KOps/s $\color{#d91a1a}-0.70\%$
test_add_one[memmap_tensor0] 0.2019ms 7.3321μs 136.3865 KOps/s 145.4262 KOps/s $\textbf{\color{#d91a1a}-6.22\%}$
test_contiguous[memmap_tensor0] 18.6150μs 1.8980μs 526.8761 KOps/s 541.5076 KOps/s $\color{#d91a1a}-2.70\%$
test_stack[memmap_tensor0] 56.4760μs 5.7385μs 174.2608 KOps/s 181.1203 KOps/s $\color{#d91a1a}-3.79\%$
test_memmaptd_index 1.2163ms 0.4137ms 2.4170 KOps/s 2.4323 KOps/s $\color{#d91a1a}-0.63\%$
test_memmaptd_index_astensor 0.1093s 0.5877ms 1.7017 KOps/s 1.9634 KOps/s $\textbf{\color{#d91a1a}-13.33\%}$
test_memmaptd_index_op 1.8487ms 1.0816ms 924.5567 Ops/s 962.1202 Ops/s $\color{#d91a1a}-3.90\%$
test_serialize_model 0.1292s 0.1219s 8.2007 Ops/s 8.3433 Ops/s $\color{#d91a1a}-1.71\%$
test_serialize_model_pickle 0.4728s 0.4003s 2.4983 Ops/s 2.5614 Ops/s $\color{#d91a1a}-2.46\%$
test_serialize_weights 0.1247s 0.1173s 8.5273 Ops/s 8.3866 Ops/s $\color{#35bf28}+1.68\%$
test_serialize_weights_returnearly 0.2682s 0.1729s 5.7824 Ops/s 6.3665 Ops/s $\textbf{\color{#d91a1a}-9.18\%}$
test_serialize_weights_pickle 1.1069s 0.7088s 1.4108 Ops/s 2.5528 Ops/s $\textbf{\color{#d91a1a}-44.74\%}$
test_serialize_weights_filesystem 0.1500s 0.1456s 6.8690 Ops/s 6.9113 Ops/s $\color{#d91a1a}-0.61\%$
test_serialize_model_filesystem 0.1508s 0.1439s 6.9505 Ops/s 5.9533 Ops/s $\textbf{\color{#35bf28}+16.75\%}$
test_reshape_pytree 96.2210μs 39.2404μs 25.4839 KOps/s 25.3481 KOps/s $\color{#35bf28}+0.54\%$
test_reshape_td 0.1162ms 48.2861μs 20.7099 KOps/s 21.1114 KOps/s $\color{#d91a1a}-1.90\%$
test_view_pytree 0.1211ms 39.3603μs 25.4063 KOps/s 25.4864 KOps/s $\color{#d91a1a}-0.31\%$
test_view_td 0.1274ms 54.0816μs 18.4906 KOps/s 18.6981 KOps/s $\color{#d91a1a}-1.11\%$
test_unbind_pytree 86.3330μs 35.8097μs 27.9254 KOps/s 27.5599 KOps/s $\color{#35bf28}+1.33\%$
test_unbind_td 0.3101ms 44.9135μs 22.2650 KOps/s 22.6166 KOps/s $\color{#d91a1a}-1.55\%$
test_split_pytree 0.1096ms 39.2366μs 25.4864 KOps/s 25.9662 KOps/s $\color{#d91a1a}-1.85\%$
test_split_td 0.2145ms 59.4653μs 16.8165 KOps/s 17.7035 KOps/s $\textbf{\color{#d91a1a}-5.01\%}$
test_add_pytree 0.1328ms 46.3078μs 21.5946 KOps/s 21.7196 KOps/s $\color{#d91a1a}-0.58\%$
test_add_td 0.3396ms 89.6104μs 11.1594 KOps/s 11.6327 KOps/s $\color{#d91a1a}-4.07\%$
test_compile_add_one_nested[tensordict-compile] 0.1322ms 74.3442μs 13.4509 KOps/s 13.8354 KOps/s $\color{#d91a1a}-2.78\%$
test_compile_add_one_nested[tensordict-eager] 0.4346ms 0.2088ms 4.7901 KOps/s 4.9500 KOps/s $\color{#d91a1a}-3.23\%$
test_compile_add_one_nested[pytree-compile] 0.1308ms 54.8707μs 18.2247 KOps/s 18.4225 KOps/s $\color{#d91a1a}-1.07\%$
test_compile_add_one_nested[pytree-eager] 0.3476ms 0.1491ms 6.7074 KOps/s 6.7780 KOps/s $\color{#d91a1a}-1.04\%$
test_compile_copy_nested[tensordict-compile] 69.3810μs 28.3027μs 35.3323 KOps/s 35.8718 KOps/s $\color{#d91a1a}-1.50\%$
test_compile_copy_nested[tensordict-eager] 0.1840ms 78.8610μs 12.6805 KOps/s 12.9174 KOps/s $\color{#d91a1a}-1.83\%$
test_compile_copy_nested[pytree-compile] 0.1550ms 80.2513μs 12.4609 KOps/s 12.6476 KOps/s $\color{#d91a1a}-1.48\%$
test_compile_copy_nested[pytree-eager] 0.1225ms 68.5871μs 14.5800 KOps/s 14.6626 KOps/s $\color{#d91a1a}-0.56\%$
test_compile_add_one_flat[tensordict-compile] 0.2693ms 0.1231ms 8.1241 KOps/s 8.1199 KOps/s $\color{#35bf28}+0.05\%$
test_compile_add_one_flat[tensordict-eager] 0.6758ms 0.2509ms 3.9849 KOps/s 4.0764 KOps/s $\color{#d91a1a}-2.25\%$
test_compile_add_one_flat[tensorclass-compile] 0.1473ms 55.0535μs 18.1642 KOps/s 18.8301 KOps/s $\color{#d91a1a}-3.54\%$
test_compile_add_one_flat[tensorclass-eager] 0.1621ms 81.4675μs 12.2748 KOps/s 12.3105 KOps/s $\color{#d91a1a}-0.29\%$
test_compile_add_one_flat[pytree-compile] 0.2374ms 0.1133ms 8.8245 KOps/s 8.9682 KOps/s $\color{#d91a1a}-1.60\%$
test_compile_add_one_flat[pytree-eager] 0.4037ms 0.3057ms 3.2710 KOps/s 3.3659 KOps/s $\color{#d91a1a}-2.82\%$
test_compile_add_self_flat[tensordict-eager] 0.7024ms 0.2861ms 3.4955 KOps/s 3.6352 KOps/s $\color{#d91a1a}-3.84\%$
test_compile_add_self_flat[tensordict-compile] 0.2550ms 0.1231ms 8.1221 KOps/s 8.2317 KOps/s $\color{#d91a1a}-1.33\%$
test_compile_add_self_flat[tensorclass-eager] 0.1412ms 76.8652μs 13.0098 KOps/s 13.2711 KOps/s $\color{#d91a1a}-1.97\%$
test_compile_add_self_flat[tensorclass-compile] 0.1131ms 55.6137μs 17.9812 KOps/s 18.3855 KOps/s $\color{#d91a1a}-2.20\%$
test_compile_add_self_flat[pytree-eager] 0.5605ms 0.2500ms 4.0003 KOps/s 4.1054 KOps/s $\color{#d91a1a}-2.56\%$
test_compile_add_self_flat[pytree-compile] 0.2420ms 0.1128ms 8.8678 KOps/s 9.0545 KOps/s $\color{#d91a1a}-2.06\%$
test_compile_copy_flat[tensordict-compile] 70.0720μs 29.6662μs 33.7083 KOps/s 32.4208 KOps/s $\color{#35bf28}+3.97\%$
test_compile_copy_flat[tensordict-eager] 0.1932ms 79.9196μs 12.5126 KOps/s 12.6594 KOps/s $\color{#d91a1a}-1.16\%$
test_compile_copy_flat[pytree-compile] 0.1735ms 80.3818μs 12.4406 KOps/s 12.3167 KOps/s $\color{#35bf28}+1.01\%$
test_compile_copy_flat[pytree-eager] 0.1239ms 69.1311μs 14.4653 KOps/s 14.4822 KOps/s $\color{#d91a1a}-0.12\%$
test_compile_assign_and_add[tensordict-compile] 0.3232ms 0.2177ms 4.5944 KOps/s 4.6201 KOps/s $\color{#d91a1a}-0.56\%$
test_compile_assign_and_add[tensordict-eager] 2.0926ms 1.8169ms 550.3826 Ops/s 563.8631 Ops/s $\color{#d91a1a}-2.39\%$
test_compile_assign_and_add[pytree-compile] 0.3352ms 0.2176ms 4.5954 KOps/s 4.7126 KOps/s $\color{#d91a1a}-2.49\%$
test_compile_assign_and_add[pytree-eager] 1.4165ms 1.1679ms 856.2711 Ops/s 857.8656 Ops/s $\color{#d91a1a}-0.19\%$
test_compile_assign_and_add_stack[compile] 1.1277ms 0.4749ms 2.1059 KOps/s 2.1431 KOps/s $\color{#d91a1a}-1.73\%$
test_compile_assign_and_add_stack[eager] 6.2043ms 4.2422ms 235.7281 Ops/s 238.8927 Ops/s $\color{#d91a1a}-1.32\%$
test_compile_indexing[tensor-tensordict-compile] 0.1068ms 44.5368μs 22.4534 KOps/s 22.9493 KOps/s $\color{#d91a1a}-2.16\%$
test_compile_indexing[tensor-tensordict-eager] 0.5551ms 49.9216μs 20.0314 KOps/s 19.7692 KOps/s $\color{#35bf28}+1.33\%$
test_compile_indexing[tensor-tensorclass-compile] 89.7690μs 38.2053μs 26.1744 KOps/s 27.5906 KOps/s $\textbf{\color{#d91a1a}-5.13\%}$
test_compile_indexing[tensor-tensorclass-eager] 71.0640μs 29.2922μs 34.1388 KOps/s 32.7156 KOps/s $\color{#35bf28}+4.35\%$
test_compile_indexing[tensor-pytree-compile] 0.1064ms 39.1713μs 25.5289 KOps/s 27.0936 KOps/s $\textbf{\color{#d91a1a}-5.78\%}$
test_compile_indexing[tensor-pytree-eager] 0.1911ms 29.0184μs 34.4609 KOps/s 31.8595 KOps/s $\textbf{\color{#35bf28}+8.17\%}$
test_compile_indexing[slice-tensordict-compile] 0.1926ms 79.0277μs 12.6538 KOps/s 13.1143 KOps/s $\color{#d91a1a}-3.51\%$
test_compile_indexing[slice-tensordict-eager] 0.5641ms 29.4318μs 33.9769 KOps/s 33.7710 KOps/s $\color{#35bf28}+0.61\%$
test_compile_indexing[slice-tensorclass-compile] 0.1443ms 70.9704μs 14.0904 KOps/s 14.5149 KOps/s $\color{#d91a1a}-2.92\%$
test_compile_indexing[slice-tensorclass-eager] 65.9440μs 24.0063μs 41.6557 KOps/s 41.8122 KOps/s $\color{#d91a1a}-0.37\%$
test_compile_indexing[slice-pytree-compile] 0.1627ms 70.9495μs 14.0945 KOps/s 14.2008 KOps/s $\color{#d91a1a}-0.75\%$
test_compile_indexing[slice-pytree-eager] 90.4800μs 23.9395μs 41.7719 KOps/s 42.0785 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_indexing[int-tensordict-compile] 0.1550ms 79.4772μs 12.5822 KOps/s 12.9520 KOps/s $\color{#d91a1a}-2.85\%$
test_compile_indexing[int-tensordict-eager] 0.9646ms 28.9920μs 34.4922 KOps/s 34.2917 KOps/s $\color{#35bf28}+0.58\%$
test_compile_indexing[int-tensorclass-compile] 0.1480ms 71.0612μs 14.0724 KOps/s 14.3732 KOps/s $\color{#d91a1a}-2.09\%$
test_compile_indexing[int-tensorclass-eager] 66.8160μs 23.8420μs 41.9427 KOps/s 42.4999 KOps/s $\color{#d91a1a}-1.31\%$
test_compile_indexing[int-pytree-compile] 0.2486ms 72.8860μs 13.7201 KOps/s 14.2627 KOps/s $\color{#d91a1a}-3.80\%$
test_compile_indexing[int-pytree-eager] 96.9740μs 23.6809μs 42.2280 KOps/s 42.8691 KOps/s $\color{#d91a1a}-1.50\%$
test_mod_add[eager] 96.3520μs 27.2484μs 36.6994 KOps/s 38.4854 KOps/s $\color{#d91a1a}-4.64\%$
test_mod_add[compile] 0.1623ms 45.1617μs 22.1427 KOps/s 22.5039 KOps/s $\color{#d91a1a}-1.61\%$
test_mod_add[compile-overhead] 95.8500μs 45.5887μs 21.9353 KOps/s 22.4036 KOps/s $\color{#d91a1a}-2.09\%$
test_mod_wrap[eager] 0.4436ms 0.2142ms 4.6691 KOps/s 4.5872 KOps/s $\color{#35bf28}+1.79\%$
test_mod_wrap[compile] 1.9134ms 0.2064ms 4.8451 KOps/s 4.9105 KOps/s $\color{#d91a1a}-1.33\%$
test_mod_wrap[compile-overhead] 1.8239ms 0.2061ms 4.8518 KOps/s 4.9163 KOps/s $\color{#d91a1a}-1.31\%$
test_mod_wrap_and_backward[eager] 13.3625ms 11.7285ms 85.2623 Ops/s 81.7317 Ops/s $\color{#35bf28}+4.32\%$
test_mod_wrap_and_backward[compile] 15.8933ms 12.9916ms 76.9725 Ops/s 79.1497 Ops/s $\color{#d91a1a}-2.75\%$
test_mod_wrap_and_backward[compile-overhead] 14.2310ms 12.6260ms 79.2018 Ops/s 88.9980 Ops/s $\textbf{\color{#d91a1a}-11.01\%}$
test_seq_add[eager] 0.1760ms 94.2674μs 10.6081 KOps/s 10.9030 KOps/s $\color{#d91a1a}-2.70\%$
test_seq_add[compile] 0.1420ms 61.1153μs 16.3625 KOps/s 16.7737 KOps/s $\color{#d91a1a}-2.45\%$
test_seq_add[compile-overhead] 0.1269ms 58.2570μs 17.1653 KOps/s 16.8140 KOps/s $\color{#35bf28}+2.09\%$
test_seq_wrap[eager] 0.7067ms 0.3934ms 2.5417 KOps/s 2.5837 KOps/s $\color{#d91a1a}-1.62\%$
test_seq_wrap[compile] 0.4364ms 0.2285ms 4.3772 KOps/s 4.4340 KOps/s $\color{#d91a1a}-1.28\%$
test_seq_wrap[compile-overhead] 0.3529ms 0.2250ms 4.4452 KOps/s 4.5139 KOps/s $\color{#d91a1a}-1.52\%$
test_func_call_runtime[False-eager] 0.9727ms 0.5431ms 1.8413 KOps/s 1.8356 KOps/s $\color{#35bf28}+0.31\%$
test_func_call_runtime[False-compile] 0.6102ms 0.4270ms 2.3419 KOps/s 2.3897 KOps/s $\color{#d91a1a}-2.00\%$
test_func_call_runtime[False-compile-overhead] 0.8010ms 0.4280ms 2.3362 KOps/s 2.3797 KOps/s $\color{#d91a1a}-1.82\%$
test_func_call_runtime[True-eager] 1.3745ms 0.7628ms 1.3109 KOps/s 1.3328 KOps/s $\color{#d91a1a}-1.64\%$
test_func_call_runtime[True-compile] 0.7556ms 0.4665ms 2.1436 KOps/s 2.1613 KOps/s $\color{#d91a1a}-0.82\%$
test_func_call_runtime[True-compile-overhead] 0.8533ms 0.4695ms 2.1301 KOps/s 2.1699 KOps/s $\color{#d91a1a}-1.83\%$
test_func_call_cm_runtime[False-eager] 0.6772ms 0.5366ms 1.8636 KOps/s 1.8496 KOps/s $\color{#35bf28}+0.75\%$
test_func_call_cm_runtime[False-compile] 0.6874ms 0.4296ms 2.3275 KOps/s 2.3635 KOps/s $\color{#d91a1a}-1.52\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5269ms 0.4248ms 2.3540 KOps/s 2.3804 KOps/s $\color{#d91a1a}-1.11\%$
test_func_call_cm_runtime[True-eager] 1.1460ms 0.9104ms 1.0984 KOps/s 1.0936 KOps/s $\color{#35bf28}+0.44\%$
test_func_call_cm_runtime[True-compile] 1.0432ms 0.4993ms 2.0030 KOps/s 2.0304 KOps/s $\color{#d91a1a}-1.35\%$
test_func_call_cm_runtime[True-compile-overhead] 0.7339ms 0.4965ms 2.0139 KOps/s 2.0344 KOps/s $\color{#d91a1a}-1.01\%$
test_vmap_func_call_cm_runtime[eager] 3.1580ms 1.9690ms 507.8817 Ops/s 509.8162 Ops/s $\color{#d91a1a}-0.38\%$
test_vmap_func_call_cm_runtime[compile] 0.8534ms 0.5235ms 1.9102 KOps/s 1.9344 KOps/s $\color{#d91a1a}-1.25\%$
test_vmap_func_call_cm_runtime[compile-overhead] 1.0546ms 0.5250ms 1.9048 KOps/s 1.9413 KOps/s $\color{#d91a1a}-1.88\%$
test_distributed 0.2819ms 0.1262ms 7.9263 KOps/s 7.8569 KOps/s $\color{#35bf28}+0.88\%$
test_tdmodule 34.8360μs 17.8975μs 55.8738 KOps/s 55.4454 KOps/s $\color{#35bf28}+0.77\%$
test_tdmodule_dispatch 83.1770μs 36.4218μs 27.4561 KOps/s 27.5308 KOps/s $\color{#d91a1a}-0.27\%$
test_tdseq 39.8250μs 20.4692μs 48.8538 KOps/s 49.0413 KOps/s $\color{#d91a1a}-0.38\%$
test_tdseq_dispatch 65.9740μs 41.4461μs 24.1277 KOps/s 24.6546 KOps/s $\color{#d91a1a}-2.14\%$
test_instantiation_functorch 2.3816ms 1.5370ms 650.6107 Ops/s 644.9317 Ops/s $\color{#35bf28}+0.88\%$
test_exec_functorch 0.2970ms 0.1780ms 5.6187 KOps/s 5.4513 KOps/s $\color{#35bf28}+3.07\%$
test_exec_functional_call 0.3284ms 0.1702ms 5.8764 KOps/s 5.6525 KOps/s $\color{#35bf28}+3.96\%$
test_exec_td_decorator 0.5420ms 0.2382ms 4.1976 KOps/s 4.1649 KOps/s $\color{#35bf28}+0.78\%$
test_vmap_mlp_speed_decorator[True-True] 1.1664ms 0.6694ms 1.4939 KOps/s 1.5085 KOps/s $\color{#d91a1a}-0.97\%$
test_vmap_mlp_speed_decorator[True-False] 0.9910ms 0.6654ms 1.5028 KOps/s 1.5435 KOps/s $\color{#d91a1a}-2.64\%$
test_vmap_mlp_speed_decorator[False-True] 0.8413ms 0.5538ms 1.8058 KOps/s 1.8488 KOps/s $\color{#d91a1a}-2.32\%$
test_vmap_mlp_speed_decorator[False-False] 0.8756ms 0.5499ms 1.8184 KOps/s 1.8290 KOps/s $\color{#d91a1a}-0.58\%$
test_to_module_speed[True] 2.0762ms 1.3790ms 725.1634 Ops/s 720.8149 Ops/s $\color{#35bf28}+0.60\%$
test_to_module_speed[False] 1.9683ms 1.3446ms 743.7064 Ops/s 733.4834 Ops/s $\color{#35bf28}+1.39\%$
test_tc_init 0.1313ms 51.5306μs 19.4059 KOps/s 22.1879 KOps/s $\textbf{\color{#d91a1a}-12.54\%}$
test_tc_init_nested 0.2088ms 0.1002ms 9.9805 KOps/s 11.2042 KOps/s $\textbf{\color{#d91a1a}-10.92\%}$
test_tc_first_layer_tensor 21.0100μs 1.5445μs 647.4441 KOps/s 659.1734 KOps/s $\color{#d91a1a}-1.78\%$
test_tc_first_layer_nontensor 27.7320μs 4.7810μs 209.1615 KOps/s 211.8770 KOps/s $\color{#d91a1a}-1.28\%$
test_tc_second_layer_tensor 39.0530μs 2.8405μs 352.0518 KOps/s 356.9931 KOps/s $\color{#d91a1a}-1.38\%$
test_tc_second_layer_nontensor 28.9550μs 6.1827μs 161.7404 KOps/s 166.6475 KOps/s $\color{#d91a1a}-2.94\%$
test_unbind 0.2276s 13.4306ms 74.4566 Ops/s 78.2742 Ops/s $\color{#d91a1a}-4.88\%$
test_full_like 9.0097ms 7.8949ms 126.6641 Ops/s 125.2523 Ops/s $\color{#35bf28}+1.13\%$
test_zeros_like 3.6043ms 3.0574ms 327.0779 Ops/s 327.1559 Ops/s $\color{#d91a1a}-0.02\%$
test_ones_like 3.9205ms 3.4438ms 290.3730 Ops/s 293.5596 Ops/s $\color{#d91a1a}-1.09\%$
test_clone 6.5136ms 5.7095ms 175.1465 Ops/s 182.7783 Ops/s $\color{#d91a1a}-4.18\%$
test_squeeze 60.4740μs 12.9892μs 76.9873 KOps/s 77.1219 KOps/s $\color{#d91a1a}-0.17\%$
test_unsqueeze 0.3274ms 95.5841μs 10.4620 KOps/s 10.6106 KOps/s $\color{#d91a1a}-1.40\%$
test_split 0.3373ms 0.1961ms 5.0996 KOps/s 5.1628 KOps/s $\color{#d91a1a}-1.22\%$
test_permute 0.3785ms 0.2239ms 4.4670 KOps/s 4.4829 KOps/s $\color{#d91a1a}-0.36\%$
test_stack 31.0536ms 26.9016ms 37.1725 Ops/s 37.7155 Ops/s $\color{#d91a1a}-1.44\%$
test_cat 31.1385ms 26.8754ms 37.2088 Ops/s 38.0383 Ops/s $\color{#d91a1a}-2.18\%$

@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}29$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 54.7730μs 16.2291μs 61.6176 KOps/s 59.4427 KOps/s $\color{#35bf28}+3.66\%$
test_plain_set_stack_nested 43.3530μs 16.5490μs 60.4265 KOps/s 60.1528 KOps/s $\color{#35bf28}+0.45\%$
test_plain_set_nested_inplace 45.3020μs 17.6879μs 56.5360 KOps/s 55.1141 KOps/s $\color{#35bf28}+2.58\%$
test_plain_set_stack_nested_inplace 58.1730μs 17.3731μs 57.5604 KOps/s 55.9052 KOps/s $\color{#35bf28}+2.96\%$
test_items 24.9210μs 2.8358μs 352.6392 KOps/s 347.5341 KOps/s $\color{#35bf28}+1.47\%$
test_items_nested 0.4558ms 0.3426ms 2.9192 KOps/s 2.9201 KOps/s $\color{#d91a1a}-0.03\%$
test_items_nested_locked 0.3912ms 0.3467ms 2.8843 KOps/s 2.9008 KOps/s $\color{#d91a1a}-0.57\%$
test_items_nested_leaf 95.9860μs 62.6394μs 15.9644 KOps/s 15.9840 KOps/s $\color{#d91a1a}-0.12\%$
test_items_stack_nested 0.4082ms 0.3479ms 2.8747 KOps/s 2.9045 KOps/s $\color{#d91a1a}-1.03\%$
test_items_stack_nested_leaf 94.4750μs 63.5080μs 15.7460 KOps/s 15.4796 KOps/s $\color{#35bf28}+1.72\%$
test_items_stack_nested_locked 0.4110ms 0.3481ms 2.8728 KOps/s 2.8812 KOps/s $\color{#d91a1a}-0.29\%$
test_keys 29.2010μs 3.4680μs 288.3497 KOps/s 291.2167 KOps/s $\color{#d91a1a}-0.98\%$
test_keys_nested 0.1012ms 71.6774μs 13.9514 KOps/s 13.9049 KOps/s $\color{#35bf28}+0.33\%$
test_keys_nested_locked 2.3893ms 77.7195μs 12.8668 KOps/s 13.0031 KOps/s $\color{#d91a1a}-1.05\%$
test_keys_nested_leaf 92.2750μs 61.9344μs 16.1461 KOps/s 15.9612 KOps/s $\color{#35bf28}+1.16\%$
test_keys_stack_nested 0.1051ms 73.1909μs 13.6629 KOps/s 14.1556 KOps/s $\color{#d91a1a}-3.48\%$
test_keys_stack_nested_leaf 90.0640μs 63.4055μs 15.7715 KOps/s 15.9697 KOps/s $\color{#d91a1a}-1.24\%$
test_keys_stack_nested_locked 0.1110ms 77.8726μs 12.8415 KOps/s 13.1483 KOps/s $\color{#d91a1a}-2.33\%$
test_values 5.3220μs 0.8406μs 1.1896 MOps/s 1.1917 MOps/s $\color{#d91a1a}-0.18\%$
test_values_nested 75.0240μs 48.4269μs 20.6497 KOps/s 20.3986 KOps/s $\color{#35bf28}+1.23\%$
test_values_nested_locked 74.4940μs 50.5433μs 19.7850 KOps/s 19.8378 KOps/s $\color{#d91a1a}-0.27\%$
test_values_nested_leaf 70.1930μs 42.1673μs 23.7150 KOps/s 23.2654 KOps/s $\color{#35bf28}+1.93\%$
test_values_stack_nested 77.9640μs 49.0262μs 20.3973 KOps/s 20.1005 KOps/s $\color{#35bf28}+1.48\%$
test_values_stack_nested_leaf 73.5540μs 43.9796μs 22.7378 KOps/s 22.8808 KOps/s $\color{#d91a1a}-0.62\%$
test_values_stack_nested_locked 81.1450μs 51.5820μs 19.3866 KOps/s 19.3814 KOps/s $\color{#35bf28}+0.03\%$
test_membership 2.0161μs 0.5016μs 1.9936 MOps/s 1.9601 MOps/s $\color{#35bf28}+1.71\%$
test_membership_nested 17.2210μs 1.9288μs 518.4647 KOps/s 528.0428 KOps/s $\color{#d91a1a}-1.81\%$
test_membership_nested_leaf 16.2605μs 1.9181μs 521.3421 KOps/s 523.7818 KOps/s $\color{#d91a1a}-0.47\%$
test_membership_stacked_nested 35.6220μs 2.0043μs 498.9261 KOps/s 506.4640 KOps/s $\color{#d91a1a}-1.49\%$
test_membership_stacked_nested_leaf 26.2520μs 1.9891μs 502.7339 KOps/s 508.6292 KOps/s $\color{#d91a1a}-1.16\%$
test_membership_nested_last 33.6820μs 3.0000μs 333.3301 KOps/s 334.7615 KOps/s $\color{#d91a1a}-0.43\%$
test_membership_nested_leaf_last 31.4120μs 3.0041μs 332.8752 KOps/s 336.4019 KOps/s $\color{#d91a1a}-1.05\%$
test_membership_stacked_nested_last 29.1110μs 3.0096μs 332.2728 KOps/s 121.8482 KOps/s $\textbf{\color{#35bf28}+172.69\%}$
test_membership_stacked_nested_leaf_last 35.4520μs 3.0636μs 326.4100 KOps/s 123.3412 KOps/s $\textbf{\color{#35bf28}+164.64\%}$
test_nested_getleaf 31.6020μs 6.0502μs 165.2850 KOps/s 165.7101 KOps/s $\color{#d91a1a}-0.26\%$
test_nested_get 37.4020μs 5.7571μs 173.6995 KOps/s 174.0434 KOps/s $\color{#d91a1a}-0.20\%$
test_stacked_getleaf 38.0720μs 6.0385μs 165.6035 KOps/s 164.2573 KOps/s $\color{#35bf28}+0.82\%$
test_stacked_get 37.3120μs 5.6686μs 176.4114 KOps/s 176.4200 KOps/s $-0.00\%$
test_nested_getitemleaf 30.8520μs 6.1512μs 162.5711 KOps/s 162.9285 KOps/s $\color{#d91a1a}-0.22\%$
test_nested_getitem 31.6820μs 5.7306μs 174.5013 KOps/s 172.7189 KOps/s $\color{#35bf28}+1.03\%$
test_stacked_getitemleaf 37.7720μs 6.0673μs 164.8171 KOps/s 163.9711 KOps/s $\color{#35bf28}+0.52\%$
test_stacked_getitem 37.8220μs 5.6792μs 176.0798 KOps/s 173.7197 KOps/s $\color{#35bf28}+1.36\%$
test_lock_nested 7.0842ms 0.4310ms 2.3202 KOps/s 2.2817 KOps/s $\color{#35bf28}+1.69\%$
test_lock_stack_nested 0.4521ms 0.3939ms 2.5384 KOps/s 2.6220 KOps/s $\color{#d91a1a}-3.19\%$
test_unlock_nested 0.7746ms 0.3647ms 2.7423 KOps/s 2.7259 KOps/s $\color{#35bf28}+0.60\%$
test_unlock_stack_nested 0.3747ms 0.3276ms 3.0528 KOps/s 3.1430 KOps/s $\color{#d91a1a}-2.87\%$
test_flatten_speed 0.1563ms 76.0786μs 13.1443 KOps/s 13.1861 KOps/s $\color{#d91a1a}-0.32\%$
test_unflatten_speed 0.3669ms 0.3194ms 3.1309 KOps/s 3.0882 KOps/s $\color{#35bf28}+1.38\%$
test_common_ops 1.5547ms 1.2617ms 792.5840 Ops/s 782.2240 Ops/s $\color{#35bf28}+1.32\%$
test_creation 23.5310μs 1.5010μs 666.2148 KOps/s 680.2069 KOps/s $\color{#d91a1a}-2.06\%$
test_creation_empty 46.8430μs 14.8897μs 67.1604 KOps/s 64.7024 KOps/s $\color{#35bf28}+3.80\%$
test_creation_nested_1 49.5430μs 16.5714μs 60.3448 KOps/s 58.5751 KOps/s $\color{#35bf28}+3.02\%$
test_creation_nested_2 49.2220μs 19.2147μs 52.0436 KOps/s 48.9758 KOps/s $\textbf{\color{#35bf28}+6.26\%}$
test_clone 64.4930μs 30.6339μs 32.6436 KOps/s 33.1518 KOps/s $\color{#d91a1a}-1.53\%$
test_getitem[int] 1.2088ms 16.3859μs 61.0279 KOps/s 62.0953 KOps/s $\color{#d91a1a}-1.72\%$
test_getitem[slice_int] 0.1184ms 27.8098μs 35.9586 KOps/s 35.3886 KOps/s $\color{#35bf28}+1.61\%$
test_getitem[range] 0.1605ms 0.1106ms 9.0418 KOps/s 8.8169 KOps/s $\color{#35bf28}+2.55\%$
test_getitem[tuple] 0.1300ms 23.7533μs 42.0995 KOps/s 37.9591 KOps/s $\textbf{\color{#35bf28}+10.91\%}$
test_getitem[list] 0.1948ms 0.1004ms 9.9554 KOps/s 9.4178 KOps/s $\textbf{\color{#35bf28}+5.71\%}$
test_setitem_dim[int] 81.0540μs 45.2171μs 22.1155 KOps/s 20.6912 KOps/s $\textbf{\color{#35bf28}+6.88\%}$
test_setitem_dim[slice_int] 93.5350μs 67.6894μs 14.7734 KOps/s 14.9130 KOps/s $\color{#d91a1a}-0.94\%$
test_setitem_dim[range] 0.1682ms 0.1288ms 7.7639 KOps/s 7.6956 KOps/s $\color{#35bf28}+0.89\%$
test_setitem_dim[tuple] 85.2650μs 61.8798μs 16.1604 KOps/s 15.3425 KOps/s $\textbf{\color{#35bf28}+5.33\%}$
test_setitem 79.4040μs 42.9493μs 23.2833 KOps/s 21.7632 KOps/s $\textbf{\color{#35bf28}+6.98\%}$
test_set 78.5940μs 42.1892μs 23.7027 KOps/s 22.3227 KOps/s $\textbf{\color{#35bf28}+6.18\%}$
test_set_shared 0.3585ms 55.7410μs 17.9401 KOps/s 18.1641 KOps/s $\color{#d91a1a}-1.23\%$
test_update 92.5250μs 52.2226μs 19.1488 KOps/s 19.5690 KOps/s $\color{#d91a1a}-2.15\%$
test_update_nested 0.1067ms 59.6663μs 16.7599 KOps/s 17.1955 KOps/s $\color{#d91a1a}-2.53\%$
test_update__nested 0.1907ms 64.6936μs 15.4575 KOps/s 15.6259 KOps/s $\color{#d91a1a}-1.08\%$
test_set_nested 86.5140μs 44.5284μs 22.4576 KOps/s 20.7508 KOps/s $\textbf{\color{#35bf28}+8.23\%}$
test_set_nested_new 91.7540μs 47.9097μs 20.8726 KOps/s 19.5606 KOps/s $\textbf{\color{#35bf28}+6.71\%}$
test_select 0.1112ms 61.6534μs 16.2197 KOps/s 15.4155 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_select_nested 75.3440μs 42.5318μs 23.5118 KOps/s 24.2089 KOps/s $\color{#d91a1a}-2.88\%$
test_exclude_nested 97.5350μs 59.8992μs 16.6947 KOps/s 17.0584 KOps/s $\color{#d91a1a}-2.13\%$
test_empty[True] 0.3190ms 0.2602ms 3.8428 KOps/s 3.8629 KOps/s $\color{#d91a1a}-0.52\%$
test_empty[False] 3.6942μs 0.7425μs 1.3468 MOps/s 1.3288 MOps/s $\color{#35bf28}+1.35\%$
test_to 56.6730μs 28.0328μs 35.6725 KOps/s 36.8197 KOps/s $\color{#d91a1a}-3.12\%$
test_to_nonblocking 70.3130μs 26.2092μs 38.1545 KOps/s 36.0309 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_unbind_speed 0.3192ms 0.2796ms 3.5759 KOps/s 3.6165 KOps/s $\color{#d91a1a}-1.12\%$
test_unbind_speed_stack0 0.3376ms 0.2787ms 3.5878 KOps/s 3.7635 KOps/s $\color{#d91a1a}-4.67\%$
test_unbind_speed_stack1 92.0571ms 0.7136ms 1.4013 KOps/s 1.4393 KOps/s $\color{#d91a1a}-2.64\%$
test_split 92.6965ms 2.2535ms 443.7620 Ops/s 441.3008 Ops/s $\color{#35bf28}+0.56\%$
test_chunk 95.1655ms 2.2483ms 444.7875 Ops/s 439.1065 Ops/s $\color{#35bf28}+1.29\%$
test_to[False] 3.6349ms 3.5414ms 282.3774 Ops/s 281.1129 Ops/s $\color{#35bf28}+0.45\%$
test_to[True] 4.9442ms 4.6022ms 217.2867 Ops/s 214.6891 Ops/s $\color{#35bf28}+1.21\%$
test_to_njt[False] 0.3319s 0.2538s 3.9408 Ops/s 3.9460 Ops/s $\color{#d91a1a}-0.13\%$
test_to_njt[True] 0.2651s 0.2646s 3.7791 Ops/s 3.7762 Ops/s $\color{#35bf28}+0.08\%$
test_creation[device0] 0.3026ms 0.1296ms 7.7176 KOps/s 7.5950 KOps/s $\color{#35bf28}+1.61\%$
test_creation_from_tensor 0.3733ms 0.1318ms 7.5878 KOps/s 7.1408 KOps/s $\textbf{\color{#35bf28}+6.26\%}$
test_add_one[memmap_tensor0] 0.2335ms 9.4946μs 105.3234 KOps/s 106.7276 KOps/s $\color{#d91a1a}-1.32\%$
test_contiguous[memmap_tensor0] 31.1520μs 2.1939μs 455.8145 KOps/s 443.6008 KOps/s $\color{#35bf28}+2.75\%$
test_stack[memmap_tensor0] 37.9320μs 7.0926μs 140.9922 KOps/s 139.5523 KOps/s $\color{#35bf28}+1.03\%$
test_memmaptd_index 1.0929ms 0.4460ms 2.2423 KOps/s 2.2712 KOps/s $\color{#d91a1a}-1.27\%$
test_memmaptd_index_astensor 0.7774ms 0.5182ms 1.9297 KOps/s 1.9574 KOps/s $\color{#d91a1a}-1.41\%$
test_memmaptd_index_op 1.4506ms 1.0539ms 948.8784 Ops/s 951.3274 Ops/s $\color{#d91a1a}-0.26\%$
test_serialize_model 0.1309s 0.1299s 7.6987 Ops/s 6.9700 Ops/s $\textbf{\color{#35bf28}+10.46\%}$
test_serialize_model_pickle 1.3505s 1.1904s 0.8401 Ops/s 0.8233 Ops/s $\color{#35bf28}+2.03\%$
test_serialize_weights 0.1299s 0.1291s 7.7438 Ops/s 7.7391 Ops/s $\color{#35bf28}+0.06\%$
test_serialize_weights_returnearly 0.2314s 57.9523ms 17.2556 Ops/s 17.6251 Ops/s $\color{#d91a1a}-2.10\%$
test_serialize_weights_pickle 1.3765s 1.1908s 0.8397 Ops/s 0.8386 Ops/s $\color{#35bf28}+0.13\%$
test_reshape_pytree 87.8050μs 35.6263μs 28.0692 KOps/s 27.5381 KOps/s $\color{#35bf28}+1.93\%$
test_reshape_td 73.1840μs 41.2755μs 24.2274 KOps/s 23.7779 KOps/s $\color{#35bf28}+1.89\%$
test_view_pytree 65.8940μs 35.9710μs 27.8001 KOps/s 28.3606 KOps/s $\color{#d91a1a}-1.98\%$
test_view_td 81.9340μs 47.1120μs 21.2260 KOps/s 21.9064 KOps/s $\color{#d91a1a}-3.11\%$
test_unbind_pytree 65.2840μs 34.8968μs 28.6560 KOps/s 26.9395 KOps/s $\textbf{\color{#35bf28}+6.37\%}$
test_unbind_td 0.7232ms 42.6097μs 23.4689 KOps/s 21.6712 KOps/s $\textbf{\color{#35bf28}+8.30\%}$
test_split_pytree 93.4897ms 57.4579μs 17.4040 KOps/s 19.6506 KOps/s $\textbf{\color{#d91a1a}-11.43\%}$
test_split_td 0.1475ms 57.5084μs 17.3888 KOps/s 14.8350 KOps/s $\textbf{\color{#35bf28}+17.21\%}$
test_add_pytree 0.1035ms 63.7252μs 15.6924 KOps/s 17.0539 KOps/s $\textbf{\color{#d91a1a}-7.98\%}$
test_add_td 0.1566ms 99.6753μs 10.0326 KOps/s 10.5569 KOps/s $\color{#d91a1a}-4.97\%$
test_compile_add_one_nested[tensordict-compile] 0.2788ms 0.1625ms 6.1536 KOps/s 6.0620 KOps/s $\color{#35bf28}+1.51\%$
test_compile_add_one_nested[tensordict-eager] 0.2857ms 0.1636ms 6.1123 KOps/s 6.0664 KOps/s $\color{#35bf28}+0.76\%$
test_compile_add_one_nested[pytree-compile] 0.2136ms 0.1603ms 6.2368 KOps/s 6.4206 KOps/s $\color{#d91a1a}-2.86\%$
test_compile_add_one_nested[pytree-eager] 0.2456ms 0.1903ms 5.2554 KOps/s 5.2904 KOps/s $\color{#d91a1a}-0.66\%$
test_compile_copy_nested[tensordict-compile] 53.6730μs 21.8411μs 45.7852 KOps/s 46.0020 KOps/s $\color{#d91a1a}-0.47\%$
test_compile_copy_nested[tensordict-eager] 82.0240μs 49.6230μs 20.1520 KOps/s 20.3302 KOps/s $\color{#d91a1a}-0.88\%$
test_compile_copy_nested[pytree-compile] 0.3731ms 65.0576μs 15.3710 KOps/s 15.4427 KOps/s $\color{#d91a1a}-0.46\%$
test_compile_copy_nested[pytree-eager] 87.8350μs 49.6394μs 20.1453 KOps/s 20.1516 KOps/s $\color{#d91a1a}-0.03\%$
test_compile_add_one_flat[tensordict-compile] 0.3590ms 0.3228ms 3.0979 KOps/s 3.0844 KOps/s $\color{#35bf28}+0.44\%$
test_compile_add_one_flat[tensordict-eager] 0.3270ms 0.2327ms 4.2983 KOps/s 4.1781 KOps/s $\color{#35bf28}+2.88\%$
test_compile_add_one_flat[tensorclass-compile] 0.1700ms 0.1297ms 7.7111 KOps/s 7.6795 KOps/s $\color{#35bf28}+0.41\%$
test_compile_add_one_flat[tensorclass-eager] 0.1237ms 65.1707μs 15.3443 KOps/s 14.4991 KOps/s $\textbf{\color{#35bf28}+5.83\%}$
test_compile_add_one_flat[pytree-compile] 0.3779ms 0.3329ms 3.0038 KOps/s 2.9992 KOps/s $\color{#35bf28}+0.15\%$
test_compile_add_one_flat[pytree-eager] 0.8488ms 0.6439ms 1.5531 KOps/s 1.5615 KOps/s $\color{#d91a1a}-0.54\%$
test_compile_add_self_flat[tensordict-eager] 0.4011ms 0.2852ms 3.5069 KOps/s 3.3890 KOps/s $\color{#35bf28}+3.48\%$
test_compile_add_self_flat[tensordict-compile] 0.3811ms 0.3249ms 3.0775 KOps/s 3.0690 KOps/s $\color{#35bf28}+0.28\%$
test_compile_add_self_flat[tensorclass-eager] 0.1554ms 77.7227μs 12.8663 KOps/s 12.4375 KOps/s $\color{#35bf28}+3.45\%$
test_compile_add_self_flat[tensorclass-compile] 0.1693ms 0.1307ms 7.6507 KOps/s 7.3326 KOps/s $\color{#35bf28}+4.34\%$
test_compile_add_self_flat[pytree-eager] 0.6193ms 0.5455ms 1.8332 KOps/s 1.7741 KOps/s $\color{#35bf28}+3.33\%$
test_compile_add_self_flat[pytree-compile] 0.3698ms 0.3284ms 3.0448 KOps/s 2.9454 KOps/s $\color{#35bf28}+3.37\%$
test_compile_copy_flat[tensordict-compile] 56.7240μs 19.2153μs 52.0419 KOps/s 48.7440 KOps/s $\textbf{\color{#35bf28}+6.77\%}$
test_compile_copy_flat[tensordict-eager] 66.2640μs 39.4247μs 25.3648 KOps/s 25.2959 KOps/s $\color{#35bf28}+0.27\%$
test_compile_copy_flat[pytree-compile] 0.1053ms 69.9645μs 14.2930 KOps/s 14.2857 KOps/s $\color{#35bf28}+0.05\%$
test_compile_copy_flat[pytree-eager] 85.3150μs 51.5439μs 19.4009 KOps/s 19.5825 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_assign_and_add[tensordict-compile] 2.3831ms 0.8316ms 1.2025 KOps/s 1.1136 KOps/s $\textbf{\color{#35bf28}+7.98\%}$
test_compile_assign_and_add[tensordict-eager] 3.5549ms 3.2502ms 307.6727 Ops/s 302.6761 Ops/s $\color{#35bf28}+1.65\%$
test_compile_assign_and_add[pytree-compile] 2.4257ms 0.8483ms 1.1788 KOps/s 1.0877 KOps/s $\textbf{\color{#35bf28}+8.38\%}$
test_compile_assign_and_add[pytree-eager] 3.4298ms 3.3101ms 302.1081 Ops/s 302.9134 Ops/s $\color{#d91a1a}-0.27\%$
test_compile_indexing[tensor-tensordict-compile] 0.2247ms 0.1264ms 7.9083 KOps/s 8.3252 KOps/s $\textbf{\color{#d91a1a}-5.01\%}$
test_compile_indexing[tensor-tensordict-eager] 0.1938ms 60.3032μs 16.5829 KOps/s 16.2383 KOps/s $\color{#35bf28}+2.12\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1626ms 0.1145ms 8.7355 KOps/s 8.5726 KOps/s $\color{#35bf28}+1.90\%$
test_compile_indexing[tensor-tensorclass-eager] 86.6840μs 45.8382μs 21.8159 KOps/s 20.8401 KOps/s $\color{#35bf28}+4.68\%$
test_compile_indexing[tensor-pytree-compile] 0.1810ms 0.1209ms 8.2735 KOps/s 8.1327 KOps/s $\color{#35bf28}+1.73\%$
test_compile_indexing[tensor-pytree-eager] 95.3250μs 46.6383μs 21.4416 KOps/s 21.1377 KOps/s $\color{#35bf28}+1.44\%$
test_compile_indexing[slice-tensordict-compile] 0.2101ms 0.1550ms 6.4510 KOps/s 6.6931 KOps/s $\color{#d91a1a}-3.62\%$
test_compile_indexing[slice-tensordict-eager] 0.1542ms 26.4456μs 37.8135 KOps/s 34.7564 KOps/s $\textbf{\color{#35bf28}+8.80\%}$
test_compile_indexing[slice-tensorclass-compile] 0.2090ms 0.1458ms 6.8590 KOps/s 7.0039 KOps/s $\color{#d91a1a}-2.07\%$
test_compile_indexing[slice-tensorclass-eager] 0.1028ms 20.7714μs 48.1430 KOps/s 45.6546 KOps/s $\textbf{\color{#35bf28}+5.45\%}$
test_compile_indexing[slice-pytree-compile] 0.1909ms 0.1458ms 6.8572 KOps/s 6.9728 KOps/s $\color{#d91a1a}-1.66\%$
test_compile_indexing[slice-pytree-eager] 51.9530μs 21.7580μs 45.9600 KOps/s 46.5506 KOps/s $\color{#d91a1a}-1.27\%$
test_compile_indexing[int-tensordict-compile] 0.2663ms 0.1496ms 6.6857 KOps/s 6.6716 KOps/s $\color{#35bf28}+0.21\%$
test_compile_indexing[int-tensordict-eager] 0.4977ms 25.1431μs 39.7723 KOps/s 38.4855 KOps/s $\color{#35bf28}+3.34\%$
test_compile_indexing[int-tensorclass-compile] 0.2237ms 0.1448ms 6.9066 KOps/s 6.9787 KOps/s $\color{#d91a1a}-1.03\%$
test_compile_indexing[int-tensorclass-eager] 55.7830μs 22.1820μs 45.0816 KOps/s 46.2155 KOps/s $\color{#d91a1a}-2.45\%$
test_compile_indexing[int-pytree-compile] 0.1948ms 0.1489ms 6.7178 KOps/s 6.9736 KOps/s $\color{#d91a1a}-3.67\%$
test_compile_indexing[int-pytree-eager] 48.3330μs 21.5508μs 46.4020 KOps/s 46.5455 KOps/s $\color{#d91a1a}-0.31\%$
test_mod_add[eager] 74.5240μs 34.4852μs 28.9980 KOps/s 30.4115 KOps/s $\color{#d91a1a}-4.65\%$
test_mod_add[compile] 0.1388ms 86.9968μs 11.4947 KOps/s 11.8634 KOps/s $\color{#d91a1a}-3.11\%$
test_mod_add[compile-overhead] 0.3045ms 0.1540ms 6.4945 KOps/s 6.2208 KOps/s $\color{#35bf28}+4.40\%$
test_mod_wrap[eager] 0.3792ms 0.2423ms 4.1269 KOps/s 4.0360 KOps/s $\color{#35bf28}+2.25\%$
test_mod_wrap[compile] 1.4252ms 0.2993ms 3.3413 KOps/s 3.2871 KOps/s $\color{#35bf28}+1.65\%$
test_mod_wrap[compile-overhead] 7.5491ms 4.0164ms 248.9821 Ops/s 249.6185 Ops/s $\color{#d91a1a}-0.25\%$
test_mod_wrap_and_backward[eager] 1.5261ms 1.3552ms 737.9154 Ops/s 694.3648 Ops/s $\textbf{\color{#35bf28}+6.27\%}$
test_mod_wrap_and_backward[compile] 1.6635ms 1.3360ms 748.4891 Ops/s 685.8134 Ops/s $\textbf{\color{#35bf28}+9.14\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3609ms 0.9180ms 1.0893 KOps/s 1.0791 KOps/s $\color{#35bf28}+0.94\%$
test_seq_add[eager] 0.1703ms 99.9303μs 10.0070 KOps/s 9.8581 KOps/s $\color{#35bf28}+1.51\%$
test_seq_add[compile] 0.1843ms 92.8746μs 10.7672 KOps/s 10.3369 KOps/s $\color{#35bf28}+4.16\%$
test_seq_add[compile-overhead] 0.1644ms 0.1262ms 7.9268 KOps/s 7.6434 KOps/s $\color{#35bf28}+3.71\%$
test_seq_wrap[eager] 0.4650ms 0.3772ms 2.6512 KOps/s 2.4446 KOps/s $\textbf{\color{#35bf28}+8.45\%}$
test_seq_wrap[compile] 0.3849ms 0.3163ms 3.1619 KOps/s 3.1241 KOps/s $\color{#35bf28}+1.21\%$
test_seq_wrap[compile-overhead] 0.2774ms 0.2243ms 4.4582 KOps/s 4.4420 KOps/s $\color{#35bf28}+0.36\%$
test_func_call_runtime[False-eager] 0.8048ms 0.7416ms 1.3484 KOps/s 1.3213 KOps/s $\color{#35bf28}+2.05\%$
test_func_call_runtime[False-compile] 0.8805ms 0.8037ms 1.2442 KOps/s 1.1904 KOps/s $\color{#35bf28}+4.52\%$
test_func_call_runtime[False-compile-overhead] 0.4135ms 0.3631ms 2.7542 KOps/s 2.7381 KOps/s $\color{#35bf28}+0.59\%$
test_func_call_runtime[True-eager] 0.9696ms 0.9002ms 1.1108 KOps/s 1.0900 KOps/s $\color{#35bf28}+1.91\%$
test_func_call_runtime[True-compile] 0.9795ms 0.8276ms 1.2084 KOps/s 1.2010 KOps/s $\color{#35bf28}+0.61\%$
test_func_call_runtime[True-compile-overhead] 0.4666ms 0.3898ms 2.5652 KOps/s 2.5599 KOps/s $\color{#35bf28}+0.21\%$
test_func_call_cm_runtime[False-eager] 0.8060ms 0.7387ms 1.3537 KOps/s 1.2782 KOps/s $\textbf{\color{#35bf28}+5.91\%}$
test_func_call_cm_runtime[False-compile] 0.8739ms 0.8080ms 1.2376 KOps/s 1.2219 KOps/s $\color{#35bf28}+1.28\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4340ms 0.3672ms 2.7233 KOps/s 2.7100 KOps/s $\color{#35bf28}+0.49\%$
test_func_call_cm_runtime[True-eager] 1.1007ms 1.0136ms 986.6243 Ops/s 962.9523 Ops/s $\color{#35bf28}+2.46\%$
test_func_call_cm_runtime[True-compile] 0.9371ms 0.8537ms 1.1714 KOps/s 1.1540 KOps/s $\color{#35bf28}+1.51\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4582ms 0.4126ms 2.4237 KOps/s 2.4345 KOps/s $\color{#d91a1a}-0.44\%$
test_vmap_func_call_cm_runtime[eager] 2.5967ms 2.1029ms 475.5427 Ops/s 469.2110 Ops/s $\color{#35bf28}+1.35\%$
test_vmap_func_call_cm_runtime[compile] 0.9479ms 0.8712ms 1.1479 KOps/s 1.1380 KOps/s $\color{#35bf28}+0.87\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4961ms 0.4179ms 2.3927 KOps/s 2.4112 KOps/s $\color{#d91a1a}-0.77\%$
test_distributed 1.5720ms 0.2084ms 4.7982 KOps/s 8.5036 KOps/s $\textbf{\color{#d91a1a}-43.57\%}$
test_tdmodule 91.1250μs 14.1186μs 70.8285 KOps/s 68.5142 KOps/s $\color{#35bf28}+3.38\%$
test_tdmodule_dispatch 64.4030μs 27.8267μs 35.9367 KOps/s 34.8737 KOps/s $\color{#35bf28}+3.05\%$
test_tdseq 44.0120μs 15.5394μs 64.3527 KOps/s 63.6507 KOps/s $\color{#35bf28}+1.10\%$
test_tdseq_dispatch 66.3430μs 30.9521μs 32.3080 KOps/s 31.8359 KOps/s $\color{#35bf28}+1.48\%$
test_instantiation_functorch 2.1426ms 1.8855ms 530.3741 Ops/s 534.2831 Ops/s $\color{#d91a1a}-0.73\%$
test_exec_functorch 0.2761ms 0.2044ms 4.8913 KOps/s 4.7771 KOps/s $\color{#35bf28}+2.39\%$
test_exec_functional_call 0.3046ms 0.2175ms 4.5967 KOps/s 4.7451 KOps/s $\color{#d91a1a}-3.13\%$
test_exec_td_decorator 0.4606ms 0.2716ms 3.6817 KOps/s 3.7940 KOps/s $\color{#d91a1a}-2.96\%$
test_vmap_mlp_speed_decorator[True-True] 0.8382ms 0.6882ms 1.4531 KOps/s 1.4509 KOps/s $\color{#35bf28}+0.15\%$
test_vmap_mlp_speed_decorator[True-False] 0.8091ms 0.6874ms 1.4547 KOps/s 1.4511 KOps/s $\color{#35bf28}+0.25\%$
test_vmap_mlp_speed_decorator[False-True] 0.7105ms 0.6092ms 1.6415 KOps/s 1.6505 KOps/s $\color{#d91a1a}-0.55\%$
test_vmap_mlp_speed_decorator[False-False] 0.7558ms 0.6065ms 1.6488 KOps/s 1.6430 KOps/s $\color{#35bf28}+0.35\%$
test_vmap_transformer_speed_decorator[True-True] 19.9049ms 19.8310ms 50.4262 Ops/s 50.8257 Ops/s $\color{#d91a1a}-0.79\%$
test_vmap_transformer_speed_decorator[True-False] 20.6690ms 19.9404ms 50.1494 Ops/s 50.7088 Ops/s $\color{#d91a1a}-1.10\%$
test_vmap_transformer_speed_decorator[False-True] 19.7263ms 19.6531ms 50.8825 Ops/s 51.3268 Ops/s $\color{#d91a1a}-0.87\%$
test_vmap_transformer_speed_decorator[False-False] 20.8987ms 19.7163ms 50.7195 Ops/s 50.8898 Ops/s $\color{#d91a1a}-0.33\%$
test_to_module_speed[True] 1.4546ms 1.0054ms 994.6369 Ops/s 1.0024 KOps/s $\color{#d91a1a}-0.78\%$
test_to_module_speed[False] 1.3963ms 0.9808ms 1.0196 KOps/s 1.0227 KOps/s $\color{#d91a1a}-0.30\%$
test_tc_init 63.0230μs 36.2301μs 27.6013 KOps/s 29.6403 KOps/s $\textbf{\color{#d91a1a}-6.88\%}$
test_tc_init_nested 0.1110ms 74.1168μs 13.4922 KOps/s 14.5041 KOps/s $\textbf{\color{#d91a1a}-6.98\%}$
test_tc_first_layer_tensor 5.0646μs 0.6706μs 1.4912 MOps/s 1.4663 MOps/s $\color{#35bf28}+1.70\%$
test_tc_first_layer_nontensor 24.2510μs 2.2268μs 449.0683 KOps/s 444.4720 KOps/s $\color{#35bf28}+1.03\%$
test_tc_second_layer_tensor 8.8905μs 1.3978μs 715.4336 KOps/s 721.0973 KOps/s $\color{#d91a1a}-0.79\%$
test_tc_second_layer_nontensor 30.1020μs 2.9822μs 335.3285 KOps/s 340.0061 KOps/s $\color{#d91a1a}-1.38\%$
test_unbind 0.1938s 9.6125ms 104.0312 Ops/s 91.6711 Ops/s $\textbf{\color{#35bf28}+13.48\%}$
test_full_like 0.6585ms 0.5741ms 1.7418 KOps/s 1.7455 KOps/s $\color{#d91a1a}-0.21\%$
test_zeros_like 0.2708ms 0.1980ms 5.0516 KOps/s 5.0478 KOps/s $\color{#35bf28}+0.08\%$
test_ones_like 0.2348ms 0.1978ms 5.0548 KOps/s 5.0528 KOps/s $\color{#35bf28}+0.04\%$
test_clone 0.4549ms 0.4148ms 2.4110 KOps/s 2.4099 KOps/s $\color{#35bf28}+0.04\%$
test_squeeze 42.5730μs 10.6847μs 93.5917 KOps/s 101.5575 KOps/s $\textbf{\color{#d91a1a}-7.84\%}$
test_unsqueeze 0.2179ms 74.6374μs 13.3981 KOps/s 13.1197 KOps/s $\color{#35bf28}+2.12\%$
test_split 0.4050ms 0.1626ms 6.1490 KOps/s 6.1805 KOps/s $\color{#d91a1a}-0.51\%$
test_permute 0.2377ms 0.1919ms 5.2121 KOps/s 5.5167 KOps/s $\textbf{\color{#d91a1a}-5.52\%}$
test_stack 1.2532ms 0.8511ms 1.1749 KOps/s 1.1672 KOps/s $\color{#35bf28}+0.66\%$
test_cat 1.2638ms 1.2312ms 812.1862 Ops/s 812.0202 Ops/s $\color{#35bf28}+0.02\%$

@vmoens vmoens merged commit 8c65dcb into main Oct 21, 2024
52 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. versioning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants