Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Oct 29, 2024

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Oct 29, 2024
ghstack-source-id: 8ff9fb4
Pull Request resolved: #1064
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 29, 2024
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}15$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 52.7700μs 21.3498μs 46.8389 KOps/s 45.5113 KOps/s $\color{#35bf28}+2.92\%$
test_plain_set_stack_nested 60.6140μs 21.4370μs 46.6484 KOps/s 45.2533 KOps/s $\color{#35bf28}+3.08\%$
test_plain_set_nested_inplace 90.9430μs 23.4068μs 42.7225 KOps/s 41.3580 KOps/s $\color{#35bf28}+3.30\%$
test_plain_set_stack_nested_inplace 61.6760μs 23.2854μs 42.9454 KOps/s 41.2815 KOps/s $\color{#35bf28}+4.03\%$
test_items 47.7970μs 4.1714μs 239.7261 KOps/s 239.1481 KOps/s $\color{#35bf28}+0.24\%$
test_items_nested 0.5076ms 0.3373ms 2.9652 KOps/s 2.9092 KOps/s $\color{#35bf28}+1.92\%$
test_items_nested_locked 0.5369ms 0.3377ms 2.9613 KOps/s 2.8985 KOps/s $\color{#35bf28}+2.17\%$
test_items_nested_leaf 0.1289ms 72.8083μs 13.7347 KOps/s 14.1118 KOps/s $\color{#d91a1a}-2.67\%$
test_items_stack_nested 0.4945ms 0.3415ms 2.9286 KOps/s 2.9475 KOps/s $\color{#d91a1a}-0.64\%$
test_items_stack_nested_leaf 0.1401ms 74.2277μs 13.4721 KOps/s 13.4869 KOps/s $\color{#d91a1a}-0.11\%$
test_items_stack_nested_locked 0.4859ms 0.3400ms 2.9412 KOps/s 2.9376 KOps/s $\color{#35bf28}+0.12\%$
test_keys 29.0740μs 3.6197μs 276.2641 KOps/s 287.9807 KOps/s $\color{#d91a1a}-4.07\%$
test_keys_nested 0.2461ms 0.1343ms 7.4485 KOps/s 7.3411 KOps/s $\color{#35bf28}+1.46\%$
test_keys_nested_locked 1.8930ms 0.1404ms 7.1210 KOps/s 7.1592 KOps/s $\color{#d91a1a}-0.53\%$
test_keys_nested_leaf 0.2328ms 0.1167ms 8.5696 KOps/s 8.5605 KOps/s $\color{#35bf28}+0.11\%$
test_keys_stack_nested 0.2306ms 0.1356ms 7.3772 KOps/s 7.0212 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_keys_stack_nested_leaf 0.2045ms 0.1165ms 8.5823 KOps/s 8.3750 KOps/s $\color{#35bf28}+2.47\%$
test_keys_stack_nested_locked 0.2705ms 0.1410ms 7.0927 KOps/s 7.0677 KOps/s $\color{#35bf28}+0.35\%$
test_values 6.2656μs 1.0388μs 962.6516 KOps/s 957.3998 KOps/s $\color{#35bf28}+0.55\%$
test_values_nested 0.1044ms 54.6809μs 18.2879 KOps/s 17.9475 KOps/s $\color{#35bf28}+1.90\%$
test_values_nested_locked 0.1296ms 55.2987μs 18.0836 KOps/s 17.7705 KOps/s $\color{#35bf28}+1.76\%$
test_values_nested_leaf 0.1137ms 59.0823μs 16.9255 KOps/s 16.4537 KOps/s $\color{#35bf28}+2.87\%$
test_values_stack_nested 0.1123ms 55.3114μs 18.0795 KOps/s 17.6943 KOps/s $\color{#35bf28}+2.18\%$
test_values_stack_nested_leaf 0.1109ms 60.6269μs 16.4943 KOps/s 16.3202 KOps/s $\color{#35bf28}+1.07\%$
test_values_stack_nested_locked 0.1185ms 55.7894μs 17.9246 KOps/s 17.6548 KOps/s $\color{#35bf28}+1.53\%$
test_membership 38.7830μs 0.8800μs 1.1364 MOps/s 1.1259 MOps/s $\color{#35bf28}+0.93\%$
test_membership_nested 30.8470μs 2.7469μs 364.0520 KOps/s 370.7699 KOps/s $\color{#d91a1a}-1.81\%$
test_membership_nested_leaf 42.1960μs 2.6973μs 370.7358 KOps/s 368.8955 KOps/s $\color{#35bf28}+0.50\%$
test_membership_stacked_nested 23.2630μs 2.7071μs 369.3955 KOps/s 368.4133 KOps/s $\color{#35bf28}+0.27\%$
test_membership_stacked_nested_leaf 29.8060μs 2.7513μs 363.4640 KOps/s 365.2951 KOps/s $\color{#d91a1a}-0.50\%$
test_membership_nested_last 26.6700μs 4.0347μs 247.8497 KOps/s 245.8219 KOps/s $\color{#35bf28}+0.82\%$
test_membership_nested_leaf_last 40.5570μs 4.0348μs 247.8453 KOps/s 246.4758 KOps/s $\color{#35bf28}+0.56\%$
test_membership_stacked_nested_last 30.2570μs 3.9473μs 253.3406 KOps/s 244.3442 KOps/s $\color{#35bf28}+3.68\%$
test_membership_stacked_nested_leaf_last 36.3480μs 4.0429μs 247.3446 KOps/s 246.7964 KOps/s $\color{#35bf28}+0.22\%$
test_nested_getleaf 37.6200μs 10.4903μs 95.3259 KOps/s 94.8186 KOps/s $\color{#35bf28}+0.54\%$
test_nested_get 40.8470μs 10.1102μs 98.9104 KOps/s 99.2612 KOps/s $\color{#d91a1a}-0.35\%$
test_stacked_getleaf 37.4500μs 10.4647μs 95.5592 KOps/s 94.0848 KOps/s $\color{#35bf28}+1.57\%$
test_stacked_get 30.5070μs 9.8839μs 101.1746 KOps/s 99.7601 KOps/s $\color{#35bf28}+1.42\%$
test_nested_getitemleaf 35.7070μs 10.9360μs 91.4411 KOps/s 90.7887 KOps/s $\color{#35bf28}+0.72\%$
test_nested_getitem 37.4700μs 10.2185μs 97.8617 KOps/s 97.3178 KOps/s $\color{#35bf28}+0.56\%$
test_stacked_getitemleaf 54.7220μs 10.9220μs 91.5586 KOps/s 92.1509 KOps/s $\color{#d91a1a}-0.64\%$
test_stacked_getitem 41.3580μs 10.1436μs 98.5846 KOps/s 97.1540 KOps/s $\color{#35bf28}+1.47\%$
test_lock_nested 2.5647ms 0.4979ms 2.0083 KOps/s 1.9956 KOps/s $\color{#35bf28}+0.63\%$
test_lock_stack_nested 0.6986ms 0.4622ms 2.1636 KOps/s 2.1488 KOps/s $\color{#35bf28}+0.69\%$
test_unlock_nested 1.0329ms 0.4078ms 2.4522 KOps/s 2.3986 KOps/s $\color{#35bf28}+2.23\%$
test_unlock_stack_nested 0.8078ms 0.3816ms 2.6207 KOps/s 2.6046 KOps/s $\color{#35bf28}+0.62\%$
test_flatten_speed 0.1691ms 91.7035μs 10.9047 KOps/s 11.0087 KOps/s $\color{#d91a1a}-0.94\%$
test_unflatten_speed 0.8520ms 0.4826ms 2.0721 KOps/s 2.1161 KOps/s $\color{#d91a1a}-2.08\%$
test_common_ops 4.1372ms 1.1507ms 869.0160 Ops/s 859.5028 Ops/s $\color{#35bf28}+1.11\%$
test_creation 62.9480μs 2.0626μs 484.8191 KOps/s 482.6117 KOps/s $\color{#35bf28}+0.46\%$
test_creation_empty 57.8790μs 17.5146μs 57.0953 KOps/s 53.6013 KOps/s $\textbf{\color{#35bf28}+6.52\%}$
test_creation_nested_1 77.7260μs 21.0648μs 47.4727 KOps/s 45.3447 KOps/s $\color{#35bf28}+4.69\%$
test_creation_nested_2 91.6420μs 25.3182μs 39.4974 KOps/s 37.8743 KOps/s $\color{#35bf28}+4.29\%$
test_clone 0.1067ms 17.6867μs 56.5397 KOps/s 57.3665 KOps/s $\color{#d91a1a}-1.44\%$
test_getitem[int] 0.9569ms 17.2118μs 58.0998 KOps/s 59.3272 KOps/s $\color{#d91a1a}-2.07\%$
test_getitem[slice_int] 0.1366ms 32.6996μs 30.5814 KOps/s 32.2690 KOps/s $\textbf{\color{#d91a1a}-5.23\%}$
test_getitem[range] 0.1924ms 58.0626μs 17.2228 KOps/s 17.1845 KOps/s $\color{#35bf28}+0.22\%$
test_getitem[tuple] 0.1562ms 26.1283μs 38.2727 KOps/s 39.5407 KOps/s $\color{#d91a1a}-3.21\%$
test_getitem[list] 0.1836ms 53.9982μs 18.5191 KOps/s 18.8065 KOps/s $\color{#d91a1a}-1.53\%$
test_setitem_dim[int] 70.7230μs 36.1728μs 27.6451 KOps/s 30.2487 KOps/s $\textbf{\color{#d91a1a}-8.61\%}$
test_setitem_dim[slice_int] 0.1464ms 66.9378μs 14.9392 KOps/s 16.1273 KOps/s $\textbf{\color{#d91a1a}-7.37\%}$
test_setitem_dim[range] 0.1397ms 85.3207μs 11.7205 KOps/s 11.4994 KOps/s $\color{#35bf28}+1.92\%$
test_setitem_dim[tuple] 0.1102ms 53.8334μs 18.5758 KOps/s 20.3316 KOps/s $\textbf{\color{#d91a1a}-8.64\%}$
test_setitem 0.1012ms 32.2468μs 31.0108 KOps/s 33.1900 KOps/s $\textbf{\color{#d91a1a}-6.57\%}$
test_set 0.2612ms 31.4433μs 31.8033 KOps/s 33.2910 KOps/s $\color{#d91a1a}-4.47\%$
test_set_shared 1.0993ms 0.2171ms 4.6069 KOps/s 4.5598 KOps/s $\color{#35bf28}+1.03\%$
test_update 0.1440ms 38.4606μs 26.0006 KOps/s 26.0936 KOps/s $\color{#d91a1a}-0.36\%$
test_update_nested 0.2163ms 49.8467μs 20.0615 KOps/s 20.7593 KOps/s $\color{#d91a1a}-3.36\%$
test_update__nested 0.7100ms 43.8440μs 22.8082 KOps/s 24.2517 KOps/s $\textbf{\color{#d91a1a}-5.95\%}$
test_set_nested 0.2035ms 34.2588μs 29.1896 KOps/s 30.1645 KOps/s $\color{#d91a1a}-3.23\%$
test_set_nested_new 0.2742ms 39.2791μs 25.4589 KOps/s 26.0471 KOps/s $\color{#d91a1a}-2.26\%$
test_select 0.2571ms 56.5874μs 17.6718 KOps/s 18.0365 KOps/s $\color{#d91a1a}-2.02\%$
test_select_nested 0.1403ms 59.4687μs 16.8156 KOps/s 16.8673 KOps/s $\color{#d91a1a}-0.31\%$
test_exclude_nested 0.1620ms 75.3009μs 13.2800 KOps/s 13.3609 KOps/s $\color{#d91a1a}-0.61\%$
test_empty[True] 0.4506ms 0.3517ms 2.8431 KOps/s 2.8970 KOps/s $\color{#d91a1a}-1.86\%$
test_empty[False] 11.4030μs 1.2052μs 829.7348 KOps/s 805.1564 KOps/s $\color{#35bf28}+3.05\%$
test_unbind_speed 0.4102ms 0.3038ms 3.2921 KOps/s 3.2691 KOps/s $\color{#35bf28}+0.70\%$
test_unbind_speed_stack0 0.3842ms 0.3030ms 3.3006 KOps/s 3.3373 KOps/s $\color{#d91a1a}-1.10\%$
test_unbind_speed_stack1 0.1026s 0.7708ms 1.2974 KOps/s 1.2920 KOps/s $\color{#35bf28}+0.41\%$
test_split 0.1068s 2.4504ms 408.0896 Ops/s 454.5300 Ops/s $\textbf{\color{#d91a1a}-10.22\%}$
test_chunk 3.2080ms 2.0294ms 492.7498 Ops/s 498.9028 Ops/s $\color{#d91a1a}-1.23\%$
test_creation[device0] 0.2293ms 0.1175ms 8.5100 KOps/s 8.1301 KOps/s $\color{#35bf28}+4.67\%$
test_creation_from_tensor 3.9928ms 0.1210ms 8.2646 KOps/s 8.3394 KOps/s $\color{#d91a1a}-0.90\%$
test_add_one[memmap_tensor0] 0.3872ms 7.6428μs 130.8426 KOps/s 131.4314 KOps/s $\color{#d91a1a}-0.45\%$
test_contiguous[memmap_tensor0] 17.3330μs 1.9065μs 524.5139 KOps/s 511.2395 KOps/s $\color{#35bf28}+2.60\%$
test_stack[memmap_tensor0] 85.3610μs 5.9678μs 167.5651 KOps/s 169.8358 KOps/s $\color{#d91a1a}-1.34\%$
test_memmaptd_index 0.7262ms 0.4225ms 2.3668 KOps/s 2.4027 KOps/s $\color{#d91a1a}-1.49\%$
test_memmaptd_index_astensor 0.8498ms 0.5069ms 1.9728 KOps/s 2.0067 KOps/s $\color{#d91a1a}-1.69\%$
test_memmaptd_index_op 2.0093ms 1.0620ms 941.6448 Ops/s 944.0042 Ops/s $\color{#d91a1a}-0.25\%$
test_serialize_model 0.1248s 0.1199s 8.3382 Ops/s 8.1814 Ops/s $\color{#35bf28}+1.92\%$
test_serialize_model_pickle 0.4311s 0.3880s 2.5773 Ops/s 2.5602 Ops/s $\color{#35bf28}+0.67\%$
test_serialize_weights 0.2279s 0.1305s 7.6612 Ops/s 8.2693 Ops/s $\textbf{\color{#d91a1a}-7.35\%}$
test_serialize_weights_returnearly 0.1915s 0.1626s 6.1505 Ops/s 6.2774 Ops/s $\color{#d91a1a}-2.02\%$
test_serialize_weights_pickle 0.5599s 0.4308s 2.3212 Ops/s 2.4753 Ops/s $\textbf{\color{#d91a1a}-6.23\%}$
test_serialize_weights_filesystem 0.1429s 0.1381s 7.2397 Ops/s 7.0682 Ops/s $\color{#35bf28}+2.43\%$
test_serialize_model_filesystem 0.2506s 0.1616s 6.1880 Ops/s 6.5069 Ops/s $\color{#d91a1a}-4.90\%$
test_reshape_pytree 0.1147ms 39.9096μs 25.0566 KOps/s 24.4813 KOps/s $\color{#35bf28}+2.35\%$
test_reshape_td 0.1131ms 46.7703μs 21.3811 KOps/s 21.2679 KOps/s $\color{#35bf28}+0.53\%$
test_view_pytree 0.1281ms 39.9525μs 25.0297 KOps/s 25.5625 KOps/s $\color{#d91a1a}-2.08\%$
test_view_td 0.1690ms 53.6185μs 18.6503 KOps/s 19.1244 KOps/s $\color{#d91a1a}-2.48\%$
test_unbind_pytree 81.3720μs 37.3211μs 26.7945 KOps/s 26.9805 KOps/s $\color{#d91a1a}-0.69\%$
test_unbind_td 0.3026ms 46.3132μs 21.5921 KOps/s 21.4776 KOps/s $\color{#35bf28}+0.53\%$
test_split_pytree 0.1498ms 39.3675μs 25.4016 KOps/s 25.9525 KOps/s $\color{#d91a1a}-2.12\%$
test_split_td 0.4848ms 58.8896μs 16.9809 KOps/s 17.2569 KOps/s $\color{#d91a1a}-1.60\%$
test_add_pytree 0.1384ms 47.5741μs 21.0199 KOps/s 21.5140 KOps/s $\color{#d91a1a}-2.30\%$
test_add_td 0.1520ms 85.2678μs 11.7277 KOps/s 11.7497 KOps/s $\color{#d91a1a}-0.19\%$
test_compile_add_one_nested[tensordict-compile] 0.1741ms 73.4813μs 13.6089 KOps/s 13.6591 KOps/s $\color{#d91a1a}-0.37\%$
test_compile_add_one_nested[tensordict-eager] 0.4358ms 0.1890ms 5.2905 KOps/s 5.2354 KOps/s $\color{#35bf28}+1.05\%$
test_compile_add_one_nested[pytree-compile] 0.1402ms 56.4895μs 17.7024 KOps/s 17.4598 KOps/s $\color{#35bf28}+1.39\%$
test_compile_add_one_nested[pytree-eager] 0.3515ms 0.1489ms 6.7140 KOps/s 6.7149 KOps/s $\color{#d91a1a}-0.01\%$
test_compile_copy_nested[tensordict-compile] 94.4470μs 26.8642μs 37.2243 KOps/s 37.8087 KOps/s $\color{#d91a1a}-1.55\%$
test_compile_copy_nested[tensordict-eager] 0.1601ms 70.7396μs 14.1364 KOps/s 13.8118 KOps/s $\color{#35bf28}+2.35\%$
test_compile_copy_nested[pytree-compile] 0.1665ms 78.6117μs 12.7208 KOps/s 12.9291 KOps/s $\color{#d91a1a}-1.61\%$
test_compile_copy_nested[pytree-eager] 0.1311ms 66.8699μs 14.9544 KOps/s 15.0411 KOps/s $\color{#d91a1a}-0.58\%$
test_compile_add_one_flat[tensordict-compile] 0.2448ms 0.1167ms 8.5683 KOps/s 8.4650 KOps/s $\color{#35bf28}+1.22\%$
test_compile_add_one_flat[tensordict-eager] 0.3834ms 0.2083ms 4.8004 KOps/s 4.7490 KOps/s $\color{#35bf28}+1.08\%$
test_compile_add_one_flat[tensorclass-compile] 0.1352ms 55.1859μs 18.1206 KOps/s 18.1117 KOps/s $\color{#35bf28}+0.05\%$
test_compile_add_one_flat[tensorclass-eager] 0.4577ms 71.1315μs 14.0585 KOps/s 14.0641 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_add_one_flat[pytree-compile] 0.2297ms 0.1140ms 8.7725 KOps/s 8.6984 KOps/s $\color{#35bf28}+0.85\%$
test_compile_add_one_flat[pytree-eager] 0.5237ms 0.3039ms 3.2904 KOps/s 3.2345 KOps/s $\color{#35bf28}+1.73\%$
test_compile_add_self_flat[tensordict-eager] 0.3207ms 0.2221ms 4.5019 KOps/s 4.4826 KOps/s $\color{#35bf28}+0.43\%$
test_compile_add_self_flat[tensordict-compile] 0.4530ms 0.1227ms 8.1516 KOps/s 8.5635 KOps/s $\color{#d91a1a}-4.81\%$
test_compile_add_self_flat[tensorclass-eager] 0.2636ms 63.7550μs 15.6851 KOps/s 15.6218 KOps/s $\color{#35bf28}+0.40\%$
test_compile_add_self_flat[tensorclass-compile] 0.3012ms 57.9639μs 17.2521 KOps/s 17.9690 KOps/s $\color{#d91a1a}-3.99\%$
test_compile_add_self_flat[pytree-eager] 0.4798ms 0.2450ms 4.0815 KOps/s 4.0004 KOps/s $\color{#35bf28}+2.03\%$
test_compile_add_self_flat[pytree-compile] 0.2082ms 0.1165ms 8.5848 KOps/s 8.6976 KOps/s $\color{#d91a1a}-1.30\%$
test_compile_copy_flat[tensordict-compile] 79.2990μs 20.9754μs 47.6750 KOps/s 45.1720 KOps/s $\textbf{\color{#35bf28}+5.54\%}$
test_compile_copy_flat[tensordict-eager] 0.2714ms 62.1707μs 16.0848 KOps/s 16.4721 KOps/s $\color{#d91a1a}-2.35\%$
test_compile_copy_flat[pytree-compile] 0.2043ms 79.3086μs 12.6090 KOps/s 12.7463 KOps/s $\color{#d91a1a}-1.08\%$
test_compile_copy_flat[pytree-eager] 0.1463ms 67.8056μs 14.7480 KOps/s 14.8415 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_assign_and_add[tensordict-compile] 0.4236ms 0.2223ms 4.4985 KOps/s 4.3617 KOps/s $\color{#35bf28}+3.14\%$
test_compile_assign_and_add[tensordict-eager] 1.9089ms 1.7516ms 570.9108 Ops/s 567.4521 Ops/s $\color{#35bf28}+0.61\%$
test_compile_assign_and_add[pytree-compile] 0.3225ms 0.2130ms 4.6954 KOps/s 4.6883 KOps/s $\color{#35bf28}+0.15\%$
test_compile_assign_and_add[pytree-eager] 2.0302ms 1.1962ms 835.9500 Ops/s 838.9149 Ops/s $\color{#d91a1a}-0.35\%$
test_compile_assign_and_add_stack[compile] 0.5932ms 0.4805ms 2.0812 KOps/s 2.0251 KOps/s $\color{#35bf28}+2.77\%$
test_compile_assign_and_add_stack[eager] 4.4324ms 4.0507ms 246.8730 Ops/s 238.3549 Ops/s $\color{#35bf28}+3.57\%$
test_compile_indexing[tensor-tensordict-compile] 0.2284ms 47.2427μs 21.1673 KOps/s 21.9398 KOps/s $\color{#d91a1a}-3.52\%$
test_compile_indexing[tensor-tensordict-eager] 0.5838ms 50.1144μs 19.9543 KOps/s 19.3062 KOps/s $\color{#35bf28}+3.36\%$
test_compile_indexing[tensor-tensorclass-compile] 97.1720μs 37.7054μs 26.5214 KOps/s 26.1572 KOps/s $\color{#35bf28}+1.39\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1139ms 30.1385μs 33.1802 KOps/s 33.5233 KOps/s $\color{#d91a1a}-1.02\%$
test_compile_indexing[tensor-pytree-compile] 97.8130μs 38.8773μs 25.7220 KOps/s 25.9587 KOps/s $\color{#d91a1a}-0.91\%$
test_compile_indexing[tensor-pytree-eager] 0.1024ms 30.2084μs 33.1034 KOps/s 33.8820 KOps/s $\color{#d91a1a}-2.30\%$
test_compile_indexing[slice-tensordict-compile] 0.1548ms 80.5469μs 12.4151 KOps/s 12.6049 KOps/s $\color{#d91a1a}-1.51\%$
test_compile_indexing[slice-tensordict-eager] 0.5399ms 29.6832μs 33.6891 KOps/s 33.7508 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_indexing[slice-tensorclass-compile] 0.3993ms 72.7039μs 13.7544 KOps/s 13.9528 KOps/s $\color{#d91a1a}-1.42\%$
test_compile_indexing[slice-tensorclass-eager] 78.8180μs 24.3008μs 41.1509 KOps/s 41.9514 KOps/s $\color{#d91a1a}-1.91\%$
test_compile_indexing[slice-pytree-compile] 0.1685ms 72.1942μs 13.8515 KOps/s 13.8266 KOps/s $\color{#35bf28}+0.18\%$
test_compile_indexing[slice-pytree-eager] 76.8540μs 24.4373μs 40.9211 KOps/s 41.7667 KOps/s $\color{#d91a1a}-2.02\%$
test_compile_indexing[int-tensordict-compile] 0.2059ms 81.7619μs 12.2306 KOps/s 12.5123 KOps/s $\color{#d91a1a}-2.25\%$
test_compile_indexing[int-tensordict-eager] 0.8121ms 29.4231μs 33.9869 KOps/s 34.3529 KOps/s $\color{#d91a1a}-1.07\%$
test_compile_indexing[int-tensorclass-compile] 0.1462ms 72.5472μs 13.7841 KOps/s 13.9254 KOps/s $\color{#d91a1a}-1.01\%$
test_compile_indexing[int-tensorclass-eager] 90.3300μs 24.1322μs 41.4384 KOps/s 42.6434 KOps/s $\color{#d91a1a}-2.83\%$
test_compile_indexing[int-pytree-compile] 0.1728ms 73.1764μs 13.6656 KOps/s 14.0023 KOps/s $\color{#d91a1a}-2.40\%$
test_compile_indexing[int-pytree-eager] 86.3910μs 24.2779μs 41.1897 KOps/s 42.2670 KOps/s $\color{#d91a1a}-2.55\%$
test_mod_add[eager] 95.2580μs 28.7402μs 34.7945 KOps/s 36.6642 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_mod_add[compile] 0.1169ms 45.6836μs 21.8897 KOps/s 21.7483 KOps/s $\color{#35bf28}+0.65\%$
test_mod_add[compile-overhead] 0.1153ms 46.3196μs 21.5891 KOps/s 22.9007 KOps/s $\textbf{\color{#d91a1a}-5.73\%}$
test_mod_wrap[eager] 0.3374ms 0.2159ms 4.6322 KOps/s 4.6563 KOps/s $\color{#d91a1a}-0.52\%$
test_mod_wrap[compile] 2.0059ms 0.2128ms 4.6982 KOps/s 4.8667 KOps/s $\color{#d91a1a}-3.46\%$
test_mod_wrap[compile-overhead] 1.7880ms 0.2071ms 4.8277 KOps/s 4.8913 KOps/s $\color{#d91a1a}-1.30\%$
test_mod_wrap_and_backward[eager] 12.6678ms 11.3156ms 88.3734 Ops/s 92.2471 Ops/s $\color{#d91a1a}-4.20\%$
test_mod_wrap_and_backward[compile] 17.3592ms 13.1120ms 76.2657 Ops/s 91.6617 Ops/s $\textbf{\color{#d91a1a}-16.80\%}$
test_mod_wrap_and_backward[compile-overhead] 17.0550ms 13.0727ms 76.4954 Ops/s 89.2558 Ops/s $\textbf{\color{#d91a1a}-14.30\%}$
test_seq_add[eager] 0.2039ms 96.0863μs 10.4073 KOps/s 10.8356 KOps/s $\color{#d91a1a}-3.95\%$
test_seq_add[compile] 0.2613ms 60.8919μs 16.4225 KOps/s 16.0426 KOps/s $\color{#35bf28}+2.37\%$
test_seq_add[compile-overhead] 0.1631ms 58.8380μs 16.9958 KOps/s 16.6527 KOps/s $\color{#35bf28}+2.06\%$
test_seq_wrap[eager] 0.6759ms 0.4011ms 2.4931 KOps/s 2.5206 KOps/s $\color{#d91a1a}-1.09\%$
test_seq_wrap[compile] 0.3534ms 0.2277ms 4.3916 KOps/s 4.3910 KOps/s $\color{#35bf28}+0.01\%$
test_seq_wrap[compile-overhead] 0.4089ms 0.2265ms 4.4144 KOps/s 4.4294 KOps/s $\color{#d91a1a}-0.34\%$
test_func_call_runtime[False-eager] 0.9423ms 0.5611ms 1.7823 KOps/s 1.7782 KOps/s $\color{#35bf28}+0.23\%$
test_func_call_runtime[False-compile] 0.8769ms 0.4346ms 2.3011 KOps/s 2.2903 KOps/s $\color{#35bf28}+0.47\%$
test_func_call_runtime[False-compile-overhead] 0.5773ms 0.4322ms 2.3137 KOps/s 2.2796 KOps/s $\color{#35bf28}+1.50\%$
test_func_call_runtime[True-eager] 0.8842ms 0.7612ms 1.3138 KOps/s 1.3295 KOps/s $\color{#d91a1a}-1.18\%$
test_func_call_runtime[True-compile] 0.9368ms 0.4745ms 2.1076 KOps/s 2.1168 KOps/s $\color{#d91a1a}-0.43\%$
test_func_call_runtime[True-compile-overhead] 0.8330ms 0.4745ms 2.1076 KOps/s 2.0724 KOps/s $\color{#35bf28}+1.70\%$
test_func_call_cm_runtime[False-eager] 0.8226ms 0.5505ms 1.8166 KOps/s 1.8065 KOps/s $\color{#35bf28}+0.56\%$
test_func_call_cm_runtime[False-compile] 0.9282ms 0.4297ms 2.3274 KOps/s 2.2955 KOps/s $\color{#35bf28}+1.39\%$
test_func_call_cm_runtime[False-compile-overhead] 1.0103ms 0.4283ms 2.3348 KOps/s 2.2557 KOps/s $\color{#35bf28}+3.50\%$
test_func_call_cm_runtime[True-eager] 1.4042ms 0.8974ms 1.1143 KOps/s 1.1021 KOps/s $\color{#35bf28}+1.11\%$
test_func_call_cm_runtime[True-compile] 0.6126ms 0.5002ms 1.9993 KOps/s 1.9661 KOps/s $\color{#35bf28}+1.69\%$
test_func_call_cm_runtime[True-compile-overhead] 0.6650ms 0.5017ms 1.9932 KOps/s 1.9697 KOps/s $\color{#35bf28}+1.19\%$
test_vmap_func_call_cm_runtime[eager] 2.9366ms 1.9373ms 516.1750 Ops/s 523.5649 Ops/s $\color{#d91a1a}-1.41\%$
test_vmap_func_call_cm_runtime[compile] 0.8850ms 0.5292ms 1.8897 KOps/s 1.8477 KOps/s $\color{#35bf28}+2.27\%$
test_vmap_func_call_cm_runtime[compile-overhead] 1.0044ms 0.5333ms 1.8750 KOps/s 1.8638 KOps/s $\color{#35bf28}+0.60\%$
test_distributed 0.3148ms 0.1273ms 7.8575 KOps/s 7.7415 KOps/s $\color{#35bf28}+1.50\%$
test_tdmodule 33.3420μs 18.9494μs 52.7722 KOps/s 51.2773 KOps/s $\color{#35bf28}+2.92\%$
test_tdmodule_dispatch 69.7010μs 37.3817μs 26.7511 KOps/s 26.3767 KOps/s $\color{#35bf28}+1.42\%$
test_tdseq 36.5190μs 21.0315μs 47.5477 KOps/s 45.8538 KOps/s $\color{#35bf28}+3.69\%$
test_tdseq_dispatch 80.9320μs 43.5050μs 22.9859 KOps/s 23.2778 KOps/s $\color{#d91a1a}-1.25\%$
test_instantiation_functorch 3.0770ms 1.5668ms 638.2273 Ops/s 640.8898 Ops/s $\color{#d91a1a}-0.42\%$
test_exec_functorch 0.3148ms 0.1790ms 5.5881 KOps/s 5.4091 KOps/s $\color{#35bf28}+3.31\%$
test_exec_functional_call 0.3922ms 0.1745ms 5.7320 KOps/s 5.7956 KOps/s $\color{#d91a1a}-1.10\%$
test_exec_td_decorator 0.5475ms 0.2343ms 4.2686 KOps/s 4.3350 KOps/s $\color{#d91a1a}-1.53\%$
test_vmap_mlp_speed_decorator[True-True] 0.8436ms 0.6525ms 1.5325 KOps/s 1.5291 KOps/s $\color{#35bf28}+0.22\%$
test_vmap_mlp_speed_decorator[True-False] 0.9565ms 0.6506ms 1.5371 KOps/s 1.5115 KOps/s $\color{#35bf28}+1.69\%$
test_vmap_mlp_speed_decorator[False-True] 0.7427ms 0.5393ms 1.8542 KOps/s 1.8692 KOps/s $\color{#d91a1a}-0.81\%$
test_vmap_mlp_speed_decorator[False-False] 0.8272ms 0.5342ms 1.8720 KOps/s 1.8767 KOps/s $\color{#d91a1a}-0.25\%$
test_to_module_speed[True] 2.0831ms 1.3144ms 760.7835 Ops/s 764.6332 Ops/s $\color{#d91a1a}-0.50\%$
test_to_module_speed[False] 1.4266ms 1.2596ms 793.9206 Ops/s 793.4624 Ops/s $\color{#35bf28}+0.06\%$
test_tc_init 0.1084ms 42.8521μs 23.3361 KOps/s 21.7879 KOps/s $\textbf{\color{#35bf28}+7.11\%}$
test_tc_init_nested 0.1561ms 86.6237μs 11.5442 KOps/s 10.9373 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_tc_first_layer_tensor 18.3350μs 1.5116μs 661.5564 KOps/s 668.6878 KOps/s $\color{#d91a1a}-1.07\%$
test_tc_first_layer_nontensor 44.1630μs 4.6568μs 214.7399 KOps/s 216.8558 KOps/s $\color{#d91a1a}-0.98\%$
test_tc_second_layer_tensor 39.5540μs 2.7857μs 358.9722 KOps/s 358.9093 KOps/s $\color{#35bf28}+0.02\%$
test_tc_second_layer_nontensor 51.2670μs 5.9786μs 167.2641 KOps/s 170.0280 KOps/s $\color{#d91a1a}-1.63\%$
test_unbind 0.2323s 13.7159ms 72.9079 Ops/s 82.5548 Ops/s $\textbf{\color{#d91a1a}-11.69\%}$
test_full_like 8.1612ms 7.4123ms 134.9113 Ops/s 86.5184 Ops/s $\textbf{\color{#35bf28}+55.93\%}$
test_zeros_like 3.1404ms 2.7521ms 363.3650 Ops/s 120.4319 Ops/s $\textbf{\color{#35bf28}+201.72\%}$
test_ones_like 3.9523ms 3.3166ms 301.5146 Ops/s 126.9755 Ops/s $\textbf{\color{#35bf28}+137.46\%}$
test_clone 6.0884ms 5.3209ms 187.9378 Ops/s 105.1245 Ops/s $\textbf{\color{#35bf28}+78.78\%}$
test_squeeze 61.6760μs 12.0885μs 82.7229 KOps/s 85.1225 KOps/s $\color{#d91a1a}-2.82\%$
test_unsqueeze 0.2074ms 87.5235μs 11.4255 KOps/s 11.1479 KOps/s $\color{#35bf28}+2.49\%$
test_split 0.5111ms 0.1940ms 5.1540 KOps/s 5.3073 KOps/s $\color{#d91a1a}-2.89\%$
test_permute 0.4765ms 0.2240ms 4.4644 KOps/s 4.4761 KOps/s $\color{#d91a1a}-0.26\%$
test_stack 33.5015ms 25.6728ms 38.9517 Ops/s 39.6663 Ops/s $\color{#d91a1a}-1.80\%$
test_cat 30.0700ms 26.3376ms 37.9686 Ops/s 40.4212 Ops/s $\textbf{\color{#d91a1a}-6.07\%}$

@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}23$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.8810μs 13.8031μs 72.4472 KOps/s 67.3691 KOps/s $\textbf{\color{#35bf28}+7.54\%}$
test_plain_set_stack_nested 44.8510μs 13.9165μs 71.8570 KOps/s 66.4153 KOps/s $\textbf{\color{#35bf28}+8.19\%}$
test_plain_set_nested_inplace 40.5410μs 14.9360μs 66.9525 KOps/s 62.4736 KOps/s $\textbf{\color{#35bf28}+7.17\%}$
test_plain_set_stack_nested_inplace 41.9610μs 14.8778μs 67.2140 KOps/s 63.3097 KOps/s $\textbf{\color{#35bf28}+6.17\%}$
test_items 29.0610μs 2.8819μs 346.9918 KOps/s 341.3010 KOps/s $\color{#35bf28}+1.67\%$
test_items_nested 0.3505ms 0.3206ms 3.1196 KOps/s 3.1104 KOps/s $\color{#35bf28}+0.29\%$
test_items_nested_locked 0.3707ms 0.3238ms 3.0887 KOps/s 3.0932 KOps/s $\color{#d91a1a}-0.14\%$
test_items_nested_leaf 80.0510μs 58.1727μs 17.1902 KOps/s 17.3384 KOps/s $\color{#d91a1a}-0.85\%$
test_items_stack_nested 0.3693ms 0.3249ms 3.0776 KOps/s 3.0860 KOps/s $\color{#d91a1a}-0.27\%$
test_items_stack_nested_leaf 0.1073ms 59.2250μs 16.8848 KOps/s 16.8724 KOps/s $\color{#35bf28}+0.07\%$
test_items_stack_nested_locked 0.3553ms 0.3228ms 3.0975 KOps/s 3.0815 KOps/s $\color{#35bf28}+0.52\%$
test_keys 32.3600μs 3.4414μs 290.5767 KOps/s 289.3403 KOps/s $\color{#35bf28}+0.43\%$
test_keys_nested 97.8720μs 69.5191μs 14.3845 KOps/s 14.2783 KOps/s $\color{#35bf28}+0.74\%$
test_keys_nested_locked 0.7368ms 75.4419μs 13.2552 KOps/s 13.2350 KOps/s $\color{#35bf28}+0.15\%$
test_keys_nested_leaf 92.2010μs 60.9606μs 16.4040 KOps/s 16.2574 KOps/s $\color{#35bf28}+0.90\%$
test_keys_stack_nested 0.1115ms 70.8056μs 14.1232 KOps/s 13.8362 KOps/s $\color{#35bf28}+2.07\%$
test_keys_stack_nested_leaf 0.2211ms 61.8353μs 16.1720 KOps/s 15.8208 KOps/s $\color{#35bf28}+2.22\%$
test_keys_stack_nested_locked 99.8410μs 76.5695μs 13.0600 KOps/s 12.9792 KOps/s $\color{#35bf28}+0.62\%$
test_values 4.6483μs 0.8468μs 1.1810 MOps/s 1.1829 MOps/s $\color{#d91a1a}-0.16\%$
test_values_nested 59.3310μs 30.9751μs 32.2840 KOps/s 32.0357 KOps/s $\color{#35bf28}+0.77\%$
test_values_nested_locked 59.8200μs 32.8862μs 30.4079 KOps/s 30.4505 KOps/s $\color{#d91a1a}-0.14\%$
test_values_nested_leaf 59.0410μs 33.4609μs 29.8856 KOps/s 29.5447 KOps/s $\color{#35bf28}+1.15\%$
test_values_stack_nested 68.0310μs 31.2492μs 32.0009 KOps/s 31.5279 KOps/s $\color{#35bf28}+1.50\%$
test_values_stack_nested_leaf 61.4010μs 33.8402μs 29.5507 KOps/s 29.1162 KOps/s $\color{#35bf28}+1.49\%$
test_values_stack_nested_locked 67.8110μs 33.2606μs 30.0656 KOps/s 29.7756 KOps/s $\color{#35bf28}+0.97\%$
test_membership 1.7375μs 0.5114μs 1.9556 MOps/s 1.9669 MOps/s $\color{#d91a1a}-0.58\%$
test_membership_nested 25.8600μs 1.9857μs 503.6055 KOps/s 519.8519 KOps/s $\color{#d91a1a}-3.13\%$
test_membership_nested_leaf 24.8437μs 1.9208μs 520.6189 KOps/s 525.5414 KOps/s $\color{#d91a1a}-0.94\%$
test_membership_stacked_nested 25.0700μs 1.9625μs 509.5578 KOps/s 506.7359 KOps/s $\color{#35bf28}+0.56\%$
test_membership_stacked_nested_leaf 23.7900μs 1.9622μs 509.6239 KOps/s 507.8819 KOps/s $\color{#35bf28}+0.34\%$
test_membership_nested_last 29.2500μs 2.8677μs 348.7167 KOps/s 356.4481 KOps/s $\color{#d91a1a}-2.17\%$
test_membership_nested_leaf_last 22.5610μs 2.8571μs 350.0068 KOps/s 356.0285 KOps/s $\color{#d91a1a}-1.69\%$
test_membership_stacked_nested_last 39.1710μs 2.8293μs 353.4459 KOps/s 265.0199 KOps/s $\textbf{\color{#35bf28}+33.37\%}$
test_membership_stacked_nested_leaf_last 20.6000μs 2.8324μs 353.0626 KOps/s 267.6375 KOps/s $\textbf{\color{#35bf28}+31.92\%}$
test_nested_getleaf 27.2900μs 6.0703μs 164.7366 KOps/s 165.8364 KOps/s $\color{#d91a1a}-0.66\%$
test_nested_get 31.1210μs 5.7214μs 174.7833 KOps/s 175.4393 KOps/s $\color{#d91a1a}-0.37\%$
test_stacked_getleaf 32.3200μs 6.0476μs 165.3555 KOps/s 165.8071 KOps/s $\color{#d91a1a}-0.27\%$
test_stacked_get 26.1010μs 5.7470μs 174.0033 KOps/s 176.1237 KOps/s $\color{#d91a1a}-1.20\%$
test_nested_getitemleaf 31.8800μs 6.0958μs 164.0466 KOps/s 163.6049 KOps/s $\color{#35bf28}+0.27\%$
test_nested_getitem 25.8510μs 5.7897μs 172.7204 KOps/s 172.8979 KOps/s $\color{#d91a1a}-0.10\%$
test_stacked_getitemleaf 33.3510μs 6.1013μs 163.8993 KOps/s 163.3704 KOps/s $\color{#35bf28}+0.32\%$
test_stacked_getitem 33.8610μs 5.7670μs 173.4006 KOps/s 174.0746 KOps/s $\color{#d91a1a}-0.39\%$
test_lock_nested 1.1741ms 0.4156ms 2.4064 KOps/s 2.4020 KOps/s $\color{#35bf28}+0.18\%$
test_lock_stack_nested 0.6005ms 0.3852ms 2.5958 KOps/s 2.6222 KOps/s $\color{#d91a1a}-1.00\%$
test_unlock_nested 0.8094ms 0.3584ms 2.7903 KOps/s 2.7723 KOps/s $\color{#35bf28}+0.65\%$
test_unlock_stack_nested 0.4369ms 0.3251ms 3.0763 KOps/s 3.1103 KOps/s $\color{#d91a1a}-1.09\%$
test_flatten_speed 0.1168ms 73.4128μs 13.6216 KOps/s 13.7365 KOps/s $\color{#d91a1a}-0.84\%$
test_unflatten_speed 0.3660ms 0.2951ms 3.3890 KOps/s 3.4448 KOps/s $\color{#d91a1a}-1.62\%$
test_common_ops 1.8136ms 1.1977ms 834.9310 Ops/s 780.9114 Ops/s $\textbf{\color{#35bf28}+6.92\%}$
test_creation 26.7710μs 1.5110μs 661.8067 KOps/s 683.9732 KOps/s $\color{#d91a1a}-3.24\%$
test_creation_empty 44.4210μs 14.2008μs 70.4186 KOps/s 61.7882 KOps/s $\textbf{\color{#35bf28}+13.97\%}$
test_creation_nested_1 39.3010μs 15.9671μs 62.6290 KOps/s 55.8166 KOps/s $\textbf{\color{#35bf28}+12.20\%}$
test_creation_nested_2 43.7710μs 18.6110μs 53.7318 KOps/s 49.5097 KOps/s $\textbf{\color{#35bf28}+8.53\%}$
test_clone 70.9510μs 27.7955μs 35.9771 KOps/s 36.6878 KOps/s $\color{#d91a1a}-1.94\%$
test_getitem[int] 1.1866ms 16.2525μs 61.5288 KOps/s 62.4696 KOps/s $\color{#d91a1a}-1.51\%$
test_getitem[slice_int] 0.1242ms 28.4980μs 35.0902 KOps/s 34.1981 KOps/s $\color{#35bf28}+2.61\%$
test_getitem[range] 0.2737ms 0.1136ms 8.8041 KOps/s 9.0414 KOps/s $\color{#d91a1a}-2.62\%$
test_getitem[tuple] 0.1411ms 25.1735μs 39.7243 KOps/s 40.2842 KOps/s $\color{#d91a1a}-1.39\%$
test_getitem[list] 0.3240ms 0.1010ms 9.9000 KOps/s 9.7237 KOps/s $\color{#35bf28}+1.81\%$
test_setitem_dim[int] 83.8120μs 43.2453μs 23.1239 KOps/s 22.2917 KOps/s $\color{#35bf28}+3.73\%$
test_setitem_dim[slice_int] 88.8010μs 65.5662μs 15.2518 KOps/s 15.3336 KOps/s $\color{#d91a1a}-0.53\%$
test_setitem_dim[range] 0.2722ms 0.1269ms 7.8796 KOps/s 7.7741 KOps/s $\color{#35bf28}+1.36\%$
test_setitem_dim[tuple] 82.6310μs 58.8262μs 16.9992 KOps/s 16.2510 KOps/s $\color{#35bf28}+4.60\%$
test_setitem 0.1886ms 39.8326μs 25.1051 KOps/s 24.2090 KOps/s $\color{#35bf28}+3.70\%$
test_set 0.1857ms 38.5771μs 25.9221 KOps/s 24.3979 KOps/s $\textbf{\color{#35bf28}+6.25\%}$
test_set_shared 93.0966ms 57.8324μs 17.2913 KOps/s 20.3456 KOps/s $\textbf{\color{#d91a1a}-15.01\%}$
test_update 0.1887ms 47.0453μs 21.2561 KOps/s 20.1849 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_update_nested 0.1856ms 56.1757μs 17.8013 KOps/s 17.3891 KOps/s $\color{#35bf28}+2.37\%$
test_update__nested 0.1778ms 58.7688μs 17.0158 KOps/s 15.8031 KOps/s $\textbf{\color{#35bf28}+7.67\%}$
test_set_nested 0.1901ms 41.3617μs 24.1770 KOps/s 23.5047 KOps/s $\color{#35bf28}+2.86\%$
test_set_nested_new 0.1779ms 44.9707μs 22.2367 KOps/s 20.9819 KOps/s $\textbf{\color{#35bf28}+5.98\%}$
test_select 0.2125ms 58.8996μs 16.9780 KOps/s 16.4938 KOps/s $\color{#35bf28}+2.94\%$
test_select_nested 71.9510μs 41.8751μs 23.8805 KOps/s 24.1223 KOps/s $\color{#d91a1a}-1.00\%$
test_exclude_nested 91.2710μs 60.1139μs 16.6351 KOps/s 16.8619 KOps/s $\color{#d91a1a}-1.35\%$
test_empty[True] 0.3056ms 0.2559ms 3.9084 KOps/s 3.9145 KOps/s $\color{#d91a1a}-0.16\%$
test_empty[False] 3.1240μs 0.7576μs 1.3199 MOps/s 1.3424 MOps/s $\color{#d91a1a}-1.67\%$
test_to 45.8900μs 25.8721μs 38.6517 KOps/s 38.9296 KOps/s $\color{#d91a1a}-0.71\%$
test_to_nonblocking 0.1666ms 24.2603μs 41.2196 KOps/s 40.5191 KOps/s $\color{#35bf28}+1.73\%$
test_unbind_speed 0.3067ms 0.2733ms 3.6584 KOps/s 3.6232 KOps/s $\color{#35bf28}+0.97\%$
test_unbind_speed_stack0 0.3161ms 0.2701ms 3.7018 KOps/s 3.6993 KOps/s $\color{#35bf28}+0.07\%$
test_unbind_speed_stack1 91.5339ms 0.7051ms 1.4182 KOps/s 1.4449 KOps/s $\color{#d91a1a}-1.85\%$
test_split 93.9770ms 2.1886ms 456.9124 Ops/s 446.5850 Ops/s $\color{#35bf28}+2.31\%$
test_chunk 95.3908ms 2.1851ms 457.6454 Ops/s 446.5011 Ops/s $\color{#35bf28}+2.50\%$
test_to[False] 3.3475ms 3.1713ms 315.3242 Ops/s 313.9340 Ops/s $\color{#35bf28}+0.44\%$
test_to[True] 4.7442ms 4.3988ms 227.3346 Ops/s 228.2632 Ops/s $\color{#d91a1a}-0.41\%$
test_to_njt[False] 0.3324s 0.2526s 3.9595 Ops/s 3.9930 Ops/s $\color{#d91a1a}-0.84\%$
test_to_njt[True] 0.2629s 0.2626s 3.8075 Ops/s 3.8075 Ops/s $+0.00\%$
test_creation[device0] 0.2602ms 0.1278ms 7.8218 KOps/s 7.8015 KOps/s $\color{#35bf28}+0.26\%$
test_creation_from_tensor 0.5533ms 0.1289ms 7.7551 KOps/s 7.7517 KOps/s $\color{#35bf28}+0.04\%$
test_add_one[memmap_tensor0] 0.2593ms 8.4682μs 118.0891 KOps/s 118.1509 KOps/s $\color{#d91a1a}-0.05\%$
test_contiguous[memmap_tensor0] 29.6600μs 2.1680μs 461.2565 KOps/s 453.6723 KOps/s $\color{#35bf28}+1.67\%$
test_stack[memmap_tensor0] 36.6900μs 6.7629μs 147.8650 KOps/s 147.3662 KOps/s $\color{#35bf28}+0.34\%$
test_memmaptd_index 1.4590ms 0.4174ms 2.3959 KOps/s 2.3913 KOps/s $\color{#35bf28}+0.19\%$
test_memmaptd_index_astensor 1.0334ms 0.4773ms 2.0952 KOps/s 2.0855 KOps/s $\color{#35bf28}+0.46\%$
test_memmaptd_index_op 1.3946ms 0.9724ms 1.0284 KOps/s 1.0016 KOps/s $\color{#35bf28}+2.68\%$
test_serialize_model 0.1310s 0.1303s 7.6768 Ops/s 7.6507 Ops/s $\color{#35bf28}+0.34\%$
test_serialize_model_pickle 1.4412s 1.2352s 0.8096 Ops/s 0.8214 Ops/s $\color{#d91a1a}-1.43\%$
test_serialize_weights 0.1319s 0.1304s 7.6695 Ops/s 7.7411 Ops/s $\color{#d91a1a}-0.93\%$
test_serialize_weights_returnearly 0.2158s 57.0873ms 17.5170 Ops/s 17.8084 Ops/s $\color{#d91a1a}-1.64\%$
test_serialize_weights_pickle 1.3728s 1.2165s 0.8221 Ops/s 0.8407 Ops/s $\color{#d91a1a}-2.22\%$
test_reshape_pytree 77.8710μs 35.1980μs 28.4107 KOps/s 27.5472 KOps/s $\color{#35bf28}+3.13\%$
test_reshape_td 0.1126ms 40.9470μs 24.4218 KOps/s 24.3513 KOps/s $\color{#35bf28}+0.29\%$
test_view_pytree 0.1443ms 35.2350μs 28.3809 KOps/s 28.6398 KOps/s $\color{#d91a1a}-0.90\%$
test_view_td 0.1523ms 44.9533μs 22.2453 KOps/s 22.1078 KOps/s $\color{#35bf28}+0.62\%$
test_unbind_pytree 0.1493ms 35.1059μs 28.4853 KOps/s 29.2245 KOps/s $\color{#d91a1a}-2.53\%$
test_unbind_td 0.5221ms 41.7838μs 23.9327 KOps/s 24.2825 KOps/s $\color{#d91a1a}-1.44\%$
test_split_pytree 0.1244ms 46.9172μs 21.3142 KOps/s 21.4839 KOps/s $\color{#d91a1a}-0.79\%$
test_split_td 93.5180ms 65.4100μs 15.2882 KOps/s 17.4863 KOps/s $\textbf{\color{#d91a1a}-12.57\%}$
test_add_pytree 0.1037ms 55.7630μs 17.9330 KOps/s 17.6071 KOps/s $\color{#35bf28}+1.85\%$
test_add_td 0.1215ms 89.6962μs 11.1487 KOps/s 10.9772 KOps/s $\color{#35bf28}+1.56\%$
test_compile_add_one_nested[tensordict-compile] 0.2641ms 0.1622ms 6.1638 KOps/s 6.1407 KOps/s $\color{#35bf28}+0.38\%$
test_compile_add_one_nested[tensordict-eager] 0.3313ms 0.1530ms 6.5361 KOps/s 6.6013 KOps/s $\color{#d91a1a}-0.99\%$
test_compile_add_one_nested[pytree-compile] 0.5347ms 0.1541ms 6.4881 KOps/s 6.3478 KOps/s $\color{#35bf28}+2.21\%$
test_compile_add_one_nested[pytree-eager] 0.5963ms 0.1809ms 5.5272 KOps/s 5.5292 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_copy_nested[tensordict-compile] 0.4142ms 23.0313μs 43.4191 KOps/s 45.7700 KOps/s $\textbf{\color{#d91a1a}-5.14\%}$
test_compile_copy_nested[tensordict-eager] 0.4194ms 44.7292μs 22.3567 KOps/s 22.5534 KOps/s $\color{#d91a1a}-0.87\%$
test_compile_copy_nested[pytree-compile] 0.4511ms 65.8925μs 15.1762 KOps/s 15.4273 KOps/s $\color{#d91a1a}-1.63\%$
test_compile_copy_nested[pytree-eager] 0.4375ms 50.0542μs 19.9784 KOps/s 20.0487 KOps/s $\color{#d91a1a}-0.35\%$
test_compile_add_one_flat[tensordict-compile] 0.3893ms 0.3142ms 3.1828 KOps/s 3.1647 KOps/s $\color{#35bf28}+0.57\%$
test_compile_add_one_flat[tensordict-eager] 0.3722ms 0.2135ms 4.6846 KOps/s 4.7347 KOps/s $\color{#d91a1a}-1.06\%$
test_compile_add_one_flat[tensorclass-compile] 0.1873ms 0.1296ms 7.7172 KOps/s 7.6663 KOps/s $\color{#35bf28}+0.66\%$
test_compile_add_one_flat[tensorclass-eager] 0.1933ms 59.1336μs 16.9108 KOps/s 16.3756 KOps/s $\color{#35bf28}+3.27\%$
test_compile_add_one_flat[pytree-compile] 0.4755ms 0.3224ms 3.1017 KOps/s 3.0754 KOps/s $\color{#35bf28}+0.85\%$
test_compile_add_one_flat[pytree-eager] 0.9689ms 0.6103ms 1.6386 KOps/s 1.6003 KOps/s $\color{#35bf28}+2.39\%$
test_compile_add_self_flat[tensordict-eager] 0.6342ms 0.2542ms 3.9337 KOps/s 3.8687 KOps/s $\color{#35bf28}+1.68\%$
test_compile_add_self_flat[tensordict-compile] 0.3737ms 0.3154ms 3.1703 KOps/s 3.1435 KOps/s $\color{#35bf28}+0.85\%$
test_compile_add_self_flat[tensorclass-eager] 0.2403ms 70.4747μs 14.1895 KOps/s 14.1590 KOps/s $\color{#35bf28}+0.22\%$
test_compile_add_self_flat[tensorclass-compile] 0.2439ms 0.1322ms 7.5670 KOps/s 7.6291 KOps/s $\color{#d91a1a}-0.81\%$
test_compile_add_self_flat[pytree-eager] 0.6650ms 0.5080ms 1.9685 KOps/s 1.9336 KOps/s $\color{#35bf28}+1.81\%$
test_compile_add_self_flat[pytree-compile] 0.4788ms 0.3224ms 3.1013 KOps/s 3.0778 KOps/s $\color{#35bf28}+0.76\%$
test_compile_copy_flat[tensordict-compile] 0.1588ms 18.6430μs 53.6395 KOps/s 53.8416 KOps/s $\color{#d91a1a}-0.38\%$
test_compile_copy_flat[tensordict-eager] 0.3977ms 29.3462μs 34.0759 KOps/s 34.7632 KOps/s $\color{#d91a1a}-1.98\%$
test_compile_copy_flat[pytree-compile] 0.4359ms 69.9424μs 14.2975 KOps/s 14.3821 KOps/s $\color{#d91a1a}-0.59\%$
test_compile_copy_flat[pytree-eager] 0.4216ms 51.3848μs 19.4610 KOps/s 19.3602 KOps/s $\color{#35bf28}+0.52\%$
test_compile_assign_and_add[tensordict-compile] 2.3571ms 0.8116ms 1.2321 KOps/s 1.1359 KOps/s $\textbf{\color{#35bf28}+8.46\%}$
test_compile_assign_and_add[tensordict-eager] 3.3055ms 3.0966ms 322.9393 Ops/s 325.7362 Ops/s $\color{#d91a1a}-0.86\%$
test_compile_assign_and_add[pytree-compile] 2.3958ms 0.8330ms 1.2005 KOps/s 1.1180 KOps/s $\textbf{\color{#35bf28}+7.38\%}$
test_compile_assign_and_add[pytree-eager] 3.3100ms 3.0965ms 322.9425 Ops/s 319.5043 Ops/s $\color{#35bf28}+1.08\%$
test_compile_indexing[tensor-tensordict-compile] 0.2688ms 0.1214ms 8.2357 KOps/s 8.2803 KOps/s $\color{#d91a1a}-0.54\%$
test_compile_indexing[tensor-tensordict-eager] 0.2123ms 61.4132μs 16.2831 KOps/s 16.1091 KOps/s $\color{#35bf28}+1.08\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2528ms 0.1149ms 8.7066 KOps/s 8.7679 KOps/s $\color{#d91a1a}-0.70\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1928ms 42.1465μs 23.7267 KOps/s 23.3460 KOps/s $\color{#35bf28}+1.63\%$
test_compile_indexing[tensor-pytree-compile] 0.2710ms 0.1152ms 8.6806 KOps/s 8.6385 KOps/s $\color{#35bf28}+0.49\%$
test_compile_indexing[tensor-pytree-eager] 0.1905ms 42.0870μs 23.7603 KOps/s 22.4060 KOps/s $\textbf{\color{#35bf28}+6.04\%}$
test_compile_indexing[slice-tensordict-compile] 0.1998ms 0.1510ms 6.6245 KOps/s 6.6028 KOps/s $\color{#35bf28}+0.33\%$
test_compile_indexing[slice-tensordict-eager] 0.1502ms 25.5602μs 39.1233 KOps/s 37.6836 KOps/s $\color{#35bf28}+3.82\%$
test_compile_indexing[slice-tensorclass-compile] 0.2420ms 0.1397ms 7.1590 KOps/s 7.1326 KOps/s $\color{#35bf28}+0.37\%$
test_compile_indexing[slice-tensorclass-eager] 82.1810μs 21.0027μs 47.6130 KOps/s 47.6012 KOps/s $\color{#35bf28}+0.02\%$
test_compile_indexing[slice-pytree-compile] 0.2941ms 0.1404ms 7.1211 KOps/s 7.0572 KOps/s $\color{#35bf28}+0.91\%$
test_compile_indexing[slice-pytree-eager] 0.1959ms 28.3822μs 35.2334 KOps/s 47.6719 KOps/s $\textbf{\color{#d91a1a}-26.09\%}$
test_compile_indexing[int-tensordict-compile] 0.2956ms 0.1477ms 6.7706 KOps/s 6.7526 KOps/s $\color{#35bf28}+0.27\%$
test_compile_indexing[int-tensordict-eager] 0.4443ms 26.1142μs 38.2933 KOps/s 37.8807 KOps/s $\color{#35bf28}+1.09\%$
test_compile_indexing[int-tensorclass-compile] 0.2982ms 0.1407ms 7.1066 KOps/s 7.0748 KOps/s $\color{#35bf28}+0.45\%$
test_compile_indexing[int-tensorclass-eager] 54.3700μs 20.8465μs 47.9697 KOps/s 47.7361 KOps/s $\color{#35bf28}+0.49\%$
test_compile_indexing[int-pytree-compile] 0.2517ms 0.1406ms 7.1104 KOps/s 7.0754 KOps/s $\color{#35bf28}+0.49\%$
test_compile_indexing[int-pytree-eager] 48.0410μs 21.0122μs 47.5915 KOps/s 47.1081 KOps/s $\color{#35bf28}+1.03\%$
test_mod_add[eager] 0.1799ms 30.9267μs 32.3345 KOps/s 30.7507 KOps/s $\textbf{\color{#35bf28}+5.15\%}$
test_mod_add[compile] 0.2126ms 77.4055μs 12.9190 KOps/s 13.0218 KOps/s $\color{#d91a1a}-0.79\%$
test_mod_add[compile-overhead] 0.3077ms 0.1539ms 6.4987 KOps/s 6.0904 KOps/s $\textbf{\color{#35bf28}+6.70\%}$
test_mod_wrap[eager] 0.3884ms 0.2403ms 4.1618 KOps/s 4.1271 KOps/s $\color{#35bf28}+0.84\%$
test_mod_wrap[compile] 0.6690ms 0.2938ms 3.4040 KOps/s 3.4797 KOps/s $\color{#d91a1a}-2.18\%$
test_mod_wrap[compile-overhead] 7.9293ms 4.2370ms 236.0187 Ops/s 232.6783 Ops/s $\color{#35bf28}+1.44\%$
test_mod_wrap_and_backward[eager] 1.5626ms 1.4021ms 713.1949 Ops/s 710.9329 Ops/s $\color{#35bf28}+0.32\%$
test_mod_wrap_and_backward[compile] 1.6904ms 1.3685ms 730.7492 Ops/s 726.6258 Ops/s $\color{#35bf28}+0.57\%$
test_mod_wrap_and_backward[compile-overhead] 1.5157ms 1.0133ms 986.9157 Ops/s 981.8335 Ops/s $\color{#35bf28}+0.52\%$
test_seq_add[eager] 0.2487ms 97.3771μs 10.2693 KOps/s 10.3156 KOps/s $\color{#d91a1a}-0.45\%$
test_seq_add[compile] 0.2366ms 92.6297μs 10.7957 KOps/s 11.1221 KOps/s $\color{#d91a1a}-2.94\%$
test_seq_add[compile-overhead] 0.2719ms 0.1253ms 7.9826 KOps/s 7.8723 KOps/s $\color{#35bf28}+1.40\%$
test_seq_wrap[eager] 0.4744ms 0.3762ms 2.6580 KOps/s 2.6053 KOps/s $\color{#35bf28}+2.02\%$
test_seq_wrap[compile] 0.4551ms 0.3043ms 3.2864 KOps/s 3.1397 KOps/s $\color{#35bf28}+4.67\%$
test_seq_wrap[compile-overhead] 0.3795ms 0.2225ms 4.4949 KOps/s 4.4458 KOps/s $\color{#35bf28}+1.10\%$
test_func_call_runtime[False-eager] 0.8970ms 0.7285ms 1.3727 KOps/s 1.3748 KOps/s $\color{#d91a1a}-0.15\%$
test_func_call_runtime[False-compile] 0.9480ms 0.7599ms 1.3160 KOps/s 1.3134 KOps/s $\color{#35bf28}+0.20\%$
test_func_call_runtime[False-compile-overhead] 0.4957ms 0.3595ms 2.7813 KOps/s 2.7624 KOps/s $\color{#35bf28}+0.69\%$
test_func_call_runtime[True-eager] 1.0784ms 0.8951ms 1.1172 KOps/s 1.1176 KOps/s $\color{#d91a1a}-0.04\%$
test_func_call_runtime[True-compile] 0.9939ms 0.7769ms 1.2872 KOps/s 1.2848 KOps/s $\color{#35bf28}+0.19\%$
test_func_call_runtime[True-compile-overhead] 0.5349ms 0.3807ms 2.6269 KOps/s 2.6129 KOps/s $\color{#35bf28}+0.54\%$
test_func_call_cm_runtime[False-eager] 0.8685ms 0.7241ms 1.3811 KOps/s 1.3682 KOps/s $\color{#35bf28}+0.94\%$
test_func_call_cm_runtime[False-compile] 0.9104ms 0.7587ms 1.3181 KOps/s 1.3113 KOps/s $\color{#35bf28}+0.52\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4879ms 0.3606ms 2.7728 KOps/s 2.7629 KOps/s $\color{#35bf28}+0.36\%$
test_func_call_cm_runtime[True-eager] 1.1817ms 0.9905ms 1.0096 KOps/s 1.0052 KOps/s $\color{#35bf28}+0.43\%$
test_func_call_cm_runtime[True-compile] 0.9508ms 0.8069ms 1.2393 KOps/s 1.2371 KOps/s $\color{#35bf28}+0.18\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5725ms 0.4066ms 2.4597 KOps/s 2.4418 KOps/s $\color{#35bf28}+0.73\%$
test_vmap_func_call_cm_runtime[eager] 2.4614ms 2.0195ms 495.1620 Ops/s 496.1823 Ops/s $\color{#d91a1a}-0.21\%$
test_vmap_func_call_cm_runtime[compile] 0.9673ms 0.8141ms 1.2283 KOps/s 1.2207 KOps/s $\color{#35bf28}+0.62\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5458ms 0.4100ms 2.4388 KOps/s 2.4445 KOps/s $\color{#d91a1a}-0.23\%$
test_distributed 2.1123ms 0.1748ms 5.7195 KOps/s 8.7997 KOps/s $\textbf{\color{#d91a1a}-35.00\%}$
test_tdmodule 24.8400μs 13.7428μs 72.7655 KOps/s 65.9854 KOps/s $\textbf{\color{#35bf28}+10.28\%}$
test_tdmodule_dispatch 48.6110μs 27.1789μs 36.7932 KOps/s 34.2456 KOps/s $\textbf{\color{#35bf28}+7.44\%}$
test_tdseq 33.7210μs 14.9705μs 66.7978 KOps/s 61.4996 KOps/s $\textbf{\color{#35bf28}+8.62\%}$
test_tdseq_dispatch 51.0910μs 30.0943μs 33.2289 KOps/s 30.5918 KOps/s $\textbf{\color{#35bf28}+8.62\%}$
test_instantiation_functorch 2.0047ms 1.8533ms 539.5888 Ops/s 543.5356 Ops/s $\color{#d91a1a}-0.73\%$
test_exec_functorch 0.3383ms 0.2055ms 4.8664 KOps/s 4.8610 KOps/s $\color{#35bf28}+0.11\%$
test_exec_functional_call 0.3664ms 0.2077ms 4.8140 KOps/s 4.7935 KOps/s $\color{#35bf28}+0.43\%$
test_exec_td_decorator 0.4225ms 0.2548ms 3.9241 KOps/s 3.9210 KOps/s $\color{#35bf28}+0.08\%$
test_vmap_mlp_speed_decorator[True-True] 0.8030ms 0.6551ms 1.5266 KOps/s 1.5082 KOps/s $\color{#35bf28}+1.22\%$
test_vmap_mlp_speed_decorator[True-False] 0.8228ms 0.6567ms 1.5228 KOps/s 1.5079 KOps/s $\color{#35bf28}+0.99\%$
test_vmap_mlp_speed_decorator[False-True] 0.7439ms 0.5736ms 1.7435 KOps/s 1.7381 KOps/s $\color{#35bf28}+0.31\%$
test_vmap_mlp_speed_decorator[False-False] 0.7394ms 0.5754ms 1.7379 KOps/s 1.7351 KOps/s $\color{#35bf28}+0.16\%$
test_vmap_transformer_speed_decorator[True-True] 19.0059ms 18.8501ms 53.0500 Ops/s 53.2244 Ops/s $\color{#d91a1a}-0.33\%$
test_vmap_transformer_speed_decorator[True-False] 19.0681ms 18.8387ms 53.0822 Ops/s 53.1802 Ops/s $\color{#d91a1a}-0.18\%$
test_vmap_transformer_speed_decorator[False-True] 18.9138ms 18.7365ms 53.3716 Ops/s 53.6781 Ops/s $\color{#d91a1a}-0.57\%$
test_vmap_transformer_speed_decorator[False-False] 18.8497ms 18.7093ms 53.4493 Ops/s 53.3153 Ops/s $\color{#35bf28}+0.25\%$
test_to_module_speed[True] 1.4473ms 0.9349ms 1.0696 KOps/s 1.0681 KOps/s $\color{#35bf28}+0.14\%$
test_to_module_speed[False] 1.3268ms 0.9160ms 1.0918 KOps/s 1.0924 KOps/s $\color{#d91a1a}-0.06\%$
test_tc_init 65.4010μs 35.1072μs 28.4842 KOps/s 28.1383 KOps/s $\color{#35bf28}+1.23\%$
test_tc_init_nested 0.1333ms 71.9426μs 13.9000 KOps/s 14.2199 KOps/s $\color{#d91a1a}-2.25\%$
test_tc_first_layer_tensor 28.3676μs 0.7077μs 1.4130 MOps/s 1.4352 MOps/s $\color{#d91a1a}-1.55\%$
test_tc_first_layer_nontensor 27.0500μs 2.3272μs 429.7084 KOps/s 434.4776 KOps/s $\color{#d91a1a}-1.10\%$
test_tc_second_layer_tensor 68.4010μs 1.4447μs 692.2088 KOps/s 707.5779 KOps/s $\color{#d91a1a}-2.17\%$
test_tc_second_layer_nontensor 30.3500μs 3.0231μs 330.7873 KOps/s 331.9858 KOps/s $\color{#d91a1a}-0.36\%$
test_unbind 0.1931s 9.3482ms 106.9729 Ops/s 102.1751 Ops/s $\color{#35bf28}+4.70\%$
test_full_like 0.6607ms 0.5737ms 1.7430 KOps/s 1.7365 KOps/s $\color{#35bf28}+0.38\%$
test_zeros_like 0.3306ms 0.1981ms 5.0476 KOps/s 5.0553 KOps/s $\color{#d91a1a}-0.15\%$
test_ones_like 0.3426ms 0.1980ms 5.0505 KOps/s 5.0571 KOps/s $\color{#d91a1a}-0.13\%$
test_clone 0.6192ms 0.4150ms 2.4095 KOps/s 2.4099 KOps/s $\color{#d91a1a}-0.02\%$
test_squeeze 35.8900μs 9.2159μs 108.5080 KOps/s 103.5079 KOps/s $\color{#35bf28}+4.83\%$
test_unsqueeze 0.2146ms 73.1771μs 13.6655 KOps/s 13.8715 KOps/s $\color{#d91a1a}-1.49\%$
test_split 0.4153ms 0.1661ms 6.0190 KOps/s 5.9910 KOps/s $\color{#35bf28}+0.47\%$
test_permute 0.3629ms 0.1791ms 5.5836 KOps/s 5.4835 KOps/s $\color{#35bf28}+1.83\%$
test_stack 1.3254ms 0.8479ms 1.1794 KOps/s 1.1683 KOps/s $\color{#35bf28}+0.95\%$
test_cat 1.3240ms 1.2315ms 812.0200 Ops/s 812.0769 Ops/s $-0.01\%$

@vmoens vmoens merged commit 1eaac7e into gh/vmoens/32/base Oct 30, 2024
50 of 55 checks passed
@vmoens vmoens deleted the gh/vmoens/32/head branch October 30, 2024 14:37
vmoens pushed a commit that referenced this pull request Nov 4, 2024
ghstack-source-id: 8ff9fb4
Pull Request resolved: #1064

(cherry picked from commit b06de95)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants