Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Apr 1, 2025

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 1, 2025
ghstack-source-id: bf2a72f
Pull Request resolved: #1282
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 1, 2025
@github-actions
Copy link

github-actions bot commented Apr 1, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 233. Improved: $\large\color{#35bf28}19$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 33.7700μs 11.3707μs 87.9454 KOps/s 87.8031 KOps/s $\color{#35bf28}+0.16\%$
test_plain_set_stack_nested 37.5100μs 11.4285μs 87.5004 KOps/s 86.8058 KOps/s $\color{#35bf28}+0.80\%$
test_plain_set_nested_inplace 0.1753ms 12.4976μs 80.0152 KOps/s 79.1738 KOps/s $\color{#35bf28}+1.06\%$
test_plain_set_stack_nested_inplace 0.1036ms 12.4869μs 80.0837 KOps/s 79.6328 KOps/s $\color{#35bf28}+0.57\%$
test_items 33.7310μs 2.9008μs 344.7286 KOps/s 342.1992 KOps/s $\color{#35bf28}+0.74\%$
test_items_nested 0.5310ms 0.3648ms 2.7411 KOps/s 2.7071 KOps/s $\color{#35bf28}+1.26\%$
test_items_nested_locked 0.4106ms 0.3653ms 2.7375 KOps/s 2.7138 KOps/s $\color{#35bf28}+0.88\%$
test_items_nested_leaf 86.5610μs 60.2531μs 16.5967 KOps/s 16.4937 KOps/s $\color{#35bf28}+0.62\%$
test_items_stack_nested 0.4065ms 0.3691ms 2.7095 KOps/s 2.7283 KOps/s $\color{#d91a1a}-0.69\%$
test_items_stack_nested_leaf 0.1633ms 60.6092μs 16.4991 KOps/s 16.4771 KOps/s $\color{#35bf28}+0.13\%$
test_items_stack_nested_locked 0.3996ms 0.3701ms 2.7020 KOps/s 2.7156 KOps/s $\color{#d91a1a}-0.50\%$
test_keys 33.6910μs 3.9660μs 252.1425 KOps/s 289.7896 KOps/s $\textbf{\color{#d91a1a}-12.99\%}$
test_keys_nested 0.1557ms 88.4337μs 11.3079 KOps/s 11.2450 KOps/s $\color{#35bf28}+0.56\%$
test_keys_nested_locked 0.8186ms 94.0168μs 10.6364 KOps/s 10.4105 KOps/s $\color{#35bf28}+2.17\%$
test_keys_nested_leaf 0.1075ms 78.9217μs 12.6708 KOps/s 12.5095 KOps/s $\color{#35bf28}+1.29\%$
test_keys_stack_nested 0.1210ms 87.9699μs 11.3675 KOps/s 11.2102 KOps/s $\color{#35bf28}+1.40\%$
test_keys_stack_nested_leaf 0.1045ms 79.2960μs 12.6110 KOps/s 12.4981 KOps/s $\color{#35bf28}+0.90\%$
test_keys_stack_nested_locked 0.1220ms 94.6589μs 10.5642 KOps/s 10.5333 KOps/s $\color{#35bf28}+0.29\%$
test_values 9.6818μs 0.8580μs 1.1655 MOps/s 1.1740 MOps/s $\color{#d91a1a}-0.72\%$
test_values_nested 63.7710μs 37.3762μs 26.7550 KOps/s 26.6093 KOps/s $\color{#35bf28}+0.55\%$
test_values_nested_locked 99.7310μs 39.4618μs 25.3410 KOps/s 25.3431 KOps/s $-0.01\%$
test_values_nested_leaf 73.0020μs 42.4596μs 23.5518 KOps/s 23.3185 KOps/s $\color{#35bf28}+1.00\%$
test_values_stack_nested 0.1004ms 37.4612μs 26.6943 KOps/s 26.5030 KOps/s $\color{#35bf28}+0.72\%$
test_values_stack_nested_leaf 70.5920μs 42.4872μs 23.5365 KOps/s 23.2168 KOps/s $\color{#35bf28}+1.38\%$
test_values_stack_nested_locked 67.3810μs 39.3894μs 25.3876 KOps/s 25.1596 KOps/s $\color{#35bf28}+0.91\%$
test_membership 2.0011μs 0.5025μs 1.9902 MOps/s 1.9957 MOps/s $\color{#d91a1a}-0.27\%$
test_membership_nested 15.1355μs 2.0264μs 493.4981 KOps/s 478.7859 KOps/s $\color{#35bf28}+3.07\%$
test_membership_nested_leaf 28.4205μs 2.0255μs 493.7022 KOps/s 499.3144 KOps/s $\color{#d91a1a}-1.12\%$
test_membership_stacked_nested 34.1810μs 2.1344μs 468.5098 KOps/s 477.4108 KOps/s $\color{#d91a1a}-1.86\%$
test_membership_stacked_nested_leaf 53.8510μs 2.0730μs 482.3976 KOps/s 479.0913 KOps/s $\color{#35bf28}+0.69\%$
test_membership_nested_last 0.1188ms 3.0252μs 330.5573 KOps/s 328.0909 KOps/s $\color{#35bf28}+0.75\%$
test_membership_nested_leaf_last 29.5710μs 3.0712μs 325.6080 KOps/s 327.9475 KOps/s $\color{#d91a1a}-0.71\%$
test_membership_stacked_nested_last 22.5900μs 3.0610μs 326.6894 KOps/s 326.9208 KOps/s $\color{#d91a1a}-0.07\%$
test_membership_stacked_nested_leaf_last 28.9500μs 3.0354μs 329.4420 KOps/s 327.4547 KOps/s $\color{#35bf28}+0.61\%$
test_nested_getleaf 0.1020ms 13.0983μs 76.3456 KOps/s 75.8218 KOps/s $\color{#35bf28}+0.69\%$
test_nested_get 0.1370ms 12.5052μs 79.9665 KOps/s 80.3425 KOps/s $\color{#d91a1a}-0.47\%$
test_stacked_getleaf 1.7470ms 12.9501μs 77.2193 KOps/s 76.6819 KOps/s $\color{#35bf28}+0.70\%$
test_stacked_get 0.1720ms 12.4389μs 80.3928 KOps/s 80.8270 KOps/s $\color{#d91a1a}-0.54\%$
test_nested_getitemleaf 68.1310μs 13.5282μs 73.9196 KOps/s 73.7640 KOps/s $\color{#35bf28}+0.21\%$
test_nested_getitem 0.1509ms 12.7999μs 78.1255 KOps/s 77.5185 KOps/s $\color{#35bf28}+0.78\%$
test_stacked_getitemleaf 0.1453ms 13.4840μs 74.1620 KOps/s 74.0652 KOps/s $\color{#35bf28}+0.13\%$
test_stacked_getitem 73.9020μs 12.7111μs 78.6717 KOps/s 78.1827 KOps/s $\color{#35bf28}+0.63\%$
test_lock_nested 5.9908ms 0.3548ms 2.8183 KOps/s 2.7869 KOps/s $\color{#35bf28}+1.13\%$
test_lock_stack_nested 0.4038ms 0.3459ms 2.8911 KOps/s 2.8178 KOps/s $\color{#35bf28}+2.60\%$
test_unlock_nested 0.5232ms 0.2958ms 3.3808 KOps/s 3.3185 KOps/s $\color{#35bf28}+1.88\%$
test_unlock_stack_nested 0.4090ms 0.2891ms 3.4596 KOps/s 3.4232 KOps/s $\color{#35bf28}+1.06\%$
test_flatten_speed 0.1064ms 77.5570μs 12.8937 KOps/s 12.9537 KOps/s $\color{#d91a1a}-0.46\%$
test_unflatten_speed 0.5473ms 0.4040ms 2.4755 KOps/s 2.4813 KOps/s $\color{#d91a1a}-0.23\%$
test_common_ops 0.8662ms 0.6360ms 1.5724 KOps/s 1.5505 KOps/s $\color{#35bf28}+1.41\%$
test_creation 78.5610μs 1.7566μs 569.2865 KOps/s 568.0408 KOps/s $\color{#35bf28}+0.22\%$
test_creation_empty 0.6628ms 7.1913μs 139.0561 KOps/s 139.3343 KOps/s $\color{#d91a1a}-0.20\%$
test_creation_nested_1 0.1009ms 10.1443μs 98.5771 KOps/s 97.5579 KOps/s $\color{#35bf28}+1.04\%$
test_creation_nested_2 0.1047ms 12.9863μs 77.0041 KOps/s 76.4103 KOps/s $\color{#35bf28}+0.78\%$
test_clone 0.1652ms 10.6075μs 94.2726 KOps/s 89.7863 KOps/s $\color{#35bf28}+5.00\%$
test_getitem[int] 0.1637ms 10.5201μs 95.0559 KOps/s 89.4178 KOps/s $\textbf{\color{#35bf28}+6.31\%}$
test_getitem[slice_int] 0.1177ms 20.6139μs 48.5110 KOps/s 45.8363 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_getitem[range] 0.1725ms 38.8091μs 25.7672 KOps/s 25.3413 KOps/s $\color{#35bf28}+1.68\%$
test_getitem[tuple] 0.1078ms 17.8776μs 55.9360 KOps/s 52.3467 KOps/s $\textbf{\color{#35bf28}+6.86\%}$
test_getitem[list] 0.2193ms 34.4601μs 29.0190 KOps/s 28.1920 KOps/s $\color{#35bf28}+2.93\%$
test_setitem_dim[int] 42.0410μs 19.4555μs 51.3993 KOps/s 47.5858 KOps/s $\textbf{\color{#35bf28}+8.01\%}$
test_setitem_dim[slice_int] 70.1120μs 40.3380μs 24.7905 KOps/s 24.7519 KOps/s $\color{#35bf28}+0.16\%$
test_setitem_dim[range] 0.1641ms 56.7777μs 17.6126 KOps/s 18.1926 KOps/s $\color{#d91a1a}-3.19\%$
test_setitem_dim[tuple] 56.3710μs 33.0624μs 30.2459 KOps/s 29.1064 KOps/s $\color{#35bf28}+3.91\%$
test_setitem 0.2940ms 16.4157μs 60.9175 KOps/s 61.8683 KOps/s $\color{#d91a1a}-1.54\%$
test_set 0.3219ms 15.0020μs 66.6578 KOps/s 64.3648 KOps/s $\color{#35bf28}+3.56\%$
test_set_shared 0.5146ms 0.1601ms 6.2458 KOps/s 6.2417 KOps/s $\color{#35bf28}+0.07\%$
test_update 0.4191ms 18.4838μs 54.1013 KOps/s 52.6527 KOps/s $\color{#35bf28}+2.75\%$
test_update_nested 0.1228ms 29.2831μs 34.1494 KOps/s 33.5547 KOps/s $\color{#35bf28}+1.77\%$
test_update__nested 0.1553ms 25.0253μs 39.9596 KOps/s 38.0117 KOps/s $\textbf{\color{#35bf28}+5.12\%}$
test_set_nested 0.1225ms 16.5007μs 60.6035 KOps/s 59.5392 KOps/s $\color{#35bf28}+1.79\%$
test_set_nested_new 0.1085ms 19.6950μs 50.7743 KOps/s 51.1747 KOps/s $\color{#d91a1a}-0.78\%$
test_select 0.1721ms 30.8675μs 32.3965 KOps/s 31.9171 KOps/s $\color{#35bf28}+1.50\%$
test_select_nested 0.1161ms 43.7546μs 22.8547 KOps/s 22.9740 KOps/s $\color{#d91a1a}-0.52\%$
test_exclude_nested 0.1021ms 63.2707μs 15.8051 KOps/s 15.7428 KOps/s $\color{#35bf28}+0.40\%$
test_empty[True] 0.3499ms 0.2966ms 3.3721 KOps/s 3.3659 KOps/s $\color{#35bf28}+0.19\%$
test_empty[False] 3.7021μs 0.8207μs 1.2184 MOps/s 1.2190 MOps/s $\color{#d91a1a}-0.05\%$
test_to 89.7610μs 58.8737μs 16.9855 KOps/s 16.8533 KOps/s $\color{#35bf28}+0.78\%$
test_to_nonblocking 0.1911ms 50.2466μs 19.9018 KOps/s 19.8964 KOps/s $\color{#35bf28}+0.03\%$
test_unbind_speed 0.3628ms 0.2398ms 4.1696 KOps/s 4.0272 KOps/s $\color{#35bf28}+3.54\%$
test_unbind_speed_stack0 0.2786ms 0.2399ms 4.1679 KOps/s 4.0245 KOps/s $\color{#35bf28}+3.56\%$
test_unbind_speed_stack1 95.6521ms 0.7423ms 1.3471 KOps/s 1.3302 KOps/s $\color{#35bf28}+1.27\%$
test_split 97.2022ms 1.6029ms 623.8648 Ops/s 608.7569 Ops/s $\color{#35bf28}+2.48\%$
test_chunk 97.3812ms 1.6066ms 622.4233 Ops/s 611.7755 Ops/s $\color{#35bf28}+1.74\%$
test_consolidate[False-None] 98.6091ms 3.1421ms 318.2634 Ops/s 318.0490 Ops/s $\color{#35bf28}+0.07\%$
test_consolidate[default-None] 1.8966ms 1.7225ms 580.5434 Ops/s 568.1189 Ops/s $\color{#35bf28}+2.19\%$
test_consolidate[reduce-overhead-None] 1.9119ms 1.7550ms 569.7917 Ops/s 555.6322 Ops/s $\color{#35bf28}+2.55\%$
test_consolidate_njt[False-None] 7.0015ms 6.6480ms 150.4211 Ops/s 149.7547 Ops/s $\color{#35bf28}+0.44\%$
test_to[False-False-None] 1.9845ms 1.7934ms 557.5876 Ops/s 557.8525 Ops/s $\color{#d91a1a}-0.05\%$
test_to[True-False-None] 1.9696ms 1.4654ms 682.4207 Ops/s 688.9660 Ops/s $\color{#d91a1a}-0.95\%$
test_to[within-False-None] 4.6821ms 4.4534ms 224.5479 Ops/s 227.1842 Ops/s $\color{#d91a1a}-1.16\%$
test_to[True-default-None] 5.6121ms 5.3470ms 187.0221 Ops/s 182.8512 Ops/s $\color{#35bf28}+2.28\%$
test_to_njt[False-False-None] 7.3510ms 7.0611ms 141.6211 Ops/s 142.2486 Ops/s $\color{#d91a1a}-0.44\%$
test_to_njt[True-False-None] 5.7859ms 5.4895ms 182.1662 Ops/s 178.4021 Ops/s $\color{#35bf28}+2.11\%$
test_to_njt[within-False-None] 12.5420ms 12.2637ms 81.5412 Ops/s 80.7581 Ops/s $\color{#35bf28}+0.97\%$
test_creation[device0] 0.3826ms 79.6998μs 12.5471 KOps/s 12.3702 KOps/s $\color{#35bf28}+1.43\%$
test_creation_from_tensor 0.5232ms 83.1340μs 12.0288 KOps/s 11.5234 KOps/s $\color{#35bf28}+4.39\%$
test_add_one[memmap_tensor0] 0.4387ms 6.6245μs 150.9558 KOps/s 136.7026 KOps/s $\textbf{\color{#35bf28}+10.43\%}$
test_contiguous[memmap_tensor0] 1.8960μs 0.4258μs 2.3485 MOps/s 2.3706 MOps/s $\color{#d91a1a}-0.93\%$
test_stack[memmap_tensor0] 39.9410μs 4.2631μs 234.5728 KOps/s 208.6718 KOps/s $\textbf{\color{#35bf28}+12.41\%}$
test_memmaptd_index 1.4579ms 0.2417ms 4.1366 KOps/s 3.8894 KOps/s $\textbf{\color{#35bf28}+6.36\%}$
test_memmaptd_index_astensor 0.4501ms 0.3051ms 3.2781 KOps/s 3.0448 KOps/s $\textbf{\color{#35bf28}+7.66\%}$
test_memmaptd_index_op 1.2632ms 0.5579ms 1.7924 KOps/s 1.6984 KOps/s $\textbf{\color{#35bf28}+5.53\%}$
test_serialize_model 0.1339s 0.1327s 7.5335 Ops/s 7.5310 Ops/s $\color{#35bf28}+0.03\%$
test_serialize_model_pickle 1.3465s 1.2528s 0.7982 Ops/s 0.8226 Ops/s $\color{#d91a1a}-2.97\%$
test_serialize_weights 0.1337s 0.1317s 7.5918 Ops/s 7.6006 Ops/s $\color{#d91a1a}-0.12\%$
test_serialize_weights_returnearly 0.2501s 55.6336ms 17.9748 Ops/s 23.0937 Ops/s $\textbf{\color{#d91a1a}-22.17\%}$
test_serialize_weights_pickle 1.3650s 1.2230s 0.8177 Ops/s 0.8237 Ops/s $\color{#d91a1a}-0.72\%$
test_reshape_pytree 0.1690ms 23.3133μs 42.8940 KOps/s 44.0342 KOps/s $\color{#d91a1a}-2.59\%$
test_reshape_td 0.1359ms 27.1953μs 36.7710 KOps/s 36.8975 KOps/s $\color{#d91a1a}-0.34\%$
test_view_pytree 0.1726ms 23.1717μs 43.1561 KOps/s 44.6499 KOps/s $\color{#d91a1a}-3.35\%$
test_view_td 0.2217ms 33.4816μs 29.8671 KOps/s 30.1903 KOps/s $\color{#d91a1a}-1.07\%$
test_unbind_pytree 0.2308ms 29.4657μs 33.9378 KOps/s 34.4801 KOps/s $\color{#d91a1a}-1.57\%$
test_unbind_td 0.8388ms 39.2986μs 25.4462 KOps/s 25.2769 KOps/s $\color{#35bf28}+0.67\%$
test_split_pytree 0.1697ms 30.7548μs 32.5152 KOps/s 32.5205 KOps/s $\color{#d91a1a}-0.02\%$
test_split_td 1.0043ms 38.2087μs 26.1721 KOps/s 23.7012 KOps/s $\textbf{\color{#35bf28}+10.43\%}$
test_add_pytree 0.1537ms 34.5723μs 28.9249 KOps/s 27.6769 KOps/s $\color{#35bf28}+4.51\%$
test_add_td 0.2838ms 48.6812μs 20.5418 KOps/s 19.3942 KOps/s $\textbf{\color{#35bf28}+5.92\%}$
test_compile_add_one_nested[tensordict-compile] 0.3011ms 0.1282ms 7.8024 KOps/s 7.6314 KOps/s $\color{#35bf28}+2.24\%$
test_compile_add_one_nested[tensordict-eager] 0.2836ms 0.1420ms 7.0429 KOps/s 6.8779 KOps/s $\color{#35bf28}+2.40\%$
test_compile_add_one_nested[pytree-compile] 0.2446ms 97.3783μs 10.2692 KOps/s 10.2732 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_add_one_nested[pytree-eager] 0.3338ms 0.1548ms 6.4594 KOps/s 6.5323 KOps/s $\color{#d91a1a}-1.12\%$
test_compile_copy_nested[tensordict-compile] 0.1712ms 24.1746μs 41.3657 KOps/s 30.8955 KOps/s $\textbf{\color{#35bf28}+33.89\%}$
test_compile_copy_nested[tensordict-eager] 83.5510μs 35.0954μs 28.4938 KOps/s 28.1539 KOps/s $\color{#35bf28}+1.21\%$
test_compile_copy_nested[pytree-compile] 0.1863ms 64.0280μs 15.6182 KOps/s 15.3249 KOps/s $\color{#35bf28}+1.91\%$
test_compile_copy_nested[pytree-eager] 96.8710μs 48.7391μs 20.5174 KOps/s 20.1719 KOps/s $\color{#35bf28}+1.71\%$
test_compile_add_one_flat[tensordict-compile] 0.2931ms 0.1474ms 6.7858 KOps/s 6.8342 KOps/s $\color{#d91a1a}-0.71\%$
test_compile_add_one_flat[tensordict-eager] 0.4006ms 0.2201ms 4.5428 KOps/s 4.4911 KOps/s $\color{#35bf28}+1.15\%$
test_compile_add_one_flat[tensorclass-compile] 0.2872ms 0.1033ms 9.6765 KOps/s 10.0472 KOps/s $\color{#d91a1a}-3.69\%$
test_compile_add_one_flat[tensorclass-eager] 0.2367ms 61.2477μs 16.3271 KOps/s 16.8156 KOps/s $\color{#d91a1a}-2.90\%$
test_compile_add_one_flat[pytree-compile] 0.2591ms 0.1368ms 7.3124 KOps/s 7.1971 KOps/s $\color{#35bf28}+1.60\%$
test_compile_add_one_flat[pytree-eager] 0.6709ms 0.4937ms 2.0257 KOps/s 2.0256 KOps/s $+0.00\%$
test_compile_add_self_flat[tensordict-eager] 0.4441ms 0.2651ms 3.7722 KOps/s 3.7152 KOps/s $\color{#35bf28}+1.53\%$
test_compile_add_self_flat[tensordict-compile] 0.3171ms 0.1514ms 6.6041 KOps/s 6.8503 KOps/s $\color{#d91a1a}-3.59\%$
test_compile_add_self_flat[tensorclass-eager] 0.2463ms 72.5489μs 13.7838 KOps/s 13.9057 KOps/s $\color{#d91a1a}-0.88\%$
test_compile_add_self_flat[tensorclass-compile] 0.2624ms 99.4634μs 10.0539 KOps/s 10.0572 KOps/s $\color{#d91a1a}-0.03\%$
test_compile_add_self_flat[pytree-eager] 0.5730ms 0.4190ms 2.3864 KOps/s 2.3808 KOps/s $\color{#35bf28}+0.23\%$
test_compile_add_self_flat[pytree-compile] 0.2807ms 0.1374ms 7.2783 KOps/s 7.2770 KOps/s $\color{#35bf28}+0.02\%$
test_compile_copy_flat[tensordict-compile] 0.2104ms 20.4329μs 48.9407 KOps/s 50.8196 KOps/s $\color{#d91a1a}-3.70\%$
test_compile_copy_flat[tensordict-eager] 0.1667ms 32.3793μs 30.8839 KOps/s 31.1142 KOps/s $\color{#d91a1a}-0.74\%$
test_compile_copy_flat[pytree-compile] 0.2121ms 69.1417μs 14.4631 KOps/s 14.2997 KOps/s $\color{#35bf28}+1.14\%$
test_compile_copy_flat[pytree-eager] 0.2324ms 52.3543μs 19.1006 KOps/s 18.8999 KOps/s $\color{#35bf28}+1.06\%$
test_compile_assign_and_add[tensordict-compile] 1.6278ms 0.3917ms 2.5532 KOps/s 2.1998 KOps/s $\textbf{\color{#35bf28}+16.07\%}$
test_compile_assign_and_add[tensordict-eager] 2.9708ms 2.7704ms 360.9605 Ops/s 355.2278 Ops/s $\color{#35bf28}+1.61\%$
test_compile_assign_and_add[pytree-compile] 1.6043ms 0.4378ms 2.2840 KOps/s 2.2566 KOps/s $\color{#35bf28}+1.21\%$
test_compile_assign_and_add[pytree-eager] 3.0477ms 2.6910ms 371.6122 Ops/s 368.9118 Ops/s $\color{#35bf28}+0.73\%$
test_compile_indexing[tensor-tensordict-compile] 0.3074ms 0.1194ms 8.3743 KOps/s 8.8273 KOps/s $\textbf{\color{#d91a1a}-5.13\%}$
test_compile_indexing[tensor-tensordict-eager] 0.5504ms 86.4636μs 11.5656 KOps/s 11.3161 KOps/s $\color{#35bf28}+2.20\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2921ms 0.1156ms 8.6470 KOps/s 8.9085 KOps/s $\color{#d91a1a}-2.94\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2475ms 70.9544μs 14.0936 KOps/s 14.3334 KOps/s $\color{#d91a1a}-1.67\%$
test_compile_indexing[tensor-pytree-compile] 0.2994ms 0.1165ms 8.5843 KOps/s 9.2198 KOps/s $\textbf{\color{#d91a1a}-6.89\%}$
test_compile_indexing[tensor-pytree-eager] 0.2567ms 73.9695μs 13.5191 KOps/s 14.4408 KOps/s $\textbf{\color{#d91a1a}-6.38\%}$
test_compile_indexing[slice-tensordict-compile] 0.2721ms 0.1023ms 9.7736 KOps/s 9.9830 KOps/s $\color{#d91a1a}-2.10\%$
test_compile_indexing[slice-tensordict-eager] 0.1603ms 18.5199μs 53.9959 KOps/s 49.3599 KOps/s $\textbf{\color{#35bf28}+9.39\%}$
test_compile_indexing[slice-tensorclass-compile] 0.2443ms 97.1989μs 10.2882 KOps/s 10.2692 KOps/s $\color{#35bf28}+0.19\%$
test_compile_indexing[slice-tensorclass-eager] 0.1359ms 15.6602μs 63.8562 KOps/s 61.0632 KOps/s $\color{#35bf28}+4.57\%$
test_compile_indexing[slice-pytree-compile] 0.2471ms 98.3930μs 10.1633 KOps/s 10.1606 KOps/s $\color{#35bf28}+0.03\%$
test_compile_indexing[slice-pytree-eager] 0.1529ms 15.7171μs 63.6249 KOps/s 61.3299 KOps/s $\color{#35bf28}+3.74\%$
test_compile_indexing[int-tensordict-compile] 0.2694ms 0.1011ms 9.8909 KOps/s 9.6945 KOps/s $\color{#35bf28}+2.03\%$
test_compile_indexing[int-tensordict-eager] 0.6152ms 18.7083μs 53.4521 KOps/s 51.2642 KOps/s $\color{#35bf28}+4.27\%$
test_compile_indexing[int-tensorclass-compile] 0.2452ms 97.4199μs 10.2648 KOps/s 10.2044 KOps/s $\color{#35bf28}+0.59\%$
test_compile_indexing[int-tensorclass-eager] 0.2179ms 16.6246μs 60.1518 KOps/s 60.8937 KOps/s $\color{#d91a1a}-1.22\%$
test_compile_indexing[int-pytree-compile] 0.2988ms 97.3178μs 10.2756 KOps/s 10.1667 KOps/s $\color{#35bf28}+1.07\%$
test_compile_indexing[int-pytree-eager] 43.6600μs 15.8179μs 63.2196 KOps/s 61.8089 KOps/s $\color{#35bf28}+2.28\%$
test_mod_add[eager] 0.1937ms 38.5853μs 25.9166 KOps/s 25.4083 KOps/s $\color{#35bf28}+2.00\%$
test_mod_add[compile] 0.2221ms 82.3238μs 12.1472 KOps/s 11.7016 KOps/s $\color{#35bf28}+3.81\%$
test_mod_add[compile-overhead] 0.3302ms 0.1751ms 5.7123 KOps/s 5.4053 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_mod_wrap[eager] 0.4759ms 0.2735ms 3.6566 KOps/s 3.8626 KOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_mod_wrap[compile] 0.4869ms 0.2996ms 3.3376 KOps/s 3.2494 KOps/s $\color{#35bf28}+2.72\%$
test_mod_wrap[compile-overhead] 7.3057ms 3.8893ms 257.1190 Ops/s 267.8580 Ops/s $\color{#d91a1a}-4.01\%$
test_mod_wrap_and_backward[eager] 1.7796ms 1.3912ms 718.7908 Ops/s 704.5658 Ops/s $\color{#35bf28}+2.02\%$
test_mod_wrap_and_backward[compile] 1.6468ms 1.2924ms 773.7786 Ops/s 765.3096 Ops/s $\color{#35bf28}+1.11\%$
test_mod_wrap_and_backward[compile-overhead] 1.3834ms 0.9389ms 1.0651 KOps/s 1.0641 KOps/s $\color{#35bf28}+0.09\%$
test_seq_add[eager] 0.3146ms 0.1345ms 7.4338 KOps/s 7.4002 KOps/s $\color{#35bf28}+0.45\%$
test_seq_add[compile] 0.2597ms 92.7141μs 10.7858 KOps/s 10.8694 KOps/s $\color{#d91a1a}-0.77\%$
test_seq_add[compile-overhead] 0.2786ms 0.1306ms 7.6577 KOps/s 7.5464 KOps/s $\color{#35bf28}+1.47\%$
test_seq_wrap[eager] 1.0487ms 0.4432ms 2.2564 KOps/s 2.2328 KOps/s $\color{#35bf28}+1.06\%$
test_seq_wrap[compile] 1.0993ms 0.3075ms 3.2516 KOps/s 3.1619 KOps/s $\color{#35bf28}+2.84\%$
test_seq_wrap[compile-overhead] 0.3793ms 0.2282ms 4.3820 KOps/s 4.3182 KOps/s $\color{#35bf28}+1.48\%$
test_func_call_runtime[False-eager] 0.9519ms 0.7653ms 1.3067 KOps/s 1.3129 KOps/s $\color{#d91a1a}-0.47\%$
test_func_call_runtime[False-compile] 0.9040ms 0.7560ms 1.3227 KOps/s 1.2950 KOps/s $\color{#35bf28}+2.14\%$
test_func_call_runtime[False-compile-overhead] 0.5162ms 0.3672ms 2.7230 KOps/s 2.6808 KOps/s $\color{#35bf28}+1.57\%$
test_func_call_runtime[True-eager] 1.1187ms 0.9433ms 1.0601 KOps/s 1.0695 KOps/s $\color{#d91a1a}-0.88\%$
test_func_call_runtime[True-compile] 0.9431ms 0.7804ms 1.2815 KOps/s 1.2664 KOps/s $\color{#35bf28}+1.19\%$
test_func_call_runtime[True-compile-overhead] 0.5367ms 0.3889ms 2.5715 KOps/s 2.5264 KOps/s $\color{#35bf28}+1.79\%$
test_func_call_cm_runtime[False-eager] 0.9806ms 0.7991ms 1.2514 KOps/s 1.3134 KOps/s $\color{#d91a1a}-4.72\%$
test_func_call_cm_runtime[False-compile] 0.9901ms 0.7664ms 1.3049 KOps/s 1.2890 KOps/s $\color{#35bf28}+1.23\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5324ms 0.3711ms 2.6946 KOps/s 2.6896 KOps/s $\color{#35bf28}+0.19\%$
test_func_call_cm_runtime[True-eager] 1.2384ms 1.0431ms 958.7058 Ops/s 963.7903 Ops/s $\color{#d91a1a}-0.53\%$
test_func_call_cm_runtime[True-compile] 1.1837ms 1.0265ms 974.1593 Ops/s 982.7049 Ops/s $\color{#d91a1a}-0.87\%$
test_func_call_cm_runtime[True-compile-overhead] 1.2150ms 1.0290ms 971.8365 Ops/s 972.9078 Ops/s $\color{#d91a1a}-0.11\%$
test_vmap_func_call_cm_runtime[eager] 2.6462ms 2.1424ms 466.7687 Ops/s 464.4308 Ops/s $\color{#35bf28}+0.50\%$
test_vmap_func_call_cm_runtime[compile] 1.0034ms 0.8253ms 1.2117 KOps/s 1.1891 KOps/s $\color{#35bf28}+1.90\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5907ms 0.4197ms 2.3824 KOps/s 2.3443 KOps/s $\color{#35bf28}+1.62\%$
test_distributed 0.6428ms 0.1218ms 8.2107 KOps/s 8.4221 KOps/s $\color{#d91a1a}-2.51\%$
test_tdmodule 57.1910μs 20.1640μs 49.5932 KOps/s 46.9107 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_tdmodule_dispatch 0.1208ms 37.8606μs 26.4127 KOps/s 25.9616 KOps/s $\color{#35bf28}+1.74\%$
test_tdseq 41.1710μs 19.7819μs 50.5514 KOps/s 48.9579 KOps/s $\color{#35bf28}+3.25\%$
test_tdseq_dispatch 66.3210μs 39.9463μs 25.0336 KOps/s 24.5704 KOps/s $\color{#35bf28}+1.89\%$
test_instantiation_functorch 1.7339ms 1.5570ms 642.2540 Ops/s 631.2318 Ops/s $\color{#35bf28}+1.75\%$
test_exec_functorch 0.2308ms 0.1441ms 6.9420 KOps/s 6.7294 KOps/s $\color{#35bf28}+3.16\%$
test_exec_functional_call 0.2759ms 0.1375ms 7.2715 KOps/s 6.9789 KOps/s $\color{#35bf28}+4.19\%$
test_exec_td_decorator 0.3957ms 0.1920ms 5.2073 KOps/s 5.1696 KOps/s $\color{#35bf28}+0.73\%$
test_vmap_mlp_speed_decorator[True-True] 0.8939ms 0.7008ms 1.4270 KOps/s 1.4203 KOps/s $\color{#35bf28}+0.47\%$
test_vmap_mlp_speed_decorator[True-False] 0.9427ms 0.7050ms 1.4183 KOps/s 1.4096 KOps/s $\color{#35bf28}+0.62\%$
test_vmap_mlp_speed_decorator[False-True] 0.7648ms 0.6044ms 1.6546 KOps/s 1.6365 KOps/s $\color{#35bf28}+1.10\%$
test_vmap_mlp_speed_decorator[False-False] 0.7600ms 0.6075ms 1.6461 KOps/s 1.6332 KOps/s $\color{#35bf28}+0.79\%$
test_vmap_transformer_speed_decorator[True-True] 20.2867ms 19.5307ms 51.2015 Ops/s 50.8556 Ops/s $\color{#35bf28}+0.68\%$
test_vmap_transformer_speed_decorator[True-False] 20.1043ms 19.5492ms 51.1529 Ops/s 51.1746 Ops/s $\color{#d91a1a}-0.04\%$
test_vmap_transformer_speed_decorator[False-True] 19.6350ms 19.3863ms 51.5827 Ops/s 51.4417 Ops/s $\color{#35bf28}+0.27\%$
test_vmap_transformer_speed_decorator[False-False] 20.1532ms 19.4802ms 51.3343 Ops/s 51.0864 Ops/s $\color{#35bf28}+0.49\%$
test_to_module_speed[True] 1.2680ms 0.9685ms 1.0325 KOps/s 1.0272 KOps/s $\color{#35bf28}+0.52\%$
test_to_module_speed[False] 1.4095ms 0.9600ms 1.0417 KOps/s 1.0525 KOps/s $\color{#d91a1a}-1.03\%$
test_tc_init 0.1651ms 34.6051μs 28.8975 KOps/s 28.8897 KOps/s $\color{#35bf28}+0.03\%$
test_tc_init_tensor_only 0.1113ms 10.8767μs 91.9396 KOps/s 91.6175 KOps/s $\color{#35bf28}+0.35\%$
test_tc_init_nested 0.1803ms 69.2094μs 14.4489 KOps/s 14.6088 KOps/s $\color{#d91a1a}-1.09\%$
test_tc_first_layer_tensor 5.9835μs 0.8183μs 1.2221 MOps/s 1.1100 MOps/s $\textbf{\color{#35bf28}+10.10\%}$
test_tc_first_layer_tensor_only 2.7150μs 0.4269μs 2.3424 MOps/s 2.3651 MOps/s $\color{#d91a1a}-0.96\%$
test_tc_first_layer_tensor_set 27.8710μs 2.9470μs 339.3257 KOps/s 337.6088 KOps/s $\color{#35bf28}+0.51\%$
test_tc_first_layer_tensor_only_set 13.6370μs 1.8179μs 550.0906 KOps/s 559.0292 KOps/s $\color{#d91a1a}-1.60\%$
test_tc_first_layer_nontensor 65.3610μs 2.4404μs 409.7720 KOps/s 420.8747 KOps/s $\color{#d91a1a}-2.64\%$
test_tc_second_layer_tensor 31.8900μs 1.7693μs 565.1908 KOps/s 571.0502 KOps/s $\color{#d91a1a}-1.03\%$
test_tc_second_layer_nontensor 33.7700μs 3.2435μs 308.3051 KOps/s 313.2560 KOps/s $\color{#d91a1a}-1.58\%$
test_unbind 0.2446s 12.6352ms 79.1443 Ops/s 143.9786 Ops/s $\textbf{\color{#d91a1a}-45.03\%}$
test_full_like 6.6320ms 4.4968ms 222.3799 Ops/s 135.5966 Ops/s $\textbf{\color{#35bf28}+64.00\%}$
test_zeros_like 11.9501ms 8.7960ms 113.6884 Ops/s 113.4145 Ops/s $\color{#35bf28}+0.24\%$
test_ones_like 5.3648ms 4.4565ms 224.3913 Ops/s 225.7578 Ops/s $\color{#d91a1a}-0.61\%$
test_clone 16.4337ms 6.9512ms 143.8593 Ops/s 144.2240 Ops/s $\color{#d91a1a}-0.25\%$
test_squeeze 0.1410ms 10.0656μs 99.3485 KOps/s 99.1343 KOps/s $\color{#35bf28}+0.22\%$
test_unsqueeze 0.2455ms 74.5265μs 13.4180 KOps/s 13.3371 KOps/s $\color{#35bf28}+0.61\%$
test_split 0.4161ms 0.1618ms 6.1820 KOps/s 5.9366 KOps/s $\color{#35bf28}+4.13\%$
test_permute 0.3220ms 0.1797ms 5.5644 KOps/s 5.5560 KOps/s $\color{#35bf28}+0.15\%$
test_stack 53.3929ms 52.0867ms 19.1987 Ops/s 28.1773 Ops/s $\textbf{\color{#d91a1a}-31.86\%}$
test_cat 52.7259ms 51.7814ms 19.3119 Ops/s 19.5639 Ops/s $\color{#d91a1a}-1.29\%$

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 1, 2025
ghstack-source-id: 6dc62b5
Pull Request resolved: #1282
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Apr 1, 2025
@vmoens
Copy link
Collaborator Author

vmoens commented Apr 1, 2025

To use this feature:

from tensordict import *
set_list_to_stack(True).set()
td = LazyStackedTensorDict(TensorDict(), TensorDict())
td["a"] = ["0", "1"]
td["b"] = [torch.rand((1,)), torch.rand((2,))]

I think this will not be a footgun. Very few tests had to be adapted.

Only lists (not iterables) are considered as batches - so a passed string will not be unbound.

If you have a lazy stack (or a tensorclass containing a lazy stack) and you want to get ragged tensors, you can do

data.get(key, as_nested_tensor=True, layout=torch.strided) # gives a nested tensor
data.get(key, as_padded_tensor=True, padding_side="left", padding_value=0)
data.get(key, as_list=True)

cc @Darktex @mikaylagawarecki

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 1, 2025
ghstack-source-id: ff19bb3
Pull Request resolved: #1282
@vmoens vmoens merged commit 6108441 into gh/vmoens/50/base Apr 2, 2025
45 of 48 checks passed
vmoens pushed a commit that referenced this pull request Apr 2, 2025
ghstack-source-id: ff19bb3
Pull Request resolved: #1282
@vmoens vmoens deleted the gh/vmoens/50/head branch April 2, 2025 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants