Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Apr 30, 2025

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 30, 2025
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 30, 2025
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 30, 2025
@vmoens vmoens added the ciflow/binaries/all Build all wheels label Apr 30, 2025
@github-actions
Copy link

github-actions bot commented Apr 30, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 233. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 25.3110μs 11.3288μs 88.2708 KOps/s 88.4560 KOps/s $\color{#d91a1a}-0.21\%$
test_plain_set_stack_nested 28.7500μs 11.3138μs 88.3874 KOps/s 87.4769 KOps/s $\color{#35bf28}+1.04\%$
test_plain_set_nested_inplace 47.4010μs 12.4817μs 80.1172 KOps/s 80.5356 KOps/s $\color{#d91a1a}-0.52\%$
test_plain_set_stack_nested_inplace 36.8800μs 12.4780μs 80.1408 KOps/s 80.4598 KOps/s $\color{#d91a1a}-0.40\%$
test_items 26.7000μs 2.9480μs 339.2074 KOps/s 341.5815 KOps/s $\color{#d91a1a}-0.70\%$
test_items_nested 0.5129ms 0.3644ms 2.7445 KOps/s 2.6998 KOps/s $\color{#35bf28}+1.65\%$
test_items_nested_locked 0.4306ms 0.3674ms 2.7218 KOps/s 2.6965 KOps/s $\color{#35bf28}+0.94\%$
test_items_nested_leaf 0.1313ms 60.1515μs 16.6247 KOps/s 16.6283 KOps/s $\color{#d91a1a}-0.02\%$
test_items_stack_nested 0.5479ms 0.3678ms 2.7185 KOps/s 2.7308 KOps/s $\color{#d91a1a}-0.45\%$
test_items_stack_nested_leaf 0.1292ms 60.6206μs 16.4960 KOps/s 16.5894 KOps/s $\color{#d91a1a}-0.56\%$
test_items_stack_nested_locked 0.5378ms 0.3695ms 2.7064 KOps/s 2.7226 KOps/s $\color{#d91a1a}-0.60\%$
test_keys 25.5300μs 3.4806μs 287.3104 KOps/s 288.7609 KOps/s $\color{#d91a1a}-0.50\%$
test_keys_nested 0.1236ms 89.1920μs 11.2118 KOps/s 11.2991 KOps/s $\color{#d91a1a}-0.77\%$
test_keys_nested_locked 2.2119ms 95.0690μs 10.5187 KOps/s 10.5821 KOps/s $\color{#d91a1a}-0.60\%$
test_keys_nested_leaf 0.1275ms 79.9566μs 12.5068 KOps/s 12.6511 KOps/s $\color{#d91a1a}-1.14\%$
test_keys_stack_nested 0.1315ms 88.2362μs 11.3332 KOps/s 11.3363 KOps/s $\color{#d91a1a}-0.03\%$
test_keys_stack_nested_leaf 0.1328ms 79.3183μs 12.6074 KOps/s 12.6487 KOps/s $\color{#d91a1a}-0.33\%$
test_keys_stack_nested_locked 0.1425ms 94.6691μs 10.5631 KOps/s 10.5957 KOps/s $\color{#d91a1a}-0.31\%$
test_values 4.8350μs 0.8551μs 1.1694 MOps/s 1.1731 MOps/s $\color{#d91a1a}-0.32\%$
test_values_nested 65.5010μs 37.5119μs 26.6582 KOps/s 26.3513 KOps/s $\color{#35bf28}+1.16\%$
test_values_nested_locked 76.3810μs 40.0435μs 24.9729 KOps/s 24.9643 KOps/s $\color{#35bf28}+0.03\%$
test_values_nested_leaf 78.8820μs 42.7954μs 23.3670 KOps/s 23.2615 KOps/s $\color{#35bf28}+0.45\%$
test_values_stack_nested 65.9220μs 37.6943μs 26.5292 KOps/s 26.3123 KOps/s $\color{#35bf28}+0.82\%$
test_values_stack_nested_leaf 83.4520μs 42.9784μs 23.2675 KOps/s 23.1053 KOps/s $\color{#35bf28}+0.70\%$
test_values_stack_nested_locked 0.1097ms 39.6326μs 25.2318 KOps/s 24.7269 KOps/s $\color{#35bf28}+2.04\%$
test_membership 1.6990μs 0.4995μs 2.0019 MOps/s 1.9823 MOps/s $\color{#35bf28}+0.98\%$
test_membership_nested 13.9600μs 2.0332μs 491.8238 KOps/s 476.5615 KOps/s $\color{#35bf28}+3.20\%$
test_membership_nested_leaf 22.8755μs 2.0306μs 492.4707 KOps/s 491.7013 KOps/s $\color{#35bf28}+0.16\%$
test_membership_stacked_nested 26.0800μs 2.0651μs 484.2347 KOps/s 482.2687 KOps/s $\color{#35bf28}+0.41\%$
test_membership_stacked_nested_leaf 41.4510μs 2.0836μs 479.9392 KOps/s 478.3243 KOps/s $\color{#35bf28}+0.34\%$
test_membership_nested_last 0.3854ms 3.0849μs 324.1596 KOps/s 325.2894 KOps/s $\color{#d91a1a}-0.35\%$
test_membership_nested_leaf_last 26.8510μs 3.0637μs 326.3995 KOps/s 322.1446 KOps/s $\color{#35bf28}+1.32\%$
test_membership_stacked_nested_last 0.3805ms 3.0812μs 324.5512 KOps/s 328.5781 KOps/s $\color{#d91a1a}-1.23\%$
test_membership_stacked_nested_leaf_last 25.2910μs 3.0381μs 329.1547 KOps/s 323.7834 KOps/s $\color{#35bf28}+1.66\%$
test_nested_getleaf 38.9900μs 13.0941μs 76.3700 KOps/s 76.9938 KOps/s $\color{#d91a1a}-0.81\%$
test_nested_get 0.3948ms 12.3838μs 80.7509 KOps/s 81.3345 KOps/s $\color{#d91a1a}-0.72\%$
test_stacked_getleaf 48.6510μs 13.0185μs 76.8135 KOps/s 76.8998 KOps/s $\color{#d91a1a}-0.11\%$
test_stacked_get 0.4008ms 12.3535μs 80.9487 KOps/s 81.0317 KOps/s $\color{#d91a1a}-0.10\%$
test_nested_getitemleaf 0.3991ms 13.4094μs 74.5743 KOps/s 74.9806 KOps/s $\color{#d91a1a}-0.54\%$
test_nested_getitem 41.2010μs 12.7216μs 78.6067 KOps/s 78.8227 KOps/s $\color{#d91a1a}-0.27\%$
test_stacked_getitemleaf 0.4037ms 13.5123μs 74.0064 KOps/s 74.8202 KOps/s $\color{#d91a1a}-1.09\%$
test_stacked_getitem 0.4009ms 12.8427μs 77.8650 KOps/s 79.1398 KOps/s $\color{#d91a1a}-1.61\%$
test_lock_nested 1.8378ms 0.3636ms 2.7501 KOps/s 2.7613 KOps/s $\color{#d91a1a}-0.41\%$
test_lock_stack_nested 0.3915ms 0.3509ms 2.8494 KOps/s 2.8390 KOps/s $\color{#35bf28}+0.37\%$
test_unlock_nested 0.5055ms 0.3045ms 3.2845 KOps/s 3.3155 KOps/s $\color{#d91a1a}-0.93\%$
test_unlock_stack_nested 0.3361ms 0.2913ms 3.4332 KOps/s 3.4191 KOps/s $\color{#35bf28}+0.41\%$
test_flatten_speed 0.4507ms 76.4259μs 13.0846 KOps/s 13.0906 KOps/s $\color{#d91a1a}-0.05\%$
test_unflatten_speed 0.7894ms 0.3990ms 2.5063 KOps/s 2.4946 KOps/s $\color{#35bf28}+0.47\%$
test_common_ops 1.0138ms 0.6477ms 1.5439 KOps/s 1.5500 KOps/s $\color{#d91a1a}-0.40\%$
test_creation 0.1004ms 1.7479μs 572.1058 KOps/s 568.3829 KOps/s $\color{#35bf28}+0.65\%$
test_creation_empty 0.7566ms 7.1776μs 139.3228 KOps/s 140.4681 KOps/s $\color{#d91a1a}-0.82\%$
test_creation_nested_1 0.4638ms 10.0355μs 99.6461 KOps/s 99.7559 KOps/s $\color{#d91a1a}-0.11\%$
test_creation_nested_2 0.1051ms 12.8188μs 78.0107 KOps/s 77.8035 KOps/s $\color{#35bf28}+0.27\%$
test_clone 54.0410μs 11.2921μs 88.5576 KOps/s 90.6980 KOps/s $\color{#d91a1a}-2.36\%$
test_getitem[int] 0.4115ms 10.7432μs 93.0821 KOps/s 93.1142 KOps/s $\color{#d91a1a}-0.03\%$
test_getitem[slice_int] 0.1208ms 21.2358μs 47.0902 KOps/s 47.1033 KOps/s $\color{#d91a1a}-0.03\%$
test_getitem[range] 0.1358ms 39.2223μs 25.4957 KOps/s 25.3091 KOps/s $\color{#35bf28}+0.74\%$
test_getitem[tuple] 0.4160ms 18.5465μs 53.9185 KOps/s 54.1461 KOps/s $\color{#d91a1a}-0.42\%$
test_getitem[list] 0.1316ms 35.0697μs 28.5147 KOps/s 28.6370 KOps/s $\color{#d91a1a}-0.43\%$
test_setitem_dim[int] 62.9510μs 20.1894μs 49.5310 KOps/s 49.3751 KOps/s $\color{#35bf28}+0.32\%$
test_setitem_dim[slice_int] 67.9920μs 39.2802μs 25.4581 KOps/s 25.7683 KOps/s $\color{#d91a1a}-1.20\%$
test_setitem_dim[range] 88.3420μs 54.6592μs 18.2952 KOps/s 18.1292 KOps/s $\color{#35bf28}+0.92\%$
test_setitem_dim[tuple] 74.2510μs 33.2572μs 30.0687 KOps/s 30.1500 KOps/s $\color{#d91a1a}-0.27\%$
test_setitem 0.4043ms 16.3441μs 61.1840 KOps/s 62.4501 KOps/s $\color{#d91a1a}-2.03\%$
test_set 0.2224ms 15.7697μs 63.4127 KOps/s 65.3325 KOps/s $\color{#d91a1a}-2.94\%$
test_set_shared 0.8326ms 0.1604ms 6.2327 KOps/s 6.2126 KOps/s $\color{#35bf28}+0.32\%$
test_update 0.4224ms 19.0196μs 52.5774 KOps/s 54.4165 KOps/s $\color{#d91a1a}-3.38\%$
test_update_nested 0.1455ms 29.7554μs 33.6074 KOps/s 34.5514 KOps/s $\color{#d91a1a}-2.73\%$
test_update__nested 0.4603ms 26.3879μs 37.8962 KOps/s 38.7477 KOps/s $\color{#d91a1a}-2.20\%$
test_set_nested 0.1063ms 16.7474μs 59.7107 KOps/s 49.0025 KOps/s $\textbf{\color{#35bf28}+21.85\%}$
test_set_nested_new 0.1058ms 20.1346μs 49.6656 KOps/s 50.5345 KOps/s $\color{#d91a1a}-1.72\%$
test_select 0.4285ms 32.3007μs 30.9591 KOps/s 31.2410 KOps/s $\color{#d91a1a}-0.90\%$
test_select_nested 0.4384ms 44.0224μs 22.7157 KOps/s 22.9895 KOps/s $\color{#d91a1a}-1.19\%$
test_exclude_nested 0.1264ms 63.0047μs 15.8718 KOps/s 15.9189 KOps/s $\color{#d91a1a}-0.30\%$
test_empty[True] 0.6956ms 0.2926ms 3.4179 KOps/s 3.4627 KOps/s $\color{#d91a1a}-1.29\%$
test_empty[False] 67.0930μs 0.8269μs 1.2094 MOps/s 1.2408 MOps/s $\color{#d91a1a}-2.53\%$
test_to 89.1120μs 58.2499μs 17.1674 KOps/s 16.9560 KOps/s $\color{#35bf28}+1.25\%$
test_to_nonblocking 0.4705ms 52.5337μs 19.0354 KOps/s 18.6278 KOps/s $\color{#35bf28}+2.19\%$
test_unbind_speed 0.2932ms 0.2509ms 3.9857 KOps/s 4.0456 KOps/s $\color{#d91a1a}-1.48\%$
test_unbind_speed_stack0 0.6410ms 0.2458ms 4.0676 KOps/s 4.0681 KOps/s $\color{#d91a1a}-0.01\%$
test_unbind_speed_stack1 92.7123ms 0.7457ms 1.3410 KOps/s 1.4634 KOps/s $\textbf{\color{#d91a1a}-8.36\%}$
test_split 93.4841ms 1.6381ms 610.4680 Ops/s 620.3917 Ops/s $\color{#d91a1a}-1.60\%$
test_chunk 95.6755ms 1.6307ms 613.2468 Ops/s 621.5865 Ops/s $\color{#d91a1a}-1.34\%$
test_consolidate[False-None] 96.4944ms 3.1155ms 320.9733 Ops/s 317.1836 Ops/s $\color{#35bf28}+1.19\%$
test_consolidate[default-None] 1.9363ms 1.7391ms 575.0020 Ops/s 572.9666 Ops/s $\color{#35bf28}+0.36\%$
test_consolidate[reduce-overhead-None] 2.1788ms 1.7691ms 565.2683 Ops/s 564.3088 Ops/s $\color{#35bf28}+0.17\%$
test_consolidate_njt[False-None] 7.0466ms 6.5634ms 152.3612 Ops/s 150.2438 Ops/s $\color{#35bf28}+1.41\%$
test_to[False-False-None] 2.2464ms 1.8278ms 547.0950 Ops/s 550.0751 Ops/s $\color{#d91a1a}-0.54\%$
test_to[True-False-None] 1.9945ms 1.4840ms 673.8406 Ops/s 682.5514 Ops/s $\color{#d91a1a}-1.28\%$
test_to[within-False-None] 4.9047ms 4.4636ms 224.0347 Ops/s 224.3821 Ops/s $\color{#d91a1a}-0.15\%$
test_to[True-default-None] 5.6299ms 5.4157ms 184.6498 Ops/s 184.5917 Ops/s $\color{#35bf28}+0.03\%$
test_to_njt[False-False-None] 7.1371ms 7.0138ms 142.5759 Ops/s 142.3722 Ops/s $\color{#35bf28}+0.14\%$
test_to_njt[True-False-None] 5.7055ms 5.5407ms 180.4822 Ops/s 178.2166 Ops/s $\color{#35bf28}+1.27\%$
test_to_njt[within-False-None] 12.4487ms 12.3600ms 80.9059 Ops/s 80.8543 Ops/s $\color{#35bf28}+0.06\%$
test_creation[device0] 0.5473ms 80.0389μs 12.4939 KOps/s 12.4865 KOps/s $\color{#35bf28}+0.06\%$
test_creation_from_tensor 0.4667ms 82.4909μs 12.1225 KOps/s 11.9797 KOps/s $\color{#35bf28}+1.19\%$
test_add_one[memmap_tensor0] 0.3473ms 7.1876μs 139.1284 KOps/s 142.6667 KOps/s $\color{#d91a1a}-2.48\%$
test_contiguous[memmap_tensor0] 1.8676μs 0.4312μs 2.3190 MOps/s 2.3517 MOps/s $\color{#d91a1a}-1.39\%$
test_stack[memmap_tensor0] 32.8110μs 4.8237μs 207.3099 KOps/s 214.3472 KOps/s $\color{#d91a1a}-3.28\%$
test_memmaptd_index 1.7238ms 0.2464ms 4.0586 KOps/s 3.9354 KOps/s $\color{#35bf28}+3.13\%$
test_memmaptd_index_astensor 0.4405ms 0.3096ms 3.2302 KOps/s 3.1667 KOps/s $\color{#35bf28}+2.00\%$
test_memmaptd_index_op 1.0337ms 0.5790ms 1.7271 KOps/s 1.7388 KOps/s $\color{#d91a1a}-0.67\%$
test_serialize_model 0.1334s 0.1321s 7.5678 Ops/s 7.5819 Ops/s $\color{#d91a1a}-0.19\%$
test_serialize_model_pickle 1.3508s 1.1922s 0.8388 Ops/s 0.8245 Ops/s $\color{#35bf28}+1.73\%$
test_serialize_weights 0.1323s 0.1309s 7.6376 Ops/s 5.4395 Ops/s $\textbf{\color{#35bf28}+40.41\%}$
test_serialize_weights_returnearly 0.3125s 53.4470ms 18.7101 Ops/s 23.7901 Ops/s $\textbf{\color{#d91a1a}-21.35\%}$
test_serialize_weights_pickle 1.3844s 1.2227s 0.8179 Ops/s 0.8195 Ops/s $\color{#d91a1a}-0.20\%$
test_reshape_pytree 58.6520μs 22.4849μs 44.4743 KOps/s 44.7004 KOps/s $\color{#d91a1a}-0.51\%$
test_reshape_td 92.8220μs 26.3356μs 37.9714 KOps/s 36.5112 KOps/s $\color{#35bf28}+4.00\%$
test_view_pytree 60.5110μs 22.2071μs 45.0307 KOps/s 45.0850 KOps/s $\color{#d91a1a}-0.12\%$
test_view_td 73.7310μs 32.8125μs 30.4762 KOps/s 30.0033 KOps/s $\color{#35bf28}+1.58\%$
test_unbind_pytree 64.7720μs 28.8604μs 34.6496 KOps/s 34.7209 KOps/s $\color{#d91a1a}-0.21\%$
test_unbind_td 0.6221ms 38.0024μs 26.3141 KOps/s 25.1204 KOps/s $\color{#35bf28}+4.75\%$
test_split_pytree 64.5210μs 30.3830μs 32.9132 KOps/s 30.8515 KOps/s $\textbf{\color{#35bf28}+6.68\%}$
test_split_td 0.7914ms 39.1543μs 25.5400 KOps/s 25.3035 KOps/s $\color{#35bf28}+0.93\%$
test_add_pytree 71.7020μs 35.7671μs 27.9587 KOps/s 27.9361 KOps/s $\color{#35bf28}+0.08\%$
test_add_td 0.2858ms 50.2250μs 19.9104 KOps/s 19.0798 KOps/s $\color{#35bf28}+4.35\%$
test_compile_add_one_nested[tensordict-compile] 0.1884ms 0.1234ms 8.1026 KOps/s 7.6971 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_compile_add_one_nested[tensordict-eager] 0.2474ms 0.1488ms 6.7213 KOps/s 6.9095 KOps/s $\color{#d91a1a}-2.72\%$
test_compile_add_one_nested[pytree-compile] 0.1361ms 98.2476μs 10.1784 KOps/s 10.1506 KOps/s $\color{#35bf28}+0.27\%$
test_compile_add_one_nested[pytree-eager] 1.4101ms 0.1554ms 6.4349 KOps/s 6.3696 KOps/s $\color{#35bf28}+1.02\%$
test_compile_copy_nested[tensordict-compile] 56.2010μs 24.9056μs 40.1516 KOps/s 39.0704 KOps/s $\color{#35bf28}+2.77\%$
test_compile_copy_nested[tensordict-eager] 81.2520μs 35.1077μs 28.4837 KOps/s 27.2092 KOps/s $\color{#35bf28}+4.68\%$
test_compile_copy_nested[pytree-compile] 0.3983ms 64.1053μs 15.5993 KOps/s 15.4214 KOps/s $\color{#35bf28}+1.15\%$
test_compile_copy_nested[pytree-eager] 84.4310μs 48.7033μs 20.5325 KOps/s 19.8741 KOps/s $\color{#35bf28}+3.31\%$
test_compile_add_one_flat[tensordict-compile] 0.1990ms 0.1487ms 6.7270 KOps/s 6.8065 KOps/s $\color{#d91a1a}-1.17\%$
test_compile_add_one_flat[tensordict-eager] 0.3187ms 0.2221ms 4.5025 KOps/s 4.4913 KOps/s $\color{#35bf28}+0.25\%$
test_compile_add_one_flat[tensorclass-compile] 0.1601ms 0.1047ms 9.5529 KOps/s 10.0337 KOps/s $\color{#d91a1a}-4.79\%$
test_compile_add_one_flat[tensorclass-eager] 0.1549ms 61.8952μs 16.1563 KOps/s 16.9636 KOps/s $\color{#d91a1a}-4.76\%$
test_compile_add_one_flat[pytree-compile] 0.2048ms 0.1400ms 7.1406 KOps/s 7.1084 KOps/s $\color{#35bf28}+0.45\%$
test_compile_add_one_flat[pytree-eager] 0.5685ms 0.5094ms 1.9631 KOps/s 1.8157 KOps/s $\textbf{\color{#35bf28}+8.12\%}$
test_compile_add_self_flat[tensordict-eager] 0.4351ms 0.2677ms 3.7352 KOps/s 3.7272 KOps/s $\color{#35bf28}+0.22\%$
test_compile_add_self_flat[tensordict-compile] 0.1858ms 0.1453ms 6.8820 KOps/s 6.7176 KOps/s $\color{#35bf28}+2.45\%$
test_compile_add_self_flat[tensorclass-eager] 0.1544ms 71.2109μs 14.0428 KOps/s 13.9732 KOps/s $\color{#35bf28}+0.50\%$
test_compile_add_self_flat[tensorclass-compile] 0.1448ms 99.6655μs 10.0336 KOps/s 10.0470 KOps/s $\color{#d91a1a}-0.13\%$
test_compile_add_self_flat[pytree-eager] 0.4693ms 0.4284ms 2.3341 KOps/s 2.3254 KOps/s $\color{#35bf28}+0.37\%$
test_compile_add_self_flat[pytree-compile] 0.1802ms 0.1414ms 7.0732 KOps/s 7.3217 KOps/s $\color{#d91a1a}-3.39\%$
test_compile_copy_flat[tensordict-compile] 0.1289ms 20.2572μs 49.3652 KOps/s 52.1598 KOps/s $\textbf{\color{#d91a1a}-5.36\%}$
test_compile_copy_flat[tensordict-eager] 64.3910μs 31.9075μs 31.3406 KOps/s 31.4972 KOps/s $\color{#d91a1a}-0.50\%$
test_compile_copy_flat[pytree-compile] 0.1110ms 69.7591μs 14.3350 KOps/s 14.3565 KOps/s $\color{#d91a1a}-0.15\%$
test_compile_copy_flat[pytree-eager] 87.5620μs 52.2878μs 19.1249 KOps/s 19.0357 KOps/s $\color{#35bf28}+0.47\%$
test_compile_assign_and_add[tensordict-compile] 1.6791ms 0.4014ms 2.4912 KOps/s 2.2224 KOps/s $\textbf{\color{#35bf28}+12.09\%}$
test_compile_assign_and_add[tensordict-eager] 3.0200ms 2.8298ms 353.3868 Ops/s 355.9633 Ops/s $\color{#d91a1a}-0.72\%$
test_compile_assign_and_add[pytree-compile] 1.5984ms 0.4325ms 2.3121 KOps/s 2.2542 KOps/s $\color{#35bf28}+2.57\%$
test_compile_assign_and_add[pytree-eager] 2.8786ms 2.7678ms 361.2967 Ops/s 354.4591 Ops/s $\color{#35bf28}+1.93\%$
test_compile_indexing[tensor-tensordict-compile] 0.4015ms 0.1181ms 8.4673 KOps/s 8.2309 KOps/s $\color{#35bf28}+2.87\%$
test_compile_indexing[tensor-tensordict-eager] 0.5390ms 87.1756μs 11.4711 KOps/s 11.6445 KOps/s $\color{#d91a1a}-1.49\%$
test_compile_indexing[tensor-tensorclass-compile] 0.5481ms 0.1132ms 8.8354 KOps/s 9.1745 KOps/s $\color{#d91a1a}-3.70\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1326ms 74.6445μs 13.3968 KOps/s 13.8134 KOps/s $\color{#d91a1a}-3.02\%$
test_compile_indexing[tensor-pytree-compile] 0.1663ms 0.1152ms 8.6797 KOps/s 8.6904 KOps/s $\color{#d91a1a}-0.12\%$
test_compile_indexing[tensor-pytree-eager] 0.1217ms 74.4815μs 13.4261 KOps/s 13.3298 KOps/s $\color{#35bf28}+0.72\%$
test_compile_indexing[slice-tensordict-compile] 0.1541ms 0.1052ms 9.5031 KOps/s 9.9670 KOps/s $\color{#d91a1a}-4.66\%$
test_compile_indexing[slice-tensordict-eager] 0.1476ms 19.1157μs 52.3130 KOps/s 51.0715 KOps/s $\color{#35bf28}+2.43\%$
test_compile_indexing[slice-tensorclass-compile] 0.1531ms 0.1007ms 9.9323 KOps/s 10.2625 KOps/s $\color{#d91a1a}-3.22\%$
test_compile_indexing[slice-tensorclass-eager] 52.6710μs 16.1646μs 61.8635 KOps/s 61.7939 KOps/s $\color{#35bf28}+0.11\%$
test_compile_indexing[slice-pytree-compile] 0.1453ms 97.6171μs 10.2441 KOps/s 10.2110 KOps/s $\color{#35bf28}+0.32\%$
test_compile_indexing[slice-pytree-eager] 49.9000μs 16.2067μs 61.7030 KOps/s 61.8597 KOps/s $\color{#d91a1a}-0.25\%$
test_compile_indexing[int-tensordict-compile] 0.1536ms 0.1069ms 9.3525 KOps/s 9.8939 KOps/s $\textbf{\color{#d91a1a}-5.47\%}$
test_compile_indexing[int-tensordict-eager] 0.6110ms 21.0192μs 47.5755 KOps/s 51.0325 KOps/s $\textbf{\color{#d91a1a}-6.77\%}$
test_compile_indexing[int-tensorclass-compile] 0.1559ms 97.2526μs 10.2825 KOps/s 10.2474 KOps/s $\color{#35bf28}+0.34\%$
test_compile_indexing[int-tensorclass-eager] 46.4710μs 16.1415μs 61.9522 KOps/s 61.7526 KOps/s $\color{#35bf28}+0.32\%$
test_compile_indexing[int-pytree-compile] 0.1392ms 97.1130μs 10.2973 KOps/s 10.2149 KOps/s $\color{#35bf28}+0.81\%$
test_compile_indexing[int-pytree-eager] 45.9510μs 15.9789μs 62.5826 KOps/s 56.3499 KOps/s $\textbf{\color{#35bf28}+11.06\%}$
test_mod_add[eager] 84.5410μs 38.7078μs 25.8346 KOps/s 25.6837 KOps/s $\color{#35bf28}+0.59\%$
test_mod_add[compile] 0.4276ms 81.9022μs 12.2097 KOps/s 11.7522 KOps/s $\color{#35bf28}+3.89\%$
test_mod_add[compile-overhead] 0.3429ms 0.1720ms 5.8128 KOps/s 5.5851 KOps/s $\color{#35bf28}+4.08\%$
test_mod_wrap[eager] 0.3424ms 0.2729ms 3.6650 KOps/s 3.6505 KOps/s $\color{#35bf28}+0.40\%$
test_mod_wrap[compile] 0.3805ms 0.2909ms 3.4380 KOps/s 3.3718 KOps/s $\color{#35bf28}+1.96\%$
test_mod_wrap[compile-overhead] 7.9027ms 3.9190ms 255.1676 Ops/s 259.8017 Ops/s $\color{#d91a1a}-1.78\%$
test_mod_wrap_and_backward[eager] 1.7262ms 1.3886ms 720.1475 Ops/s 675.9755 Ops/s $\textbf{\color{#35bf28}+6.53\%}$
test_mod_wrap_and_backward[compile] 1.3890ms 1.3031ms 767.4040 Ops/s 703.8058 Ops/s $\textbf{\color{#35bf28}+9.04\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3971ms 0.9347ms 1.0698 KOps/s 946.7203 Ops/s $\textbf{\color{#35bf28}+13.01\%}$
test_seq_add[eager] 0.3125ms 0.1295ms 7.7224 KOps/s 7.6508 KOps/s $\color{#35bf28}+0.93\%$
test_seq_add[compile] 0.3372ms 91.2478μs 10.9592 KOps/s 10.8139 KOps/s $\color{#35bf28}+1.34\%$
test_seq_add[compile-overhead] 0.1880ms 0.1316ms 7.6014 KOps/s 7.5339 KOps/s $\color{#35bf28}+0.90\%$
test_seq_wrap[eager] 1.0408ms 0.4398ms 2.2736 KOps/s 2.2641 KOps/s $\color{#35bf28}+0.42\%$
test_seq_wrap[compile] 1.1376ms 0.3113ms 3.2119 KOps/s 3.1788 KOps/s $\color{#35bf28}+1.04\%$
test_seq_wrap[compile-overhead] 0.3274ms 0.2295ms 4.3577 KOps/s 4.3106 KOps/s $\color{#35bf28}+1.09\%$
test_func_call_runtime[False-eager] 0.8419ms 0.7537ms 1.3267 KOps/s 1.2922 KOps/s $\color{#35bf28}+2.67\%$
test_func_call_runtime[False-compile] 1.1307ms 0.7628ms 1.3109 KOps/s 1.3009 KOps/s $\color{#35bf28}+0.77\%$
test_func_call_runtime[False-compile-overhead] 0.4115ms 0.3690ms 2.7099 KOps/s 2.7083 KOps/s $\color{#35bf28}+0.06\%$
test_func_call_runtime[True-eager] 0.9768ms 0.9175ms 1.0900 KOps/s 1.0724 KOps/s $\color{#35bf28}+1.64\%$
test_func_call_runtime[True-compile] 0.9005ms 0.7901ms 1.2656 KOps/s 1.2699 KOps/s $\color{#d91a1a}-0.34\%$
test_func_call_runtime[True-compile-overhead] 0.4808ms 0.3916ms 2.5536 KOps/s 2.5629 KOps/s $\color{#d91a1a}-0.37\%$
test_func_call_cm_runtime[False-eager] 0.8029ms 0.7496ms 1.3340 KOps/s 1.2989 KOps/s $\color{#35bf28}+2.70\%$
test_func_call_cm_runtime[False-compile] 0.8089ms 0.7638ms 1.3092 KOps/s 1.3005 KOps/s $\color{#35bf28}+0.67\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4133ms 0.3711ms 2.6950 KOps/s 2.7096 KOps/s $\color{#d91a1a}-0.54\%$
test_func_call_cm_runtime[True-eager] 1.1423ms 1.0227ms 977.8377 Ops/s 968.0326 Ops/s $\color{#35bf28}+1.01\%$
test_func_call_cm_runtime[True-compile] 1.0587ms 1.0118ms 988.3452 Ops/s 981.6851 Ops/s $\color{#35bf28}+0.68\%$
test_func_call_cm_runtime[True-compile-overhead] 1.0902ms 1.0137ms 986.4393 Ops/s 975.1465 Ops/s $\color{#35bf28}+1.16\%$
test_vmap_func_call_cm_runtime[eager] 2.5411ms 2.1293ms 469.6338 Ops/s 466.1098 Ops/s $\color{#35bf28}+0.76\%$
test_vmap_func_call_cm_runtime[compile] 0.8834ms 0.8349ms 1.1977 KOps/s 1.1917 KOps/s $\color{#35bf28}+0.51\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4963ms 0.4191ms 2.3861 KOps/s 2.3576 KOps/s $\color{#35bf28}+1.21\%$
test_distributed 4.3398ms 0.3420ms 2.9236 KOps/s 8.7178 KOps/s $\textbf{\color{#d91a1a}-66.46\%}$
test_tdmodule 28.4210μs 19.8923μs 50.2707 KOps/s 48.3010 KOps/s $\color{#35bf28}+4.08\%$
test_tdmodule_dispatch 0.1639ms 37.6121μs 26.5872 KOps/s 26.5995 KOps/s $\color{#d91a1a}-0.05\%$
test_tdseq 40.1500μs 20.3103μs 49.2361 KOps/s 49.4469 KOps/s $\color{#d91a1a}-0.43\%$
test_tdseq_dispatch 68.3610μs 40.1123μs 24.9300 KOps/s 24.9344 KOps/s $\color{#d91a1a}-0.02\%$
test_instantiation_functorch 1.6474ms 1.5804ms 632.7614 Ops/s 646.1560 Ops/s $\color{#d91a1a}-2.07\%$
test_exec_functorch 0.3001ms 0.1474ms 6.7841 KOps/s 6.7844 KOps/s $-0.00\%$
test_exec_functional_call 0.1938ms 0.1428ms 7.0048 KOps/s 6.9721 KOps/s $\color{#35bf28}+0.47\%$
test_exec_td_decorator 0.4051ms 0.1940ms 5.1534 KOps/s 5.1659 KOps/s $\color{#d91a1a}-0.24\%$
test_vmap_mlp_speed_decorator[True-True] 0.8936ms 0.6973ms 1.4340 KOps/s 1.4246 KOps/s $\color{#35bf28}+0.66\%$
test_vmap_mlp_speed_decorator[True-False] 0.9026ms 0.6975ms 1.4336 KOps/s 1.4285 KOps/s $\color{#35bf28}+0.36\%$
test_vmap_mlp_speed_decorator[False-True] 0.8059ms 0.6033ms 1.6575 KOps/s 1.6384 KOps/s $\color{#35bf28}+1.17\%$
test_vmap_mlp_speed_decorator[False-False] 0.7188ms 0.6049ms 1.6531 KOps/s 1.6396 KOps/s $\color{#35bf28}+0.82\%$
test_vmap_transformer_speed_decorator[True-True] 19.5319ms 19.4544ms 51.4024 Ops/s 51.1233 Ops/s $\color{#35bf28}+0.55\%$
test_vmap_transformer_speed_decorator[True-False] 19.5580ms 19.4615ms 51.3835 Ops/s 51.2740 Ops/s $\color{#35bf28}+0.21\%$
test_vmap_transformer_speed_decorator[False-True] 19.3955ms 19.3064ms 51.7964 Ops/s 51.6309 Ops/s $\color{#35bf28}+0.32\%$
test_vmap_transformer_speed_decorator[False-False] 19.4060ms 19.3520ms 51.6744 Ops/s 51.5439 Ops/s $\color{#35bf28}+0.25\%$
test_to_module_speed[True] 1.0456ms 0.9700ms 1.0309 KOps/s 1.0153 KOps/s $\color{#35bf28}+1.53\%$
test_to_module_speed[False] 1.0180ms 0.9439ms 1.0594 KOps/s 1.0263 KOps/s $\color{#35bf28}+3.23\%$
test_tc_init 18.3386ms 38.0012μs 26.3150 KOps/s 28.8817 KOps/s $\textbf{\color{#d91a1a}-8.89\%}$
test_tc_init_tensor_only 0.1067ms 10.8372μs 92.2750 KOps/s 92.8583 KOps/s $\color{#d91a1a}-0.63\%$
test_tc_init_nested 0.1750ms 69.5793μs 14.3721 KOps/s 14.3110 KOps/s $\color{#35bf28}+0.43\%$
test_tc_first_layer_tensor 22.6010μs 0.9090μs 1.1001 MOps/s 1.1061 MOps/s $\color{#d91a1a}-0.54\%$
test_tc_first_layer_tensor_only 1.9621μs 0.4186μs 2.3888 MOps/s 2.3051 MOps/s $\color{#35bf28}+3.63\%$
test_tc_first_layer_tensor_set 1.2874ms 2.9318μs 341.0901 KOps/s 336.1226 KOps/s $\color{#35bf28}+1.48\%$
test_tc_first_layer_tensor_only_set 11.0937μs 1.7773μs 562.6630 KOps/s 558.5681 KOps/s $\color{#35bf28}+0.73\%$
test_tc_first_layer_nontensor 22.8200μs 2.3057μs 433.7016 KOps/s 424.8757 KOps/s $\color{#35bf28}+2.08\%$
test_tc_second_layer_tensor 18.1610μs 1.7327μs 577.1222 KOps/s 577.7778 KOps/s $\color{#d91a1a}-0.11\%$
test_tc_second_layer_nontensor 35.6400μs 3.1994μs 312.5570 KOps/s 317.7929 KOps/s $\color{#d91a1a}-1.65\%$
test_unbind 0.2239s 10.2053ms 97.9878 Ops/s 144.5057 Ops/s $\textbf{\color{#d91a1a}-32.19\%}$
test_full_like 6.1515ms 4.4005ms 227.2448 Ops/s 60.7483 Ops/s $\textbf{\color{#35bf28}+274.08\%}$
test_zeros_like 9.2969ms 7.2846ms 137.2761 Ops/s 60.9158 Ops/s $\textbf{\color{#35bf28}+125.35\%}$
test_ones_like 5.3988ms 4.3579ms 229.4666 Ops/s 60.9035 Ops/s $\textbf{\color{#35bf28}+276.77\%}$
test_clone 14.0235ms 6.5682ms 152.2491 Ops/s 57.3364 Ops/s $\textbf{\color{#35bf28}+165.54\%}$
test_squeeze 58.5810μs 9.9706μs 100.2952 KOps/s 100.2334 KOps/s $\color{#35bf28}+0.06\%$
test_unsqueeze 0.2389ms 77.2366μs 12.9472 KOps/s 13.2679 KOps/s $\color{#d91a1a}-2.42\%$
test_split 0.2587ms 0.1608ms 6.2205 KOps/s 6.0014 KOps/s $\color{#35bf28}+3.65\%$
test_permute 0.3184ms 0.1924ms 5.1988 KOps/s 5.5412 KOps/s $\textbf{\color{#d91a1a}-6.18\%}$
test_stack 51.2096ms 50.9092ms 19.6428 Ops/s 19.7474 Ops/s $\color{#d91a1a}-0.53\%$
test_cat 51.2385ms 50.9155ms 19.6404 Ops/s 19.6192 Ops/s $\color{#35bf28}+0.11\%$

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 30, 2025
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 30, 2025
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 30, 2025
vmoens pushed a commit that referenced this pull request Apr 30, 2025
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 30, 2025
vmoens pushed a commit that referenced this pull request Apr 30, 2025
vmoens pushed a commit that referenced this pull request Apr 30, 2025
@vmoens vmoens changed the title [Setup] statically link your extension against the Python library [Setup] statically link _C extension against the Python library Apr 30, 2025
@vmoens vmoens merged commit 9a63fe2 into gh/vmoens/52/base Apr 30, 2025
193 of 198 checks passed
@vmoens vmoens deleted the gh/vmoens/52/head branch April 30, 2025 10:53
vmoens pushed a commit that referenced this pull request Apr 30, 2025
ghstack-source-id: 26f374e
Pull-Request-resolved: #1304
(cherry picked from commit af17524)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries/all Build all wheels CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants