Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Jan 27, 2025

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Jan 27, 2025
ghstack-source-id: 37aecd4
Pull Request resolved: #1193
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 27, 2025
@vmoens vmoens merged commit 4c08b0b into gh/vmoens/46/base Jan 27, 2025
19 of 25 checks passed
vmoens pushed a commit that referenced this pull request Jan 27, 2025
ghstack-source-id: 37aecd4
Pull Request resolved: #1193
@vmoens vmoens deleted the gh/vmoens/46/head branch January 27, 2025 00:29
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 229. Improved: $\large\color{#35bf28}43$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.5800μs 11.2227μs 89.1051 KOps/s 76.0996 KOps/s $\textbf{\color{#35bf28}+17.09\%}$
test_plain_set_stack_nested 38.7510μs 11.3568μs 88.0534 KOps/s 75.6218 KOps/s $\textbf{\color{#35bf28}+16.44\%}$
test_plain_set_nested_inplace 39.2010μs 12.3004μs 81.2979 KOps/s 69.8058 KOps/s $\textbf{\color{#35bf28}+16.46\%}$
test_plain_set_stack_nested_inplace 37.3110μs 12.4365μs 80.4086 KOps/s 70.3083 KOps/s $\textbf{\color{#35bf28}+14.37\%}$
test_items 22.8100μs 2.8826μs 346.9034 KOps/s 338.1740 KOps/s $\color{#35bf28}+2.58\%$
test_items_nested 0.4107ms 0.3625ms 2.7585 KOps/s 2.7210 KOps/s $\color{#35bf28}+1.38\%$
test_items_nested_locked 0.4911ms 0.3660ms 2.7323 KOps/s 2.7247 KOps/s $\color{#35bf28}+0.28\%$
test_items_nested_leaf 88.4200μs 58.9903μs 16.9519 KOps/s 16.8323 KOps/s $\color{#35bf28}+0.71\%$
test_items_stack_nested 0.4076ms 0.3664ms 2.7292 KOps/s 2.7259 KOps/s $\color{#35bf28}+0.12\%$
test_items_stack_nested_leaf 0.1093ms 60.0373μs 16.6563 KOps/s 16.6976 KOps/s $\color{#d91a1a}-0.25\%$
test_items_stack_nested_locked 0.5371ms 0.3641ms 2.7466 KOps/s 2.7284 KOps/s $\color{#35bf28}+0.66\%$
test_keys 26.3000μs 3.5071μs 285.1350 KOps/s 285.0917 KOps/s $\color{#35bf28}+0.02\%$
test_keys_nested 0.1297ms 89.4943μs 11.1739 KOps/s 11.0868 KOps/s $\color{#35bf28}+0.79\%$
test_keys_nested_locked 0.6351ms 94.7836μs 10.5503 KOps/s 10.4885 KOps/s $\color{#35bf28}+0.59\%$
test_keys_nested_leaf 0.1408ms 80.4822μs 12.4251 KOps/s 12.3376 KOps/s $\color{#35bf28}+0.71\%$
test_keys_stack_nested 0.1274ms 90.6699μs 11.0290 KOps/s 10.9788 KOps/s $\color{#35bf28}+0.46\%$
test_keys_stack_nested_leaf 0.1176ms 81.6282μs 12.2507 KOps/s 12.2828 KOps/s $\color{#d91a1a}-0.26\%$
test_keys_stack_nested_locked 0.1364ms 96.4540μs 10.3676 KOps/s 10.3949 KOps/s $\color{#d91a1a}-0.26\%$
test_values 6.8135μs 0.8723μs 1.1464 MOps/s 1.1487 MOps/s $\color{#d91a1a}-0.20\%$
test_values_nested 67.8400μs 37.3031μs 26.8074 KOps/s 26.5036 KOps/s $\color{#35bf28}+1.15\%$
test_values_nested_locked 65.5910μs 38.9987μs 25.6419 KOps/s 25.6454 KOps/s $\color{#d91a1a}-0.01\%$
test_values_nested_leaf 69.3600μs 41.8278μs 23.9075 KOps/s 23.8193 KOps/s $\color{#35bf28}+0.37\%$
test_values_stack_nested 68.3710μs 37.8887μs 26.3931 KOps/s 26.1537 KOps/s $\color{#35bf28}+0.92\%$
test_values_stack_nested_leaf 66.9010μs 42.3182μs 23.6305 KOps/s 23.3683 KOps/s $\color{#35bf28}+1.12\%$
test_values_stack_nested_locked 69.0200μs 39.7557μs 25.1536 KOps/s 25.2644 KOps/s $\color{#d91a1a}-0.44\%$
test_membership 1.7135μs 0.5061μs 1.9759 MOps/s 1.9551 MOps/s $\color{#35bf28}+1.07\%$
test_membership_nested 19.4100μs 2.0266μs 493.4333 KOps/s 486.5749 KOps/s $\color{#35bf28}+1.41\%$
test_membership_nested_leaf 19.0650μs 2.0379μs 490.7122 KOps/s 488.9835 KOps/s $\color{#35bf28}+0.35\%$
test_membership_stacked_nested 28.5100μs 2.1089μs 474.1831 KOps/s 480.1358 KOps/s $\color{#d91a1a}-1.24\%$
test_membership_stacked_nested_leaf 28.5500μs 2.0902μs 478.4222 KOps/s 479.4081 KOps/s $\color{#d91a1a}-0.21\%$
test_membership_nested_last 35.6600μs 3.0922μs 323.3991 KOps/s 320.6956 KOps/s $\color{#35bf28}+0.84\%$
test_membership_nested_leaf_last 39.5200μs 3.1284μs 319.6477 KOps/s 322.2644 KOps/s $\color{#d91a1a}-0.81\%$
test_membership_stacked_nested_last 0.9717ms 3.0502μs 327.8474 KOps/s 322.0950 KOps/s $\color{#35bf28}+1.79\%$
test_membership_stacked_nested_leaf_last 37.2600μs 3.0883μs 323.8066 KOps/s 323.2093 KOps/s $\color{#35bf28}+0.18\%$
test_nested_getleaf 28.5500μs 6.1499μs 162.6039 KOps/s 161.4464 KOps/s $\color{#35bf28}+0.72\%$
test_nested_get 26.5410μs 5.8423μs 171.1649 KOps/s 170.9464 KOps/s $\color{#35bf28}+0.13\%$
test_stacked_getleaf 31.3300μs 6.2023μs 161.2303 KOps/s 162.2969 KOps/s $\color{#d91a1a}-0.66\%$
test_stacked_get 33.4910μs 5.8628μs 170.5677 KOps/s 170.4133 KOps/s $\color{#35bf28}+0.09\%$
test_nested_getitemleaf 31.2410μs 6.4300μs 155.5207 KOps/s 154.5669 KOps/s $\color{#35bf28}+0.62\%$
test_nested_getitem 24.9410μs 6.1659μs 162.1827 KOps/s 161.0554 KOps/s $\color{#35bf28}+0.70\%$
test_stacked_getitemleaf 32.8310μs 6.4505μs 155.0263 KOps/s 154.0386 KOps/s $\color{#35bf28}+0.64\%$
test_stacked_getitem 32.3300μs 6.1438μs 162.7669 KOps/s 163.5470 KOps/s $\color{#d91a1a}-0.48\%$
test_lock_nested 0.4293ms 0.3367ms 2.9702 KOps/s 2.8541 KOps/s $\color{#35bf28}+4.07\%$
test_lock_stack_nested 0.5045ms 0.3437ms 2.9094 KOps/s 2.9069 KOps/s $\color{#35bf28}+0.09\%$
test_unlock_nested 0.3514ms 0.2795ms 3.5776 KOps/s 3.5622 KOps/s $\color{#35bf28}+0.43\%$
test_unlock_stack_nested 0.3161ms 0.2784ms 3.5922 KOps/s 3.5436 KOps/s $\color{#35bf28}+1.37\%$
test_flatten_speed 0.1245ms 76.2664μs 13.1119 KOps/s 12.8706 KOps/s $\color{#35bf28}+1.88\%$
test_unflatten_speed 0.3754ms 0.3237ms 3.0894 KOps/s 3.0648 KOps/s $\color{#35bf28}+0.80\%$
test_common_ops 0.7337ms 0.5669ms 1.7640 KOps/s 1.5509 KOps/s $\textbf{\color{#35bf28}+13.74\%}$
test_creation 0.1244ms 1.7773μs 562.6574 KOps/s 563.4787 KOps/s $\color{#d91a1a}-0.15\%$
test_creation_empty 29.4600μs 6.5890μs 151.7673 KOps/s 98.7118 KOps/s $\textbf{\color{#35bf28}+53.75\%}$
test_creation_nested_1 35.3610μs 8.1897μs 122.1039 KOps/s 84.5267 KOps/s $\textbf{\color{#35bf28}+44.46\%}$
test_creation_nested_2 42.2800μs 10.9560μs 91.2739 KOps/s 68.3253 KOps/s $\textbf{\color{#35bf28}+33.59\%}$
test_clone 57.4400μs 9.9403μs 100.6004 KOps/s 98.8259 KOps/s $\color{#35bf28}+1.80\%$
test_getitem[int] 1.3087ms 10.7694μs 92.8559 KOps/s 92.7630 KOps/s $\color{#35bf28}+0.10\%$
test_getitem[slice_int] 0.1110ms 21.0142μs 47.5870 KOps/s 47.2514 KOps/s $\color{#35bf28}+0.71\%$
test_getitem[range] 0.1270ms 37.7573μs 26.4849 KOps/s 26.9210 KOps/s $\color{#d91a1a}-1.62\%$
test_getitem[tuple] 0.1045ms 18.0994μs 55.2503 KOps/s 53.3228 KOps/s $\color{#35bf28}+3.61\%$
test_getitem[list] 0.1729ms 32.2538μs 31.0041 KOps/s 30.5433 KOps/s $\color{#35bf28}+1.51\%$
test_setitem_dim[int] 44.9010μs 18.3053μs 54.6289 KOps/s 52.8129 KOps/s $\color{#35bf28}+3.44\%$
test_setitem_dim[slice_int] 62.2810μs 38.3449μs 26.0791 KOps/s 26.6257 KOps/s $\color{#d91a1a}-2.05\%$
test_setitem_dim[range] 74.2010μs 52.5509μs 19.0292 KOps/s 18.8310 KOps/s $\color{#35bf28}+1.05\%$
test_setitem_dim[tuple] 52.5010μs 31.0871μs 32.1677 KOps/s 31.4688 KOps/s $\color{#35bf28}+2.22\%$
test_setitem 71.8600μs 13.2651μs 75.3859 KOps/s 64.1868 KOps/s $\textbf{\color{#35bf28}+17.45\%}$
test_set 62.3410μs 13.1477μs 76.0589 KOps/s 65.3940 KOps/s $\textbf{\color{#35bf28}+16.31\%}$
test_set_shared 0.5078ms 0.1593ms 6.2777 KOps/s 6.2352 KOps/s $\color{#35bf28}+0.68\%$
test_update 0.3976ms 14.8236μs 67.4600 KOps/s 52.8273 KOps/s $\textbf{\color{#35bf28}+27.70\%}$
test_update_nested 87.4410μs 20.1101μs 49.7264 KOps/s 40.6527 KOps/s $\textbf{\color{#35bf28}+22.32\%}$
test_update__nested 0.5331ms 24.7064μs 40.4754 KOps/s 40.0788 KOps/s $\color{#35bf28}+0.99\%$
test_set_nested 75.7400μs 14.2084μs 70.3810 KOps/s 60.6238 KOps/s $\textbf{\color{#35bf28}+16.09\%}$
test_set_nested_new 0.1114ms 16.3515μs 61.1565 KOps/s 53.2189 KOps/s $\textbf{\color{#35bf28}+14.92\%}$
test_select 93.4410μs 27.3962μs 36.5015 KOps/s 32.2179 KOps/s $\textbf{\color{#35bf28}+13.30\%}$
test_select_nested 76.5700μs 44.0685μs 22.6919 KOps/s 22.5566 KOps/s $\color{#35bf28}+0.60\%$
test_exclude_nested 97.7410μs 62.9466μs 15.8865 KOps/s 15.6145 KOps/s $\color{#35bf28}+1.74\%$
test_empty[True] 0.3461ms 0.2956ms 3.3825 KOps/s 3.3394 KOps/s $\color{#35bf28}+1.29\%$
test_empty[False] 3.5850μs 0.8296μs 1.2053 MOps/s 1.1948 MOps/s $\color{#35bf28}+0.89\%$
test_to 87.8710μs 56.7949μs 17.6072 KOps/s 17.3287 KOps/s $\color{#35bf28}+1.61\%$
test_to_nonblocking 88.8000μs 49.0856μs 20.3726 KOps/s 21.3146 KOps/s $\color{#d91a1a}-4.42\%$
test_unbind_speed 0.2959ms 0.2429ms 4.1175 KOps/s 4.1542 KOps/s $\color{#d91a1a}-0.88\%$
test_unbind_speed_stack0 0.2957ms 0.2398ms 4.1696 KOps/s 4.1333 KOps/s $\color{#35bf28}+0.88\%$
test_unbind_speed_stack1 93.5189ms 0.7354ms 1.3598 KOps/s 1.2282 KOps/s $\textbf{\color{#35bf28}+10.71\%}$
test_split 95.4502ms 1.5993ms 625.2800 Ops/s 627.6837 Ops/s $\color{#d91a1a}-0.38\%$
test_chunk 95.2710ms 1.5969ms 626.2009 Ops/s 626.6563 Ops/s $\color{#d91a1a}-0.07\%$
test_consolidate[False-None] 3.5966ms 2.7151ms 368.3078 Ops/s 364.5578 Ops/s $\color{#35bf28}+1.03\%$
test_consolidate[default-None] 2.1643ms 1.7163ms 582.6445 Ops/s 585.4959 Ops/s $\color{#d91a1a}-0.49\%$
test_consolidate[reduce-overhead-None] 1.8352ms 1.7360ms 576.0458 Ops/s 577.7619 Ops/s $\color{#d91a1a}-0.30\%$
test_consolidate_njt[False-None] 6.7483ms 6.4509ms 155.0173 Ops/s 155.5221 Ops/s $\color{#d91a1a}-0.32\%$
test_to[False-False-None] 1.7968ms 1.6892ms 591.9905 Ops/s 606.3551 Ops/s $\color{#d91a1a}-2.37\%$
test_to[True-False-None] 1.5396ms 1.3508ms 740.2953 Ops/s 762.9142 Ops/s $\color{#d91a1a}-2.96\%$
test_to[within-False-None] 4.4454ms 4.1174ms 242.8702 Ops/s 239.7858 Ops/s $\color{#35bf28}+1.29\%$
test_to[True-default-None] 5.4427ms 5.1987ms 192.3543 Ops/s 193.2269 Ops/s $\color{#d91a1a}-0.45\%$
test_to_njt[False-False-None] 7.0005ms 6.8783ms 145.3855 Ops/s 146.1084 Ops/s $\color{#d91a1a}-0.49\%$
test_to_njt[True-False-None] 5.6766ms 5.4454ms 183.6425 Ops/s 184.2486 Ops/s $\color{#d91a1a}-0.33\%$
test_to_njt[within-False-None] 12.4822ms 12.0902ms 82.7118 Ops/s 83.5055 Ops/s $\color{#d91a1a}-0.95\%$
test_creation[device0] 0.3607ms 81.9136μs 12.2080 KOps/s 12.3579 KOps/s $\color{#d91a1a}-1.21\%$
test_creation_from_tensor 0.6071ms 84.9544μs 11.7710 KOps/s 11.8649 KOps/s $\color{#d91a1a}-0.79\%$
test_add_one[memmap_tensor0] 0.4147ms 6.2198μs 160.7764 KOps/s 161.4665 KOps/s $\color{#d91a1a}-0.43\%$
test_contiguous[memmap_tensor0] 5.2901μs 0.4205μs 2.3781 MOps/s 2.3310 MOps/s $\color{#35bf28}+2.02\%$
test_stack[memmap_tensor0] 98.7810μs 4.4609μs 224.1683 KOps/s 233.0354 KOps/s $\color{#d91a1a}-3.81\%$
test_memmaptd_index 0.4494ms 0.2372ms 4.2162 KOps/s 4.1958 KOps/s $\color{#35bf28}+0.49\%$
test_memmaptd_index_astensor 0.4304ms 0.2978ms 3.3582 KOps/s 3.3206 KOps/s $\color{#35bf28}+1.13\%$
test_memmaptd_index_op 0.7793ms 0.5292ms 1.8895 KOps/s 1.7217 KOps/s $\textbf{\color{#35bf28}+9.75\%}$
test_serialize_model 0.1322s 0.1303s 7.6734 Ops/s 7.6257 Ops/s $\color{#35bf28}+0.63\%$
test_serialize_model_pickle 1.3515s 1.2136s 0.8240 Ops/s 0.8232 Ops/s $\color{#35bf28}+0.10\%$
test_serialize_weights 0.4069s 0.1695s 5.8998 Ops/s 5.5325 Ops/s $\textbf{\color{#35bf28}+6.64\%}$
test_serialize_weights_returnearly 0.3319s 54.7898ms 18.2516 Ops/s 23.1837 Ops/s $\textbf{\color{#d91a1a}-21.27\%}$
test_serialize_weights_pickle 1.3774s 1.2184s 0.8208 Ops/s 0.8191 Ops/s $\color{#35bf28}+0.20\%$
test_reshape_pytree 0.1041ms 21.8203μs 45.8290 KOps/s 44.7269 KOps/s $\color{#35bf28}+2.46\%$
test_reshape_td 0.4240ms 26.3344μs 37.9731 KOps/s 36.6777 KOps/s $\color{#35bf28}+3.53\%$
test_view_pytree 58.9200μs 21.7455μs 45.9866 KOps/s 45.4532 KOps/s $\color{#35bf28}+1.17\%$
test_view_td 0.1133ms 29.1749μs 34.2760 KOps/s 30.3140 KOps/s $\textbf{\color{#35bf28}+13.07\%}$
test_unbind_pytree 65.7410μs 27.4083μs 36.4854 KOps/s 36.3689 KOps/s $\color{#35bf28}+0.32\%$
test_unbind_td 0.7493ms 35.8784μs 27.8720 KOps/s 27.6181 KOps/s $\color{#35bf28}+0.92\%$
test_split_pytree 90.7810μs 29.9023μs 33.4422 KOps/s 32.9532 KOps/s $\color{#35bf28}+1.48\%$
test_split_td 0.9218ms 38.3120μs 26.1015 KOps/s 25.3836 KOps/s $\color{#35bf28}+2.83\%$
test_add_pytree 78.3610μs 32.2374μs 31.0198 KOps/s 30.9932 KOps/s $\color{#35bf28}+0.09\%$
test_add_td 0.1354ms 43.2456μs 23.1237 KOps/s 20.5867 KOps/s $\textbf{\color{#35bf28}+12.32\%}$
test_compile_add_one_nested[tensordict-compile] 0.1798ms 0.1253ms 7.9820 KOps/s 7.8026 KOps/s $\color{#35bf28}+2.30\%$
test_compile_add_one_nested[tensordict-eager] 0.2317ms 0.1336ms 7.4857 KOps/s 7.5877 KOps/s $\color{#d91a1a}-1.34\%$
test_compile_add_one_nested[pytree-compile] 0.2098ms 96.8371μs 10.3266 KOps/s 10.3178 KOps/s $\color{#35bf28}+0.09\%$
test_compile_add_one_nested[pytree-eager] 0.2519ms 0.1469ms 6.8089 KOps/s 6.9024 KOps/s $\color{#d91a1a}-1.35\%$
test_compile_copy_nested[tensordict-compile] 0.1266ms 22.6140μs 44.2203 KOps/s 44.4719 KOps/s $\color{#d91a1a}-0.57\%$
test_compile_copy_nested[tensordict-eager] 0.1224ms 29.7789μs 33.5808 KOps/s 33.1889 KOps/s $\color{#35bf28}+1.18\%$
test_compile_copy_nested[pytree-compile] 0.3256ms 63.4717μs 15.7550 KOps/s 15.3592 KOps/s $\color{#35bf28}+2.58\%$
test_compile_copy_nested[pytree-eager] 0.1268ms 48.8193μs 20.4837 KOps/s 19.9684 KOps/s $\color{#35bf28}+2.58\%$
test_compile_add_one_flat[tensordict-compile] 0.1836ms 0.1433ms 6.9764 KOps/s 7.0085 KOps/s $\color{#d91a1a}-0.46\%$
test_compile_add_one_flat[tensordict-eager] 0.3243ms 0.2184ms 4.5779 KOps/s 4.6800 KOps/s $\color{#d91a1a}-2.18\%$
test_compile_add_one_flat[tensorclass-compile] 0.2138ms 99.0959μs 10.0912 KOps/s 10.1351 KOps/s $\color{#d91a1a}-0.43\%$
test_compile_add_one_flat[tensorclass-eager] 0.1530ms 56.1303μs 17.8157 KOps/s 18.3797 KOps/s $\color{#d91a1a}-3.07\%$
test_compile_add_one_flat[pytree-compile] 0.2458ms 0.1360ms 7.3528 KOps/s 7.4282 KOps/s $\color{#d91a1a}-1.01\%$
test_compile_add_one_flat[pytree-eager] 0.6212ms 0.4661ms 2.1455 KOps/s 2.1413 KOps/s $\color{#35bf28}+0.20\%$
test_compile_add_self_flat[tensordict-eager] 0.6643ms 0.2617ms 3.8211 KOps/s 3.8680 KOps/s $\color{#d91a1a}-1.21\%$
test_compile_add_self_flat[tensordict-compile] 0.5739ms 0.1458ms 6.8568 KOps/s 6.9974 KOps/s $\color{#d91a1a}-2.01\%$
test_compile_add_self_flat[tensorclass-eager] 0.4877ms 69.7741μs 14.3320 KOps/s 14.8473 KOps/s $\color{#d91a1a}-3.47\%$
test_compile_add_self_flat[tensorclass-compile] 0.1512ms 0.1012ms 9.8825 KOps/s 10.1156 KOps/s $\color{#d91a1a}-2.30\%$
test_compile_add_self_flat[pytree-eager] 0.5357ms 0.4038ms 2.4762 KOps/s 2.4920 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_add_self_flat[pytree-compile] 0.1999ms 0.1363ms 7.3375 KOps/s 7.5064 KOps/s $\color{#d91a1a}-2.25\%$
test_compile_copy_flat[tensordict-compile] 92.2410μs 17.4488μs 57.3107 KOps/s 57.8757 KOps/s $\color{#d91a1a}-0.98\%$
test_compile_copy_flat[tensordict-eager] 0.1301ms 33.0320μs 30.2737 KOps/s 30.4628 KOps/s $\color{#d91a1a}-0.62\%$
test_compile_copy_flat[pytree-compile] 0.2044ms 70.4585μs 14.1928 KOps/s 14.2220 KOps/s $\color{#d91a1a}-0.21\%$
test_compile_copy_flat[pytree-eager] 0.1336ms 52.4463μs 19.0671 KOps/s 19.4143 KOps/s $\color{#d91a1a}-1.79\%$
test_compile_assign_and_add[tensordict-compile] 1.6376ms 0.3934ms 2.5417 KOps/s 2.1398 KOps/s $\textbf{\color{#35bf28}+18.78\%}$
test_compile_assign_and_add[tensordict-eager] 2.9236ms 2.6497ms 377.3974 Ops/s 388.0128 Ops/s $\color{#d91a1a}-2.74\%$
test_compile_assign_and_add[pytree-compile] 1.6026ms 0.4342ms 2.3030 KOps/s 2.1970 KOps/s $\color{#35bf28}+4.83\%$
test_compile_assign_and_add[pytree-eager] 2.9309ms 2.8062ms 356.3568 Ops/s 357.6045 Ops/s $\color{#d91a1a}-0.35\%$
test_compile_indexing[tensor-tensordict-compile] 0.7532ms 0.1173ms 8.5276 KOps/s 8.3660 KOps/s $\color{#35bf28}+1.93\%$
test_compile_indexing[tensor-tensordict-eager] 0.5982ms 80.2971μs 12.4537 KOps/s 11.4366 KOps/s $\textbf{\color{#35bf28}+8.89\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.7154ms 0.1092ms 9.1585 KOps/s 9.0614 KOps/s $\color{#35bf28}+1.07\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1134ms 67.3689μs 14.8436 KOps/s 14.3588 KOps/s $\color{#35bf28}+3.38\%$
test_compile_indexing[tensor-pytree-compile] 0.1763ms 0.1126ms 8.8772 KOps/s 9.4793 KOps/s $\textbf{\color{#d91a1a}-6.35\%}$
test_compile_indexing[tensor-pytree-eager] 0.2109ms 71.7854μs 13.9304 KOps/s 13.5326 KOps/s $\color{#35bf28}+2.94\%$
test_compile_indexing[slice-tensordict-compile] 0.1488ms 0.1019ms 9.8089 KOps/s 9.1398 KOps/s $\textbf{\color{#35bf28}+7.32\%}$
test_compile_indexing[slice-tensordict-eager] 0.1439ms 17.5984μs 56.8235 KOps/s 49.8601 KOps/s $\textbf{\color{#35bf28}+13.97\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1732ms 97.4555μs 10.2611 KOps/s 9.5728 KOps/s $\textbf{\color{#35bf28}+7.19\%}$
test_compile_indexing[slice-tensorclass-eager] 47.8900μs 16.0068μs 62.4736 KOps/s 63.3868 KOps/s $\color{#d91a1a}-1.44\%$
test_compile_indexing[slice-pytree-compile] 0.1518ms 0.1006ms 9.9433 KOps/s 10.1310 KOps/s $\color{#d91a1a}-1.85\%$
test_compile_indexing[slice-pytree-eager] 51.5910μs 15.5241μs 64.4161 KOps/s 62.5365 KOps/s $\color{#35bf28}+3.01\%$
test_compile_indexing[int-tensordict-compile] 0.2103ms 0.1072ms 9.3244 KOps/s 8.9962 KOps/s $\color{#35bf28}+3.65\%$
test_compile_indexing[int-tensordict-eager] 0.5718ms 17.0666μs 58.5939 KOps/s 49.2407 KOps/s $\textbf{\color{#35bf28}+18.99\%}$
test_compile_indexing[int-tensorclass-compile] 0.1622ms 0.1014ms 9.8583 KOps/s 9.3432 KOps/s $\textbf{\color{#35bf28}+5.51\%}$
test_compile_indexing[int-tensorclass-eager] 54.8900μs 15.5868μs 64.1569 KOps/s 57.9995 KOps/s $\textbf{\color{#35bf28}+10.62\%}$
test_compile_indexing[int-pytree-compile] 0.1510ms 97.6531μs 10.2403 KOps/s 9.3809 KOps/s $\textbf{\color{#35bf28}+9.16\%}$
test_compile_indexing[int-pytree-eager] 66.6010μs 20.7239μs 48.2534 KOps/s 54.1669 KOps/s $\textbf{\color{#d91a1a}-10.92\%}$
test_mod_add[eager] 80.2610μs 36.4818μs 27.4110 KOps/s 24.3688 KOps/s $\textbf{\color{#35bf28}+12.48\%}$
test_mod_add[compile] 0.5366ms 81.0890μs 12.3321 KOps/s 12.2260 KOps/s $\color{#35bf28}+0.87\%$
test_mod_add[compile-overhead] 0.3347ms 0.1733ms 5.7706 KOps/s 5.5342 KOps/s $\color{#35bf28}+4.27\%$
test_mod_wrap[eager] 0.3268ms 0.2435ms 4.1069 KOps/s 3.9021 KOps/s $\textbf{\color{#35bf28}+5.25\%}$
test_mod_wrap[compile] 0.3437ms 0.2857ms 3.5002 KOps/s 3.4788 KOps/s $\color{#35bf28}+0.61\%$
test_mod_wrap[compile-overhead] 7.1900ms 3.7257ms 268.4083 Ops/s 274.3644 Ops/s $\color{#d91a1a}-2.17\%$
test_mod_wrap_and_backward[eager] 1.5758ms 1.4302ms 699.1787 Ops/s 692.8986 Ops/s $\color{#35bf28}+0.91\%$
test_mod_wrap_and_backward[compile] 1.4411ms 1.3364ms 748.2597 Ops/s 730.7943 Ops/s $\color{#35bf28}+2.39\%$
test_mod_wrap_and_backward[compile-overhead] 1.4185ms 0.9329ms 1.0720 KOps/s 968.5536 Ops/s $\textbf{\color{#35bf28}+10.68\%}$
test_seq_add[eager] 0.6074ms 0.1187ms 8.4226 KOps/s 8.2317 KOps/s $\color{#35bf28}+2.32\%$
test_seq_add[compile] 0.1941ms 92.2211μs 10.8435 KOps/s 10.6258 KOps/s $\color{#35bf28}+2.05\%$
test_seq_add[compile-overhead] 0.1826ms 0.1384ms 7.2240 KOps/s 7.6031 KOps/s $\color{#d91a1a}-4.99\%$
test_seq_wrap[eager] 0.4954ms 0.4243ms 2.3566 KOps/s 2.2030 KOps/s $\textbf{\color{#35bf28}+6.97\%}$
test_seq_wrap[compile] 0.3693ms 0.3009ms 3.3231 KOps/s 3.3013 KOps/s $\color{#35bf28}+0.66\%$
test_seq_wrap[compile-overhead] 0.2899ms 0.2277ms 4.3918 KOps/s 4.3570 KOps/s $\color{#35bf28}+0.80\%$
test_func_call_runtime[False-eager] 0.7732ms 0.7059ms 1.4166 KOps/s 1.2612 KOps/s $\textbf{\color{#35bf28}+12.32\%}$
test_func_call_runtime[False-compile] 0.8040ms 0.7313ms 1.3675 KOps/s 1.3186 KOps/s $\color{#35bf28}+3.71\%$
test_func_call_runtime[False-compile-overhead] 0.4187ms 0.3686ms 2.7128 KOps/s 2.6888 KOps/s $\color{#35bf28}+0.89\%$
test_func_call_runtime[True-eager] 0.9776ms 0.8788ms 1.1379 KOps/s 1.1149 KOps/s $\color{#35bf28}+2.06\%$
test_func_call_runtime[True-compile] 0.8148ms 0.7550ms 1.3245 KOps/s 1.3034 KOps/s $\color{#35bf28}+1.61\%$
test_func_call_runtime[True-compile-overhead] 0.4562ms 0.3907ms 2.5594 KOps/s 2.5576 KOps/s $\color{#35bf28}+0.07\%$
test_func_call_cm_runtime[False-eager] 0.7970ms 0.7016ms 1.4253 KOps/s 1.2571 KOps/s $\textbf{\color{#35bf28}+13.38\%}$
test_func_call_cm_runtime[False-compile] 0.8477ms 0.7336ms 1.3631 KOps/s 1.3246 KOps/s $\color{#35bf28}+2.91\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4224ms 0.3703ms 2.7006 KOps/s 2.6809 KOps/s $\color{#35bf28}+0.74\%$
test_func_call_cm_runtime[True-eager] 1.0741ms 0.9739ms 1.0268 KOps/s 936.3896 Ops/s $\textbf{\color{#35bf28}+9.66\%}$
test_func_call_cm_runtime[True-compile] 1.0817ms 0.9617ms 1.0398 KOps/s 1.0211 KOps/s $\color{#35bf28}+1.83\%$
test_func_call_cm_runtime[True-compile-overhead] 1.0411ms 0.9621ms 1.0394 KOps/s 1.0161 KOps/s $\color{#35bf28}+2.29\%$
test_vmap_func_call_cm_runtime[eager] 2.4322ms 2.0226ms 494.4221 Ops/s 484.9704 Ops/s $\color{#35bf28}+1.95\%$
test_vmap_func_call_cm_runtime[compile] 0.8591ms 0.7976ms 1.2537 KOps/s 1.2078 KOps/s $\color{#35bf28}+3.80\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4706ms 0.4230ms 2.3639 KOps/s 2.3456 KOps/s $\color{#35bf28}+0.78\%$
test_distributed 3.9814ms 0.1940ms 5.1545 KOps/s 8.5213 KOps/s $\textbf{\color{#d91a1a}-39.51\%}$
test_tdmodule 0.4462ms 19.3383μs 51.7107 KOps/s 49.0122 KOps/s $\textbf{\color{#35bf28}+5.51\%}$
test_tdmodule_dispatch 51.8400μs 32.5937μs 30.6808 KOps/s 24.6353 KOps/s $\textbf{\color{#35bf28}+24.54\%}$
test_tdseq 37.9900μs 19.6552μs 50.8772 KOps/s 42.5406 KOps/s $\textbf{\color{#35bf28}+19.60\%}$
test_tdseq_dispatch 70.5000μs 36.6718μs 27.2689 KOps/s 22.7332 KOps/s $\textbf{\color{#35bf28}+19.95\%}$
test_instantiation_functorch 1.7651ms 1.5805ms 632.7133 Ops/s 635.6256 Ops/s $\color{#d91a1a}-0.46\%$
test_exec_functorch 0.1933ms 0.1435ms 6.9679 KOps/s 6.6986 KOps/s $\color{#35bf28}+4.02\%$
test_exec_functional_call 0.1992ms 0.1362ms 7.3448 KOps/s 7.0965 KOps/s $\color{#35bf28}+3.50\%$
test_exec_td_decorator 0.3689ms 0.1836ms 5.4467 KOps/s 5.1444 KOps/s $\textbf{\color{#35bf28}+5.88\%}$
test_vmap_mlp_speed_decorator[True-True] 0.8284ms 0.6815ms 1.4673 KOps/s 1.4577 KOps/s $\color{#35bf28}+0.66\%$
test_vmap_mlp_speed_decorator[True-False] 0.7956ms 0.6662ms 1.5009 KOps/s 1.4817 KOps/s $\color{#35bf28}+1.30\%$
test_vmap_mlp_speed_decorator[False-True] 0.7246ms 0.6092ms 1.6415 KOps/s 1.7316 KOps/s $\textbf{\color{#d91a1a}-5.21\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7184ms 0.6088ms 1.6426 KOps/s 1.7303 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_vmap_transformer_speed_decorator[True-True] 19.7670ms 18.7875ms 53.2269 Ops/s 53.9603 Ops/s $\color{#d91a1a}-1.36\%$
test_vmap_transformer_speed_decorator[True-False] 19.3546ms 18.7027ms 53.4683 Ops/s 54.0645 Ops/s $\color{#d91a1a}-1.10\%$
test_vmap_transformer_speed_decorator[False-True] 18.9089ms 18.5708ms 53.8480 Ops/s 54.5691 Ops/s $\color{#d91a1a}-1.32\%$
test_vmap_transformer_speed_decorator[False-False] 18.7033ms 18.5763ms 53.8320 Ops/s 54.5187 Ops/s $\color{#d91a1a}-1.26\%$
test_to_module_speed[True] 1.0435ms 0.9712ms 1.0296 KOps/s 1.0276 KOps/s $\color{#35bf28}+0.20\%$
test_to_module_speed[False] 1.0377ms 0.9617ms 1.0398 KOps/s 1.0397 KOps/s $+0.01\%$
test_tc_init 76.1800μs 34.4252μs 29.0485 KOps/s 26.4293 KOps/s $\textbf{\color{#35bf28}+9.91\%}$
test_tc_init_nested 0.1122ms 70.4584μs 14.1928 KOps/s 13.3059 KOps/s $\textbf{\color{#35bf28}+6.67\%}$
test_tc_first_layer_tensor 5.3971μs 0.6971μs 1.4345 MOps/s 1.4110 MOps/s $\color{#35bf28}+1.67\%$
test_tc_first_layer_nontensor 27.7800μs 2.2738μs 439.7862 KOps/s 442.0247 KOps/s $\color{#d91a1a}-0.51\%$
test_tc_second_layer_tensor 9.0400μs 1.4155μs 706.4657 KOps/s 705.7274 KOps/s $\color{#35bf28}+0.10\%$
test_tc_second_layer_nontensor 33.0500μs 3.0033μs 332.9698 KOps/s 331.4314 KOps/s $\color{#35bf28}+0.46\%$
test_unbind 7.3057ms 7.0006ms 142.8441 Ops/s 142.5800 Ops/s $\color{#35bf28}+0.19\%$
test_full_like 10.7435ms 9.2219ms 108.4376 Ops/s 106.6513 Ops/s $\color{#35bf28}+1.67\%$
test_zeros_like 4.9619ms 4.3343ms 230.7191 Ops/s 230.7546 Ops/s $\color{#d91a1a}-0.02\%$
test_ones_like 4.4559ms 4.3390ms 230.4679 Ops/s 230.5699 Ops/s $\color{#d91a1a}-0.04\%$
test_clone 11.9361ms 9.2295ms 108.3482 Ops/s 156.6431 Ops/s $\textbf{\color{#d91a1a}-30.83\%}$
test_squeeze 85.1810μs 10.0462μs 99.5399 KOps/s 104.0000 KOps/s $\color{#d91a1a}-4.29\%$
test_unsqueeze 0.1273ms 75.7221μs 13.2062 KOps/s 13.9445 KOps/s $\textbf{\color{#d91a1a}-5.29\%}$
test_split 0.2962ms 0.1602ms 6.2424 KOps/s 6.3516 KOps/s $\color{#d91a1a}-1.72\%$
test_permute 0.7284ms 0.1812ms 5.5199 KOps/s 5.7894 KOps/s $\color{#d91a1a}-4.65\%$
test_stack 50.9250ms 50.6804ms 19.7315 Ops/s 19.8739 Ops/s $\color{#d91a1a}-0.72\%$
test_cat 50.9058ms 50.6363ms 19.7487 Ops/s 19.8963 Ops/s $\color{#d91a1a}-0.74\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants