Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Oct 8, 2024

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: 37652e7
Pull Request resolved: #1034
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 8, 2024
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: 05cd544
Pull Request resolved: #1034
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: af94039
Pull Request resolved: #1034
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: ebc9c94
Pull Request resolved: #1034
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: 81c741e
Pull Request resolved: #1034
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: fc3439b
Pull Request resolved: #1034
[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: 817cf31
Pull Request resolved: #1034
@github-actions
Copy link

github-actions bot commented Oct 8, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 218. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 39.6910μs 16.7877μs 59.5675 KOps/s 60.8527 KOps/s $\color{#d91a1a}-2.11\%$
test_plain_set_stack_nested 54.8200μs 16.8615μs 59.3068 KOps/s 60.0456 KOps/s $\color{#d91a1a}-1.23\%$
test_plain_set_nested_inplace 0.1577ms 18.1021μs 55.2423 KOps/s 56.8466 KOps/s $\color{#d91a1a}-2.82\%$
test_plain_set_stack_nested_inplace 55.3010μs 18.0884μs 55.2840 KOps/s 56.5874 KOps/s $\color{#d91a1a}-2.30\%$
test_items 27.3500μs 2.8866μs 346.4255 KOps/s 341.6969 KOps/s $\color{#35bf28}+1.38\%$
test_items_nested 0.3971ms 0.3405ms 2.9372 KOps/s 2.8719 KOps/s $\color{#35bf28}+2.27\%$
test_items_nested_locked 0.4057ms 0.3447ms 2.9011 KOps/s 2.8434 KOps/s $\color{#35bf28}+2.03\%$
test_items_nested_leaf 97.9610μs 62.7772μs 15.9293 KOps/s 15.7661 KOps/s $\color{#35bf28}+1.04\%$
test_items_stack_nested 0.4035ms 0.3455ms 2.8942 KOps/s 2.8715 KOps/s $\color{#35bf28}+0.79\%$
test_items_stack_nested_leaf 97.2710μs 64.6526μs 15.4673 KOps/s 15.2943 KOps/s $\color{#35bf28}+1.13\%$
test_items_stack_nested_locked 0.4037ms 0.3451ms 2.8979 KOps/s 2.8374 KOps/s $\color{#35bf28}+2.13\%$
test_keys 31.3000μs 3.4559μs 289.3592 KOps/s 286.4553 KOps/s $\color{#35bf28}+1.01\%$
test_keys_nested 0.1024ms 71.6294μs 13.9607 KOps/s 13.7959 KOps/s $\color{#35bf28}+1.19\%$
test_keys_nested_locked 2.8082ms 77.3051μs 12.9358 KOps/s 12.7332 KOps/s $\color{#35bf28}+1.59\%$
test_keys_nested_leaf 0.2094ms 62.0729μs 16.1101 KOps/s 15.8101 KOps/s $\color{#35bf28}+1.90\%$
test_keys_stack_nested 0.1006ms 70.4771μs 14.1890 KOps/s 13.7162 KOps/s $\color{#35bf28}+3.45\%$
test_keys_stack_nested_leaf 95.1510μs 62.7398μs 15.9388 KOps/s 15.4902 KOps/s $\color{#35bf28}+2.90\%$
test_keys_stack_nested_locked 0.3485ms 76.9485μs 12.9957 KOps/s 12.6792 KOps/s $\color{#35bf28}+2.50\%$
test_values 5.4733μs 0.8387μs 1.1923 MOps/s 1.1783 MOps/s $\color{#35bf28}+1.18\%$
test_values_nested 98.4910μs 49.7683μs 20.0931 KOps/s 20.1560 KOps/s $\color{#d91a1a}-0.31\%$
test_values_nested_locked 87.4510μs 51.2689μs 19.5050 KOps/s 19.4419 KOps/s $\color{#35bf28}+0.32\%$
test_values_nested_leaf 77.8410μs 43.8854μs 22.7866 KOps/s 22.7689 KOps/s $\color{#35bf28}+0.08\%$
test_values_stack_nested 81.1800μs 50.9202μs 19.6386 KOps/s 19.4595 KOps/s $\color{#35bf28}+0.92\%$
test_values_stack_nested_leaf 69.4600μs 44.1915μs 22.6288 KOps/s 22.3633 KOps/s $\color{#35bf28}+1.19\%$
test_values_stack_nested_locked 84.1710μs 52.2211μs 19.1493 KOps/s 19.0475 KOps/s $\color{#35bf28}+0.53\%$
test_membership 2.6060μs 0.5327μs 1.8773 MOps/s 1.8664 MOps/s $\color{#35bf28}+0.58\%$
test_membership_nested 96.2360μs 1.9218μs 520.3382 KOps/s 505.9539 KOps/s $\color{#35bf28}+2.84\%$
test_membership_nested_leaf 98.4810μs 1.9113μs 523.2077 KOps/s 508.0035 KOps/s $\color{#35bf28}+2.99\%$
test_membership_stacked_nested 33.0000μs 1.9661μs 508.6129 KOps/s 480.3579 KOps/s $\textbf{\color{#35bf28}+5.88\%}$
test_membership_stacked_nested_leaf 42.3100μs 1.9812μs 504.7380 KOps/s 484.8572 KOps/s $\color{#35bf28}+4.10\%$
test_membership_nested_last 33.8400μs 3.0670μs 326.0515 KOps/s 315.3593 KOps/s $\color{#35bf28}+3.39\%$
test_membership_nested_leaf_last 39.9710μs 3.1188μs 320.6368 KOps/s 315.9096 KOps/s $\color{#35bf28}+1.50\%$
test_membership_stacked_nested_last 0.1638ms 8.3978μs 119.0791 KOps/s 234.8492 KOps/s $\textbf{\color{#d91a1a}-49.30\%}$
test_membership_stacked_nested_leaf_last 34.6900μs 8.4088μs 118.9233 KOps/s 235.1299 KOps/s $\textbf{\color{#d91a1a}-49.42\%}$
test_nested_getleaf 28.9200μs 6.1665μs 162.1660 KOps/s 159.9920 KOps/s $\color{#35bf28}+1.36\%$
test_nested_get 0.1991ms 5.8970μs 169.5781 KOps/s 169.0383 KOps/s $\color{#35bf28}+0.32\%$
test_stacked_getleaf 0.1824ms 6.2603μs 159.7380 KOps/s 161.3983 KOps/s $\color{#d91a1a}-1.03\%$
test_stacked_get 34.8210μs 5.8928μs 169.6995 KOps/s 172.1933 KOps/s $\color{#d91a1a}-1.45\%$
test_nested_getitemleaf 25.8000μs 6.3075μs 158.5423 KOps/s 159.1990 KOps/s $\color{#d91a1a}-0.41\%$
test_nested_getitem 29.3310μs 5.9749μs 167.3661 KOps/s 169.0944 KOps/s $\color{#d91a1a}-1.02\%$
test_stacked_getitemleaf 27.0800μs 6.3199μs 158.2300 KOps/s 158.2829 KOps/s $\color{#d91a1a}-0.03\%$
test_stacked_getitem 30.4410μs 5.9407μs 168.3294 KOps/s 171.7870 KOps/s $\color{#d91a1a}-2.01\%$
test_lock_nested 6.6288ms 0.4371ms 2.2879 KOps/s 2.2880 KOps/s $-0.01\%$
test_lock_stack_nested 0.6017ms 0.3851ms 2.5966 KOps/s 2.5157 KOps/s $\color{#35bf28}+3.21\%$
test_unlock_nested 0.8499ms 0.3708ms 2.6970 KOps/s 2.6777 KOps/s $\color{#35bf28}+0.72\%$
test_unlock_stack_nested 0.4338ms 0.3217ms 3.1087 KOps/s 2.9873 KOps/s $\color{#35bf28}+4.06\%$
test_flatten_speed 0.1581ms 77.6010μs 12.8864 KOps/s 12.9797 KOps/s $\color{#d91a1a}-0.72\%$
test_unflatten_speed 0.4030ms 0.3275ms 3.0532 KOps/s 3.0227 KOps/s $\color{#35bf28}+1.01\%$
test_common_ops 1.5619ms 1.2571ms 795.5038 Ops/s 801.7889 Ops/s $\color{#d91a1a}-0.78\%$
test_creation 25.1300μs 1.5255μs 655.5250 KOps/s 654.3697 KOps/s $\color{#35bf28}+0.18\%$
test_creation_empty 36.9900μs 14.8511μs 67.3353 KOps/s 71.2384 KOps/s $\textbf{\color{#d91a1a}-5.48\%}$
test_creation_nested_1 0.1078ms 16.5075μs 60.5785 KOps/s 63.0093 KOps/s $\color{#d91a1a}-3.86\%$
test_creation_nested_2 61.0210μs 19.3627μs 51.6458 KOps/s 53.9435 KOps/s $\color{#d91a1a}-4.26\%$
test_clone 0.2207ms 28.8207μs 34.6973 KOps/s 34.7162 KOps/s $\color{#d91a1a}-0.05\%$
test_getitem[int] 1.2840ms 16.5252μs 60.5138 KOps/s 60.1196 KOps/s $\color{#35bf28}+0.66\%$
test_getitem[slice_int] 0.1211ms 28.1439μs 35.5317 KOps/s 35.0829 KOps/s $\color{#35bf28}+1.28\%$
test_getitem[range] 0.2235ms 0.1134ms 8.8147 KOps/s 8.8344 KOps/s $\color{#d91a1a}-0.22\%$
test_getitem[tuple] 0.1226ms 24.3754μs 41.0249 KOps/s 39.6797 KOps/s $\color{#35bf28}+3.39\%$
test_getitem[list] 0.2562ms 0.1017ms 9.8353 KOps/s 9.2523 KOps/s $\textbf{\color{#35bf28}+6.30\%}$
test_setitem_dim[int] 78.3910μs 45.3539μs 22.0488 KOps/s 20.3209 KOps/s $\textbf{\color{#35bf28}+8.50\%}$
test_setitem_dim[slice_int] 0.1040ms 67.9526μs 14.7161 KOps/s 14.6675 KOps/s $\color{#35bf28}+0.33\%$
test_setitem_dim[range] 0.2865ms 0.1311ms 7.6288 KOps/s 7.6294 KOps/s $-0.01\%$
test_setitem_dim[tuple] 0.2035ms 61.5182μs 16.2553 KOps/s 16.2922 KOps/s $\color{#d91a1a}-0.23\%$
test_setitem 0.1942ms 41.5651μs 24.0586 KOps/s 24.1845 KOps/s $\color{#d91a1a}-0.52\%$
test_set 0.1908ms 40.6739μs 24.5858 KOps/s 22.3543 KOps/s $\textbf{\color{#35bf28}+9.98\%}$
test_set_shared 0.3517ms 54.5592μs 18.3287 KOps/s 16.4226 KOps/s $\textbf{\color{#35bf28}+11.61\%}$
test_update 0.1957ms 49.2290μs 20.3132 KOps/s 18.4238 KOps/s $\textbf{\color{#35bf28}+10.26\%}$
test_update_nested 0.1961ms 56.4193μs 17.7244 KOps/s 16.3088 KOps/s $\textbf{\color{#35bf28}+8.68\%}$
test_update__nested 0.1565ms 61.1145μs 16.3627 KOps/s 16.2241 KOps/s $\color{#35bf28}+0.85\%$
test_set_nested 0.1902ms 42.6455μs 23.4491 KOps/s 23.4632 KOps/s $\color{#d91a1a}-0.06\%$
test_set_nested_new 0.1902ms 46.9815μs 21.2850 KOps/s 21.7026 KOps/s $\color{#d91a1a}-1.92\%$
test_select 0.2174ms 62.4476μs 16.0134 KOps/s 16.6699 KOps/s $\color{#d91a1a}-3.94\%$
test_select_nested 79.2310μs 42.7740μs 23.3787 KOps/s 23.2528 KOps/s $\color{#35bf28}+0.54\%$
test_exclude_nested 93.3510μs 59.1409μs 16.9088 KOps/s 16.5373 KOps/s $\color{#35bf28}+2.25\%$
test_empty[True] 0.3501ms 0.2623ms 3.8120 KOps/s 3.7766 KOps/s $\color{#35bf28}+0.94\%$
test_empty[False] 4.0641μs 0.7739μs 1.2922 MOps/s 1.3141 MOps/s $\color{#d91a1a}-1.66\%$
test_to 74.0410μs 27.1530μs 36.8283 KOps/s 36.3978 KOps/s $\color{#35bf28}+1.18\%$
test_to_nonblocking 63.8810μs 26.4330μs 37.8315 KOps/s 38.1910 KOps/s $\color{#d91a1a}-0.94\%$
test_unbind_speed 1.0548ms 0.2828ms 3.5364 KOps/s 3.2540 KOps/s $\textbf{\color{#35bf28}+8.68\%}$
test_unbind_speed_stack0 0.4056ms 0.2736ms 3.6544 KOps/s 3.3065 KOps/s $\textbf{\color{#35bf28}+10.52\%}$
test_unbind_speed_stack1 92.7001ms 0.7038ms 1.4208 KOps/s 1.4052 KOps/s $\color{#35bf28}+1.11\%$
test_split 93.6851ms 2.2179ms 450.8789 Ops/s 449.1140 Ops/s $\color{#35bf28}+0.39\%$
test_chunk 95.5789ms 2.2240ms 449.6333 Ops/s 445.1170 Ops/s $\color{#35bf28}+1.01\%$
test_creation[device0] 0.2964ms 0.1291ms 7.7464 KOps/s 7.7992 KOps/s $\color{#d91a1a}-0.68\%$
test_creation_from_tensor 0.4761ms 0.1329ms 7.5244 KOps/s 7.6684 KOps/s $\color{#d91a1a}-1.88\%$
test_add_one[memmap_tensor0] 0.2336ms 8.8309μs 113.2392 KOps/s 111.5256 KOps/s $\color{#35bf28}+1.54\%$
test_contiguous[memmap_tensor0] 30.6500μs 2.1866μs 457.3228 KOps/s 458.2266 KOps/s $\color{#d91a1a}-0.20\%$
test_stack[memmap_tensor0] 33.3300μs 6.7759μs 147.5819 KOps/s 142.2508 KOps/s $\color{#35bf28}+3.75\%$
test_memmaptd_index 1.1633ms 0.4398ms 2.2739 KOps/s 2.2839 KOps/s $\color{#d91a1a}-0.44\%$
test_memmaptd_index_astensor 0.9116ms 0.5107ms 1.9582 KOps/s 1.9472 KOps/s $\color{#35bf28}+0.56\%$
test_memmaptd_index_op 1.4340ms 1.0529ms 949.7564 Ops/s 958.4298 Ops/s $\color{#d91a1a}-0.90\%$
test_serialize_model 0.1309s 0.1298s 7.7050 Ops/s 7.6783 Ops/s $\color{#35bf28}+0.35\%$
test_serialize_model_pickle 1.3656s 1.2146s 0.8233 Ops/s 0.8233 Ops/s $+0.00\%$
test_serialize_weights 0.1310s 0.1295s 7.7237 Ops/s 7.7118 Ops/s $\color{#35bf28}+0.15\%$
test_serialize_weights_returnearly 58.1460ms 46.2711ms 21.6117 Ops/s 15.4305 Ops/s $\textbf{\color{#35bf28}+40.06\%}$
test_serialize_weights_pickle 1.3569s 1.2186s 0.8206 Ops/s 0.8212 Ops/s $\color{#d91a1a}-0.07\%$
test_reshape_pytree 0.4165ms 35.8551μs 27.8900 KOps/s 27.9343 KOps/s $\color{#d91a1a}-0.16\%$
test_reshape_td 0.1824ms 43.4530μs 23.0134 KOps/s 23.7852 KOps/s $\color{#d91a1a}-3.25\%$
test_view_pytree 0.1798ms 35.9456μs 27.8198 KOps/s 28.1057 KOps/s $\color{#d91a1a}-1.02\%$
test_view_td 0.4425ms 50.5685μs 19.7751 KOps/s 20.7355 KOps/s $\color{#d91a1a}-4.63\%$
test_unbind_pytree 0.1656ms 34.6438μs 28.8652 KOps/s 28.8000 KOps/s $\color{#35bf28}+0.23\%$
test_unbind_td 0.5139ms 43.9299μs 22.7635 KOps/s 22.5831 KOps/s $\color{#35bf28}+0.80\%$
test_split_pytree 0.5784ms 47.3325μs 21.1271 KOps/s 21.2336 KOps/s $\color{#d91a1a}-0.50\%$
test_split_td 0.1889ms 56.5422μs 17.6859 KOps/s 17.3373 KOps/s $\color{#35bf28}+2.01\%$
test_add_pytree 0.2069ms 57.3104μs 17.4488 KOps/s 17.4844 KOps/s $\color{#d91a1a}-0.20\%$
test_add_td 0.1498ms 94.4972μs 10.5823 KOps/s 10.9292 KOps/s $\color{#d91a1a}-3.17\%$
test_compile_add_one_nested[tensordict-compile] 0.2894ms 0.1601ms 6.2458 KOps/s 6.1361 KOps/s $\color{#35bf28}+1.79\%$
test_compile_add_one_nested[tensordict-eager] 0.3352ms 0.1644ms 6.0818 KOps/s 6.0311 KOps/s $\color{#35bf28}+0.84\%$
test_compile_add_one_nested[pytree-compile] 0.2785ms 0.1450ms 6.8968 KOps/s 6.8234 KOps/s $\color{#35bf28}+1.08\%$
test_compile_add_one_nested[pytree-eager] 0.3368ms 0.1844ms 5.4243 KOps/s 5.4640 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_copy_nested[tensordict-compile] 0.1707ms 22.2510μs 44.9418 KOps/s 45.3471 KOps/s $\color{#d91a1a}-0.89\%$
test_compile_copy_nested[tensordict-eager] 92.8410μs 50.7323μs 19.7113 KOps/s 20.3924 KOps/s $\color{#d91a1a}-3.34\%$
test_compile_copy_nested[pytree-compile] 0.2035ms 65.7354μs 15.2125 KOps/s 15.1362 KOps/s $\color{#35bf28}+0.50\%$
test_compile_copy_nested[pytree-eager] 91.7610μs 50.6040μs 19.7613 KOps/s 20.1322 KOps/s $\color{#d91a1a}-1.84\%$
test_compile_add_one_flat[tensordict-compile] 0.4711ms 0.3206ms 3.1187 KOps/s 3.0906 KOps/s $\color{#35bf28}+0.91\%$
test_compile_add_one_flat[tensordict-eager] 0.3664ms 0.2346ms 4.2631 KOps/s 4.0818 KOps/s $\color{#35bf28}+4.44\%$
test_compile_add_one_flat[tensorclass-compile] 0.2836ms 0.1273ms 7.8559 KOps/s 7.7389 KOps/s $\color{#35bf28}+1.51\%$
test_compile_add_one_flat[tensorclass-eager] 0.2214ms 66.5680μs 15.0222 KOps/s 14.5843 KOps/s $\color{#35bf28}+3.00\%$
test_compile_add_one_flat[pytree-compile] 0.4195ms 0.3175ms 3.1499 KOps/s 3.1278 KOps/s $\color{#35bf28}+0.71\%$
test_compile_add_one_flat[pytree-eager] 0.7954ms 0.6224ms 1.6067 KOps/s 1.6163 KOps/s $\color{#d91a1a}-0.60\%$
test_compile_add_self_flat[tensordict-eager] 0.4290ms 0.2859ms 3.4983 KOps/s 3.3488 KOps/s $\color{#35bf28}+4.47\%$
test_compile_add_self_flat[tensordict-compile] 0.3842ms 0.3227ms 3.0991 KOps/s 3.0731 KOps/s $\color{#35bf28}+0.84\%$
test_compile_add_self_flat[tensorclass-eager] 0.2272ms 80.4748μs 12.4262 KOps/s 12.4211 KOps/s $\color{#35bf28}+0.04\%$
test_compile_add_self_flat[tensorclass-compile] 0.2425ms 0.1284ms 7.7894 KOps/s 7.6218 KOps/s $\color{#35bf28}+2.20\%$
test_compile_add_self_flat[pytree-eager] 0.6801ms 0.5279ms 1.8943 KOps/s 1.8944 KOps/s $-0.01\%$
test_compile_add_self_flat[pytree-compile] 0.3780ms 0.3194ms 3.1305 KOps/s 3.1340 KOps/s $\color{#d91a1a}-0.11\%$
test_compile_copy_flat[tensordict-compile] 0.1587ms 19.3463μs 51.6893 KOps/s 49.5386 KOps/s $\color{#35bf28}+4.34\%$
test_compile_copy_flat[tensordict-eager] 87.7510μs 38.9450μs 25.6772 KOps/s 25.7810 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_copy_flat[pytree-compile] 0.1113ms 70.6284μs 14.1586 KOps/s 14.0823 KOps/s $\color{#35bf28}+0.54\%$
test_compile_copy_flat[pytree-eager] 84.6800μs 51.6507μs 19.3608 KOps/s 19.3505 KOps/s $\color{#35bf28}+0.05\%$
test_compile_assign_and_add[tensordict-compile] 2.3818ms 0.7866ms 1.2713 KOps/s 1.0883 KOps/s $\textbf{\color{#35bf28}+16.81\%}$
test_compile_assign_and_add[tensordict-eager] 3.5076ms 3.3240ms 300.8428 Ops/s 299.9918 Ops/s $\color{#35bf28}+0.28\%$
test_compile_assign_and_add[pytree-compile] 2.3485ms 0.8274ms 1.2086 KOps/s 1.1249 KOps/s $\textbf{\color{#35bf28}+7.44\%}$
test_compile_assign_and_add[pytree-eager] 3.4031ms 3.2285ms 309.7373 Ops/s 304.9373 Ops/s $\color{#35bf28}+1.57\%$
test_compile_indexing[tensor-tensordict-compile] 0.2563ms 0.1090ms 9.1722 KOps/s 9.1756 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_indexing[tensor-tensordict-eager] 0.2031ms 62.5606μs 15.9845 KOps/s 15.7751 KOps/s $\color{#35bf28}+1.33\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2288ms 0.1032ms 9.6903 KOps/s 9.6590 KOps/s $\color{#35bf28}+0.32\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2069ms 44.2172μs 22.6156 KOps/s 22.2062 KOps/s $\color{#35bf28}+1.84\%$
test_compile_indexing[tensor-pytree-compile] 0.2544ms 0.1042ms 9.5934 KOps/s 9.5702 KOps/s $\color{#35bf28}+0.24\%$
test_compile_indexing[tensor-pytree-eager] 0.2082ms 43.7400μs 22.8624 KOps/s 22.3201 KOps/s $\color{#35bf28}+2.43\%$
test_compile_indexing[slice-tensordict-compile] 0.2891ms 0.1393ms 7.1770 KOps/s 7.1863 KOps/s $\color{#d91a1a}-0.13\%$
test_compile_indexing[slice-tensordict-eager] 0.1641ms 26.2850μs 38.0446 KOps/s 38.1219 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_indexing[slice-tensorclass-compile] 0.2815ms 0.1328ms 7.5283 KOps/s 7.2015 KOps/s $\color{#35bf28}+4.54\%$
test_compile_indexing[slice-tensorclass-eager] 0.2020ms 21.0827μs 47.4322 KOps/s 45.2839 KOps/s $\color{#35bf28}+4.74\%$
test_compile_indexing[slice-pytree-compile] 0.3895ms 0.1338ms 7.4750 KOps/s 7.4679 KOps/s $\color{#35bf28}+0.09\%$
test_compile_indexing[slice-pytree-eager] 54.1100μs 21.2125μs 47.1420 KOps/s 45.6568 KOps/s $\color{#35bf28}+3.25\%$
test_compile_indexing[int-tensordict-compile] 0.4199ms 0.1399ms 7.1466 KOps/s 7.1385 KOps/s $\color{#35bf28}+0.11\%$
test_compile_indexing[int-tensordict-eager] 0.4579ms 25.8797μs 38.6403 KOps/s 38.6834 KOps/s $\color{#d91a1a}-0.11\%$
test_compile_indexing[int-tensorclass-compile] 0.2713ms 0.1339ms 7.4660 KOps/s 7.4736 KOps/s $\color{#d91a1a}-0.10\%$
test_compile_indexing[int-tensorclass-eager] 61.3610μs 21.4719μs 46.5725 KOps/s 46.4699 KOps/s $\color{#35bf28}+0.22\%$
test_compile_indexing[int-pytree-compile] 0.2669ms 0.1336ms 7.4839 KOps/s 7.5084 KOps/s $\color{#d91a1a}-0.33\%$
test_compile_indexing[int-pytree-eager] 62.5310μs 21.4524μs 46.6149 KOps/s 47.0957 KOps/s $\color{#d91a1a}-1.02\%$
test_mod_add[eager] 0.1750ms 32.2129μs 31.0435 KOps/s 31.8397 KOps/s $\color{#d91a1a}-2.50\%$
test_mod_add[compile] 0.1990ms 71.3446μs 14.0165 KOps/s 12.8607 KOps/s $\textbf{\color{#35bf28}+8.99\%}$
test_mod_add[compile-overhead] 0.2597ms 0.1352ms 7.3980 KOps/s 6.9943 KOps/s $\textbf{\color{#35bf28}+5.77\%}$
test_mod_wrap[eager] 0.3975ms 0.2441ms 4.0963 KOps/s 4.0759 KOps/s $\color{#35bf28}+0.50\%$
test_mod_wrap[compile] 1.3879ms 0.3094ms 3.2324 KOps/s 3.1469 KOps/s $\color{#35bf28}+2.72\%$
test_mod_wrap[compile-overhead] 7.3391ms 3.9008ms 256.3586 Ops/s 246.2656 Ops/s $\color{#35bf28}+4.10\%$
test_mod_wrap_and_backward[eager] 1.6261ms 1.3669ms 731.5702 Ops/s 683.2319 Ops/s $\textbf{\color{#35bf28}+7.07\%}$
test_mod_wrap_and_backward[compile] 1.7885ms 1.3454ms 743.2925 Ops/s 679.1767 Ops/s $\textbf{\color{#35bf28}+9.44\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3961ms 0.9403ms 1.0635 KOps/s 1.0699 KOps/s $\color{#d91a1a}-0.60\%$
test_seq_add[eager] 0.2468ms 98.9535μs 10.1058 KOps/s 10.1764 KOps/s $\color{#d91a1a}-0.69\%$
test_seq_add[compile] 0.2329ms 82.4870μs 12.1231 KOps/s 11.9628 KOps/s $\color{#35bf28}+1.34\%$
test_seq_add[compile-overhead] 0.2207ms 0.1147ms 8.7177 KOps/s 8.5881 KOps/s $\color{#35bf28}+1.51\%$
test_seq_wrap[eager] 0.5201ms 0.3859ms 2.5911 KOps/s 2.5542 KOps/s $\color{#35bf28}+1.44\%$
test_seq_wrap[compile] 0.4640ms 0.3193ms 3.1322 KOps/s 2.9302 KOps/s $\textbf{\color{#35bf28}+6.89\%}$
test_seq_wrap[compile-overhead] 0.3698ms 0.2221ms 4.5015 KOps/s 4.4493 KOps/s $\color{#35bf28}+1.17\%$
test_func_call_runtime[False-eager] 0.9055ms 0.7571ms 1.3209 KOps/s 1.3113 KOps/s $\color{#35bf28}+0.73\%$
test_func_call_runtime[False-compile] 0.9898ms 0.7987ms 1.2520 KOps/s 1.2345 KOps/s $\color{#35bf28}+1.41\%$
test_func_call_runtime[False-compile-overhead] 0.5337ms 0.3644ms 2.7443 KOps/s 2.7533 KOps/s $\color{#d91a1a}-0.33\%$
test_func_call_runtime[True-eager] 1.0352ms 0.9185ms 1.0887 KOps/s 1.0856 KOps/s $\color{#35bf28}+0.29\%$
test_func_call_runtime[True-compile] 0.9735ms 0.8210ms 1.2180 KOps/s 1.2048 KOps/s $\color{#35bf28}+1.09\%$
test_func_call_runtime[True-compile-overhead] 0.5317ms 0.3863ms 2.5885 KOps/s 2.6101 KOps/s $\color{#d91a1a}-0.83\%$
test_func_call_cm_runtime[False-eager] 0.8926ms 0.7507ms 1.3321 KOps/s 1.3274 KOps/s $\color{#35bf28}+0.35\%$
test_func_call_cm_runtime[False-compile] 0.9658ms 0.8018ms 1.2472 KOps/s 1.2257 KOps/s $\color{#35bf28}+1.75\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5207ms 0.3674ms 2.7221 KOps/s 2.7272 KOps/s $\color{#d91a1a}-0.19\%$
test_func_call_cm_runtime[True-eager] 1.2129ms 1.0281ms 972.6653 Ops/s 964.2272 Ops/s $\color{#35bf28}+0.88\%$
test_func_call_cm_runtime[True-compile] 1.0164ms 0.8465ms 1.1814 KOps/s 1.1587 KOps/s $\color{#35bf28}+1.96\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4535ms 0.4073ms 2.4554 KOps/s 2.4389 KOps/s $\color{#35bf28}+0.68\%$
test_vmap_func_call_cm_runtime[eager] 2.5789ms 2.1039ms 475.3119 Ops/s 471.2238 Ops/s $\color{#35bf28}+0.87\%$
test_vmap_func_call_cm_runtime[compile] 1.0205ms 0.8697ms 1.1499 KOps/s 1.1394 KOps/s $\color{#35bf28}+0.92\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5538ms 0.4116ms 2.4297 KOps/s 2.4181 KOps/s $\color{#35bf28}+0.48\%$
test_distributed 2.6740ms 0.1712ms 5.8400 KOps/s 8.5264 KOps/s $\textbf{\color{#d91a1a}-31.51\%}$
test_tdmodule 0.2791ms 15.3569μs 65.1174 KOps/s 69.9910 KOps/s $\textbf{\color{#d91a1a}-6.96\%}$
test_tdmodule_dispatch 73.3710μs 29.0086μs 34.4725 KOps/s 36.5188 KOps/s $\textbf{\color{#d91a1a}-5.60\%}$
test_tdseq 35.5900μs 16.1238μs 62.0203 KOps/s 65.8870 KOps/s $\textbf{\color{#d91a1a}-5.87\%}$
test_tdseq_dispatch 58.6410μs 31.2118μs 32.0392 KOps/s 33.2305 KOps/s $\color{#d91a1a}-3.59\%$
test_instantiation_functorch 2.0789ms 1.8936ms 528.1039 Ops/s 522.5158 Ops/s $\color{#35bf28}+1.07\%$
test_exec_functorch 0.3604ms 0.2118ms 4.7223 KOps/s 4.6733 KOps/s $\color{#35bf28}+1.05\%$
test_exec_functional_call 0.3541ms 0.2119ms 4.7195 KOps/s 4.6261 KOps/s $\color{#35bf28}+2.02\%$
test_exec_td_decorator 0.4373ms 0.2655ms 3.7659 KOps/s 3.7075 KOps/s $\color{#35bf28}+1.58\%$
test_vmap_mlp_speed_decorator[True-True] 0.7997ms 0.6829ms 1.4643 KOps/s 1.4605 KOps/s $\color{#35bf28}+0.26\%$
test_vmap_mlp_speed_decorator[True-False] 0.8599ms 0.6891ms 1.4513 KOps/s 1.4549 KOps/s $\color{#d91a1a}-0.25\%$
test_vmap_mlp_speed_decorator[False-True] 0.7793ms 0.6128ms 1.6319 KOps/s 1.6579 KOps/s $\color{#d91a1a}-1.57\%$
test_vmap_mlp_speed_decorator[False-False] 0.7551ms 0.6031ms 1.6581 KOps/s 1.6459 KOps/s $\color{#35bf28}+0.74\%$
test_vmap_transformer_speed_decorator[True-True] 19.8245ms 19.6110ms 50.9919 Ops/s 50.7148 Ops/s $\color{#35bf28}+0.55\%$
test_vmap_transformer_speed_decorator[True-False] 20.2583ms 19.6083ms 50.9987 Ops/s 50.6304 Ops/s $\color{#35bf28}+0.73\%$
test_vmap_transformer_speed_decorator[False-True] 19.5280ms 19.4396ms 51.4414 Ops/s 51.1640 Ops/s $\color{#35bf28}+0.54\%$
test_vmap_transformer_speed_decorator[False-False] 20.0597ms 19.4768ms 51.3431 Ops/s 51.0259 Ops/s $\color{#35bf28}+0.62\%$
test_to_module_speed[True] 1.4132ms 1.0230ms 977.5458 Ops/s 974.1738 Ops/s $\color{#35bf28}+0.35\%$
test_to_module_speed[False] 1.4311ms 0.9865ms 1.0137 KOps/s 1.0070 KOps/s $\color{#35bf28}+0.67\%$
test_tc_init 0.1070ms 34.6354μs 28.8722 KOps/s 30.4458 KOps/s $\textbf{\color{#d91a1a}-5.17\%}$
test_tc_init_nested 0.1055ms 69.0786μs 14.4763 KOps/s 14.7698 KOps/s $\color{#d91a1a}-1.99\%$
test_tc_first_layer_tensor 5.1243μs 0.6948μs 1.4393 MOps/s 1.4426 MOps/s $\color{#d91a1a}-0.23\%$
test_tc_first_layer_nontensor 17.3700μs 2.2786μs 438.8614 KOps/s 440.4582 KOps/s $\color{#d91a1a}-0.36\%$
test_tc_second_layer_tensor 20.0477μs 1.4192μs 704.6303 KOps/s 706.6038 KOps/s $\color{#d91a1a}-0.28\%$
test_tc_second_layer_nontensor 25.5710μs 2.9712μs 336.5638 KOps/s 332.2267 KOps/s $\color{#35bf28}+1.31\%$
test_unbind 0.1836s 12.1849ms 82.0688 Ops/s 100.0760 Ops/s $\textbf{\color{#d91a1a}-17.99\%}$
test_full_like 0.7812ms 0.5741ms 1.7420 KOps/s 1.7398 KOps/s $\color{#35bf28}+0.13\%$
test_zeros_like 0.3818ms 0.1982ms 5.0455 KOps/s 5.0528 KOps/s $\color{#d91a1a}-0.15\%$
test_ones_like 0.3526ms 0.1978ms 5.0555 KOps/s 5.0543 KOps/s $\color{#35bf28}+0.02\%$
test_clone 0.5629ms 0.4149ms 2.4102 KOps/s 2.4117 KOps/s $\color{#d91a1a}-0.06\%$
test_squeeze 37.4900μs 9.8356μs 101.6719 KOps/s 97.9592 KOps/s $\color{#35bf28}+3.79\%$
test_unsqueeze 0.2638ms 76.8474μs 13.0128 KOps/s 12.8296 KOps/s $\color{#35bf28}+1.43\%$
test_split 0.3813ms 0.1591ms 6.2856 KOps/s 6.1382 KOps/s $\color{#35bf28}+2.40\%$
test_permute 0.2742ms 0.1863ms 5.3689 KOps/s 5.3112 KOps/s $\color{#35bf28}+1.09\%$
test_stack 1.2529ms 0.8740ms 1.1442 KOps/s 1.1528 KOps/s $\color{#d91a1a}-0.75\%$
test_cat 1.3847ms 1.2316ms 811.9492 Ops/s 812.0695 Ops/s $\color{#d91a1a}-0.01\%$

@vmoens vmoens merged commit 42c378e into gh/vmoens/28/base Oct 8, 2024
vmoens pushed a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: 817cf31
Pull Request resolved: #1034
@vmoens vmoens deleted the gh/vmoens/28/head branch October 8, 2024 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants