-
Notifications
You must be signed in to change notification settings - Fork 108
[BugFix] Use separate streams for cudagraph warmup #1010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 44.3130μs | 21.4363μs | 46.6499 KOps/s | 50.5395 KOps/s | |
| test_plain_set_stack_nested | 52.0980μs | 22.1748μs | 45.0963 KOps/s | 50.3384 KOps/s | |
| test_plain_set_nested_inplace | 73.7380μs | 23.7829μs | 42.0470 KOps/s | 47.1037 KOps/s | |
| test_plain_set_stack_nested_inplace | 57.2770μs | 23.7994μs | 42.0179 KOps/s | 46.7057 KOps/s | |
| test_items | 22.8320μs | 4.2248μs | 236.7003 KOps/s | 247.9907 KOps/s | |
| test_items_nested | 0.4747ms | 0.3636ms | 2.7499 KOps/s | 2.7517 KOps/s | |
| test_items_nested_locked | 0.9009ms | 0.3675ms | 2.7212 KOps/s | 2.7362 KOps/s | |
| test_items_nested_leaf | 0.1256ms | 69.7525μs | 14.3364 KOps/s | 14.7878 KOps/s | |
| test_items_stack_nested | 0.4640ms | 0.3707ms | 2.6979 KOps/s | 2.7421 KOps/s | |
| test_items_stack_nested_leaf | 0.1568ms | 71.6523μs | 13.9563 KOps/s | 13.5695 KOps/s | |
| test_items_stack_nested_locked | 0.9797ms | 0.4781ms | 2.0918 KOps/s | 2.6973 KOps/s | |
| test_keys | 42.3600μs | 4.0253μs | 248.4290 KOps/s | 278.8630 KOps/s | |
| test_keys_nested | 0.2991ms | 0.1120ms | 8.9306 KOps/s | 9.9469 KOps/s | |
| test_keys_nested_locked | 1.6327ms | 0.1177ms | 8.4995 KOps/s | 9.4559 KOps/s | |
| test_keys_nested_leaf | 0.2551ms | 94.7757μs | 10.5512 KOps/s | 12.0032 KOps/s | |
| test_keys_stack_nested | 0.1822ms | 0.1010ms | 9.9017 KOps/s | 9.8418 KOps/s | |
| test_keys_stack_nested_leaf | 0.1688ms | 83.4255μs | 11.9867 KOps/s | 11.8356 KOps/s | |
| test_keys_stack_nested_locked | 0.1962ms | 0.1052ms | 9.5042 KOps/s | 9.4387 KOps/s | |
| test_values | 11.3572μs | 1.0482μs | 954.0309 KOps/s | 943.4178 KOps/s | |
| test_values_nested | 0.1432ms | 73.9116μs | 13.5297 KOps/s | 13.0624 KOps/s | |
| test_values_nested_locked | 0.1443ms | 72.9312μs | 13.7116 KOps/s | 13.2868 KOps/s | |
| test_values_nested_leaf | 0.1306ms | 61.5322μs | 16.2517 KOps/s | 15.9078 KOps/s | |
| test_values_stack_nested | 0.1541ms | 74.4470μs | 13.4324 KOps/s | 13.5041 KOps/s | |
| test_values_stack_nested_leaf | 0.1181ms | 61.4980μs | 16.2607 KOps/s | 16.3509 KOps/s | |
| test_values_stack_nested_locked | 0.1423ms | 74.6268μs | 13.4000 KOps/s | 13.6149 KOps/s | |
| test_membership | 3.9831μs | 0.7206μs | 1.3877 MOps/s | 1.3752 MOps/s | |
| test_membership_nested | 21.9310μs | 2.7890μs | 358.5549 KOps/s | 358.4124 KOps/s | |
| test_membership_nested_leaf | 22.3730μs | 2.7979μs | 357.4127 KOps/s | 358.1231 KOps/s | |
| test_membership_stacked_nested | 24.1050μs | 2.7959μs | 357.6612 KOps/s | 360.7745 KOps/s | |
| test_membership_stacked_nested_leaf | 30.9780μs | 2.7687μs | 361.1738 KOps/s | 356.6135 KOps/s | |
| test_membership_nested_last | 22.3520μs | 4.0250μs | 248.4462 KOps/s | 249.8532 KOps/s | |
| test_membership_nested_leaf_last | 49.8740μs | 4.0094μs | 249.4144 KOps/s | 247.7781 KOps/s | |
| test_membership_stacked_nested_last | 23.5850μs | 4.5467μs | 219.9405 KOps/s | 169.6517 KOps/s | |
| test_membership_stacked_nested_leaf_last | 48.8740μs | 4.5779μs | 218.4397 KOps/s | 167.9384 KOps/s | |
| test_nested_getleaf | 35.1060μs | 10.8432μs | 92.2234 KOps/s | 94.5291 KOps/s | |
| test_nested_get | 38.9930μs | 10.2902μs | 97.1799 KOps/s | 83.9280 KOps/s | |
| test_stacked_getleaf | 68.6590μs | 10.6957μs | 93.4959 KOps/s | 73.3471 KOps/s | |
| test_stacked_get | 43.0600μs | 10.1818μs | 98.2145 KOps/s | 82.4284 KOps/s | |
| test_nested_getitemleaf | 63.2590μs | 11.0311μs | 90.6529 KOps/s | 82.5964 KOps/s | |
| test_nested_getitem | 30.9990μs | 10.4068μs | 96.0908 KOps/s | 88.6168 KOps/s | |
| test_stacked_getitemleaf | 63.2960μs | 10.9339μs | 91.4585 KOps/s | 80.3606 KOps/s | |
| test_stacked_getitem | 44.8140μs | 10.4258μs | 95.9155 KOps/s | 87.5394 KOps/s | |
| test_lock_nested | 83.9395ms | 0.5747ms | 1.7399 KOps/s | 1.9433 KOps/s | |
| test_lock_stack_nested | 0.7025ms | 0.4539ms | 2.2032 KOps/s | 2.1884 KOps/s | |
| test_unlock_nested | 82.3425ms | 0.4890ms | 2.0449 KOps/s | 2.4544 KOps/s | |
| test_unlock_stack_nested | 0.7333ms | 0.3719ms | 2.6892 KOps/s | 2.6481 KOps/s | |
| test_flatten_speed | 0.1808ms | 87.8620μs | 11.3815 KOps/s | 9.3829 KOps/s | |
| test_unflatten_speed | 0.6489ms | 0.4631ms | 2.1594 KOps/s | 1.8769 KOps/s | |
| test_common_ops | 4.8491ms | 1.1491ms | 870.2625 Ops/s | 772.5172 Ops/s | |
| test_creation | 30.8880μs | 2.0984μs | 476.5433 KOps/s | 472.8116 KOps/s | |
| test_creation_empty | 51.7070μs | 20.1048μs | 49.7393 KOps/s | 45.6777 KOps/s | |
| test_creation_nested_1 | 64.9810μs | 23.2287μs | 43.0501 KOps/s | 49.7397 KOps/s | |
| test_creation_nested_2 | 76.9050μs | 27.9362μs | 35.7959 KOps/s | 34.3380 KOps/s | |
| test_clone | 75.9230μs | 16.7699μs | 59.6308 KOps/s | 56.5050 KOps/s | |
| test_getitem[int] | 1.0798ms | 16.5717μs | 60.3437 KOps/s | 58.2997 KOps/s | |
| test_getitem[slice_int] | 0.1364ms | 29.5335μs | 33.8599 KOps/s | 31.6424 KOps/s | |
| test_getitem[range] | 0.1853ms | 55.9905μs | 17.8602 KOps/s | 16.9710 KOps/s | |
| test_getitem[tuple] | 0.1328ms | 24.3723μs | 41.0302 KOps/s | 38.7474 KOps/s | |
| test_getitem[list] | 0.1779ms | 51.2354μs | 19.5178 KOps/s | 18.2591 KOps/s | |
| test_setitem_dim[int] | 99.4160μs | 33.9431μs | 29.4610 KOps/s | 29.8793 KOps/s | |
| test_setitem_dim[slice_int] | 0.1034ms | 61.4205μs | 16.2812 KOps/s | 15.8627 KOps/s | |
| test_setitem_dim[range] | 0.1233ms | 82.0889μs | 12.1819 KOps/s | 11.6666 KOps/s | |
| test_setitem_dim[tuple] | 0.1194ms | 50.2572μs | 19.8977 KOps/s | 19.6980 KOps/s | |
| test_setitem | 91.4420μs | 31.4623μs | 31.7841 KOps/s | 32.9777 KOps/s | |
| test_set | 85.2800μs | 30.4454μs | 32.8457 KOps/s | 33.4991 KOps/s | |
| test_set_shared | 1.9860ms | 0.2138ms | 4.6766 KOps/s | 4.5945 KOps/s | |
| test_update | 0.1402ms | 38.5018μs | 25.9728 KOps/s | 27.4220 KOps/s | |
| test_update_nested | 0.1598ms | 47.8007μs | 20.9202 KOps/s | 20.9647 KOps/s | |
| test_update__nested | 88.2550μs | 34.8766μs | 28.6725 KOps/s | 27.9213 KOps/s | |
| test_set_nested | 98.4450μs | 32.8774μs | 30.4160 KOps/s | 31.5034 KOps/s | |
| test_set_nested_new | 0.1082ms | 37.7506μs | 26.4897 KOps/s | 27.0555 KOps/s | |
| test_select | 0.1169ms | 54.3558μs | 18.3973 KOps/s | 18.3752 KOps/s | |
| test_select_nested | 0.1372ms | 59.1150μs | 16.9162 KOps/s | 17.1730 KOps/s | |
| test_exclude_nested | 0.1657ms | 75.7260μs | 13.2055 KOps/s | 13.7017 KOps/s | |
| test_empty[True] | 0.5025ms | 0.3117ms | 3.2077 KOps/s | 3.1991 KOps/s | |
| test_empty[False] | 10.2443μs | 1.2561μs | 796.1114 KOps/s | 842.3419 KOps/s | |
| test_unbind_speed | 0.6114ms | 0.4057ms | 2.4651 KOps/s | 3.2963 KOps/s | |
| test_unbind_speed_stack0 | 0.6845ms | 0.3565ms | 2.8053 KOps/s | 3.3780 KOps/s | |
| test_unbind_speed_stack1 | 0.1141s | 0.9485ms | 1.0543 KOps/s | 1.3738 KOps/s | |
| test_split | 86.8470ms | 2.3396ms | 427.4262 Ops/s | 453.9179 Ops/s | |
| test_chunk | 3.4056ms | 2.1369ms | 467.9657 Ops/s | 447.6919 Ops/s | |
| test_creation[device0] | 0.2242ms | 0.1149ms | 8.7042 KOps/s | 8.4284 KOps/s | |
| test_creation_from_tensor | 5.1334ms | 0.1223ms | 8.1761 KOps/s | 8.3662 KOps/s | |
| test_add_one[memmap_tensor0] | 0.2144ms | 7.2817μs | 137.3302 KOps/s | 128.4618 KOps/s | |
| test_contiguous[memmap_tensor0] | 13.3450μs | 1.9203μs | 520.7414 KOps/s | 512.6073 KOps/s | |
| test_stack[memmap_tensor0] | 32.1910μs | 5.7487μs | 173.9515 KOps/s | 172.1151 KOps/s | |
| test_memmaptd_index | 0.7411ms | 0.4037ms | 2.4768 KOps/s | 2.4617 KOps/s | |
| test_memmaptd_index_astensor | 0.7642ms | 0.4830ms | 2.0704 KOps/s | 2.0484 KOps/s | |
| test_memmaptd_index_op | 1.6729ms | 1.0546ms | 948.2705 Ops/s | 981.7813 Ops/s | |
| test_serialize_model | 0.2062s | 0.1305s | 7.6648 Ops/s | 8.6328 Ops/s | |
| test_serialize_model_pickle | 0.4677s | 0.3993s | 2.5041 Ops/s | 2.5661 Ops/s | |
| test_serialize_weights | 0.1227s | 0.1176s | 8.5027 Ops/s | 8.6556 Ops/s | |
| test_serialize_weights_returnearly | 0.2105s | 0.1898s | 5.2695 Ops/s | 6.2607 Ops/s | |
| test_serialize_weights_pickle | 0.5567s | 0.4405s | 2.2703 Ops/s | 2.5170 Ops/s | |
| test_serialize_weights_filesystem | 0.2201s | 0.1516s | 6.5955 Ops/s | 6.9735 Ops/s | |
| test_serialize_model_filesystem | 0.1615s | 0.1474s | 6.7826 Ops/s | 6.9781 Ops/s | |
| test_reshape_pytree | 79.5690μs | 38.4837μs | 25.9850 KOps/s | 24.9799 KOps/s | |
| test_reshape_td | 96.0900μs | 46.4689μs | 21.5198 KOps/s | 20.7855 KOps/s | |
| test_view_pytree | 0.1179ms | 38.0882μs | 26.2548 KOps/s | 25.3091 KOps/s | |
| test_view_td | 0.1293ms | 51.2868μs | 19.4982 KOps/s | 18.2789 KOps/s | |
| test_unbind_pytree | 79.6400μs | 35.4489μs | 28.2096 KOps/s | 27.3592 KOps/s | |
| test_unbind_td | 0.3070ms | 45.0914μs | 22.1772 KOps/s | 21.5545 KOps/s | |
| test_split_pytree | 75.0710μs | 37.5558μs | 26.6271 KOps/s | 25.8551 KOps/s | |
| test_split_td | 0.5059ms | 55.8917μs | 17.8917 KOps/s | 17.3705 KOps/s | |
| test_add_pytree | 0.1308ms | 44.6468μs | 22.3980 KOps/s | 21.6775 KOps/s | |
| test_add_td | 0.2988ms | 85.8300μs | 11.6509 KOps/s | 12.5795 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.1185ms | 57.8613μs | 17.2827 KOps/s | 15.9241 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.2645ms | 0.1781ms | 5.6158 KOps/s | 5.0154 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.1441ms | 57.8683μs | 17.2806 KOps/s | 15.9920 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.2530ms | 0.1384ms | 7.2229 KOps/s | 6.6722 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 61.5550μs | 21.3230μs | 46.8978 KOps/s | 48.0556 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 0.1569ms | 68.4555μs | 14.6080 KOps/s | 14.6095 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.1619ms | 74.6586μs | 13.3943 KOps/s | 13.2687 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.1567ms | 68.8357μs | 14.5273 KOps/s | 14.4625 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.2925ms | 0.1735ms | 5.7637 KOps/s | 5.6563 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.3992ms | 0.1970ms | 5.0772 KOps/s | 5.1969 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 96.4310μs | 46.0723μs | 21.7050 KOps/s | 21.2242 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1723ms | 68.4291μs | 14.6137 KOps/s | 13.3597 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.3691ms | 0.1795ms | 5.5697 KOps/s | 5.1553 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.6184ms | 0.2860ms | 3.4968 KOps/s | 3.1149 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.3763ms | 0.2072ms | 4.8252 KOps/s | 4.9234 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.4294ms | 0.1778ms | 5.6234 KOps/s | 5.7167 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1206ms | 61.2634μs | 16.3230 KOps/s | 15.8772 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.1104ms | 46.6363μs | 21.4425 KOps/s | 21.2740 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.4082ms | 0.2321ms | 4.3086 KOps/s | 4.2064 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.2752ms | 0.1762ms | 5.6753 KOps/s | 5.6036 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 0.1903ms | 0.1039ms | 9.6208 KOps/s | 9.4875 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 0.1362ms | 58.2443μs | 17.1691 KOps/s | 17.1406 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 0.1753ms | 77.8832μs | 12.8397 KOps/s | 12.7644 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.1353ms | 69.2336μs | 14.4438 KOps/s | 14.4815 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 0.4984ms | 0.2319ms | 4.3115 KOps/s | 4.9605 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.2491ms | 1.9750ms | 506.3215 Ops/s | 607.8096 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 0.2935ms | 0.1950ms | 5.1288 KOps/s | 4.9048 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 1.8538ms | 1.1009ms | 908.3140 Ops/s | 898.1401 Ops/s | |
| test_compile_assign_and_add_stack[compile] | 0.5367ms | 0.4267ms | 2.3438 KOps/s | 2.1862 KOps/s | |
| test_compile_assign_and_add_stack[eager] | 6.1418ms | 3.9538ms | 252.9212 Ops/s | 263.3389 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 95.5600μs | 34.3930μs | 29.0756 KOps/s | 26.6013 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 1.0826ms | 47.5016μs | 21.0519 KOps/s | 20.0337 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.1071ms | 29.7825μs | 33.5767 KOps/s | 32.7048 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 69.8400μs | 27.9961μs | 35.7193 KOps/s | 33.5615 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1037ms | 30.3641μs | 32.9336 KOps/s | 32.8119 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.1080ms | 27.8343μs | 35.9269 KOps/s | 34.4605 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.1525ms | 74.8995μs | 13.3512 KOps/s | 12.8487 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.5362ms | 26.5341μs | 37.6874 KOps/s | 35.1544 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 0.1643ms | 68.7645μs | 14.5424 KOps/s | 13.9402 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 97.9830μs | 22.7241μs | 44.0062 KOps/s | 36.9017 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 0.1844ms | 67.6443μs | 14.7832 KOps/s | 11.6915 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 63.6490μs | 22.8397μs | 43.7835 KOps/s | 35.7917 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.1578ms | 73.3223μs | 13.6384 KOps/s | 11.4425 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.9490ms | 26.4064μs | 37.8696 KOps/s | 31.5860 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 0.1537ms | 68.1420μs | 14.6752 KOps/s | 13.2602 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 63.8700μs | 22.8524μs | 43.7591 KOps/s | 41.5358 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 0.1487ms | 67.7100μs | 14.7689 KOps/s | 14.1500 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 92.7040μs | 22.9731μs | 43.5293 KOps/s | 42.0989 KOps/s | |
| test_mod_add[eager] | 89.5790μs | 27.0626μs | 36.9514 KOps/s | 40.3183 KOps/s | |
| test_mod_add[compile] | 0.1406ms | 39.6661μs | 25.2104 KOps/s | 25.8498 KOps/s | |
| test_mod_add[compile-overhead] | 86.4520μs | 38.8741μs | 25.7241 KOps/s | 25.1170 KOps/s | |
| test_mod_wrap[eager] | 0.3202ms | 0.2092ms | 4.7804 KOps/s | 4.7674 KOps/s | |
| test_mod_wrap[compile] | 0.3873ms | 0.2320ms | 4.3111 KOps/s | 4.2520 KOps/s | |
| test_mod_wrap[compile-overhead] | 0.3627ms | 0.2310ms | 4.3299 KOps/s | 4.2879 KOps/s | |
| test_mod_wrap_and_backward[eager] | 14.2373ms | 11.4446ms | 87.3774 Ops/s | 86.5955 Ops/s | |
| test_mod_wrap_and_backward[compile] | 13.9040ms | 11.7950ms | 84.7819 Ops/s | 77.5904 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 14.1129ms | 12.3865ms | 80.7331 Ops/s | 84.3387 Ops/s | |
| test_seq_add[eager] | 0.2411ms | 93.8521μs | 10.6551 KOps/s | 10.8211 KOps/s | |
| test_seq_add[compile] | 0.1334ms | 64.9264μs | 15.4021 KOps/s | 15.4690 KOps/s | |
| test_seq_add[compile-overhead] | 0.1659ms | 63.4129μs | 15.7697 KOps/s | 15.4418 KOps/s | |
| test_seq_wrap[eager] | 0.5044ms | 0.3928ms | 2.5458 KOps/s | 2.6479 KOps/s | |
| test_seq_wrap[compile] | 1.2061ms | 0.2758ms | 3.6253 KOps/s | 3.6631 KOps/s | |
| test_seq_wrap[compile-overhead] | 1.9764ms | 0.3087ms | 3.2390 KOps/s | 3.6957 KOps/s | |
| test_func_call_runtime[False-eager] | 1.6400ms | 0.5866ms | 1.7048 KOps/s | 1.8431 KOps/s | |
| test_func_call_runtime[False-compile] | 1.0058ms | 0.5365ms | 1.8640 KOps/s | 1.9669 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.6746ms | 0.5007ms | 1.9970 KOps/s | 1.9628 KOps/s | |
| test_func_call_runtime[True-eager] | 0.8620ms | 0.7449ms | 1.3425 KOps/s | 1.3195 KOps/s | |
| test_func_call_runtime[True-compile] | 0.6764ms | 0.5151ms | 1.9412 KOps/s | 1.7085 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.6309ms | 0.5114ms | 1.9554 KOps/s | 1.9159 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 1.4874ms | 0.6272ms | 1.5945 KOps/s | 1.8681 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 1.1813ms | 0.5233ms | 1.9109 KOps/s | 1.9500 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.9501ms | 0.5072ms | 1.9715 KOps/s | 1.9592 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.0119ms | 0.8664ms | 1.1542 KOps/s | 1.1217 KOps/s | |
| test_func_call_cm_runtime[True-compile] | 1.0509ms | 0.7300ms | 1.3699 KOps/s | 1.1478 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 1.1779ms | 0.7360ms | 1.3587 KOps/s | 1.3190 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.6206ms | 1.8649ms | 536.2259 Ops/s | 530.0275 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 2.4900ms | 1.8917ms | 528.6301 Ops/s | 514.2960 Ops/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 2.7097ms | 1.9254ms | 519.3708 Ops/s | 508.7570 Ops/s | |
| test_distributed | 0.2521ms | 0.1259ms | 7.9403 KOps/s | 7.6638 KOps/s | |
| test_tdmodule | 46.8680μs | 20.2493μs | 49.3845 KOps/s | 57.8097 KOps/s | |
| test_tdmodule_dispatch | 72.4250μs | 39.3421μs | 25.4181 KOps/s | 29.2806 KOps/s | |
| test_tdseq | 42.4200μs | 22.8410μs | 43.7809 KOps/s | 51.2625 KOps/s | |
| test_tdseq_dispatch | 68.6090μs | 44.7773μs | 22.3327 KOps/s | 25.3506 KOps/s | |
| test_instantiation_functorch | 2.5220ms | 1.5832ms | 631.6355 Ops/s | 618.5979 Ops/s | |
| test_instantiation_td | 1.9181ms | 1.1624ms | 860.2716 Ops/s | 847.2131 Ops/s | |
| test_exec_functorch | 0.3502ms | 0.1862ms | 5.3705 KOps/s | 5.4202 KOps/s | |
| test_exec_functional_call | 0.3433ms | 0.1703ms | 5.8732 KOps/s | 5.5886 KOps/s | |
| test_exec_td | 0.3052ms | 0.1689ms | 5.9208 KOps/s | 5.8982 KOps/s | |
| test_exec_td_decorator | 0.3462ms | 0.2223ms | 4.4986 KOps/s | 4.4003 KOps/s | |
| test_vmap_mlp_speed[True-True] | 0.8025ms | 0.6509ms | 1.5364 KOps/s | 1.4939 KOps/s | |
| test_vmap_mlp_speed[True-False] | 0.8657ms | 0.6497ms | 1.5391 KOps/s | 1.5251 KOps/s | |
| test_vmap_mlp_speed[False-True] | 0.8067ms | 0.4952ms | 2.0194 KOps/s | 1.9964 KOps/s | |
| test_vmap_mlp_speed[False-False] | 0.6713ms | 0.4944ms | 2.0225 KOps/s | 1.9879 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.3925ms | 0.6225ms | 1.6065 KOps/s | 1.5919 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 0.9574ms | 0.6232ms | 1.6047 KOps/s | 1.5784 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.7795ms | 0.5092ms | 1.9637 KOps/s | 1.9100 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.7310ms | 0.5082ms | 1.9679 KOps/s | 1.9270 KOps/s | |
| test_to_module_speed[True] | 2.0503ms | 1.3158ms | 759.9883 Ops/s | 765.3669 Ops/s | |
| test_to_module_speed[False] | 1.9691ms | 1.2763ms | 783.4966 Ops/s | 790.3830 Ops/s | |
| test_tc_init | 0.1076ms | 47.5272μs | 21.0406 KOps/s | 23.5330 KOps/s | |
| test_tc_init_nested | 0.1678ms | 94.2758μs | 10.6072 KOps/s | 11.6543 KOps/s | |
| test_tc_first_layer_tensor | 16.1900μs | 1.5548μs | 643.1893 KOps/s | 651.5511 KOps/s | |
| test_tc_first_layer_nontensor | 28.9640μs | 4.7885μs | 208.8345 KOps/s | 216.6973 KOps/s | |
| test_tc_second_layer_tensor | 38.6330μs | 2.8415μs | 351.9266 KOps/s | 359.1640 KOps/s | |
| test_tc_second_layer_nontensor | 43.7920μs | 6.1898μs | 161.5550 KOps/s | 167.1201 KOps/s | |
| test_unbind | 0.4589s | 14.9766ms | 66.7708 Ops/s | 77.5994 Ops/s | |
| test_full_like | 7.7109ms | 6.8349ms | 146.3078 Ops/s | 142.0332 Ops/s | |
| test_zeros_like | 2.9933ms | 2.6335ms | 379.7248 Ops/s | 372.2304 Ops/s | |
| test_ones_like | 10.0016ms | 5.8865ms | 169.8805 Ops/s | 326.0933 Ops/s | |
| test_clone | 13.9636ms | 7.7088ms | 129.7223 Ops/s | 207.5325 Ops/s | |
| test_squeeze | 75.4920μs | 13.0917μs | 76.3840 KOps/s | 79.2947 KOps/s | |
| test_unsqueeze | 0.1694ms | 90.0702μs | 11.1025 KOps/s | 10.4247 KOps/s | |
| test_split | 0.5553ms | 0.1893ms | 5.2828 KOps/s | 4.9494 KOps/s | |
| test_permute | 0.3666ms | 0.2171ms | 4.6071 KOps/s | 4.3818 KOps/s | |
| test_stack | 26.1090ms | 24.1343ms | 41.4349 Ops/s | 41.8202 Ops/s | |
| test_cat | 26.2814ms | 23.8884ms | 41.8613 Ops/s | 41.8339 Ops/s |
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 0.1423ms | 13.4781μs | 74.1944 KOps/s | 73.2826 KOps/s | |
| test_plain_set_stack_nested | 36.0900μs | 13.6054μs | 73.5004 KOps/s | 73.0380 KOps/s | |
| test_plain_set_nested_inplace | 64.2610μs | 14.5842μs | 68.5676 KOps/s | 67.6228 KOps/s | |
| test_plain_set_stack_nested_inplace | 49.4900μs | 14.6887μs | 68.0794 KOps/s | 67.9551 KOps/s | |
| test_items | 36.9110μs | 2.9916μs | 334.2701 KOps/s | 348.2568 KOps/s | |
| test_items_nested | 0.3791ms | 0.3251ms | 3.0762 KOps/s | 3.0654 KOps/s | |
| test_items_nested_locked | 0.3684ms | 0.3235ms | 3.0911 KOps/s | 3.0427 KOps/s | |
| test_items_nested_leaf | 79.2720μs | 55.4711μs | 18.0274 KOps/s | 17.8418 KOps/s | |
| test_items_stack_nested | 0.3734ms | 0.3241ms | 3.0859 KOps/s | 3.0214 KOps/s | |
| test_items_stack_nested_leaf | 83.2910μs | 55.9829μs | 17.8626 KOps/s | 17.4104 KOps/s | |
| test_items_stack_nested_locked | 0.3757ms | 0.3246ms | 3.0804 KOps/s | 2.9910 KOps/s | |
| test_keys | 39.0510μs | 3.4001μs | 294.1129 KOps/s | 294.5316 KOps/s | |
| test_keys_nested | 87.0010μs | 56.6905μs | 17.6396 KOps/s | 17.9659 KOps/s | |
| test_keys_nested_locked | 2.5389ms | 62.6206μs | 15.9692 KOps/s | 15.9527 KOps/s | |
| test_keys_nested_leaf | 74.1610μs | 46.0415μs | 21.7195 KOps/s | 21.0003 KOps/s | |
| test_keys_stack_nested | 89.3110μs | 56.4150μs | 17.7258 KOps/s | 17.5387 KOps/s | |
| test_keys_stack_nested_leaf | 79.6110μs | 47.5032μs | 21.0512 KOps/s | 21.1425 KOps/s | |
| test_keys_stack_nested_locked | 90.1420μs | 62.1923μs | 16.0792 KOps/s | 15.9650 KOps/s | |
| test_values | 3.5830μs | 0.8362μs | 1.1958 MOps/s | 1.1847 MOps/s | |
| test_values_nested | 70.4610μs | 41.2954μs | 24.2158 KOps/s | 24.1881 KOps/s | |
| test_values_nested_locked | 77.5310μs | 43.2008μs | 23.1477 KOps/s | 23.2035 KOps/s | |
| test_values_nested_leaf | 64.2810μs | 35.8045μs | 27.9295 KOps/s | 27.8958 KOps/s | |
| test_values_stack_nested | 76.2720μs | 41.4895μs | 24.1025 KOps/s | 23.6376 KOps/s | |
| test_values_stack_nested_leaf | 69.0610μs | 36.3362μs | 27.5207 KOps/s | 27.4096 KOps/s | |
| test_values_stack_nested_locked | 76.3910μs | 43.3493μs | 23.0684 KOps/s | 22.7081 KOps/s | |
| test_membership | 1.9535μs | 0.5007μs | 1.9973 MOps/s | 1.9903 MOps/s | |
| test_membership_nested | 15.2805μs | 1.8690μs | 535.0445 KOps/s | 514.6768 KOps/s | |
| test_membership_nested_leaf | 11.7800μs | 1.8409μs | 543.2007 KOps/s | 515.2955 KOps/s | |
| test_membership_stacked_nested | 48.2700μs | 1.9137μs | 522.5501 KOps/s | 513.5835 KOps/s | |
| test_membership_stacked_nested_leaf | 27.0300μs | 1.9453μs | 514.0492 KOps/s | 514.4970 KOps/s | |
| test_membership_nested_last | 34.0000μs | 2.7752μs | 360.3346 KOps/s | 358.3612 KOps/s | |
| test_membership_nested_leaf_last | 36.4400μs | 2.7970μs | 357.5280 KOps/s | 359.3850 KOps/s | |
| test_membership_stacked_nested_last | 33.6210μs | 2.7900μs | 358.4272 KOps/s | 305.3557 KOps/s | |
| test_membership_stacked_nested_leaf_last | 44.4010μs | 2.8015μs | 356.9518 KOps/s | 316.6232 KOps/s | |
| test_nested_getleaf | 53.5610μs | 6.0785μs | 164.5154 KOps/s | 164.8446 KOps/s | |
| test_nested_get | 30.1400μs | 5.6843μs | 175.9231 KOps/s | 175.7009 KOps/s | |
| test_stacked_getleaf | 49.0810μs | 6.0303μs | 165.8291 KOps/s | 165.0051 KOps/s | |
| test_stacked_get | 39.1100μs | 5.6837μs | 175.9431 KOps/s | 177.7379 KOps/s | |
| test_nested_getitemleaf | 30.5700μs | 6.1464μs | 162.6968 KOps/s | 163.3697 KOps/s | |
| test_nested_getitem | 36.0600μs | 5.7376μs | 174.2878 KOps/s | 175.3553 KOps/s | |
| test_stacked_getitemleaf | 47.1010μs | 6.1323μs | 163.0696 KOps/s | 164.5605 KOps/s | |
| test_stacked_getitem | 37.8700μs | 5.7496μs | 173.9244 KOps/s | 176.9446 KOps/s | |
| test_lock_nested | 4.6849ms | 0.4198ms | 2.3822 KOps/s | 2.3853 KOps/s | |
| test_lock_stack_nested | 0.4349ms | 0.3783ms | 2.6434 KOps/s | 2.6540 KOps/s | |
| test_unlock_nested | 0.7371ms | 0.3542ms | 2.8236 KOps/s | 2.8306 KOps/s | |
| test_unlock_stack_nested | 0.3784ms | 0.3186ms | 3.1383 KOps/s | 3.1490 KOps/s | |
| test_flatten_speed | 0.1185ms | 68.8184μs | 14.5310 KOps/s | 14.5185 KOps/s | |
| test_unflatten_speed | 0.3390ms | 0.2869ms | 3.4851 KOps/s | 3.4913 KOps/s | |
| test_common_ops | 1.4942ms | 1.2040ms | 830.5443 Ops/s | 815.1284 Ops/s | |
| test_creation | 29.7910μs | 1.4717μs | 679.5032 KOps/s | 694.4175 KOps/s | |
| test_creation_empty | 51.3410μs | 14.7177μs | 67.9454 KOps/s | 68.6038 KOps/s | |
| test_creation_nested_1 | 49.1710μs | 16.5256μs | 60.5121 KOps/s | 60.2004 KOps/s | |
| test_creation_nested_2 | 51.5700μs | 18.9357μs | 52.8104 KOps/s | 52.1240 KOps/s | |
| test_clone | 61.1110μs | 27.9360μs | 35.7961 KOps/s | 33.1360 KOps/s | |
| test_getitem[int] | 91.6347ms | 22.8879μs | 43.6913 KOps/s | 64.0822 KOps/s | |
| test_getitem[slice_int] | 0.1158ms | 27.2425μs | 36.7073 KOps/s | 36.7136 KOps/s | |
| test_getitem[range] | 0.2442ms | 0.1077ms | 9.2888 KOps/s | 9.2332 KOps/s | |
| test_getitem[tuple] | 0.1230ms | 23.3292μs | 42.8647 KOps/s | 42.6426 KOps/s | |
| test_getitem[list] | 0.1894ms | 96.7958μs | 10.3310 KOps/s | 10.2136 KOps/s | |
| test_setitem_dim[int] | 67.9210μs | 43.5903μs | 22.9409 KOps/s | 22.5965 KOps/s | |
| test_setitem_dim[slice_int] | 92.6110μs | 66.5557μs | 15.0250 KOps/s | 15.1524 KOps/s | |
| test_setitem_dim[range] | 0.1698ms | 0.1246ms | 8.0238 KOps/s | 7.9956 KOps/s | |
| test_setitem_dim[tuple] | 95.7410μs | 59.9529μs | 16.6798 KOps/s | 16.5007 KOps/s | |
| test_setitem | 78.6520μs | 40.1610μs | 24.8998 KOps/s | 24.3346 KOps/s | |
| test_set | 70.6310μs | 39.2072μs | 25.5055 KOps/s | 24.0163 KOps/s | |
| test_set_shared | 0.3532ms | 49.0214μs | 20.3992 KOps/s | 19.6759 KOps/s | |
| test_update | 86.9520μs | 46.8548μs | 21.3425 KOps/s | 20.7506 KOps/s | |
| test_update_nested | 81.2620μs | 54.0440μs | 18.5035 KOps/s | 18.1390 KOps/s | |
| test_update__nested | 95.8710μs | 56.7743μs | 17.6136 KOps/s | 17.3217 KOps/s | |
| test_set_nested | 72.6110μs | 41.4062μs | 24.1510 KOps/s | 23.0828 KOps/s | |
| test_set_nested_new | 83.5210μs | 44.8928μs | 22.2753 KOps/s | 21.5610 KOps/s | |
| test_select | 0.5408ms | 58.4989μs | 17.0943 KOps/s | 16.8132 KOps/s | |
| test_select_nested | 79.8220μs | 41.7820μs | 23.9337 KOps/s | 23.7951 KOps/s | |
| test_exclude_nested | 84.0010μs | 58.5599μs | 17.0765 KOps/s | 17.3399 KOps/s | |
| test_empty[True] | 0.2837ms | 0.2428ms | 4.1182 KOps/s | 4.0786 KOps/s | |
| test_empty[False] | 3.4431μs | 0.7447μs | 1.3428 MOps/s | 1.3458 MOps/s | |
| test_to | 56.1210μs | 25.6997μs | 38.9109 KOps/s | 38.4271 KOps/s | |
| test_to_nonblocking | 60.7810μs | 24.3804μs | 41.0166 KOps/s | 39.9256 KOps/s | |
| test_unbind_speed | 0.3062ms | 0.2779ms | 3.5981 KOps/s | 3.6549 KOps/s | |
| test_unbind_speed_stack0 | 0.3674ms | 0.2788ms | 3.5865 KOps/s | 3.6690 KOps/s | |
| test_unbind_speed_stack1 | 90.6798ms | 0.7130ms | 1.4026 KOps/s | 1.4024 KOps/s | |
| test_split | 93.3989ms | 2.1838ms | 457.9092 Ops/s | 469.7142 Ops/s | |
| test_chunk | 93.5502ms | 2.1884ms | 456.9505 Ops/s | 469.1022 Ops/s | |
| test_creation[device0] | 0.3892ms | 0.1255ms | 7.9687 KOps/s | 8.0338 KOps/s | |
| test_creation_from_tensor | 0.4277ms | 0.1282ms | 7.7985 KOps/s | 7.5976 KOps/s | |
| test_add_one[memmap_tensor0] | 0.2908ms | 8.5004μs | 117.6422 KOps/s | 116.0175 KOps/s | |
| test_contiguous[memmap_tensor0] | 31.4810μs | 2.1005μs | 476.0829 KOps/s | 474.1897 KOps/s | |
| test_stack[memmap_tensor0] | 37.3400μs | 6.6454μs | 150.4806 KOps/s | 156.7164 KOps/s | |
| test_memmaptd_index | 1.1140ms | 0.4074ms | 2.4549 KOps/s | 2.4488 KOps/s | |
| test_memmaptd_index_astensor | 0.9738ms | 0.4675ms | 2.1389 KOps/s | 2.1507 KOps/s | |
| test_memmaptd_index_op | 1.3989ms | 0.9843ms | 1.0159 KOps/s | 1.0195 KOps/s | |
| test_serialize_model | 0.1300s | 0.1291s | 7.7456 Ops/s | 7.6917 Ops/s | |
| test_serialize_model_pickle | 1.3765s | 1.2175s | 0.8213 Ops/s | 0.8241 Ops/s | |
| test_serialize_weights | 0.2173s | 0.1416s | 7.0630 Ops/s | 7.0187 Ops/s | |
| test_serialize_weights_returnearly | 0.2168s | 54.3828ms | 18.3882 Ops/s | 17.9311 Ops/s | |
| test_serialize_weights_pickle | 1.3767s | 1.2172s | 0.8216 Ops/s | 0.8254 Ops/s | |
| test_reshape_pytree | 67.9810μs | 35.2877μs | 28.3384 KOps/s | 28.9935 KOps/s | |
| test_reshape_td | 69.7510μs | 41.9600μs | 23.8322 KOps/s | 23.6955 KOps/s | |
| test_view_pytree | 0.4254ms | 34.9150μs | 28.6410 KOps/s | 28.9056 KOps/s | |
| test_view_td | 81.4410μs | 47.6700μs | 20.9776 KOps/s | 21.3273 KOps/s | |
| test_unbind_pytree | 0.4226ms | 33.3162μs | 30.0154 KOps/s | 29.5940 KOps/s | |
| test_unbind_td | 0.5281ms | 42.0003μs | 23.8093 KOps/s | 23.5115 KOps/s | |
| test_split_pytree | 0.4154ms | 46.3190μs | 21.5894 KOps/s | 22.3112 KOps/s | |
| test_split_td | 96.7141ms | 68.4361μs | 14.6122 KOps/s | 18.2464 KOps/s | |
| test_add_pytree | 92.4020μs | 53.0035μs | 18.8667 KOps/s | 18.1045 KOps/s | |
| test_add_td | 0.4709ms | 86.0537μs | 11.6207 KOps/s | 11.0315 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.4039ms | 0.2057ms | 4.8609 KOps/s | 4.7091 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.2255ms | 0.1505ms | 6.6427 KOps/s | 6.6105 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.2029ms | 0.1414ms | 7.0717 KOps/s | 7.0729 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.2379ms | 0.1776ms | 5.6309 KOps/s | 5.5503 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 57.3210μs | 21.4823μs | 46.5500 KOps/s | 47.0058 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 80.6010μs | 43.6085μs | 22.9313 KOps/s | 22.8701 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.2058ms | 63.5024μs | 15.7474 KOps/s | 15.6644 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 98.7120μs | 49.1645μs | 20.3399 KOps/s | 20.5237 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.3993ms | 0.3088ms | 3.2382 KOps/s | 3.2327 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.2695ms | 0.2082ms | 4.8027 KOps/s | 4.7757 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.1654ms | 0.1246ms | 8.0266 KOps/s | 7.6707 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1099ms | 59.7617μs | 16.7331 KOps/s | 16.1501 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.4120ms | 0.3069ms | 3.2585 KOps/s | 3.2076 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.6422ms | 0.5982ms | 1.6716 KOps/s | 1.6327 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.3228ms | 0.2478ms | 4.0355 KOps/s | 4.0178 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.3732ms | 0.3094ms | 3.2317 KOps/s | 3.2326 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1279ms | 69.6745μs | 14.3525 KOps/s | 14.2795 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.1802ms | 0.1251ms | 7.9936 KOps/s | 7.7907 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.6173ms | 0.5167ms | 1.9353 KOps/s | 1.8992 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.3728ms | 0.3071ms | 3.2560 KOps/s | 3.2115 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 75.6610μs | 17.6717μs | 56.5875 KOps/s | 50.9541 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 65.5700μs | 27.1981μs | 36.7672 KOps/s | 36.6189 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 98.6310μs | 68.2908μs | 14.6433 KOps/s | 14.6635 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 84.6010μs | 51.7496μs | 19.3238 KOps/s | 19.4489 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 2.3322ms | 0.8269ms | 1.2094 KOps/s | 1.1420 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.3595ms | 3.1376ms | 318.7135 Ops/s | 317.2517 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 2.3156ms | 0.8071ms | 1.2389 KOps/s | 1.1363 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 3.3668ms | 3.1833ms | 314.1382 Ops/s | 308.4233 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.1627ms | 0.1124ms | 8.8947 KOps/s | 9.4667 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.2040ms | 66.0920μs | 15.1304 KOps/s | 16.6217 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.1535ms | 0.1005ms | 9.9527 KOps/s | 9.9968 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.1212ms | 44.7222μs | 22.3602 KOps/s | 23.6388 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1465ms | 0.1072ms | 9.3321 KOps/s | 9.4625 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 93.1620μs | 45.3779μs | 22.0372 KOps/s | 22.1496 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.2134ms | 0.1374ms | 7.2798 KOps/s | 7.4917 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2048ms | 25.8398μs | 38.7000 KOps/s | 40.2366 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 0.1936ms | 0.1355ms | 7.3815 KOps/s | 7.8344 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 65.0010μs | 21.9263μs | 45.6073 KOps/s | 49.5386 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 0.2656ms | 0.1307ms | 7.6498 KOps/s | 7.8626 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 61.6500μs | 20.7714μs | 48.1431 KOps/s | 49.3627 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.1968ms | 0.1357ms | 7.3690 KOps/s | 7.4501 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.4767ms | 25.2125μs | 39.6628 KOps/s | 40.0491 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 0.2586ms | 0.1360ms | 7.3519 KOps/s | 7.8455 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2149ms | 27.5826μs | 36.2547 KOps/s | 50.1291 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 0.1839ms | 0.1368ms | 7.3087 KOps/s | 7.7988 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 55.0310μs | 22.0881μs | 45.2733 KOps/s | 50.7824 KOps/s | |
| test_mod_add[eager] | 67.2910μs | 33.0604μs | 30.2477 KOps/s | 32.3842 KOps/s | |
| test_mod_add[compile] | 0.3469ms | 69.3055μs | 14.4289 KOps/s | 14.6400 KOps/s | |
| test_mod_add[compile-overhead] | 0.2637ms | 0.1347ms | 7.4265 KOps/s | 6.8346 KOps/s | |
| test_mod_wrap[eager] | 0.3451ms | 0.2356ms | 4.2448 KOps/s | 4.1297 KOps/s | |
| test_mod_wrap[compile] | 0.3576ms | 0.2866ms | 3.4886 KOps/s | 3.4189 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.7496ms | 4.0984ms | 244.0001 Ops/s | 246.0708 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.4370ms | 1.3399ms | 746.3274 Ops/s | 700.4672 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.5196ms | 1.2918ms | 774.1379 Ops/s | 708.7194 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.3396ms | 0.8930ms | 1.1198 KOps/s | 998.3503 Ops/s | |
| test_seq_add[eager] | 0.2208ms | 98.0737μs | 10.1964 KOps/s | 10.2953 KOps/s | |
| test_seq_add[compile] | 0.1812ms | 81.4467μs | 12.2780 KOps/s | 12.5827 KOps/s | |
| test_seq_add[compile-overhead] | 0.1708ms | 0.1145ms | 8.7346 KOps/s | 8.8979 KOps/s | |
| test_seq_wrap[eager] | 0.5127ms | 0.3908ms | 2.5592 KOps/s | 2.5929 KOps/s | |
| test_seq_wrap[compile] | 0.7132ms | 0.3049ms | 3.2797 KOps/s | 3.2296 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.2642ms | 0.2157ms | 4.6364 KOps/s | 4.6080 KOps/s | |
| test_func_call_runtime[False-eager] | 1.1382ms | 0.7265ms | 1.3764 KOps/s | 1.3753 KOps/s | |
| test_func_call_runtime[False-compile] | 1.1739ms | 0.7582ms | 1.3190 KOps/s | 1.3025 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.7536ms | 0.3552ms | 2.8153 KOps/s | 2.8393 KOps/s | |
| test_func_call_runtime[True-eager] | 1.3272ms | 0.8939ms | 1.1187 KOps/s | 1.1260 KOps/s | |
| test_func_call_runtime[True-compile] | 1.1878ms | 0.7839ms | 1.2757 KOps/s | 1.2685 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.4801ms | 0.3760ms | 2.6597 KOps/s | 2.6663 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 0.8648ms | 0.7202ms | 1.3885 KOps/s | 1.3694 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 0.9046ms | 0.7613ms | 1.3135 KOps/s | 1.2941 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.4481ms | 0.3541ms | 2.8243 KOps/s | 2.8142 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.1103ms | 0.9794ms | 1.0211 KOps/s | 1.0053 KOps/s | |
| test_func_call_cm_runtime[True-compile] | 1.2193ms | 0.8125ms | 1.2307 KOps/s | 1.2202 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5348ms | 0.3978ms | 2.5141 KOps/s | 2.4910 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.5006ms | 2.0566ms | 486.2302 Ops/s | 484.5449 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 0.9187ms | 0.8241ms | 1.2135 KOps/s | 1.1996 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.5394ms | 0.4023ms | 2.4857 KOps/s | 2.4795 KOps/s | |
| test_distributed | 3.2547ms | 0.1730ms | 5.7802 KOps/s | 8.9335 KOps/s | |
| test_tdmodule | 24.5300μs | 14.2490μs | 70.1805 KOps/s | 66.9259 KOps/s | |
| test_tdmodule_dispatch | 49.3610μs | 27.8747μs | 35.8749 KOps/s | 35.6520 KOps/s | |
| test_tdseq | 43.8900μs | 15.2888μs | 65.4076 KOps/s | 65.0597 KOps/s | |
| test_tdseq_dispatch | 53.0710μs | 30.4529μs | 32.8376 KOps/s | 32.8196 KOps/s | |
| test_instantiation_functorch | 1.9859ms | 1.8259ms | 547.6637 Ops/s | 543.9815 Ops/s | |
| test_instantiation_td | 1.8240ms | 1.1813ms | 846.5333 Ops/s | 843.9704 Ops/s | |
| test_exec_functorch | 0.2716ms | 0.2131ms | 4.6929 KOps/s | 4.8045 KOps/s | |
| test_exec_functional_call | 0.2947ms | 0.2191ms | 4.5639 KOps/s | 4.8396 KOps/s | |
| test_exec_td | 0.2745ms | 0.2282ms | 4.3820 KOps/s | 4.4992 KOps/s | |
| test_exec_td_decorator | 0.7420ms | 0.2690ms | 3.7171 KOps/s | 3.7538 KOps/s | |
| test_vmap_mlp_speed[True-True] | 0.7893ms | 0.6726ms | 1.4867 KOps/s | 1.4454 KOps/s | |
| test_vmap_mlp_speed[True-False] | 0.8065ms | 0.6702ms | 1.4920 KOps/s | 1.4535 KOps/s | |
| test_vmap_mlp_speed[False-True] | 0.6852ms | 0.5638ms | 1.7736 KOps/s | 1.6896 KOps/s | |
| test_vmap_mlp_speed[False-False] | 0.6811ms | 0.5622ms | 1.7788 KOps/s | 1.6746 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.1304ms | 0.6855ms | 1.4588 KOps/s | 1.4663 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 0.8242ms | 0.6941ms | 1.4407 KOps/s | 1.4759 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.7708ms | 0.6081ms | 1.6444 KOps/s | 1.6413 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.7134ms | 0.6089ms | 1.6423 KOps/s | 1.6762 KOps/s | |
| test_vmap_transformer_speed[True-True] | 8.3545ms | 8.2208ms | 121.6427 Ops/s | 120.6177 Ops/s | |
| test_vmap_transformer_speed[True-False] | 8.5564ms | 8.2000ms | 121.9518 Ops/s | 120.4110 Ops/s | |
| test_vmap_transformer_speed[False-True] | 8.1616ms | 8.0031ms | 124.9516 Ops/s | 123.7449 Ops/s | |
| test_vmap_transformer_speed[False-False] | 8.1863ms | 8.0438ms | 124.3200 Ops/s | 123.4753 Ops/s | |
| test_vmap_transformer_speed_decorator[True-True] | 19.4271ms | 19.2599ms | 51.9213 Ops/s | 51.7660 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 20.2922ms | 19.2877ms | 51.8465 Ops/s | 51.7339 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 20.0554ms | 19.7287ms | 50.6874 Ops/s | 52.1247 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.1971ms | 19.4122ms | 51.5141 Ops/s | 52.1079 Ops/s | |
| test_to_module_speed[True] | 2.0420ms | 0.9491ms | 1.0536 KOps/s | 1.0726 KOps/s | |
| test_to_module_speed[False] | 1.1293ms | 0.9207ms | 1.0861 KOps/s | 1.0841 KOps/s | |
| test_tc_init | 61.9210μs | 33.3983μs | 29.9417 KOps/s | 31.5204 KOps/s | |
| test_tc_init_nested | 0.1102ms | 66.9339μs | 14.9401 KOps/s | 15.1638 KOps/s | |
| test_tc_first_layer_tensor | 14.6246μs | 0.6621μs | 1.5104 MOps/s | 1.5164 MOps/s | |
| test_tc_first_layer_nontensor | 25.9300μs | 2.2092μs | 452.6463 KOps/s | 454.0020 KOps/s | |
| test_tc_second_layer_tensor | 10.2275μs | 1.3684μs | 730.7956 KOps/s | 742.5225 KOps/s | |
| test_tc_second_layer_nontensor | 0.1051ms | 2.8973μs | 345.1443 KOps/s | 349.0280 KOps/s | |
| test_unbind | 0.1944s | 12.2077ms | 81.9153 Ops/s | 92.7359 Ops/s | |
| test_full_like | 0.6736ms | 0.5761ms | 1.7357 KOps/s | 1.7400 KOps/s | |
| test_zeros_like | 0.2610ms | 0.1979ms | 5.0532 KOps/s | 5.0519 KOps/s | |
| test_ones_like | 0.2403ms | 0.1980ms | 5.0517 KOps/s | 5.0559 KOps/s | |
| test_clone | 0.4445ms | 0.4146ms | 2.4121 KOps/s | 2.4117 KOps/s | |
| test_squeeze | 33.3410μs | 9.9399μs | 100.6051 KOps/s | 98.7886 KOps/s | |
| test_unsqueeze | 0.2212ms | 76.3118μs | 13.1041 KOps/s | 13.4892 KOps/s | |
| test_split | 0.4340ms | 0.1582ms | 6.3223 KOps/s | 6.3652 KOps/s | |
| test_permute | 0.2225ms | 0.1767ms | 5.6588 KOps/s | 5.6490 KOps/s | |
| test_stack | 1.2550ms | 0.8700ms | 1.1495 KOps/s | 1.1713 KOps/s | |
| test_cat | 1.2521ms | 1.2320ms | 811.6905 Ops/s | 811.9882 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):