Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Jul 15, 2024

Description

Describe your changes in detail.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 15, 2024
@github-actions
Copy link

github-actions bot commented Jul 15, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}19$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 37.0690μs 17.3290μs 57.7069 KOps/s 56.8121 KOps/s $\color{#35bf28}+1.57\%$
test_plain_set_stack_nested 34.5550μs 17.6516μs 56.6521 KOps/s 55.8475 KOps/s $\color{#35bf28}+1.44\%$
test_plain_set_nested_inplace 76.0220μs 19.1295μs 52.2753 KOps/s 51.1032 KOps/s $\color{#35bf28}+2.29\%$
test_plain_set_stack_nested_inplace 64.0590μs 19.1134μs 52.3192 KOps/s 50.9084 KOps/s $\color{#35bf28}+2.77\%$
test_items 31.7200μs 2.6141μs 382.5422 KOps/s 315.6673 KOps/s $\textbf{\color{#35bf28}+21.19\%}$
test_items_nested 0.5159ms 0.3654ms 2.7369 KOps/s 2.7019 KOps/s $\color{#35bf28}+1.29\%$
test_items_nested_locked 0.9281ms 0.3678ms 2.7192 KOps/s 2.7399 KOps/s $\color{#d91a1a}-0.75\%$
test_items_nested_leaf 0.1581ms 88.0219μs 11.3608 KOps/s 11.6996 KOps/s $\color{#d91a1a}-2.90\%$
test_items_stack_nested 0.4556ms 0.3679ms 2.7184 KOps/s 2.7179 KOps/s $\color{#35bf28}+0.02\%$
test_items_stack_nested_leaf 0.1551ms 87.4050μs 11.4410 KOps/s 11.9028 KOps/s $\color{#d91a1a}-3.88\%$
test_items_stack_nested_locked 0.7448ms 0.3660ms 2.7324 KOps/s 2.6989 KOps/s $\color{#35bf28}+1.24\%$
test_keys 44.4830μs 3.8905μs 257.0389 KOps/s 259.5611 KOps/s $\color{#d91a1a}-0.97\%$
test_keys_nested 0.2456ms 0.1437ms 6.9578 KOps/s 6.9272 KOps/s $\color{#35bf28}+0.44\%$
test_keys_nested_locked 0.7143ms 0.1486ms 6.7303 KOps/s 6.6631 KOps/s $\color{#35bf28}+1.01\%$
test_keys_nested_leaf 0.2206ms 0.1221ms 8.1899 KOps/s 8.1363 KOps/s $\color{#35bf28}+0.66\%$
test_keys_stack_nested 0.2690ms 0.1417ms 7.0566 KOps/s 6.9935 KOps/s $\color{#35bf28}+0.90\%$
test_keys_stack_nested_leaf 0.2104ms 0.1220ms 8.1939 KOps/s 8.1170 KOps/s $\color{#35bf28}+0.95\%$
test_keys_stack_nested_locked 0.2523ms 0.1473ms 6.7869 KOps/s 6.7118 KOps/s $\color{#35bf28}+1.12\%$
test_values 6.0392μs 1.1446μs 873.6376 KOps/s 862.3896 KOps/s $\color{#35bf28}+1.30\%$
test_values_nested 93.4750μs 48.5162μs 20.6117 KOps/s 20.0532 KOps/s $\color{#35bf28}+2.78\%$
test_values_nested_locked 0.2126ms 49.4295μs 20.2308 KOps/s 20.2430 KOps/s $\color{#d91a1a}-0.06\%$
test_values_nested_leaf 88.9860μs 43.6787μs 22.8944 KOps/s 22.3397 KOps/s $\color{#35bf28}+2.48\%$
test_values_stack_nested 0.1136ms 50.1616μs 19.9356 KOps/s 19.6490 KOps/s $\color{#35bf28}+1.46\%$
test_values_stack_nested_leaf 94.0160μs 44.0131μs 22.7205 KOps/s 22.9235 KOps/s $\color{#d91a1a}-0.89\%$
test_values_stack_nested_locked 91.1310μs 49.8000μs 20.0803 KOps/s 19.7143 KOps/s $\color{#35bf28}+1.86\%$
test_membership 20.7090μs 0.9040μs 1.1062 MOps/s 1.3595 MOps/s $\textbf{\color{#d91a1a}-18.63\%}$
test_membership_nested 95.8690μs 2.6740μs 373.9663 KOps/s 363.8171 KOps/s $\color{#35bf28}+2.79\%$
test_membership_nested_leaf 0.1075ms 2.7201μs 367.6388 KOps/s 367.2169 KOps/s $\color{#35bf28}+0.11\%$
test_membership_stacked_nested 17.8330μs 2.6760μs 373.6963 KOps/s 371.1984 KOps/s $\color{#35bf28}+0.67\%$
test_membership_stacked_nested_leaf 27.4810μs 2.6756μs 373.7453 KOps/s 373.3663 KOps/s $\color{#35bf28}+0.10\%$
test_membership_nested_last 28.8740μs 3.9466μs 253.3812 KOps/s 250.8092 KOps/s $\color{#35bf28}+1.03\%$
test_membership_nested_leaf_last 48.8310μs 3.9912μs 250.5539 KOps/s 249.7861 KOps/s $\color{#35bf28}+0.31\%$
test_membership_stacked_nested_last 45.5150μs 7.4855μs 133.5913 KOps/s 78.0405 KOps/s $\textbf{\color{#35bf28}+71.18\%}$
test_membership_stacked_nested_leaf_last 51.1250μs 7.4638μs 133.9798 KOps/s 77.8508 KOps/s $\textbf{\color{#35bf28}+72.10\%}$
test_nested_getleaf 90.1780μs 10.9785μs 91.0869 KOps/s 93.5408 KOps/s $\color{#d91a1a}-2.62\%$
test_nested_get 92.5030μs 10.3791μs 96.3473 KOps/s 98.0410 KOps/s $\color{#d91a1a}-1.73\%$
test_stacked_getleaf 57.3270μs 10.7885μs 92.6912 KOps/s 94.1451 KOps/s $\color{#d91a1a}-1.54\%$
test_stacked_get 0.1165ms 10.2268μs 97.7818 KOps/s 99.1930 KOps/s $\color{#d91a1a}-1.42\%$
test_nested_getitemleaf 45.9760μs 11.2394μs 88.9728 KOps/s 88.8566 KOps/s $\color{#35bf28}+0.13\%$
test_nested_getitem 58.4310μs 10.3497μs 96.6210 KOps/s 96.7850 KOps/s $\color{#d91a1a}-0.17\%$
test_stacked_getitemleaf 36.6290μs 11.1799μs 89.4463 KOps/s 89.6279 KOps/s $\color{#d91a1a}-0.20\%$
test_stacked_getitem 0.1130ms 10.3801μs 96.3380 KOps/s 96.8040 KOps/s $\color{#d91a1a}-0.48\%$
test_lock_nested 7.7477ms 0.4589ms 2.1790 KOps/s 2.3014 KOps/s $\textbf{\color{#d91a1a}-5.32\%}$
test_lock_stack_nested 0.6402ms 0.4253ms 2.3511 KOps/s 2.5029 KOps/s $\textbf{\color{#d91a1a}-6.07\%}$
test_unlock_nested 0.9485ms 0.3753ms 2.6645 KOps/s 2.3764 KOps/s $\textbf{\color{#35bf28}+12.13\%}$
test_unlock_stack_nested 0.5840ms 0.3415ms 2.9280 KOps/s 3.2011 KOps/s $\textbf{\color{#d91a1a}-8.53\%}$
test_flatten_speed 0.5113ms 0.1046ms 9.5633 KOps/s 9.5300 KOps/s $\color{#35bf28}+0.35\%$
test_unflatten_speed 0.7922ms 0.4360ms 2.2936 KOps/s 2.2859 KOps/s $\color{#35bf28}+0.34\%$
test_common_ops 3.9043ms 0.7677ms 1.3026 KOps/s 1.2495 KOps/s $\color{#35bf28}+4.25\%$
test_creation 17.0720μs 2.2874μs 437.1686 KOps/s 419.6640 KOps/s $\color{#35bf28}+4.17\%$
test_creation_empty 56.6560μs 10.8741μs 91.9618 KOps/s 81.4158 KOps/s $\textbf{\color{#35bf28}+12.95\%}$
test_creation_nested_1 37.5900μs 13.6705μs 73.1503 KOps/s 66.2487 KOps/s $\textbf{\color{#35bf28}+10.42\%}$
test_creation_nested_2 56.4450μs 17.4735μs 57.2294 KOps/s 51.8890 KOps/s $\textbf{\color{#35bf28}+10.29\%}$
test_clone 54.0610μs 12.9801μs 77.0412 KOps/s 75.4281 KOps/s $\color{#35bf28}+2.14\%$
test_getitem[int] 1.4342ms 11.6170μs 86.0810 KOps/s 87.0297 KOps/s $\color{#d91a1a}-1.09\%$
test_getitem[slice_int] 61.7860μs 23.0358μs 43.4106 KOps/s 41.9444 KOps/s $\color{#35bf28}+3.50\%$
test_getitem[range] 0.1707ms 44.6882μs 22.3772 KOps/s 21.8887 KOps/s $\color{#35bf28}+2.23\%$
test_getitem[tuple] 56.7660μs 19.3660μs 51.6368 KOps/s 50.6477 KOps/s $\color{#35bf28}+1.95\%$
test_getitem[list] 0.1385ms 39.8028μs 25.1239 KOps/s 25.0657 KOps/s $\color{#35bf28}+0.23\%$
test_setitem_dim[int] 67.0660μs 31.9243μs 31.3241 KOps/s 29.1948 KOps/s $\textbf{\color{#35bf28}+7.29\%}$
test_setitem_dim[slice_int] 0.1234ms 59.3989μs 16.8353 KOps/s 15.8791 KOps/s $\textbf{\color{#35bf28}+6.02\%}$
test_setitem_dim[range] 0.2539ms 80.5777μs 12.4104 KOps/s 12.1582 KOps/s $\color{#35bf28}+2.07\%$
test_setitem_dim[tuple] 0.1258ms 49.4420μs 20.2257 KOps/s 19.6839 KOps/s $\color{#35bf28}+2.75\%$
test_setitem 83.0760μs 19.8281μs 50.4335 KOps/s 47.9498 KOps/s $\textbf{\color{#35bf28}+5.18\%}$
test_set 62.2060μs 19.2778μs 51.8730 KOps/s 49.7980 KOps/s $\color{#35bf28}+4.17\%$
test_set_shared 2.1341ms 0.1690ms 5.9174 KOps/s 5.8637 KOps/s $\color{#35bf28}+0.92\%$
test_update 0.1254ms 21.6728μs 46.1407 KOps/s 42.2694 KOps/s $\textbf{\color{#35bf28}+9.16\%}$
test_update_nested 0.1137ms 31.4665μs 31.7798 KOps/s 30.5107 KOps/s $\color{#35bf28}+4.16\%$
test_update__nested 89.9680μs 25.1956μs 39.6894 KOps/s 38.9448 KOps/s $\color{#35bf28}+1.91\%$
test_set_nested 74.9610μs 20.9875μs 47.6475 KOps/s 45.3430 KOps/s $\textbf{\color{#35bf28}+5.08\%}$
test_set_nested_new 0.1017ms 25.4677μs 39.2655 KOps/s 37.5898 KOps/s $\color{#35bf28}+4.46\%$
test_select 1.1055ms 41.4450μs 24.1283 KOps/s 23.6854 KOps/s $\color{#35bf28}+1.87\%$
test_select_nested 0.1437ms 60.4240μs 16.5497 KOps/s 16.6680 KOps/s $\color{#d91a1a}-0.71\%$
test_exclude_nested 0.1493ms 80.8066μs 12.3752 KOps/s 12.4339 KOps/s $\color{#d91a1a}-0.47\%$
test_empty[True] 0.4730ms 0.3394ms 2.9461 KOps/s 2.9286 KOps/s $\color{#35bf28}+0.60\%$
test_empty[False] 11.4365μs 1.2579μs 794.9763 KOps/s 790.9410 KOps/s $\color{#35bf28}+0.51\%$
test_unbind_speed 0.4407ms 0.2761ms 3.6221 KOps/s 3.8192 KOps/s $\textbf{\color{#d91a1a}-5.16\%}$
test_unbind_speed_stack0 0.5426ms 0.2706ms 3.6950 KOps/s 3.9977 KOps/s $\textbf{\color{#d91a1a}-7.57\%}$
test_unbind_speed_stack1 77.9960ms 0.7482ms 1.3365 KOps/s 1.5017 KOps/s $\textbf{\color{#d91a1a}-11.00\%}$
test_split 74.3686ms 1.6216ms 616.6583 Ops/s 620.3471 Ops/s $\color{#d91a1a}-0.59\%$
test_chunk 76.5228ms 1.6338ms 612.0874 Ops/s 615.2821 Ops/s $\color{#d91a1a}-0.52\%$
test_creation[device0] 0.2470ms 94.3248μs 10.6017 KOps/s 10.7157 KOps/s $\color{#d91a1a}-1.06\%$
test_creation_from_tensor 6.2757ms 96.9229μs 10.3175 KOps/s 10.4235 KOps/s $\color{#d91a1a}-1.02\%$
test_add_one[memmap_tensor0] 0.1289ms 5.5247μs 181.0064 KOps/s 173.4741 KOps/s $\color{#35bf28}+4.34\%$
test_contiguous[memmap_tensor0] 9.9990μs 0.6442μs 1.5522 MOps/s 1.5025 MOps/s $\color{#35bf28}+3.31\%$
test_stack[memmap_tensor0] 41.7280μs 3.5895μs 278.5931 KOps/s 244.5993 KOps/s $\textbf{\color{#35bf28}+13.90\%}$
test_memmaptd_index 0.9536ms 0.2565ms 3.8990 KOps/s 3.8980 KOps/s $\color{#35bf28}+0.03\%$
test_memmaptd_index_astensor 0.7612ms 0.3355ms 2.9804 KOps/s 3.0270 KOps/s $\color{#d91a1a}-1.54\%$
test_memmaptd_index_op 0.9234ms 0.6232ms 1.6046 KOps/s 1.5832 KOps/s $\color{#35bf28}+1.36\%$
test_serialize_model 0.1309s 0.1267s 7.8943 Ops/s 7.2718 Ops/s $\textbf{\color{#35bf28}+8.56\%}$
test_serialize_model_pickle 0.4318s 0.3915s 2.5544 Ops/s 2.4981 Ops/s $\color{#35bf28}+2.26\%$
test_serialize_weights 0.2030s 0.1329s 7.5264 Ops/s 8.1777 Ops/s $\textbf{\color{#d91a1a}-7.96\%}$
test_serialize_weights_returnearly 0.1714s 0.1634s 6.1210 Ops/s 5.6416 Ops/s $\textbf{\color{#35bf28}+8.50\%}$
test_serialize_weights_pickle 0.4466s 0.4054s 2.4667 Ops/s 2.5558 Ops/s $\color{#d91a1a}-3.49\%$
test_serialize_weights_filesystem 0.2183s 0.1510s 6.6203 Ops/s 7.1232 Ops/s $\textbf{\color{#d91a1a}-7.06\%}$
test_serialize_model_filesystem 0.1588s 0.1541s 6.4912 Ops/s 6.5804 Ops/s $\color{#d91a1a}-1.36\%$
test_reshape_pytree 60.0720μs 25.9863μs 38.4818 KOps/s 37.9728 KOps/s $\color{#35bf28}+1.34\%$
test_reshape_td 70.9130μs 33.6123μs 29.7510 KOps/s 29.2814 KOps/s $\color{#35bf28}+1.60\%$
test_view_pytree 77.6040μs 25.8110μs 38.7432 KOps/s 39.7079 KOps/s $\color{#d91a1a}-2.43\%$
test_view_td 85.8410μs 38.8507μs 25.7396 KOps/s 25.6394 KOps/s $\color{#35bf28}+0.39\%$
test_unbind_pytree 87.2230μs 29.4482μs 33.9579 KOps/s 34.0649 KOps/s $\color{#d91a1a}-0.31\%$
test_unbind_td 0.3676ms 40.1298μs 24.9191 KOps/s 26.2163 KOps/s $\color{#d91a1a}-4.95\%$
test_split_pytree 62.0160μs 29.3035μs 34.1257 KOps/s 33.4812 KOps/s $\color{#35bf28}+1.92\%$
test_split_td 0.4763ms 40.7686μs 24.5287 KOps/s 25.0091 KOps/s $\color{#d91a1a}-1.92\%$
test_add_pytree 0.1053ms 34.7074μs 28.8123 KOps/s 28.4227 KOps/s $\color{#35bf28}+1.37\%$
test_add_td 0.1673ms 54.3890μs 18.3861 KOps/s 17.1877 KOps/s $\textbf{\color{#35bf28}+6.97\%}$
test_distributed 0.2571ms 0.1280ms 7.8099 KOps/s 7.6415 KOps/s $\color{#35bf28}+2.20\%$
test_tdmodule 35.0650μs 16.5195μs 60.5344 KOps/s 57.4175 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_tdmodule_dispatch 61.3950μs 34.4181μs 29.0544 KOps/s 27.3034 KOps/s $\textbf{\color{#35bf28}+6.41\%}$
test_tdseq 46.4470μs 18.5916μs 53.7877 KOps/s 52.5285 KOps/s $\color{#35bf28}+2.40\%$
test_tdseq_dispatch 62.5060μs 38.6941μs 25.8437 KOps/s 24.9152 KOps/s $\color{#35bf28}+3.73\%$
test_instantiation_functorch 1.5614ms 1.3143ms 760.8634 Ops/s 750.5978 Ops/s $\color{#35bf28}+1.37\%$
test_instantiation_td 1.9247ms 1.0179ms 982.3796 Ops/s 968.5522 Ops/s $\color{#35bf28}+1.43\%$
test_exec_functorch 0.2890ms 0.1727ms 5.7909 KOps/s 6.0288 KOps/s $\color{#d91a1a}-3.95\%$
test_exec_functional_call 0.2310ms 0.1499ms 6.6714 KOps/s 6.5148 KOps/s $\color{#35bf28}+2.40\%$
test_exec_td 0.2919ms 0.1467ms 6.8188 KOps/s 6.4777 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_exec_td_decorator 0.8096ms 0.2338ms 4.2776 KOps/s 4.2180 KOps/s $\color{#35bf28}+1.41\%$
test_vmap_mlp_speed[True-True] 0.7757ms 0.4918ms 2.0334 KOps/s 2.0146 KOps/s $\color{#35bf28}+0.93\%$
test_vmap_mlp_speed[True-False] 0.6492ms 0.4855ms 2.0598 KOps/s 1.9938 KOps/s $\color{#35bf28}+3.31\%$
test_vmap_mlp_speed[False-True] 0.7801ms 0.4012ms 2.4925 KOps/s 2.4950 KOps/s $\color{#d91a1a}-0.10\%$
test_vmap_mlp_speed[False-False] 0.6826ms 0.3992ms 2.5053 KOps/s 2.4824 KOps/s $\color{#35bf28}+0.92\%$
test_vmap_mlp_speed_decorator[True-True] 1.1723ms 0.5816ms 1.7195 KOps/s 1.7167 KOps/s $\color{#35bf28}+0.16\%$
test_vmap_mlp_speed_decorator[True-False] 0.9049ms 0.5810ms 1.7211 KOps/s 1.7187 KOps/s $\color{#35bf28}+0.14\%$
test_vmap_mlp_speed_decorator[False-True] 0.7470ms 0.4762ms 2.0999 KOps/s 2.1077 KOps/s $\color{#d91a1a}-0.37\%$
test_vmap_mlp_speed_decorator[False-False] 0.8129ms 0.4764ms 2.0993 KOps/s 2.1096 KOps/s $\color{#d91a1a}-0.49\%$
test_to_module_speed[True] 80.6200ms 1.9675ms 508.2655 Ops/s 504.9593 Ops/s $\color{#35bf28}+0.65\%$
test_to_module_speed[False] 2.3666ms 1.7604ms 568.0534 Ops/s 556.0623 Ops/s $\color{#35bf28}+2.16\%$
test_tc_init 71.3540μs 37.9127μs 26.3764 KOps/s 26.4086 KOps/s $\color{#d91a1a}-0.12\%$
test_tc_init_nested 0.1468ms 75.9635μs 13.1642 KOps/s 12.7990 KOps/s $\color{#35bf28}+2.85\%$
test_tc_first_layer_tensor 32.2600μs 8.1342μs 122.9370 KOps/s 121.4800 KOps/s $\color{#35bf28}+1.20\%$
test_tc_first_layer_nontensor 54.0310μs 8.0771μs 123.8069 KOps/s 121.7196 KOps/s $\color{#35bf28}+1.71\%$
test_tc_second_layer_tensor 26.3990μs 2.5010μs 399.8368 KOps/s 399.4364 KOps/s $\color{#35bf28}+0.10\%$
test_tc_second_layer_nontensor 34.3150μs 9.1460μs 109.3378 KOps/s 108.0159 KOps/s $\color{#35bf28}+1.22\%$

@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 61.3010μs 12.7271μs 78.5727 KOps/s 78.1830 KOps/s $\color{#35bf28}+0.50\%$
test_plain_set_stack_nested 30.3710μs 12.7426μs 78.4769 KOps/s 78.1624 KOps/s $\color{#35bf28}+0.40\%$
test_plain_set_nested_inplace 36.4400μs 13.8219μs 72.3489 KOps/s 72.6391 KOps/s $\color{#d91a1a}-0.40\%$
test_plain_set_stack_nested_inplace 49.3410μs 13.8116μs 72.4031 KOps/s 72.6925 KOps/s $\color{#d91a1a}-0.40\%$
test_items 15.5500μs 4.7439μs 210.7991 KOps/s 213.3090 KOps/s $\color{#d91a1a}-1.18\%$
test_items_nested 0.4199ms 0.3958ms 2.5263 KOps/s 2.5261 KOps/s $+0.01\%$
test_items_nested_locked 0.4184ms 0.3995ms 2.5029 KOps/s 2.5132 KOps/s $\color{#d91a1a}-0.41\%$
test_items_nested_leaf 0.1050ms 86.7907μs 11.5220 KOps/s 11.5487 KOps/s $\color{#d91a1a}-0.23\%$
test_items_stack_nested 0.4461ms 0.4003ms 2.4982 KOps/s 2.5186 KOps/s $\color{#d91a1a}-0.81\%$
test_items_stack_nested_leaf 0.1050ms 86.8122μs 11.5191 KOps/s 11.4498 KOps/s $\color{#35bf28}+0.61\%$
test_items_stack_nested_locked 0.4187ms 0.4000ms 2.4997 KOps/s 2.5356 KOps/s $\color{#d91a1a}-1.41\%$
test_keys 20.2800μs 4.3528μs 229.7371 KOps/s 228.0155 KOps/s $\color{#35bf28}+0.76\%$
test_keys_nested 0.1104ms 68.6897μs 14.5582 KOps/s 14.4448 KOps/s $\color{#35bf28}+0.79\%$
test_keys_nested_locked 2.5945ms 75.3306μs 13.2748 KOps/s 13.2610 KOps/s $\color{#35bf28}+0.10\%$
test_keys_nested_leaf 80.4220μs 59.3896μs 16.8380 KOps/s 17.1480 KOps/s $\color{#d91a1a}-1.81\%$
test_keys_stack_nested 87.5220μs 68.5327μs 14.5916 KOps/s 14.4974 KOps/s $\color{#35bf28}+0.65\%$
test_keys_stack_nested_leaf 0.1295ms 57.6814μs 17.3366 KOps/s 16.7417 KOps/s $\color{#35bf28}+3.55\%$
test_keys_stack_nested_locked 98.0520μs 74.2407μs 13.4697 KOps/s 13.3522 KOps/s $\color{#35bf28}+0.88\%$
test_values 8.2733μs 1.7784μs 562.3011 KOps/s 570.1860 KOps/s $\color{#d91a1a}-1.38\%$
test_values_nested 57.3410μs 34.6581μs 28.8533 KOps/s 28.6196 KOps/s $\color{#35bf28}+0.82\%$
test_values_nested_locked 58.0310μs 36.3510μs 27.5096 KOps/s 27.0421 KOps/s $\color{#35bf28}+1.73\%$
test_values_nested_leaf 50.6110μs 30.4774μs 32.8112 KOps/s 32.2663 KOps/s $\color{#35bf28}+1.69\%$
test_values_stack_nested 59.7110μs 35.6381μs 28.0598 KOps/s 27.7746 KOps/s $\color{#35bf28}+1.03\%$
test_values_stack_nested_leaf 69.9620μs 31.5410μs 31.7048 KOps/s 31.2768 KOps/s $\color{#35bf28}+1.37\%$
test_values_stack_nested_locked 63.6710μs 37.1850μs 26.8925 KOps/s 26.3561 KOps/s $\color{#35bf28}+2.04\%$
test_membership 3.5741μs 0.5406μs 1.8499 MOps/s 1.8608 MOps/s $\color{#d91a1a}-0.59\%$
test_membership_nested 29.1910μs 2.0935μs 477.6685 KOps/s 477.3930 KOps/s $\color{#35bf28}+0.06\%$
test_membership_nested_leaf 13.3850μs 2.0642μs 484.4403 KOps/s 492.2864 KOps/s $\color{#d91a1a}-1.59\%$
test_membership_stacked_nested 33.6200μs 2.0680μs 483.5498 KOps/s 469.5312 KOps/s $\color{#35bf28}+2.99\%$
test_membership_stacked_nested_leaf 21.2100μs 2.1020μs 475.7464 KOps/s 480.6952 KOps/s $\color{#d91a1a}-1.03\%$
test_membership_nested_last 22.9500μs 3.0141μs 331.7699 KOps/s 330.5309 KOps/s $\color{#35bf28}+0.37\%$
test_membership_nested_leaf_last 32.9810μs 2.9697μs 336.7311 KOps/s 330.3411 KOps/s $\color{#35bf28}+1.93\%$
test_membership_stacked_nested_last 20.0100μs 3.4084μs 293.3931 KOps/s 289.6682 KOps/s $\color{#35bf28}+1.29\%$
test_membership_stacked_nested_leaf_last 35.0600μs 3.4198μs 292.4158 KOps/s 285.4982 KOps/s $\color{#35bf28}+2.42\%$
test_nested_getleaf 24.0100μs 7.9655μs 125.5414 KOps/s 123.7768 KOps/s $\color{#35bf28}+1.43\%$
test_nested_get 31.6010μs 7.5209μs 132.9630 KOps/s 131.6561 KOps/s $\color{#35bf28}+0.99\%$
test_stacked_getleaf 35.0200μs 8.0052μs 124.9183 KOps/s 123.7503 KOps/s $\color{#35bf28}+0.94\%$
test_stacked_get 19.6510μs 7.5090μs 133.1734 KOps/s 131.6385 KOps/s $\color{#35bf28}+1.17\%$
test_nested_getitemleaf 19.5710μs 8.1808μs 122.2371 KOps/s 121.3716 KOps/s $\color{#35bf28}+0.71\%$
test_nested_getitem 71.2510μs 7.6814μs 130.1850 KOps/s 129.5107 KOps/s $\color{#35bf28}+0.52\%$
test_stacked_getitemleaf 28.2700μs 8.1738μs 122.3421 KOps/s 121.8509 KOps/s $\color{#35bf28}+0.40\%$
test_stacked_getitem 35.4210μs 7.7067μs 129.7579 KOps/s 129.6215 KOps/s $\color{#35bf28}+0.11\%$
test_lock_nested 7.1708ms 0.4300ms 2.3255 KOps/s 2.3382 KOps/s $\color{#d91a1a}-0.54\%$
test_lock_stack_nested 0.4511ms 0.3882ms 2.5759 KOps/s 2.5563 KOps/s $\color{#35bf28}+0.76\%$
test_unlock_nested 89.0052ms 0.4305ms 2.3229 KOps/s 2.8917 KOps/s $\textbf{\color{#d91a1a}-19.67\%}$
test_unlock_stack_nested 0.3342ms 0.3083ms 3.2439 KOps/s 3.2365 KOps/s $\color{#35bf28}+0.23\%$
test_flatten_speed 0.4132ms 0.1072ms 9.3315 KOps/s 9.2724 KOps/s $\color{#35bf28}+0.64\%$
test_unflatten_speed 0.3483ms 0.2980ms 3.3556 KOps/s 3.3946 KOps/s $\color{#d91a1a}-1.15\%$
test_common_ops 1.0069ms 0.5944ms 1.6824 KOps/s 1.4615 KOps/s $\textbf{\color{#35bf28}+15.11\%}$
test_creation 33.9010μs 1.8636μs 536.6002 KOps/s 538.6355 KOps/s $\color{#d91a1a}-0.38\%$
test_creation_empty 24.1200μs 9.2822μs 107.7334 KOps/s 114.0508 KOps/s $\textbf{\color{#d91a1a}-5.54\%}$
test_creation_nested_1 28.3600μs 11.3933μs 87.7707 KOps/s 93.8001 KOps/s $\textbf{\color{#d91a1a}-6.43\%}$
test_creation_nested_2 39.5110μs 13.7165μs 72.9051 KOps/s 75.9646 KOps/s $\color{#d91a1a}-4.03\%$
test_clone 81.2210μs 10.9800μs 91.0744 KOps/s 84.3014 KOps/s $\textbf{\color{#35bf28}+8.03\%}$
test_getitem[int] 24.5000μs 10.0140μs 99.8603 KOps/s 93.4186 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_getitem[slice_int] 37.4700μs 19.4824μs 51.3283 KOps/s 47.2723 KOps/s $\textbf{\color{#35bf28}+8.58\%}$
test_getitem[range] 0.1838ms 36.7314μs 27.2247 KOps/s 26.1211 KOps/s $\color{#35bf28}+4.22\%$
test_getitem[tuple] 39.1710μs 17.2961μs 57.8164 KOps/s 54.4397 KOps/s $\textbf{\color{#35bf28}+6.20\%}$
test_getitem[list] 0.1626ms 32.0948μs 31.1577 KOps/s 29.5281 KOps/s $\textbf{\color{#35bf28}+5.52\%}$
test_setitem_dim[int] 41.3310μs 25.6785μs 38.9430 KOps/s 38.3443 KOps/s $\color{#35bf28}+1.56\%$
test_setitem_dim[slice_int] 64.5010μs 46.6600μs 21.4317 KOps/s 21.3729 KOps/s $\color{#35bf28}+0.28\%$
test_setitem_dim[range] 85.1220μs 63.1304μs 15.8402 KOps/s 15.7883 KOps/s $\color{#35bf28}+0.33\%$
test_setitem_dim[tuple] 57.4710μs 40.3032μs 24.8120 KOps/s 24.8793 KOps/s $\color{#d91a1a}-0.27\%$
test_setitem 83.8420μs 16.0054μs 62.4789 KOps/s 58.3748 KOps/s $\textbf{\color{#35bf28}+7.03\%}$
test_set 62.6910μs 15.4501μs 64.7247 KOps/s 60.4534 KOps/s $\textbf{\color{#35bf28}+7.07\%}$
test_set_shared 2.7345ms 96.2525μs 10.3893 KOps/s 9.9768 KOps/s $\color{#35bf28}+4.13\%$
test_update 0.1004ms 18.7464μs 53.3436 KOps/s 52.1691 KOps/s $\color{#35bf28}+2.25\%$
test_update_nested 84.5920μs 24.2282μs 41.2743 KOps/s 40.0837 KOps/s $\color{#35bf28}+2.97\%$
test_update__nested 83.5710μs 21.7241μs 46.0318 KOps/s 42.8795 KOps/s $\textbf{\color{#35bf28}+7.35\%}$
test_set_nested 79.9520μs 16.4909μs 60.6395 KOps/s 55.4827 KOps/s $\textbf{\color{#35bf28}+9.29\%}$
test_set_nested_new 65.7210μs 19.4917μs 51.3038 KOps/s 48.3337 KOps/s $\textbf{\color{#35bf28}+6.15\%}$
test_select 78.7620μs 31.5020μs 31.7440 KOps/s 29.8713 KOps/s $\textbf{\color{#35bf28}+6.27\%}$
test_select_nested 1.0682ms 52.3718μs 19.0942 KOps/s 18.6780 KOps/s $\color{#35bf28}+2.23\%$
test_exclude_nested 0.1052ms 72.5911μs 13.7758 KOps/s 13.6159 KOps/s $\color{#35bf28}+1.17\%$
test_empty[True] 0.3375ms 0.3031ms 3.2989 KOps/s 3.3907 KOps/s $\color{#d91a1a}-2.71\%$
test_empty[False] 3.0761μs 0.9105μs 1.0983 MOps/s 1.0865 MOps/s $\color{#35bf28}+1.08\%$
test_to 87.3710μs 58.0870μs 17.2156 KOps/s 16.1679 KOps/s $\textbf{\color{#35bf28}+6.48\%}$
test_to_nonblocking 63.2910μs 34.6336μs 28.8737 KOps/s 27.6241 KOps/s $\color{#35bf28}+4.52\%$
test_unbind_speed 0.2997ms 0.2647ms 3.7779 KOps/s 3.7823 KOps/s $\color{#d91a1a}-0.12\%$
test_unbind_speed_stack0 0.2991ms 0.2645ms 3.7809 KOps/s 3.8372 KOps/s $\color{#d91a1a}-1.46\%$
test_unbind_speed_stack1 92.4756ms 0.7868ms 1.2709 KOps/s 1.3986 KOps/s $\textbf{\color{#d91a1a}-9.13\%}$
test_split 89.8904ms 1.5876ms 629.9008 Ops/s 609.7816 Ops/s $\color{#35bf28}+3.30\%$
test_chunk 1.4773ms 1.4386ms 695.1167 Ops/s 668.4066 Ops/s $\color{#35bf28}+4.00\%$
test_creation[device0] 0.1284ms 55.0862μs 18.1534 KOps/s 17.5925 KOps/s $\color{#35bf28}+3.19\%$
test_creation_from_tensor 0.1417ms 52.3676μs 19.0958 KOps/s 18.5618 KOps/s $\color{#35bf28}+2.88\%$
test_add_one[memmap_tensor0] 91.7710μs 6.9745μs 143.3793 KOps/s 132.2033 KOps/s $\textbf{\color{#35bf28}+8.45\%}$
test_contiguous[memmap_tensor0] 26.5300μs 0.5960μs 1.6780 MOps/s 1.7077 MOps/s $\color{#d91a1a}-1.74\%$
test_stack[memmap_tensor0] 36.6710μs 4.4206μs 226.2138 KOps/s 202.0893 KOps/s $\textbf{\color{#35bf28}+11.94\%}$
test_memmaptd_index 1.1161ms 0.2586ms 3.8676 KOps/s 3.6425 KOps/s $\textbf{\color{#35bf28}+6.18\%}$
test_memmaptd_index_astensor 0.5856ms 0.3211ms 3.1139 KOps/s 2.9399 KOps/s $\textbf{\color{#35bf28}+5.92\%}$
test_memmaptd_index_op 0.9139ms 0.6147ms 1.6268 KOps/s 1.5188 KOps/s $\textbf{\color{#35bf28}+7.11\%}$
test_serialize_model 95.3120ms 91.4266ms 10.9377 Ops/s 10.4348 Ops/s $\color{#35bf28}+4.82\%$
test_serialize_model_pickle 1.3623s 1.2387s 0.8073 Ops/s 0.8085 Ops/s $\color{#d91a1a}-0.16\%$
test_serialize_weights 92.5481ms 88.6782ms 11.2767 Ops/s 10.7017 Ops/s $\textbf{\color{#35bf28}+5.37\%}$
test_serialize_weights_returnearly 0.1918s 73.8125ms 13.5478 Ops/s 13.5108 Ops/s $\color{#35bf28}+0.27\%$
test_serialize_weights_pickle 1.3514s 1.2480s 0.8013 Ops/s 0.8009 Ops/s $\color{#35bf28}+0.05\%$
test_reshape_pytree 52.7710μs 25.3705μs 39.4159 KOps/s 38.4223 KOps/s $\color{#35bf28}+2.59\%$
test_reshape_td 87.6810μs 29.7972μs 33.5603 KOps/s 32.0842 KOps/s $\color{#35bf28}+4.60\%$
test_view_pytree 0.1243ms 24.9976μs 40.0039 KOps/s 38.9885 KOps/s $\color{#35bf28}+2.60\%$
test_view_td 0.1813ms 36.5021μs 27.3957 KOps/s 27.1818 KOps/s $\color{#35bf28}+0.79\%$
test_unbind_pytree 46.9910μs 30.4263μs 32.8663 KOps/s 31.9954 KOps/s $\color{#35bf28}+2.72\%$
test_unbind_td 0.5033ms 39.8765μs 25.0774 KOps/s 24.8415 KOps/s $\color{#35bf28}+0.95\%$
test_split_pytree 49.7410μs 33.0925μs 30.2183 KOps/s 28.9775 KOps/s $\color{#35bf28}+4.28\%$
test_split_td 0.1029ms 36.6722μs 27.2686 KOps/s 25.9127 KOps/s $\textbf{\color{#35bf28}+5.23\%}$
test_add_pytree 60.7210μs 36.7590μs 27.2042 KOps/s 25.1518 KOps/s $\textbf{\color{#35bf28}+8.16\%}$
test_add_td 78.1820μs 52.1021μs 19.1931 KOps/s 18.9588 KOps/s $\color{#35bf28}+1.24\%$
test_distributed 0.2236ms 69.2519μs 14.4400 KOps/s 14.1018 KOps/s $\color{#35bf28}+2.40\%$
test_tdmodule 39.2610μs 13.9599μs 71.6339 KOps/s 71.2649 KOps/s $\color{#35bf28}+0.52\%$
test_tdmodule_dispatch 45.6210μs 28.8091μs 34.7113 KOps/s 35.2358 KOps/s $\color{#d91a1a}-1.49\%$
test_tdseq 30.5500μs 15.2767μs 65.4593 KOps/s 66.6815 KOps/s $\color{#d91a1a}-1.83\%$
test_tdseq_dispatch 51.2210μs 31.4194μs 31.8275 KOps/s 32.4391 KOps/s $\color{#d91a1a}-1.89\%$
test_instantiation_functorch 1.4492ms 1.3675ms 731.2542 Ops/s 716.8012 Ops/s $\color{#35bf28}+2.02\%$
test_instantiation_td 92.4157ms 1.0822ms 924.0252 Ops/s 903.4073 Ops/s $\color{#35bf28}+2.28\%$
test_exec_functorch 0.3331ms 0.1484ms 6.7391 KOps/s 6.6578 KOps/s $\color{#35bf28}+1.22\%$
test_exec_functional_call 0.1736ms 0.1351ms 7.4041 KOps/s 7.2421 KOps/s $\color{#35bf28}+2.24\%$
test_exec_td 0.1706ms 0.1328ms 7.5288 KOps/s 7.3047 KOps/s $\color{#35bf28}+3.07\%$
test_exec_td_decorator 0.7928ms 0.2059ms 4.8573 KOps/s 4.8266 KOps/s $\color{#35bf28}+0.64\%$
test_vmap_mlp_speed[True-True] 0.7819ms 0.5758ms 1.7367 KOps/s 1.7273 KOps/s $\color{#35bf28}+0.55\%$
test_vmap_mlp_speed[True-False] 0.6284ms 0.5754ms 1.7379 KOps/s 1.7344 KOps/s $\color{#35bf28}+0.20\%$
test_vmap_mlp_speed[False-True] 0.5540ms 0.5074ms 1.9707 KOps/s 1.9519 KOps/s $\color{#35bf28}+0.96\%$
test_vmap_mlp_speed[False-False] 0.5701ms 0.5090ms 1.9645 KOps/s 1.9532 KOps/s $\color{#35bf28}+0.58\%$
test_vmap_mlp_speed_decorator[True-True] 1.1245ms 0.6513ms 1.5354 KOps/s 1.5377 KOps/s $\color{#d91a1a}-0.15\%$
test_vmap_mlp_speed_decorator[True-False] 0.7982ms 0.6492ms 1.5402 KOps/s 1.5363 KOps/s $\color{#35bf28}+0.26\%$
test_vmap_mlp_speed_decorator[False-True] 0.7302ms 0.5773ms 1.7321 KOps/s 1.7170 KOps/s $\color{#35bf28}+0.88\%$
test_vmap_mlp_speed_decorator[False-False] 0.7017ms 0.5671ms 1.7633 KOps/s 1.7363 KOps/s $\color{#35bf28}+1.55\%$
test_vmap_transformer_speed[True-True] 8.3295ms 7.9524ms 125.7480 Ops/s 128.4583 Ops/s $\color{#d91a1a}-2.11\%$
test_vmap_transformer_speed[True-False] 8.1217ms 7.8071ms 128.0886 Ops/s 128.4898 Ops/s $\color{#d91a1a}-0.31\%$
test_vmap_transformer_speed[False-True] 8.3509ms 7.7586ms 128.8897 Ops/s 129.8980 Ops/s $\color{#d91a1a}-0.78\%$
test_vmap_transformer_speed[False-False] 8.1513ms 7.7487ms 129.0541 Ops/s 130.0662 Ops/s $\color{#d91a1a}-0.78\%$
test_vmap_transformer_speed_decorator[True-True] 19.6389ms 19.2826ms 51.8602 Ops/s 52.2955 Ops/s $\color{#d91a1a}-0.83\%$
test_vmap_transformer_speed_decorator[True-False] 19.6471ms 19.3008ms 51.8114 Ops/s 52.0380 Ops/s $\color{#d91a1a}-0.44\%$
test_vmap_transformer_speed_decorator[False-True] 20.2323ms 19.1647ms 52.1792 Ops/s 52.3089 Ops/s $\color{#d91a1a}-0.25\%$
test_vmap_transformer_speed_decorator[False-False] 19.4441ms 19.1563ms 52.2021 Ops/s 52.4109 Ops/s $\color{#d91a1a}-0.40\%$
test_to_module_speed[True] 2.0821ms 1.5277ms 654.5918 Ops/s 667.5826 Ops/s $\color{#d91a1a}-1.95\%$
test_to_module_speed[False] 1.7309ms 1.5008ms 666.2897 Ops/s 672.4107 Ops/s $\color{#d91a1a}-0.91\%$
test_tc_init 0.1599ms 34.5621μs 28.9334 KOps/s 30.5728 KOps/s $\textbf{\color{#d91a1a}-5.36\%}$
test_tc_init_nested 0.2009ms 70.1830μs 14.2485 KOps/s 14.7881 KOps/s $\color{#d91a1a}-3.65\%$
test_tc_first_layer_tensor 0.1242ms 3.5739μs 279.8089 KOps/s 284.1301 KOps/s $\color{#d91a1a}-1.52\%$
test_tc_first_layer_nontensor 0.1172ms 3.6005μs 277.7412 KOps/s 282.5440 KOps/s $\color{#d91a1a}-1.70\%$
test_tc_second_layer_tensor 25.8184μs 1.1332μs 882.4828 KOps/s 904.7765 KOps/s $\color{#d91a1a}-2.46\%$
test_tc_second_layer_nontensor 0.1267ms 4.0975μs 244.0525 KOps/s 247.2206 KOps/s $\color{#d91a1a}-1.28\%$

@vmoens vmoens merged commit dd3b4e5 into main Jul 15, 2024
@vmoens vmoens deleted the zip-strict branch July 15, 2024 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants