Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented May 30, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 30, 2024
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}19$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 46.1170μs 16.8163μs 59.4660 KOps/s 55.7567 KOps/s $\textbf{\color{#35bf28}+6.65\%}$
test_plain_set_stack_nested 43.3010μs 17.0365μs 58.6975 KOps/s 54.5938 KOps/s $\textbf{\color{#35bf28}+7.52\%}$
test_plain_set_nested_inplace 64.9220μs 18.9374μs 52.8054 KOps/s 48.7063 KOps/s $\textbf{\color{#35bf28}+8.42\%}$
test_plain_set_stack_nested_inplace 0.1009ms 18.9801μs 52.6866 KOps/s 49.3086 KOps/s $\textbf{\color{#35bf28}+6.85\%}$
test_items 40.8950μs 2.5457μs 392.8233 KOps/s 396.3067 KOps/s $\color{#d91a1a}-0.88\%$
test_items_nested 0.4451ms 0.2664ms 3.7539 KOps/s 3.7796 KOps/s $\color{#d91a1a}-0.68\%$
test_items_nested_locked 0.3474ms 0.2654ms 3.7676 KOps/s 3.7431 KOps/s $\color{#35bf28}+0.65\%$
test_items_nested_leaf 0.1436ms 77.8825μs 12.8398 KOps/s 12.6562 KOps/s $\color{#35bf28}+1.45\%$
test_items_stack_nested 0.4878ms 0.2725ms 3.6695 KOps/s 3.7258 KOps/s $\color{#d91a1a}-1.51\%$
test_items_stack_nested_leaf 0.3353ms 80.6020μs 12.4066 KOps/s 12.8087 KOps/s $\color{#d91a1a}-3.14\%$
test_items_stack_nested_locked 1.3053ms 0.2810ms 3.5585 KOps/s 3.7424 KOps/s $\color{#d91a1a}-4.91\%$
test_keys 27.1000μs 4.1050μs 243.6067 KOps/s 243.8726 KOps/s $\color{#d91a1a}-0.11\%$
test_keys_nested 0.5169ms 0.1404ms 7.1219 KOps/s 6.8705 KOps/s $\color{#35bf28}+3.66\%$
test_keys_nested_locked 0.8082ms 0.1412ms 7.0828 KOps/s 6.6695 KOps/s $\textbf{\color{#35bf28}+6.20\%}$
test_keys_nested_leaf 0.1787ms 0.1165ms 8.5823 KOps/s 8.0652 KOps/s $\textbf{\color{#35bf28}+6.41\%}$
test_keys_stack_nested 0.4372ms 0.1388ms 7.2035 KOps/s 7.0858 KOps/s $\color{#35bf28}+1.66\%$
test_keys_stack_nested_leaf 0.1763ms 0.1144ms 8.7394 KOps/s 8.2760 KOps/s $\textbf{\color{#35bf28}+5.60\%}$
test_keys_stack_nested_locked 0.2018ms 0.1413ms 7.0762 KOps/s 6.8357 KOps/s $\color{#35bf28}+3.52\%$
test_values 16.4770μs 1.1699μs 854.7792 KOps/s 863.2834 KOps/s $\color{#d91a1a}-0.99\%$
test_values_nested 0.1044ms 51.3091μs 19.4897 KOps/s 18.9217 KOps/s $\color{#35bf28}+3.00\%$
test_values_nested_locked 84.1980μs 51.6691μs 19.3539 KOps/s 19.1272 KOps/s $\color{#35bf28}+1.19\%$
test_values_nested_leaf 85.3500μs 46.1897μs 21.6499 KOps/s 20.9453 KOps/s $\color{#35bf28}+3.36\%$
test_values_stack_nested 0.1041ms 52.3694μs 19.0951 KOps/s 18.4661 KOps/s $\color{#35bf28}+3.41\%$
test_values_stack_nested_leaf 80.4210μs 46.8022μs 21.3665 KOps/s 21.5702 KOps/s $\color{#d91a1a}-0.94\%$
test_values_stack_nested_locked 0.1135ms 52.4091μs 19.0807 KOps/s 18.7083 KOps/s $\color{#35bf28}+1.99\%$
test_membership 28.3840μs 1.3340μs 749.5989 KOps/s 754.6484 KOps/s $\color{#d91a1a}-0.67\%$
test_membership_nested 27.7620μs 3.4440μs 290.3581 KOps/s 296.0118 KOps/s $\color{#d91a1a}-1.91\%$
test_membership_nested_leaf 30.3570μs 3.4722μs 288.0016 KOps/s 290.7865 KOps/s $\color{#d91a1a}-0.96\%$
test_membership_stacked_nested 21.5000μs 3.4800μs 287.3586 KOps/s 295.8880 KOps/s $\color{#d91a1a}-2.88\%$
test_membership_stacked_nested_leaf 33.3830μs 3.4503μs 289.8262 KOps/s 289.0630 KOps/s $\color{#35bf28}+0.26\%$
test_membership_nested_last 27.6810μs 4.2421μs 235.7325 KOps/s 236.6408 KOps/s $\color{#d91a1a}-0.38\%$
test_membership_nested_leaf_last 70.3130μs 4.2605μs 234.7158 KOps/s 236.7526 KOps/s $\color{#d91a1a}-0.86\%$
test_membership_stacked_nested_last 37.8810μs 5.3318μs 187.5546 KOps/s 75.4616 KOps/s $\textbf{\color{#35bf28}+148.54\%}$
test_membership_stacked_nested_leaf_last 36.4580μs 5.3560μs 186.7051 KOps/s 74.5421 KOps/s $\textbf{\color{#35bf28}+150.47\%}$
test_nested_getleaf 40.1360μs 10.4571μs 95.6284 KOps/s 95.2274 KOps/s $\color{#35bf28}+0.42\%$
test_nested_get 33.9030μs 9.9641μs 100.3603 KOps/s 101.1975 KOps/s $\color{#d91a1a}-0.83\%$
test_stacked_getleaf 40.4760μs 10.3492μs 96.6254 KOps/s 95.4343 KOps/s $\color{#35bf28}+1.25\%$
test_stacked_get 32.5110μs 9.8850μs 101.1637 KOps/s 101.3714 KOps/s $\color{#d91a1a}-0.20\%$
test_nested_getitemleaf 41.8180μs 10.9850μs 91.0330 KOps/s 89.3086 KOps/s $\color{#35bf28}+1.93\%$
test_nested_getitem 43.4120μs 10.2074μs 97.9678 KOps/s 97.6038 KOps/s $\color{#35bf28}+0.37\%$
test_stacked_getitemleaf 38.4820μs 10.8708μs 91.9894 KOps/s 90.5099 KOps/s $\color{#35bf28}+1.63\%$
test_stacked_getitem 53.3510μs 10.0870μs 99.1371 KOps/s 97.7032 KOps/s $\color{#35bf28}+1.47\%$
test_lock_nested 0.7791ms 0.3563ms 2.8063 KOps/s 2.7993 KOps/s $\color{#35bf28}+0.25\%$
test_lock_stack_nested 0.7011ms 0.3183ms 3.1413 KOps/s 3.2590 KOps/s $\color{#d91a1a}-3.61\%$
test_unlock_nested 0.7912ms 0.3572ms 2.7998 KOps/s 2.4004 KOps/s $\textbf{\color{#35bf28}+16.64\%}$
test_unlock_stack_nested 0.4902ms 0.3226ms 3.1001 KOps/s 3.1653 KOps/s $\color{#d91a1a}-2.06\%$
test_flatten_speed 0.5252ms 97.0129μs 10.3079 KOps/s 10.2332 KOps/s $\color{#35bf28}+0.73\%$
test_unflatten_speed 0.7200ms 0.4189ms 2.3875 KOps/s 2.3984 KOps/s $\color{#d91a1a}-0.46\%$
test_common_ops 4.3908ms 0.7091ms 1.4103 KOps/s 1.3473 KOps/s $\color{#35bf28}+4.68\%$
test_creation 21.5800μs 1.9413μs 515.1163 KOps/s 528.3651 KOps/s $\color{#d91a1a}-2.51\%$
test_creation_empty 48.8320μs 9.6776μs 103.3313 KOps/s 92.9611 KOps/s $\textbf{\color{#35bf28}+11.16\%}$
test_creation_nested_1 44.6740μs 12.5000μs 79.9998 KOps/s 73.3596 KOps/s $\textbf{\color{#35bf28}+9.05\%}$
test_creation_nested_2 55.2140μs 15.7241μs 63.5967 KOps/s 58.6803 KOps/s $\textbf{\color{#35bf28}+8.38\%}$
test_clone 0.1186ms 13.7456μs 72.7508 KOps/s 71.6661 KOps/s $\color{#35bf28}+1.51\%$
test_getitem[int] 50.8560μs 11.1909μs 89.3585 KOps/s 85.9520 KOps/s $\color{#35bf28}+3.96\%$
test_getitem[slice_int] 65.0220μs 23.2996μs 42.9191 KOps/s 44.6408 KOps/s $\color{#d91a1a}-3.86\%$
test_getitem[range] 86.2820μs 60.3292μs 16.5757 KOps/s 16.8837 KOps/s $\color{#d91a1a}-1.82\%$
test_getitem[tuple] 57.8180μs 19.0836μs 52.4009 KOps/s 52.8256 KOps/s $\color{#d91a1a}-0.80\%$
test_getitem[list] 96.4010μs 41.4479μs 24.1267 KOps/s 24.9085 KOps/s $\color{#d91a1a}-3.14\%$
test_setitem_dim[int] 74.0580μs 34.6986μs 28.8196 KOps/s 28.6390 KOps/s $\color{#35bf28}+0.63\%$
test_setitem_dim[slice_int] 0.1350ms 62.5727μs 15.9814 KOps/s 16.2399 KOps/s $\color{#d91a1a}-1.59\%$
test_setitem_dim[range] 0.1364ms 82.1763μs 12.1690 KOps/s 12.2253 KOps/s $\color{#d91a1a}-0.46\%$
test_setitem_dim[tuple] 0.1044ms 49.6317μs 20.1484 KOps/s 19.9095 KOps/s $\color{#35bf28}+1.20\%$
test_setitem 60.9440μs 20.4884μs 48.8080 KOps/s 46.4717 KOps/s $\textbf{\color{#35bf28}+5.03\%}$
test_set 0.1143ms 20.1644μs 49.5923 KOps/s 47.7946 KOps/s $\color{#35bf28}+3.76\%$
test_set_shared 1.2586ms 0.1451ms 6.8900 KOps/s 7.0517 KOps/s $\color{#d91a1a}-2.29\%$
test_update 82.4350μs 21.8902μs 45.6824 KOps/s 43.2767 KOps/s $\textbf{\color{#35bf28}+5.56\%}$
test_update_nested 0.1713ms 30.2742μs 33.0314 KOps/s 30.8368 KOps/s $\textbf{\color{#35bf28}+7.12\%}$
test_update__nested 89.5880μs 25.9958μs 38.4677 KOps/s 37.2922 KOps/s $\color{#35bf28}+3.15\%$
test_set_nested 79.1280μs 21.9489μs 45.5604 KOps/s 43.5068 KOps/s $\color{#35bf28}+4.72\%$
test_set_nested_new 89.2280μs 26.7841μs 37.3356 KOps/s 36.3317 KOps/s $\color{#35bf28}+2.76\%$
test_select 0.1339ms 42.3567μs 23.6090 KOps/s 23.5236 KOps/s $\color{#35bf28}+0.36\%$
test_select_nested 0.1534ms 61.5175μs 16.2555 KOps/s 16.2938 KOps/s $\color{#d91a1a}-0.23\%$
test_exclude_nested 0.2367ms 0.1218ms 8.2130 KOps/s 8.1461 KOps/s $\color{#35bf28}+0.82\%$
test_empty[True] 0.6570ms 0.3960ms 2.5252 KOps/s 2.4869 KOps/s $\color{#35bf28}+1.54\%$
test_empty[False] 8.4107μs 1.2359μs 809.1560 KOps/s 849.5069 KOps/s $\color{#d91a1a}-4.75\%$
test_unbind_speed 1.8794ms 0.2599ms 3.8472 KOps/s 3.6692 KOps/s $\color{#35bf28}+4.85\%$
test_unbind_speed_stack0 0.5103ms 0.2572ms 3.8877 KOps/s 3.9980 KOps/s $\color{#d91a1a}-2.76\%$
test_unbind_speed_stack1 72.2786ms 0.7520ms 1.3298 KOps/s 1.3152 KOps/s $\color{#35bf28}+1.11\%$
test_split 71.1650ms 1.6387ms 610.2300 Ops/s 614.3075 Ops/s $\color{#d91a1a}-0.66\%$
test_chunk 70.8358ms 1.6305ms 613.3101 Ops/s 612.7907 Ops/s $\color{#35bf28}+0.08\%$
test_creation[device0] 0.2610ms 84.7416μs 11.8006 KOps/s 11.8096 KOps/s $\color{#d91a1a}-0.08\%$
test_creation_from_tensor 4.7526ms 87.2562μs 11.4605 KOps/s 11.6259 KOps/s $\color{#d91a1a}-1.42\%$
test_add_one[memmap_tensor0] 0.1116ms 5.2736μs 189.6254 KOps/s 182.8778 KOps/s $\color{#35bf28}+3.69\%$
test_contiguous[memmap_tensor0] 12.4130μs 0.6406μs 1.5611 MOps/s 1.5617 MOps/s $\color{#d91a1a}-0.04\%$
test_stack[memmap_tensor0] 25.8280μs 3.6658μs 272.7910 KOps/s 273.5962 KOps/s $\color{#d91a1a}-0.29\%$
test_memmaptd_index 1.0060ms 0.2585ms 3.8687 KOps/s 3.8579 KOps/s $\color{#35bf28}+0.28\%$
test_memmaptd_index_astensor 0.7137ms 0.3321ms 3.0114 KOps/s 2.9590 KOps/s $\color{#35bf28}+1.77\%$
test_memmaptd_index_op 0.9313ms 0.6159ms 1.6237 KOps/s 1.5770 KOps/s $\color{#35bf28}+2.96\%$
test_serialize_model 0.1791s 0.1163s 8.6015 Ops/s 8.5117 Ops/s $\color{#35bf28}+1.05\%$
test_serialize_model_pickle 0.4494s 0.3771s 2.6518 Ops/s 2.6100 Ops/s $\color{#35bf28}+1.60\%$
test_serialize_weights 0.1829s 0.1149s 8.7043 Ops/s 8.6067 Ops/s $\color{#35bf28}+1.13\%$
test_serialize_weights_returnearly 0.1919s 0.1341s 7.4551 Ops/s 7.2127 Ops/s $\color{#35bf28}+3.36\%$
test_serialize_weights_pickle 0.7993s 0.4933s 2.0273 Ops/s 1.6035 Ops/s $\textbf{\color{#35bf28}+26.43\%}$
test_serialize_weights_filesystem 0.1004s 93.4372ms 10.7024 Ops/s 10.7379 Ops/s $\color{#d91a1a}-0.33\%$
test_serialize_model_filesystem 0.1775s 0.1012s 9.8774 Ops/s 9.8039 Ops/s $\color{#35bf28}+0.75\%$
test_reshape_pytree 0.1076ms 25.9767μs 38.4960 KOps/s 36.9579 KOps/s $\color{#35bf28}+4.16\%$
test_reshape_td 88.4360μs 35.1649μs 28.4374 KOps/s 28.0082 KOps/s $\color{#35bf28}+1.53\%$
test_view_pytree 70.1820μs 25.7408μs 38.8488 KOps/s 38.1775 KOps/s $\color{#35bf28}+1.76\%$
test_view_td 0.1108ms 39.5082μs 25.3112 KOps/s 25.0862 KOps/s $\color{#35bf28}+0.90\%$
test_unbind_pytree 0.1021ms 30.0606μs 33.2661 KOps/s 33.3719 KOps/s $\color{#d91a1a}-0.32\%$
test_unbind_td 0.4942ms 38.6458μs 25.8761 KOps/s 25.8685 KOps/s $\color{#35bf28}+0.03\%$
test_split_pytree 88.2360μs 29.5015μs 33.8966 KOps/s 33.2970 KOps/s $\color{#35bf28}+1.80\%$
test_split_td 0.1411ms 41.4842μs 24.1056 KOps/s 24.3013 KOps/s $\color{#d91a1a}-0.81\%$
test_add_pytree 86.9030μs 34.9936μs 28.5767 KOps/s 27.7488 KOps/s $\color{#35bf28}+2.98\%$
test_add_td 0.1298ms 56.0357μs 17.8458 KOps/s 17.5268 KOps/s $\color{#35bf28}+1.82\%$
test_distributed 0.3213ms 0.1059ms 9.4466 KOps/s 9.5721 KOps/s $\color{#d91a1a}-1.31\%$
test_tdmodule 38.1220μs 17.6434μs 56.6783 KOps/s 55.2915 KOps/s $\color{#35bf28}+2.51\%$
test_tdmodule_dispatch 60.2330μs 35.2326μs 28.3828 KOps/s 27.8911 KOps/s $\color{#35bf28}+1.76\%$
test_tdseq 41.6180μs 20.5472μs 48.6684 KOps/s 47.7595 KOps/s $\color{#35bf28}+1.90\%$
test_tdseq_dispatch 71.5840μs 40.5709μs 24.6482 KOps/s 24.5139 KOps/s $\color{#35bf28}+0.55\%$
test_instantiation_functorch 1.6889ms 1.3154ms 760.2042 Ops/s 721.5881 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_instantiation_td 1.9216ms 1.0366ms 964.6797 Ops/s 946.5226 Ops/s $\color{#35bf28}+1.92\%$
test_exec_functorch 0.3720ms 0.1629ms 6.1375 KOps/s 6.0429 KOps/s $\color{#35bf28}+1.56\%$
test_exec_functional_call 0.2868ms 0.1568ms 6.3795 KOps/s 6.5469 KOps/s $\color{#d91a1a}-2.56\%$
test_exec_td 0.2909ms 0.1541ms 6.4897 KOps/s 6.6731 KOps/s $\color{#d91a1a}-2.75\%$
test_exec_td_decorator 0.9273ms 0.2358ms 4.2411 KOps/s 4.3689 KOps/s $\color{#d91a1a}-2.92\%$
test_vmap_mlp_speed[True-True] 0.7614ms 0.4949ms 2.0206 KOps/s 2.0568 KOps/s $\color{#d91a1a}-1.76\%$
test_vmap_mlp_speed[True-False] 2.3477ms 0.4984ms 2.0064 KOps/s 2.0678 KOps/s $\color{#d91a1a}-2.97\%$
test_vmap_mlp_speed[False-True] 0.6659ms 0.4038ms 2.4765 KOps/s 2.5503 KOps/s $\color{#d91a1a}-2.89\%$
test_vmap_mlp_speed[False-False] 0.6619ms 0.4052ms 2.4680 KOps/s 2.5516 KOps/s $\color{#d91a1a}-3.28\%$
test_vmap_mlp_speed_decorator[True-True] 1.1165ms 0.5642ms 1.7725 KOps/s 1.7827 KOps/s $\color{#d91a1a}-0.57\%$
test_vmap_mlp_speed_decorator[True-False] 0.8045ms 0.5618ms 1.7801 KOps/s 1.7878 KOps/s $\color{#d91a1a}-0.43\%$
test_vmap_mlp_speed_decorator[False-True] 0.9030ms 0.4690ms 2.1322 KOps/s 2.1698 KOps/s $\color{#d91a1a}-1.73\%$
test_vmap_mlp_speed_decorator[False-False] 0.6421ms 0.4631ms 2.1595 KOps/s 2.1711 KOps/s $\color{#d91a1a}-0.54\%$
test_to_module_speed[True] 76.8081ms 1.8424ms 542.7768 Ops/s 579.0783 Ops/s $\textbf{\color{#d91a1a}-6.27\%}$
test_to_module_speed[False] 2.2961ms 1.6745ms 597.1942 Ops/s 539.0063 Ops/s $\textbf{\color{#35bf28}+10.80\%}$
test_tc_init 56.7170μs 27.7575μs 36.0263 KOps/s 34.4323 KOps/s $\color{#35bf28}+4.63\%$
test_tc_init_nested 0.1045ms 58.2877μs 17.1563 KOps/s 16.5961 KOps/s $\color{#35bf28}+3.38\%$
test_tc_first_layer_tensor 7.0533μs 0.7268μs 1.3760 MOps/s 1.4467 MOps/s $\color{#d91a1a}-4.89\%$
test_tc_first_layer_nontensor 2.5072μs 0.6960μs 1.4368 MOps/s 1.4507 MOps/s $\color{#d91a1a}-0.96\%$
test_tc_second_layer_tensor 26.2290μs 1.8754μs 533.2215 KOps/s 543.9492 KOps/s $\color{#d91a1a}-1.97\%$
test_tc_second_layer_nontensor 35.8570μs 1.6986μs 588.7240 KOps/s 614.7445 KOps/s $\color{#d91a1a}-4.23\%$
test_unbind 86.0083ms 6.8149ms 146.7366 Ops/s 146.2237 Ops/s $\color{#35bf28}+0.35\%$
test_full_like 17.4117ms 12.7083ms 78.6886 Ops/s 89.3764 Ops/s $\textbf{\color{#d91a1a}-11.96\%}$
test_zeros_like 14.7891ms 6.9609ms 143.6598 Ops/s 162.5692 Ops/s $\textbf{\color{#d91a1a}-11.63\%}$
test_ones_like 12.1949ms 7.1489ms 139.8826 Ops/s 152.5419 Ops/s $\textbf{\color{#d91a1a}-8.30\%}$
test_clone 19.2053ms 9.5536ms 104.6729 Ops/s 122.5617 Ops/s $\textbf{\color{#d91a1a}-14.60\%}$
test_squeeze 73.0060μs 14.3005μs 69.9277 KOps/s 70.2557 KOps/s $\color{#d91a1a}-0.47\%$
test_unsqueeze 0.1217ms 61.0210μs 16.3878 KOps/s 15.8810 KOps/s $\color{#35bf28}+3.19\%$
test_split 0.1918ms 0.1135ms 8.8091 KOps/s 8.7818 KOps/s $\color{#35bf28}+0.31\%$
test_permute 0.2450ms 0.1276ms 7.8381 KOps/s 7.7870 KOps/s $\color{#35bf28}+0.66\%$
test_stack 31.1262ms 26.8018ms 37.3109 Ops/s 41.5372 Ops/s $\textbf{\color{#d91a1a}-10.17\%}$
test_cat 34.4613ms 28.4707ms 35.1238 Ops/s 41.2332 Ops/s $\textbf{\color{#d91a1a}-14.82\%}$

@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}33$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 41.1900μs 21.8176μs 45.8345 KOps/s 50.3520 KOps/s $\textbf{\color{#d91a1a}-8.97\%}$
test_plain_set_stack_nested 37.0410μs 21.8990μs 45.6643 KOps/s 49.8137 KOps/s $\textbf{\color{#d91a1a}-8.33\%}$
test_plain_set_nested_inplace 47.6800μs 24.4728μs 40.8616 KOps/s 44.2023 KOps/s $\textbf{\color{#d91a1a}-7.56\%}$
test_plain_set_stack_nested_inplace 97.0220μs 24.2646μs 41.2122 KOps/s 44.1863 KOps/s $\textbf{\color{#d91a1a}-6.73\%}$
test_items 27.5500μs 4.3595μs 229.3848 KOps/s 232.8373 KOps/s $\color{#d91a1a}-1.48\%$
test_items_nested 0.4275ms 0.3460ms 2.8901 KOps/s 2.8698 KOps/s $\color{#35bf28}+0.71\%$
test_items_nested_locked 0.4450ms 0.3463ms 2.8876 KOps/s 2.8709 KOps/s $\color{#35bf28}+0.58\%$
test_items_nested_leaf 0.1373ms 0.1009ms 9.9138 KOps/s 9.9314 KOps/s $\color{#d91a1a}-0.18\%$
test_items_stack_nested 0.4779ms 0.3500ms 2.8572 KOps/s 2.8557 KOps/s $\color{#35bf28}+0.05\%$
test_items_stack_nested_leaf 0.1346ms 0.1026ms 9.7477 KOps/s 9.7942 KOps/s $\color{#d91a1a}-0.47\%$
test_items_stack_nested_locked 0.4151ms 0.3459ms 2.8911 KOps/s 2.8958 KOps/s $\color{#d91a1a}-0.16\%$
test_keys 28.2510μs 4.8232μs 207.3327 KOps/s 209.1893 KOps/s $\color{#d91a1a}-0.89\%$
test_keys_nested 0.2560ms 0.1670ms 5.9879 KOps/s 5.9598 KOps/s $\color{#35bf28}+0.47\%$
test_keys_nested_locked 2.0778ms 0.1760ms 5.6814 KOps/s 5.7643 KOps/s $\color{#d91a1a}-1.44\%$
test_keys_nested_leaf 0.1959ms 0.1469ms 6.8093 KOps/s 6.9196 KOps/s $\color{#d91a1a}-1.59\%$
test_keys_stack_nested 0.2441ms 0.1690ms 5.9165 KOps/s 5.8985 KOps/s $\color{#35bf28}+0.30\%$
test_keys_stack_nested_leaf 0.2053ms 0.1435ms 6.9685 KOps/s 6.9010 KOps/s $\color{#35bf28}+0.98\%$
test_keys_stack_nested_locked 0.2437ms 0.1703ms 5.8713 KOps/s 5.8366 KOps/s $\color{#35bf28}+0.59\%$
test_values 8.9603μs 2.0499μs 487.8293 KOps/s 483.8586 KOps/s $\color{#35bf28}+0.82\%$
test_values_nested 97.5110μs 62.8569μs 15.9092 KOps/s 16.0889 KOps/s $\color{#d91a1a}-1.12\%$
test_values_nested_locked 0.1039ms 62.2420μs 16.0663 KOps/s 16.0955 KOps/s $\color{#d91a1a}-0.18\%$
test_values_nested_leaf 95.2020μs 55.9837μs 17.8623 KOps/s 17.4674 KOps/s $\color{#35bf28}+2.26\%$
test_values_stack_nested 91.8110μs 61.9322μs 16.1467 KOps/s 15.8375 KOps/s $\color{#35bf28}+1.95\%$
test_values_stack_nested_leaf 88.3920μs 57.0017μs 17.5433 KOps/s 17.4601 KOps/s $\color{#35bf28}+0.48\%$
test_values_stack_nested_locked 0.2449ms 61.5592μs 16.2445 KOps/s 16.2601 KOps/s $\color{#d91a1a}-0.10\%$
test_membership 18.1610μs 1.5510μs 644.7471 KOps/s 658.2183 KOps/s $\color{#d91a1a}-2.05\%$
test_membership_nested 29.9210μs 3.9172μs 255.2825 KOps/s 259.7039 KOps/s $\color{#d91a1a}-1.70\%$
test_membership_nested_leaf 44.0010μs 3.9178μs 255.2424 KOps/s 258.6235 KOps/s $\color{#d91a1a}-1.31\%$
test_membership_stacked_nested 24.4510μs 3.8753μs 258.0419 KOps/s 259.4204 KOps/s $\color{#d91a1a}-0.53\%$
test_membership_stacked_nested_leaf 33.8100μs 3.8776μs 257.8903 KOps/s 258.6729 KOps/s $\color{#d91a1a}-0.30\%$
test_membership_nested_last 18.6600μs 4.7669μs 209.7808 KOps/s 212.9240 KOps/s $\color{#d91a1a}-1.48\%$
test_membership_nested_leaf_last 35.9510μs 4.7616μs 210.0139 KOps/s 211.1444 KOps/s $\color{#d91a1a}-0.54\%$
test_membership_stacked_nested_last 29.3310μs 4.7598μs 210.0926 KOps/s 184.0890 KOps/s $\textbf{\color{#35bf28}+14.13\%}$
test_membership_stacked_nested_leaf_last 75.8700μs 4.7546μs 210.3231 KOps/s 183.0968 KOps/s $\textbf{\color{#35bf28}+14.87\%}$
test_nested_getleaf 0.1445ms 13.3156μs 75.1000 KOps/s 76.6255 KOps/s $\color{#d91a1a}-1.99\%$
test_nested_get 38.6200μs 12.7123μs 78.6641 KOps/s 80.6027 KOps/s $\color{#d91a1a}-2.41\%$
test_stacked_getleaf 31.2110μs 13.2400μs 75.5285 KOps/s 76.2080 KOps/s $\color{#d91a1a}-0.89\%$
test_stacked_get 45.0310μs 12.6363μs 79.1368 KOps/s 80.2303 KOps/s $\color{#d91a1a}-1.36\%$
test_nested_getitemleaf 37.9500μs 13.6822μs 73.0876 KOps/s 73.5835 KOps/s $\color{#d91a1a}-0.67\%$
test_nested_getitem 38.6300μs 12.7226μs 78.6002 KOps/s 78.4878 KOps/s $\color{#35bf28}+0.14\%$
test_stacked_getitemleaf 31.9310μs 13.7094μs 72.9429 KOps/s 73.7712 KOps/s $\color{#d91a1a}-1.12\%$
test_stacked_getitem 42.1100μs 12.8142μs 78.0382 KOps/s 78.5619 KOps/s $\color{#d91a1a}-0.67\%$
test_lock_nested 3.8722ms 0.4046ms 2.4714 KOps/s 2.1778 KOps/s $\textbf{\color{#35bf28}+13.48\%}$
test_lock_stack_nested 0.4098ms 0.3546ms 2.8200 KOps/s 2.7699 KOps/s $\color{#35bf28}+1.81\%$
test_unlock_nested 0.9023ms 0.4097ms 2.4410 KOps/s 2.1328 KOps/s $\textbf{\color{#35bf28}+14.45\%}$
test_unlock_stack_nested 0.5184ms 0.3731ms 2.6800 KOps/s 2.6612 KOps/s $\color{#35bf28}+0.71\%$
test_flatten_speed 0.2101ms 0.1238ms 8.0779 KOps/s 8.0979 KOps/s $\color{#d91a1a}-0.25\%$
test_unflatten_speed 0.6050ms 0.4952ms 2.0194 KOps/s 2.0595 KOps/s $\color{#d91a1a}-1.95\%$
test_common_ops 1.3634ms 0.7795ms 1.2828 KOps/s 1.4378 KOps/s $\textbf{\color{#d91a1a}-10.78\%}$
test_creation 30.1810μs 2.1885μs 456.9311 KOps/s 464.5598 KOps/s $\color{#d91a1a}-1.64\%$
test_creation_empty 77.9120μs 14.0377μs 71.2368 KOps/s 97.3097 KOps/s $\textbf{\color{#d91a1a}-26.79\%}$
test_creation_nested_1 45.8600μs 17.1827μs 58.1980 KOps/s 74.7529 KOps/s $\textbf{\color{#d91a1a}-22.15\%}$
test_creation_nested_2 47.2600μs 20.5715μs 48.6108 KOps/s 59.6054 KOps/s $\textbf{\color{#d91a1a}-18.45\%}$
test_clone 86.1420μs 15.3147μs 65.2966 KOps/s 64.9085 KOps/s $\color{#35bf28}+0.60\%$
test_getitem[int] 30.6700μs 14.0353μs 71.2490 KOps/s 70.2778 KOps/s $\color{#35bf28}+1.38\%$
test_getitem[slice_int] 84.6320μs 25.2754μs 39.5641 KOps/s 38.8565 KOps/s $\color{#35bf28}+1.82\%$
test_getitem[range] 70.6610μs 51.7068μs 19.3398 KOps/s 19.1928 KOps/s $\color{#35bf28}+0.77\%$
test_getitem[tuple] 51.4010μs 23.0138μs 43.4523 KOps/s 42.3532 KOps/s $\color{#35bf28}+2.59\%$
test_getitem[list] 0.1194ms 40.8930μs 24.4541 KOps/s 24.0740 KOps/s $\color{#35bf28}+1.58\%$
test_setitem_dim[int] 67.9810μs 38.9671μs 25.6627 KOps/s 27.7067 KOps/s $\textbf{\color{#d91a1a}-7.38\%}$
test_setitem_dim[slice_int] 93.9620μs 63.6066μs 15.7216 KOps/s 16.5898 KOps/s $\textbf{\color{#d91a1a}-5.23\%}$
test_setitem_dim[range] 0.1152ms 81.2163μs 12.3128 KOps/s 12.7128 KOps/s $\color{#d91a1a}-3.15\%$
test_setitem_dim[tuple] 84.7920μs 55.9679μs 17.8674 KOps/s 18.8062 KOps/s $\color{#d91a1a}-4.99\%$
test_setitem 86.8610μs 22.9597μs 43.5545 KOps/s 46.5028 KOps/s $\textbf{\color{#d91a1a}-6.34\%}$
test_set 67.9010μs 23.0993μs 43.2913 KOps/s 47.6043 KOps/s $\textbf{\color{#d91a1a}-9.06\%}$
test_set_shared 1.1593ms 0.1131ms 8.8398 KOps/s 8.7399 KOps/s $\color{#35bf28}+1.14\%$
test_update 92.5920μs 26.4798μs 37.7646 KOps/s 43.8951 KOps/s $\textbf{\color{#d91a1a}-13.97\%}$
test_update_nested 0.1545ms 35.6521μs 28.0489 KOps/s 31.9958 KOps/s $\textbf{\color{#d91a1a}-12.34\%}$
test_update__nested 78.5820μs 29.9156μs 33.4274 KOps/s 33.1501 KOps/s $\color{#35bf28}+0.84\%$
test_set_nested 90.8610μs 24.2227μs 41.2836 KOps/s 45.2668 KOps/s $\textbf{\color{#d91a1a}-8.80\%}$
test_set_nested_new 0.1026ms 30.4145μs 32.8790 KOps/s 34.8542 KOps/s $\textbf{\color{#d91a1a}-5.67\%}$
test_select 95.0210μs 48.3464μs 20.6841 KOps/s 21.6284 KOps/s $\color{#d91a1a}-4.37\%$
test_select_nested 0.1090ms 68.7335μs 14.5489 KOps/s 14.9776 KOps/s $\color{#d91a1a}-2.86\%$
test_exclude_nested 0.2789ms 0.1366ms 7.3221 KOps/s 7.3889 KOps/s $\color{#d91a1a}-0.90\%$
test_empty[True] 0.5224ms 0.4440ms 2.2523 KOps/s 2.2461 KOps/s $\color{#35bf28}+0.27\%$
test_empty[False] 51.0335μs 1.4438μs 692.5954 KOps/s 702.9133 KOps/s $\color{#d91a1a}-1.47\%$
test_to 0.1296ms 0.1007ms 9.9338 KOps/s 11.0542 KOps/s $\textbf{\color{#d91a1a}-10.14\%}$
test_to_nonblocking 0.1507ms 74.7867μs 13.3714 KOps/s 13.1913 KOps/s $\color{#35bf28}+1.36\%$
test_unbind_speed 0.3557ms 0.3139ms 3.1861 KOps/s 3.1391 KOps/s $\color{#35bf28}+1.50\%$
test_unbind_speed_stack0 0.4967ms 0.3138ms 3.1864 KOps/s 3.1294 KOps/s $\color{#35bf28}+1.82\%$
test_unbind_speed_stack1 72.3187ms 0.9270ms 1.0787 KOps/s 1.0735 KOps/s $\color{#35bf28}+0.49\%$
test_split 71.7442ms 1.9497ms 512.8994 Ops/s 544.6862 Ops/s $\textbf{\color{#d91a1a}-5.84\%}$
test_chunk 73.2457ms 1.9509ms 512.5935 Ops/s 508.9672 Ops/s $\color{#35bf28}+0.71\%$
test_creation[device0] 0.2194ms 72.2794μs 13.8352 KOps/s 13.7480 KOps/s $\color{#35bf28}+0.63\%$
test_creation_from_tensor 0.2286ms 70.4401μs 14.1965 KOps/s 14.6389 KOps/s $\color{#d91a1a}-3.02\%$
test_add_one[memmap_tensor0] 0.1331ms 7.1219μs 140.4118 KOps/s 139.2282 KOps/s $\color{#35bf28}+0.85\%$
test_contiguous[memmap_tensor0] 14.0410μs 0.7068μs 1.4148 MOps/s 1.4584 MOps/s $\color{#d91a1a}-2.99\%$
test_stack[memmap_tensor0] 29.5710μs 4.7746μs 209.4421 KOps/s 213.2920 KOps/s $\color{#d91a1a}-1.81\%$
test_memmaptd_index 1.0572ms 0.3260ms 3.0671 KOps/s 3.0654 KOps/s $\color{#35bf28}+0.05\%$
test_memmaptd_index_astensor 0.8233ms 0.4186ms 2.3891 KOps/s 2.4072 KOps/s $\color{#d91a1a}-0.75\%$
test_memmaptd_index_op 1.2173ms 0.7869ms 1.2708 KOps/s 1.3839 KOps/s $\textbf{\color{#d91a1a}-8.18\%}$
test_serialize_model 0.1869s 0.1170s 8.5448 Ops/s 8.0826 Ops/s $\textbf{\color{#35bf28}+5.72\%}$
test_serialize_model_pickle 1.3515s 1.2364s 0.8088 Ops/s 0.8074 Ops/s $\color{#35bf28}+0.17\%$
test_serialize_weights 0.1826s 0.1142s 8.7536 Ops/s 8.3164 Ops/s $\textbf{\color{#35bf28}+5.26\%}$
test_serialize_weights_returnearly 0.1202s 96.7880ms 10.3319 Ops/s 10.0294 Ops/s $\color{#35bf28}+3.02\%$
test_serialize_weights_pickle 1.3505s 1.2366s 0.8087 Ops/s 0.8086 Ops/s $\color{#35bf28}+0.01\%$
test_reshape_pytree 0.1510ms 35.5245μs 28.1496 KOps/s 30.0774 KOps/s $\textbf{\color{#d91a1a}-6.41\%}$
test_reshape_td 0.2323ms 45.1905μs 22.1285 KOps/s 24.5809 KOps/s $\textbf{\color{#d91a1a}-9.98\%}$
test_view_pytree 0.1774ms 35.5446μs 28.1337 KOps/s 29.7658 KOps/s $\textbf{\color{#d91a1a}-5.48\%}$
test_view_td 84.1620μs 50.1152μs 19.9540 KOps/s 21.9943 KOps/s $\textbf{\color{#d91a1a}-9.28\%}$
test_unbind_pytree 0.1904ms 42.1016μs 23.7521 KOps/s 25.2892 KOps/s $\textbf{\color{#d91a1a}-6.08\%}$
test_unbind_td 0.5468ms 49.0710μs 20.3786 KOps/s 20.9748 KOps/s $\color{#d91a1a}-2.84\%$
test_split_pytree 0.1656ms 39.1643μs 25.5335 KOps/s 25.7557 KOps/s $\color{#d91a1a}-0.86\%$
test_split_td 0.1175ms 47.5683μs 21.0224 KOps/s 20.7911 KOps/s $\color{#35bf28}+1.11\%$
test_add_pytree 64.4010μs 44.5064μs 22.4687 KOps/s 21.8678 KOps/s $\color{#35bf28}+2.75\%$
test_add_td 0.1056ms 65.6495μs 15.2324 KOps/s 15.9109 KOps/s $\color{#d91a1a}-4.26\%$
test_distributed 3.8285ms 0.1049ms 9.5290 KOps/s 12.4312 KOps/s $\textbf{\color{#d91a1a}-23.35\%}$
test_tdmodule 0.1299ms 20.7661μs 48.1553 KOps/s 54.0091 KOps/s $\textbf{\color{#d91a1a}-10.84\%}$
test_tdmodule_dispatch 0.1666ms 40.1197μs 24.9254 KOps/s 28.1393 KOps/s $\textbf{\color{#d91a1a}-11.42\%}$
test_tdseq 40.9900μs 23.5545μs 42.4547 KOps/s 49.1050 KOps/s $\textbf{\color{#d91a1a}-13.54\%}$
test_tdseq_dispatch 67.2810μs 45.9014μs 21.7858 KOps/s 25.5952 KOps/s $\textbf{\color{#d91a1a}-14.88\%}$
test_instantiation_functorch 1.5707ms 1.4833ms 674.1883 Ops/s 677.8161 Ops/s $\color{#d91a1a}-0.54\%$
test_instantiation_td 1.5542ms 1.0857ms 921.0657 Ops/s 842.4797 Ops/s $\textbf{\color{#35bf28}+9.33\%}$
test_exec_functorch 0.2292ms 0.1815ms 5.5095 KOps/s 5.5338 KOps/s $\color{#d91a1a}-0.44\%$
test_exec_functional_call 0.2334ms 0.1780ms 5.6175 KOps/s 5.7188 KOps/s $\color{#d91a1a}-1.77\%$
test_exec_td 0.2768ms 0.1726ms 5.7950 KOps/s 5.8661 KOps/s $\color{#d91a1a}-1.21\%$
test_exec_td_decorator 0.5878ms 0.2539ms 3.9386 KOps/s 3.9072 KOps/s $\color{#35bf28}+0.80\%$
test_vmap_mlp_speed[True-True] 1.2985ms 0.6342ms 1.5768 KOps/s 1.5802 KOps/s $\color{#d91a1a}-0.21\%$
test_vmap_mlp_speed[True-False] 0.8260ms 0.6326ms 1.5807 KOps/s 1.5985 KOps/s $\color{#d91a1a}-1.12\%$
test_vmap_mlp_speed[False-True] 0.6716ms 0.5412ms 1.8476 KOps/s 1.7754 KOps/s $\color{#35bf28}+4.06\%$
test_vmap_mlp_speed[False-False] 0.7052ms 0.5394ms 1.8540 KOps/s 1.8407 KOps/s $\color{#35bf28}+0.72\%$
test_vmap_mlp_speed_decorator[True-True] 1.1321ms 0.7120ms 1.4044 KOps/s 1.4142 KOps/s $\color{#d91a1a}-0.69\%$
test_vmap_mlp_speed_decorator[True-False] 0.8507ms 0.7067ms 1.4151 KOps/s 1.4185 KOps/s $\color{#d91a1a}-0.24\%$
test_vmap_mlp_speed_decorator[False-True] 0.7889ms 0.6150ms 1.6260 KOps/s 1.6250 KOps/s $\color{#35bf28}+0.06\%$
test_vmap_mlp_speed_decorator[False-False] 0.7548ms 0.6124ms 1.6330 KOps/s 1.6277 KOps/s $\color{#35bf28}+0.33\%$
test_vmap_transformer_speed[True-True] 8.2677ms 7.9758ms 125.3790 Ops/s 125.4232 Ops/s $\color{#d91a1a}-0.04\%$
test_vmap_transformer_speed[True-False] 8.8926ms 7.9390ms 125.9606 Ops/s 125.9226 Ops/s $\color{#35bf28}+0.03\%$
test_vmap_transformer_speed[False-True] 8.0546ms 7.8329ms 127.6674 Ops/s 126.9717 Ops/s $\color{#35bf28}+0.55\%$
test_vmap_transformer_speed[False-False] 8.0697ms 7.8353ms 127.6272 Ops/s 127.3395 Ops/s $\color{#35bf28}+0.23\%$
test_vmap_transformer_speed_decorator[True-True] 19.5518ms 19.3369ms 51.7145 Ops/s 51.5687 Ops/s $\color{#35bf28}+0.28\%$
test_vmap_transformer_speed_decorator[True-False] 19.5635ms 19.3212ms 51.7567 Ops/s 51.5950 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_transformer_speed_decorator[False-True] 19.4465ms 19.2600ms 51.9210 Ops/s 51.8789 Ops/s $\color{#35bf28}+0.08\%$
test_vmap_transformer_speed_decorator[False-False] 19.7561ms 19.2588ms 51.9242 Ops/s 51.9866 Ops/s $\color{#d91a1a}-0.12\%$
test_to_module_speed[True] 2.6276ms 1.9777ms 505.6359 Ops/s 514.0356 Ops/s $\color{#d91a1a}-1.63\%$
test_to_module_speed[False] 2.4179ms 1.9513ms 512.4885 Ops/s 521.4940 Ops/s $\color{#d91a1a}-1.73\%$
test_tc_init 0.1782ms 37.2982μs 26.8109 KOps/s 36.9495 KOps/s $\textbf{\color{#d91a1a}-27.44\%}$
test_tc_init_nested 0.2803ms 77.2491μs 12.9451 KOps/s 16.4984 KOps/s $\textbf{\color{#d91a1a}-21.54\%}$
test_tc_first_layer_tensor 9.9192μs 0.6823μs 1.4656 MOps/s 1.4346 MOps/s $\color{#35bf28}+2.16\%$
test_tc_first_layer_nontensor 6.7866μs 0.6819μs 1.4666 MOps/s 1.4264 MOps/s $\color{#35bf28}+2.82\%$
test_tc_second_layer_tensor 78.1113μs 1.9362μs 516.4733 KOps/s 485.1041 KOps/s $\textbf{\color{#35bf28}+6.47\%}$
test_tc_second_layer_nontensor 66.6377μs 1.6107μs 620.8612 KOps/s 608.3090 KOps/s $\color{#35bf28}+2.06\%$
test_unbind 0.1024s 8.4896ms 117.7914 Ops/s 116.4021 Ops/s $\color{#35bf28}+1.19\%$
test_full_like 11.8103ms 11.3719ms 87.9358 Ops/s 87.6854 Ops/s $\color{#35bf28}+0.29\%$
test_zeros_like 6.3482ms 6.1349ms 163.0009 Ops/s 143.4621 Ops/s $\textbf{\color{#35bf28}+13.62\%}$
test_ones_like 7.3245ms 6.9653ms 143.5689 Ops/s 162.1187 Ops/s $\textbf{\color{#d91a1a}-11.44\%}$
test_clone 9.5822ms 8.9191ms 112.1195 Ops/s 122.4963 Ops/s $\textbf{\color{#d91a1a}-8.47\%}$
test_squeeze 63.8710μs 15.4035μs 64.9202 KOps/s 64.4885 KOps/s $\color{#35bf28}+0.67\%$
test_unsqueeze 0.2831ms 65.8498μs 15.1861 KOps/s 15.2478 KOps/s $\color{#d91a1a}-0.40\%$
test_split 0.2378ms 0.1221ms 8.1881 KOps/s 7.9895 KOps/s $\color{#35bf28}+2.49\%$
test_permute 0.1645ms 0.1225ms 8.1607 KOps/s 8.0924 KOps/s $\color{#35bf28}+0.84\%$
test_stack 28.7220ms 27.9254ms 35.8097 Ops/s 36.2615 Ops/s $\color{#d91a1a}-1.25\%$
test_cat 28.6281ms 27.6536ms 36.1617 Ops/s 36.2767 Ops/s $\color{#d91a1a}-0.32\%$

@vmoens vmoens added the Refactor Refactoring code - not a new feature label May 30, 2024
@vmoens vmoens merged commit 3fcb5f6 into main May 30, 2024
@vmoens vmoens deleted the fix-numpy-nestedtensor branch May 30, 2024 12:52
return tuple(_x.numpy() for _x in x)
return x.numpy()
if hasattr(x, "numpy"):
if getattr(x, "is_nested", False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious but isnt this a better check for nesting ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK
I usually try to avoid hasattr bc it can run code under the hood (imagine a very expensive property, hasattr will run it all until it may get to an AttributeError)
Happy to revert if you think it was better the other way around

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants