Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Jun 26, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 26, 2024
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}15$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 67.0050μs 16.8648μs 59.2950 KOps/s 59.8942 KOps/s $\color{#d91a1a}-1.00\%$
test_plain_set_stack_nested 46.6570μs 17.1354μs 58.3586 KOps/s 59.0690 KOps/s $\color{#d91a1a}-1.20\%$
test_plain_set_nested_inplace 50.7650μs 19.1894μs 52.1121 KOps/s 52.2937 KOps/s $\color{#d91a1a}-0.35\%$
test_plain_set_stack_nested_inplace 51.8560μs 19.2597μs 51.9218 KOps/s 52.9405 KOps/s $\color{#d91a1a}-1.92\%$
test_items 28.2230μs 2.5348μs 394.5059 KOps/s 389.8393 KOps/s $\color{#35bf28}+1.20\%$
test_items_nested 1.3238ms 0.2784ms 3.5922 KOps/s 3.6841 KOps/s $\color{#d91a1a}-2.50\%$
test_items_nested_locked 0.5158ms 0.2794ms 3.5790 KOps/s 3.6366 KOps/s $\color{#d91a1a}-1.58\%$
test_items_nested_leaf 0.1309ms 82.3603μs 12.1418 KOps/s 12.9537 KOps/s $\textbf{\color{#d91a1a}-6.27\%}$
test_items_stack_nested 1.3927ms 0.2833ms 3.5302 KOps/s 3.5897 KOps/s $\color{#d91a1a}-1.66\%$
test_items_stack_nested_leaf 0.1338ms 83.7034μs 11.9469 KOps/s 12.3368 KOps/s $\color{#d91a1a}-3.16\%$
test_items_stack_nested_locked 0.5135ms 0.2819ms 3.5468 KOps/s 3.6345 KOps/s $\color{#d91a1a}-2.41\%$
test_keys 25.7480μs 4.0403μs 247.5075 KOps/s 263.6170 KOps/s $\textbf{\color{#d91a1a}-6.11\%}$
test_keys_nested 0.2778ms 0.1395ms 7.1701 KOps/s 7.2703 KOps/s $\color{#d91a1a}-1.38\%$
test_keys_nested_locked 0.7752ms 0.1437ms 6.9602 KOps/s 7.0205 KOps/s $\color{#d91a1a}-0.86\%$
test_keys_nested_leaf 0.2131ms 0.1185ms 8.4375 KOps/s 8.5917 KOps/s $\color{#d91a1a}-1.79\%$
test_keys_stack_nested 0.2661ms 0.1406ms 7.1117 KOps/s 7.3238 KOps/s $\color{#d91a1a}-2.90\%$
test_keys_stack_nested_leaf 0.2117ms 0.1180ms 8.4720 KOps/s 8.5164 KOps/s $\color{#d91a1a}-0.52\%$
test_keys_stack_nested_locked 0.2756ms 0.1443ms 6.9316 KOps/s 7.0330 KOps/s $\color{#d91a1a}-1.44\%$
test_values 6.7525μs 1.1828μs 845.4539 KOps/s 749.8874 KOps/s $\textbf{\color{#35bf28}+12.74\%}$
test_values_nested 0.1146ms 51.4992μs 19.4178 KOps/s 19.7387 KOps/s $\color{#d91a1a}-1.63\%$
test_values_nested_locked 0.1043ms 51.4016μs 19.4546 KOps/s 19.7155 KOps/s $\color{#d91a1a}-1.32\%$
test_values_nested_leaf 0.1093ms 46.6645μs 21.4296 KOps/s 21.6827 KOps/s $\color{#d91a1a}-1.17\%$
test_values_stack_nested 0.1006ms 52.9444μs 18.8877 KOps/s 19.2500 KOps/s $\color{#d91a1a}-1.88\%$
test_values_stack_nested_leaf 96.0590μs 46.3596μs 21.5705 KOps/s 21.7884 KOps/s $\color{#d91a1a}-1.00\%$
test_values_stack_nested_locked 96.6310μs 52.3531μs 19.1011 KOps/s 19.4548 KOps/s $\color{#d91a1a}-1.82\%$
test_membership 14.0460μs 1.3597μs 735.4710 KOps/s 745.5273 KOps/s $\color{#d91a1a}-1.35\%$
test_membership_nested 32.1500μs 3.5974μs 277.9812 KOps/s 285.7855 KOps/s $\color{#d91a1a}-2.73\%$
test_membership_nested_leaf 44.5230μs 3.6010μs 277.6984 KOps/s 287.9084 KOps/s $\color{#d91a1a}-3.55\%$
test_membership_stacked_nested 28.8540μs 3.5280μs 283.4460 KOps/s 285.2103 KOps/s $\color{#d91a1a}-0.62\%$
test_membership_stacked_nested_leaf 28.8040μs 3.5492μs 281.7555 KOps/s 273.5672 KOps/s $\color{#35bf28}+2.99\%$
test_membership_nested_last 43.1300μs 4.3112μs 231.9541 KOps/s 235.5704 KOps/s $\color{#d91a1a}-1.54\%$
test_membership_nested_leaf_last 38.7520μs 4.3937μs 227.6000 KOps/s 233.0469 KOps/s $\color{#d91a1a}-2.34\%$
test_membership_stacked_nested_last 31.6700μs 5.4955μs 181.9675 KOps/s 208.1668 KOps/s $\textbf{\color{#d91a1a}-12.59\%}$
test_membership_stacked_nested_leaf_last 28.7040μs 5.5397μs 180.5137 KOps/s 207.7108 KOps/s $\textbf{\color{#d91a1a}-13.09\%}$
test_nested_getleaf 36.0180μs 10.6605μs 93.8043 KOps/s 93.9865 KOps/s $\color{#d91a1a}-0.19\%$
test_nested_get 37.1100μs 10.0546μs 99.4573 KOps/s 98.4257 KOps/s $\color{#35bf28}+1.05\%$
test_stacked_getleaf 41.7080μs 10.5907μs 94.4222 KOps/s 96.0176 KOps/s $\color{#d91a1a}-1.66\%$
test_stacked_get 33.1320μs 9.9345μs 100.6589 KOps/s 100.7476 KOps/s $\color{#d91a1a}-0.09\%$
test_nested_getitemleaf 48.5910μs 11.3334μs 88.2349 KOps/s 90.0631 KOps/s $\color{#d91a1a}-2.03\%$
test_nested_getitem 33.7540μs 10.4182μs 95.9856 KOps/s 96.9118 KOps/s $\color{#d91a1a}-0.96\%$
test_stacked_getitemleaf 34.9150μs 11.2157μs 89.1605 KOps/s 90.0522 KOps/s $\color{#d91a1a}-0.99\%$
test_stacked_getitem 36.9890μs 10.2549μs 97.5144 KOps/s 96.3338 KOps/s $\color{#35bf28}+1.23\%$
test_lock_nested 51.6710ms 0.4011ms 2.4933 KOps/s 2.9030 KOps/s $\textbf{\color{#d91a1a}-14.11\%}$
test_lock_stack_nested 0.5233ms 0.3138ms 3.1865 KOps/s 3.1945 KOps/s $\color{#d91a1a}-0.25\%$
test_unlock_nested 0.7877ms 0.3547ms 2.8194 KOps/s 2.8376 KOps/s $\color{#d91a1a}-0.64\%$
test_unlock_stack_nested 0.5141ms 0.3218ms 3.1077 KOps/s 3.1211 KOps/s $\color{#d91a1a}-0.43\%$
test_flatten_speed 0.2361ms 0.1011ms 9.8928 KOps/s 10.3738 KOps/s $\color{#d91a1a}-4.64\%$
test_unflatten_speed 0.7340ms 0.4178ms 2.3937 KOps/s 2.3710 KOps/s $\color{#35bf28}+0.96\%$
test_common_ops 5.1147ms 0.7345ms 1.3615 KOps/s 1.4039 KOps/s $\color{#d91a1a}-3.02\%$
test_creation 70.4220μs 1.8813μs 531.5594 KOps/s 520.4760 KOps/s $\color{#35bf28}+2.13\%$
test_creation_empty 34.6440μs 9.9490μs 100.5122 KOps/s 101.5181 KOps/s $\color{#d91a1a}-0.99\%$
test_creation_nested_1 43.1510μs 13.3146μs 75.1056 KOps/s 77.1244 KOps/s $\color{#d91a1a}-2.62\%$
test_creation_nested_2 46.2860μs 16.1408μs 61.9550 KOps/s 62.4120 KOps/s $\color{#d91a1a}-0.73\%$
test_clone 0.1378ms 13.9857μs 71.5017 KOps/s 73.0856 KOps/s $\color{#d91a1a}-2.17\%$
test_getitem[int] 61.7360μs 11.5879μs 86.2968 KOps/s 86.8573 KOps/s $\color{#d91a1a}-0.65\%$
test_getitem[slice_int] 84.2180μs 23.3518μs 42.8232 KOps/s 42.6087 KOps/s $\color{#35bf28}+0.50\%$
test_getitem[range] 87.8640μs 60.7276μs 16.4670 KOps/s 15.1783 KOps/s $\textbf{\color{#35bf28}+8.49\%}$
test_getitem[tuple] 60.4330μs 19.6418μs 50.9117 KOps/s 51.6798 KOps/s $\color{#d91a1a}-1.49\%$
test_getitem[list] 0.1547ms 42.6347μs 23.4551 KOps/s 24.4805 KOps/s $\color{#d91a1a}-4.19\%$
test_setitem_dim[int] 74.8400μs 35.4342μs 28.2213 KOps/s 28.4875 KOps/s $\color{#d91a1a}-0.93\%$
test_setitem_dim[slice_int] 0.1077ms 62.3024μs 16.0508 KOps/s 16.0860 KOps/s $\color{#d91a1a}-0.22\%$
test_setitem_dim[range] 0.1444ms 84.9323μs 11.7741 KOps/s 11.8165 KOps/s $\color{#d91a1a}-0.36\%$
test_setitem_dim[tuple] 0.1041ms 51.4750μs 19.4269 KOps/s 19.2810 KOps/s $\color{#35bf28}+0.76\%$
test_setitem 67.1450μs 20.4165μs 48.9799 KOps/s 49.9832 KOps/s $\color{#d91a1a}-2.01\%$
test_set 77.9250μs 21.0762μs 47.4470 KOps/s 51.0931 KOps/s $\textbf{\color{#d91a1a}-7.14\%}$
test_set_shared 1.7843ms 0.1439ms 6.9493 KOps/s 6.9197 KOps/s $\color{#35bf28}+0.43\%$
test_update 0.2303ms 21.7794μs 45.9149 KOps/s 46.7350 KOps/s $\color{#d91a1a}-1.75\%$
test_update_nested 0.1294ms 30.3877μs 32.9080 KOps/s 32.6705 KOps/s $\color{#35bf28}+0.73\%$
test_update__nested 81.7930μs 26.6272μs 37.5555 KOps/s 39.5408 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_set_nested 59.1710μs 21.7828μs 45.9079 KOps/s 47.6891 KOps/s $\color{#d91a1a}-3.74\%$
test_set_nested_new 77.8550μs 26.1986μs 38.1700 KOps/s 39.5521 KOps/s $\color{#d91a1a}-3.49\%$
test_select 1.3463ms 41.6038μs 24.0362 KOps/s 24.7735 KOps/s $\color{#d91a1a}-2.98\%$
test_select_nested 0.1458ms 61.7957μs 16.1824 KOps/s 16.5480 KOps/s $\color{#d91a1a}-2.21\%$
test_exclude_nested 0.2627ms 0.1231ms 8.1252 KOps/s 8.0146 KOps/s $\color{#35bf28}+1.38\%$
test_empty[True] 0.6115ms 0.4095ms 2.4421 KOps/s 2.4994 KOps/s $\color{#d91a1a}-2.29\%$
test_empty[False] 9.0595μs 1.1829μs 845.3536 KOps/s 877.8495 KOps/s $\color{#d91a1a}-3.70\%$
test_unbind_speed 0.3364ms 0.2641ms 3.7862 KOps/s 3.9466 KOps/s $\color{#d91a1a}-4.06\%$
test_unbind_speed_stack0 0.5373ms 0.2599ms 3.8469 KOps/s 3.9703 KOps/s $\color{#d91a1a}-3.11\%$
test_unbind_speed_stack1 76.9480ms 0.7492ms 1.3347 KOps/s 1.3818 KOps/s $\color{#d91a1a}-3.41\%$
test_split 76.9533ms 1.6516ms 605.4872 Ops/s 615.6309 Ops/s $\color{#d91a1a}-1.65\%$
test_chunk 77.8419ms 1.6416ms 609.1779 Ops/s 622.2475 Ops/s $\color{#d91a1a}-2.10\%$
test_creation[device0] 0.2763ms 86.9637μs 11.4991 KOps/s 11.8075 KOps/s $\color{#d91a1a}-2.61\%$
test_creation_from_tensor 4.1597ms 86.6974μs 11.5344 KOps/s 11.5706 KOps/s $\color{#d91a1a}-0.31\%$
test_add_one[memmap_tensor0] 0.1108ms 5.2683μs 189.8133 KOps/s 181.5287 KOps/s $\color{#35bf28}+4.56\%$
test_contiguous[memmap_tensor0] 11.9220μs 0.6323μs 1.5815 MOps/s 1.5785 MOps/s $\color{#35bf28}+0.19\%$
test_stack[memmap_tensor0] 29.1350μs 3.7695μs 265.2873 KOps/s 273.0128 KOps/s $\color{#d91a1a}-2.83\%$
test_memmaptd_index 0.9241ms 0.2568ms 3.8943 KOps/s 3.8183 KOps/s $\color{#35bf28}+1.99\%$
test_memmaptd_index_astensor 0.7322ms 0.3340ms 2.9944 KOps/s 2.9876 KOps/s $\color{#35bf28}+0.23\%$
test_memmaptd_index_op 1.5497ms 0.6383ms 1.5667 KOps/s 1.6117 KOps/s $\color{#d91a1a}-2.79\%$
test_serialize_model 0.1797s 0.1165s 8.5865 Ops/s 8.4624 Ops/s $\color{#35bf28}+1.47\%$
test_serialize_model_pickle 0.4511s 0.3809s 2.6251 Ops/s 2.6577 Ops/s $\color{#d91a1a}-1.23\%$
test_serialize_weights 0.1841s 0.1145s 8.7354 Ops/s 9.0917 Ops/s $\color{#d91a1a}-3.92\%$
test_serialize_weights_returnearly 0.2005s 0.1391s 7.1916 Ops/s 7.8219 Ops/s $\textbf{\color{#d91a1a}-8.06\%}$
test_serialize_weights_pickle 0.6027s 0.4544s 2.2005 Ops/s 2.5286 Ops/s $\textbf{\color{#d91a1a}-12.98\%}$
test_serialize_weights_filesystem 0.1847s 0.1048s 9.5432 Ops/s 9.9219 Ops/s $\color{#d91a1a}-3.82\%$
test_serialize_model_filesystem 0.1029s 96.9521ms 10.3144 Ops/s 10.0895 Ops/s $\color{#35bf28}+2.23\%$
test_reshape_pytree 0.2170ms 29.4652μs 33.9384 KOps/s 38.0725 KOps/s $\textbf{\color{#d91a1a}-10.86\%}$
test_reshape_td 92.9340μs 35.6631μs 28.0402 KOps/s 28.9218 KOps/s $\color{#d91a1a}-3.05\%$
test_view_pytree 66.2030μs 25.6213μs 39.0300 KOps/s 38.0934 KOps/s $\color{#35bf28}+2.46\%$
test_view_td 94.0760μs 41.0493μs 24.3610 KOps/s 25.2112 KOps/s $\color{#d91a1a}-3.37\%$
test_unbind_pytree 81.6030μs 30.2108μs 33.1008 KOps/s 33.1734 KOps/s $\color{#d91a1a}-0.22\%$
test_unbind_td 0.3952ms 38.9826μs 25.6525 KOps/s 26.0359 KOps/s $\color{#d91a1a}-1.47\%$
test_split_pytree 71.4240μs 29.9588μs 33.3792 KOps/s 33.8251 KOps/s $\color{#d91a1a}-1.32\%$
test_split_td 0.5079ms 41.7375μs 23.9593 KOps/s 24.2670 KOps/s $\color{#d91a1a}-1.27\%$
test_add_pytree 82.9050μs 35.5218μs 28.1517 KOps/s 28.1083 KOps/s $\color{#35bf28}+0.15\%$
test_add_td 0.1173ms 55.7661μs 17.9320 KOps/s 17.5140 KOps/s $\color{#35bf28}+2.39\%$
test_distributed 0.2153ms 0.1035ms 9.6599 KOps/s 9.5538 KOps/s $\color{#35bf28}+1.11\%$
test_tdmodule 0.1163ms 17.7609μs 56.3034 KOps/s 56.8530 KOps/s $\color{#d91a1a}-0.97\%$
test_tdmodule_dispatch 51.1660μs 35.0985μs 28.4912 KOps/s 28.4482 KOps/s $\color{#35bf28}+0.15\%$
test_tdseq 39.4340μs 20.3532μs 49.1323 KOps/s 48.6791 KOps/s $\color{#35bf28}+0.93\%$
test_tdseq_dispatch 64.4800μs 39.5035μs 25.3142 KOps/s 25.1130 KOps/s $\color{#35bf28}+0.80\%$
test_instantiation_functorch 1.6474ms 1.3489ms 741.3332 Ops/s 731.9937 Ops/s $\color{#35bf28}+1.28\%$
test_instantiation_td 71.2573ms 1.1171ms 895.1544 Ops/s 957.6510 Ops/s $\textbf{\color{#d91a1a}-6.53\%}$
test_exec_functorch 0.2963ms 0.1615ms 6.1918 KOps/s 5.7540 KOps/s $\textbf{\color{#35bf28}+7.61\%}$
test_exec_functional_call 0.3401ms 0.1528ms 6.5465 KOps/s 6.5671 KOps/s $\color{#d91a1a}-0.31\%$
test_exec_td 0.2890ms 0.1498ms 6.6769 KOps/s 6.7181 KOps/s $\color{#d91a1a}-0.61\%$
test_exec_td_decorator 1.0283ms 0.2273ms 4.3994 KOps/s 4.3798 KOps/s $\color{#35bf28}+0.45\%$
test_vmap_mlp_speed[True-True] 0.7713ms 0.4871ms 2.0530 KOps/s 2.0485 KOps/s $\color{#35bf28}+0.22\%$
test_vmap_mlp_speed[True-False] 0.7723ms 0.4829ms 2.0707 KOps/s 2.0757 KOps/s $\color{#d91a1a}-0.24\%$
test_vmap_mlp_speed[False-True] 0.8456ms 0.3986ms 2.5089 KOps/s 2.5416 KOps/s $\color{#d91a1a}-1.29\%$
test_vmap_mlp_speed[False-False] 0.6718ms 0.3948ms 2.5330 KOps/s 2.5432 KOps/s $\color{#d91a1a}-0.40\%$
test_vmap_mlp_speed_decorator[True-True] 1.1316ms 0.5656ms 1.7681 KOps/s 1.7926 KOps/s $\color{#d91a1a}-1.37\%$
test_vmap_mlp_speed_decorator[True-False] 0.8625ms 0.5564ms 1.7973 KOps/s 1.6514 KOps/s $\textbf{\color{#35bf28}+8.83\%}$
test_vmap_mlp_speed_decorator[False-True] 0.7741ms 0.4631ms 2.1593 KOps/s 2.1653 KOps/s $\color{#d91a1a}-0.27\%$
test_vmap_mlp_speed_decorator[False-False] 0.6769ms 0.4588ms 2.1798 KOps/s 2.1742 KOps/s $\color{#35bf28}+0.26\%$
test_to_module_speed[True] 2.4293ms 1.7233ms 580.2845 Ops/s 578.2675 Ops/s $\color{#35bf28}+0.35\%$
test_to_module_speed[False] 1.8423ms 1.7045ms 586.6718 Ops/s 584.3855 Ops/s $\color{#35bf28}+0.39\%$
test_tc_init 59.0000μs 27.0659μs 36.9469 KOps/s 36.9003 KOps/s $\color{#35bf28}+0.13\%$
test_tc_init_nested 0.1175ms 58.3503μs 17.1379 KOps/s 17.7078 KOps/s $\color{#d91a1a}-3.22\%$
test_tc_first_layer_tensor 5.2256μs 0.7076μs 1.4132 MOps/s 1.4708 MOps/s $\color{#d91a1a}-3.92\%$
test_tc_first_layer_nontensor 6.9830μs 0.7031μs 1.4224 MOps/s 1.5025 MOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_tc_second_layer_tensor 19.9470μs 1.8572μs 538.4584 KOps/s 543.0257 KOps/s $\color{#d91a1a}-0.84\%$
test_tc_second_layer_nontensor 18.6350μs 1.6783μs 595.8322 KOps/s 611.2751 KOps/s $\color{#d91a1a}-2.53\%$
test_unbind 85.0842ms 7.8232ms 127.8249 Ops/s 158.3942 Ops/s $\textbf{\color{#d91a1a}-19.30\%}$
test_full_like 15.5765ms 10.9436ms 91.3775 Ops/s 89.4241 Ops/s $\color{#35bf28}+2.18\%$
test_zeros_like 11.7763ms 5.9091ms 169.2294 Ops/s 180.2557 Ops/s $\textbf{\color{#d91a1a}-6.12\%}$
test_ones_like 11.9560ms 6.2815ms 159.1964 Ops/s 149.6545 Ops/s $\textbf{\color{#35bf28}+6.38\%}$
test_clone 12.3878ms 8.0635ms 124.0159 Ops/s 120.8141 Ops/s $\color{#35bf28}+2.65\%$
test_squeeze 68.0170μs 15.4435μs 64.7523 KOps/s 70.0548 KOps/s $\textbf{\color{#d91a1a}-7.57\%}$
test_unsqueeze 0.1326ms 61.9170μs 16.1506 KOps/s 16.4681 KOps/s $\color{#d91a1a}-1.93\%$
test_split 0.2614ms 0.1164ms 8.5891 KOps/s 9.0168 KOps/s $\color{#d91a1a}-4.74\%$
test_permute 0.2169ms 0.1269ms 7.8831 KOps/s 7.9618 KOps/s $\color{#d91a1a}-0.99\%$
test_stack 26.4938ms 22.6102ms 44.2278 Ops/s 42.8289 Ops/s $\color{#35bf28}+3.27\%$
test_cat 28.3176ms 23.5450ms 42.4718 Ops/s 41.2069 Ops/s $\color{#35bf28}+3.07\%$

@vmoens vmoens merged commit 43a1b11 into main Jun 26, 2024
@vmoens vmoens deleted the fix-uint16-dep branch June 26, 2024 13:29
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4891ms 13.0107μs 76.8595 KOps/s 85.5200 KOps/s $\textbf{\color{#d91a1a}-10.13\%}$
test_plain_set_stack_nested 26.6600μs 13.2475μs 75.4860 KOps/s 83.2685 KOps/s $\textbf{\color{#d91a1a}-9.35\%}$
test_plain_set_nested_inplace 36.1410μs 14.5911μs 68.5348 KOps/s 76.1418 KOps/s $\textbf{\color{#d91a1a}-9.99\%}$
test_plain_set_stack_nested_inplace 36.7010μs 14.7151μs 67.9573 KOps/s 75.0822 KOps/s $\textbf{\color{#d91a1a}-9.49\%}$
test_items 28.5610μs 4.6686μs 214.1975 KOps/s 210.0242 KOps/s $\color{#35bf28}+1.99\%$
test_items_nested 0.3697ms 0.3474ms 2.8787 KOps/s 2.9398 KOps/s $\color{#d91a1a}-2.08\%$
test_items_nested_locked 0.3788ms 0.3496ms 2.8603 KOps/s 2.8551 KOps/s $\color{#35bf28}+0.18\%$
test_items_nested_leaf 98.5920μs 83.0731μs 12.0376 KOps/s 12.1168 KOps/s $\color{#d91a1a}-0.65\%$
test_items_stack_nested 0.4279ms 0.3500ms 2.8575 KOps/s 2.8692 KOps/s $\color{#d91a1a}-0.41\%$
test_items_stack_nested_leaf 0.1170ms 84.9727μs 11.7685 KOps/s 12.0043 KOps/s $\color{#d91a1a}-1.96\%$
test_items_stack_nested_locked 0.4084ms 0.3531ms 2.8320 KOps/s 2.8456 KOps/s $\color{#d91a1a}-0.48\%$
test_keys 18.4910μs 4.3319μs 230.8465 KOps/s 228.5917 KOps/s $\color{#35bf28}+0.99\%$
test_keys_nested 89.5110μs 67.3139μs 14.8558 KOps/s 14.9079 KOps/s $\color{#d91a1a}-0.35\%$
test_keys_nested_locked 2.0515ms 72.1334μs 13.8632 KOps/s 13.7998 KOps/s $\color{#35bf28}+0.46\%$
test_keys_nested_leaf 77.2310μs 57.7595μs 17.3132 KOps/s 17.3554 KOps/s $\color{#d91a1a}-0.24\%$
test_keys_stack_nested 85.3220μs 66.8613μs 14.9563 KOps/s 14.8990 KOps/s $\color{#35bf28}+0.38\%$
test_keys_stack_nested_leaf 71.9320μs 57.5488μs 17.3766 KOps/s 17.4245 KOps/s $\color{#d91a1a}-0.28\%$
test_keys_stack_nested_locked 90.9020μs 71.5040μs 13.9852 KOps/s 14.0757 KOps/s $\color{#d91a1a}-0.64\%$
test_values 8.9103μs 1.8332μs 545.5052 KOps/s 547.2125 KOps/s $\color{#d91a1a}-0.31\%$
test_values_nested 58.1810μs 35.1982μs 28.4106 KOps/s 28.3767 KOps/s $\color{#35bf28}+0.12\%$
test_values_nested_locked 54.9710μs 37.3491μs 26.7744 KOps/s 26.9181 KOps/s $\color{#d91a1a}-0.53\%$
test_values_nested_leaf 49.5310μs 31.1744μs 32.0776 KOps/s 31.6190 KOps/s $\color{#35bf28}+1.45\%$
test_values_stack_nested 63.5410μs 36.1570μs 27.6572 KOps/s 28.0559 KOps/s $\color{#d91a1a}-1.42\%$
test_values_stack_nested_leaf 54.9510μs 32.1215μs 31.1318 KOps/s 31.4489 KOps/s $\color{#d91a1a}-1.01\%$
test_values_stack_nested_locked 58.8610μs 38.2336μs 26.1550 KOps/s 26.8997 KOps/s $\color{#d91a1a}-2.77\%$
test_membership 3.3257μs 0.7666μs 1.3045 MOps/s 1.2805 MOps/s $\color{#35bf28}+1.87\%$
test_membership_nested 26.9610μs 2.6165μs 382.1954 KOps/s 376.4177 KOps/s $\color{#35bf28}+1.53\%$
test_membership_nested_leaf 19.1810μs 2.6026μs 384.2257 KOps/s 377.6353 KOps/s $\color{#35bf28}+1.75\%$
test_membership_stacked_nested 27.2300μs 2.6397μs 378.8358 KOps/s 374.7874 KOps/s $\color{#35bf28}+1.08\%$
test_membership_stacked_nested_leaf 20.1500μs 2.6206μs 381.5923 KOps/s 380.1463 KOps/s $\color{#35bf28}+0.38\%$
test_membership_nested_last 34.5210μs 3.2040μs 312.1121 KOps/s 316.1625 KOps/s $\color{#d91a1a}-1.28\%$
test_membership_nested_leaf_last 18.8910μs 3.1431μs 318.1587 KOps/s 317.6761 KOps/s $\color{#35bf28}+0.15\%$
test_membership_stacked_nested_last 24.4000μs 3.6204μs 276.2147 KOps/s 316.8087 KOps/s $\textbf{\color{#d91a1a}-12.81\%}$
test_membership_stacked_nested_leaf_last 35.7610μs 3.5815μs 279.2120 KOps/s 315.8200 KOps/s $\textbf{\color{#d91a1a}-11.59\%}$
test_nested_getleaf 25.3210μs 8.3656μs 119.5370 KOps/s 118.8599 KOps/s $\color{#35bf28}+0.57\%$
test_nested_get 66.7210μs 7.8663μs 127.1238 KOps/s 126.2368 KOps/s $\color{#35bf28}+0.70\%$
test_stacked_getleaf 38.7100μs 8.3950μs 119.1180 KOps/s 118.7018 KOps/s $\color{#35bf28}+0.35\%$
test_stacked_get 24.8300μs 7.8653μs 127.1400 KOps/s 126.5972 KOps/s $\color{#35bf28}+0.43\%$
test_nested_getitemleaf 33.6710μs 8.5226μs 117.3357 KOps/s 116.2076 KOps/s $\color{#35bf28}+0.97\%$
test_nested_getitem 27.6410μs 8.0374μs 124.4189 KOps/s 123.1656 KOps/s $\color{#35bf28}+1.02\%$
test_stacked_getitemleaf 25.8800μs 8.6042μs 116.2228 KOps/s 116.3390 KOps/s $\color{#d91a1a}-0.10\%$
test_stacked_getitem 69.3010μs 8.0502μs 124.2211 KOps/s 123.3771 KOps/s $\color{#35bf28}+0.68\%$
test_lock_nested 58.8089ms 0.4045ms 2.4721 KOps/s 2.4583 KOps/s $\color{#35bf28}+0.56\%$
test_lock_stack_nested 0.3357ms 0.3002ms 3.3314 KOps/s 3.2686 KOps/s $\color{#35bf28}+1.92\%$
test_unlock_nested 60.5803ms 0.4090ms 2.4452 KOps/s 2.4345 KOps/s $\color{#35bf28}+0.44\%$
test_unlock_stack_nested 0.3323ms 0.3096ms 3.2300 KOps/s 3.1897 KOps/s $\color{#35bf28}+1.26\%$
test_flatten_speed 0.3275ms 0.1020ms 9.8008 KOps/s 9.8776 KOps/s $\color{#d91a1a}-0.78\%$
test_unflatten_speed 0.3812ms 0.2924ms 3.4201 KOps/s 3.3580 KOps/s $\color{#35bf28}+1.85\%$
test_common_ops 1.0498ms 0.5947ms 1.6816 KOps/s 1.8181 KOps/s $\textbf{\color{#d91a1a}-7.51\%}$
test_creation 32.7410μs 1.6435μs 608.4693 KOps/s 598.5097 KOps/s $\color{#35bf28}+1.66\%$
test_creation_empty 47.8000μs 9.3411μs 107.0535 KOps/s 159.5561 KOps/s $\textbf{\color{#d91a1a}-32.91\%}$
test_creation_nested_1 26.4200μs 11.0006μs 90.9041 KOps/s 123.1820 KOps/s $\textbf{\color{#d91a1a}-26.20\%}$
test_creation_nested_2 37.2300μs 13.2102μs 75.6991 KOps/s 98.2999 KOps/s $\textbf{\color{#d91a1a}-22.99\%}$
test_clone 67.0810μs 11.8231μs 84.5801 KOps/s 82.5357 KOps/s $\color{#35bf28}+2.48\%$
test_getitem[int] 27.6510μs 10.7732μs 92.8232 KOps/s 88.4376 KOps/s $\color{#35bf28}+4.96\%$
test_getitem[slice_int] 43.9210μs 20.2375μs 49.4133 KOps/s 45.4156 KOps/s $\textbf{\color{#35bf28}+8.80\%}$
test_getitem[range] 66.3310μs 47.6148μs 21.0019 KOps/s 20.1027 KOps/s $\color{#35bf28}+4.47\%$
test_getitem[tuple] 52.7010μs 18.6755μs 53.5460 KOps/s 51.4430 KOps/s $\color{#35bf28}+4.09\%$
test_getitem[list] 0.1225ms 34.6743μs 28.8398 KOps/s 29.1697 KOps/s $\color{#d91a1a}-1.13\%$
test_setitem_dim[int] 46.7410μs 30.2173μs 33.0937 KOps/s 36.2960 KOps/s $\textbf{\color{#d91a1a}-8.82\%}$
test_setitem_dim[slice_int] 68.4910μs 50.9178μs 19.6395 KOps/s 20.8365 KOps/s $\textbf{\color{#d91a1a}-5.74\%}$
test_setitem_dim[range] 95.4210μs 67.0238μs 14.9201 KOps/s 14.5953 KOps/s $\color{#35bf28}+2.22\%$
test_setitem_dim[tuple] 61.7210μs 44.6601μs 22.3913 KOps/s 22.8518 KOps/s $\color{#d91a1a}-2.01\%$
test_setitem 52.4310μs 16.8832μs 59.2306 KOps/s 60.9400 KOps/s $\color{#d91a1a}-2.80\%$
test_set 44.2800μs 16.2710μs 61.4589 KOps/s 65.0818 KOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_set_shared 1.5914ms 0.1003ms 9.9691 KOps/s 9.7335 KOps/s $\color{#35bf28}+2.42\%$
test_update 63.1810μs 19.3420μs 51.7010 KOps/s 59.3498 KOps/s $\textbf{\color{#d91a1a}-12.89\%}$
test_update_nested 74.9620μs 24.9168μs 40.1336 KOps/s 44.1944 KOps/s $\textbf{\color{#d91a1a}-9.19\%}$
test_update__nested 56.7310μs 22.7054μs 44.0423 KOps/s 42.5907 KOps/s $\color{#35bf28}+3.41\%$
test_set_nested 51.3810μs 17.2335μs 58.0267 KOps/s 59.1974 KOps/s $\color{#d91a1a}-1.98\%$
test_set_nested_new 61.8410μs 20.5082μs 48.7609 KOps/s 50.1372 KOps/s $\color{#d91a1a}-2.74\%$
test_select 76.5710μs 35.1322μs 28.4639 KOps/s 29.8689 KOps/s $\color{#d91a1a}-4.70\%$
test_select_nested 0.1358ms 54.3243μs 18.4080 KOps/s 17.9277 KOps/s $\color{#35bf28}+2.68\%$
test_exclude_nested 0.1425ms 0.1140ms 8.7688 KOps/s 8.9164 KOps/s $\color{#d91a1a}-1.66\%$
test_empty[True] 0.3839ms 0.3541ms 2.8239 KOps/s 2.8096 KOps/s $\color{#35bf28}+0.51\%$
test_empty[False] 2.7290μs 0.9292μs 1.0762 MOps/s 1.0676 MOps/s $\color{#35bf28}+0.80\%$
test_to 0.1056ms 77.5163μs 12.9005 KOps/s 12.7773 KOps/s $\color{#35bf28}+0.96\%$
test_to_nonblocking 98.2720μs 62.2692μs 16.0593 KOps/s 15.7991 KOps/s $\color{#35bf28}+1.65\%$
test_unbind_speed 1.5093ms 0.2673ms 3.7418 KOps/s 3.7382 KOps/s $\color{#35bf28}+0.10\%$
test_unbind_speed_stack0 0.2954ms 0.2641ms 3.7868 KOps/s 3.7611 KOps/s $\color{#35bf28}+0.69\%$
test_unbind_speed_stack1 76.1570ms 0.8012ms 1.2481 KOps/s 1.2235 KOps/s $\color{#35bf28}+2.02\%$
test_split 76.5115ms 1.6619ms 601.7078 Ops/s 576.7270 Ops/s $\color{#35bf28}+4.33\%$
test_chunk 76.3607ms 1.6553ms 604.1194 Ops/s 624.4696 Ops/s $\color{#d91a1a}-3.26\%$
test_creation[device0] 0.1262ms 56.6970μs 17.6376 KOps/s 17.1585 KOps/s $\color{#35bf28}+2.79\%$
test_creation_from_tensor 0.1880ms 53.5673μs 18.6681 KOps/s 17.8588 KOps/s $\color{#35bf28}+4.53\%$
test_add_one[memmap_tensor0] 0.1022ms 6.9029μs 144.8668 KOps/s 133.0137 KOps/s $\textbf{\color{#35bf28}+8.91\%}$
test_contiguous[memmap_tensor0] 11.1100μs 0.6689μs 1.4949 MOps/s 1.4404 MOps/s $\color{#35bf28}+3.78\%$
test_stack[memmap_tensor0] 29.9110μs 4.7147μs 212.1034 KOps/s 185.7259 KOps/s $\textbf{\color{#35bf28}+14.20\%}$
test_memmaptd_index 1.0705ms 0.2860ms 3.4967 KOps/s 2.5693 KOps/s $\textbf{\color{#35bf28}+36.10\%}$
test_memmaptd_index_astensor 0.6239ms 0.3602ms 2.7760 KOps/s 2.7081 KOps/s $\color{#35bf28}+2.51\%$
test_memmaptd_index_op 1.1310ms 0.6663ms 1.5009 KOps/s 1.5518 KOps/s $\color{#d91a1a}-3.28\%$
test_serialize_model 0.1817s 0.1103s 9.0685 Ops/s 9.4588 Ops/s $\color{#d91a1a}-4.13\%$
test_serialize_model_pickle 1.3650s 1.2377s 0.8080 Ops/s 0.8069 Ops/s $\color{#35bf28}+0.13\%$
test_serialize_weights 0.1792s 0.1086s 9.2069 Ops/s 8.9095 Ops/s $\color{#35bf28}+3.34\%$
test_serialize_weights_returnearly 0.2613s 0.1046s 9.5573 Ops/s 9.5788 Ops/s $\color{#d91a1a}-0.22\%$
test_serialize_weights_pickle 1.3557s 1.2480s 0.8013 Ops/s 0.8091 Ops/s $\color{#d91a1a}-0.97\%$
test_reshape_pytree 50.9710μs 25.9799μs 38.4913 KOps/s 37.6632 KOps/s $\color{#35bf28}+2.20\%$
test_reshape_td 51.8210μs 31.5654μs 31.6802 KOps/s 31.4193 KOps/s $\color{#35bf28}+0.83\%$
test_view_pytree 0.2586ms 25.7939μs 38.7689 KOps/s 37.7408 KOps/s $\color{#35bf28}+2.72\%$
test_view_td 60.5210μs 36.6015μs 27.3213 KOps/s 26.9948 KOps/s $\color{#35bf28}+1.21\%$
test_unbind_pytree 0.2329ms 31.4852μs 31.7610 KOps/s 30.6563 KOps/s $\color{#35bf28}+3.60\%$
test_unbind_td 0.4453ms 40.0078μs 24.9951 KOps/s 24.0011 KOps/s $\color{#35bf28}+4.14\%$
test_split_pytree 53.7010μs 34.8674μs 28.6801 KOps/s 26.6072 KOps/s $\textbf{\color{#35bf28}+7.79\%}$
test_split_td 0.1061ms 38.8299μs 25.7534 KOps/s 25.2110 KOps/s $\color{#35bf28}+2.15\%$
test_add_pytree 0.2486ms 37.0713μs 26.9750 KOps/s 25.6884 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_add_td 83.4010μs 50.5449μs 19.7844 KOps/s 20.1252 KOps/s $\color{#d91a1a}-1.69\%$
test_distributed 0.2899ms 66.0367μs 15.1431 KOps/s 14.6122 KOps/s $\color{#35bf28}+3.63\%$
test_tdmodule 32.8300μs 15.4412μs 64.7618 KOps/s 73.8558 KOps/s $\textbf{\color{#d91a1a}-12.31\%}$
test_tdmodule_dispatch 47.9010μs 30.5355μs 32.7487 KOps/s 36.5899 KOps/s $\textbf{\color{#d91a1a}-10.50\%}$
test_tdseq 34.4600μs 17.7622μs 56.2993 KOps/s 64.0086 KOps/s $\textbf{\color{#d91a1a}-12.04\%}$
test_tdseq_dispatch 64.6710μs 33.7406μs 29.6379 KOps/s 32.2822 KOps/s $\textbf{\color{#d91a1a}-8.19\%}$
test_instantiation_functorch 1.7781ms 1.5470ms 646.4192 Ops/s 610.0646 Ops/s $\textbf{\color{#35bf28}+5.96\%}$
test_instantiation_td 1.5615ms 1.0509ms 951.5867 Ops/s 925.0336 Ops/s $\color{#35bf28}+2.87\%$
test_exec_functorch 0.2057ms 0.1496ms 6.6826 KOps/s 6.4936 KOps/s $\color{#35bf28}+2.91\%$
test_exec_functional_call 0.3398ms 0.1395ms 7.1665 KOps/s 6.9329 KOps/s $\color{#35bf28}+3.37\%$
test_exec_td 0.1709ms 0.1384ms 7.2250 KOps/s 6.9758 KOps/s $\color{#35bf28}+3.57\%$
test_exec_td_decorator 0.7794ms 0.2129ms 4.6964 KOps/s 4.6183 KOps/s $\color{#35bf28}+1.69\%$
test_vmap_mlp_speed[True-True] 0.8617ms 0.5890ms 1.6977 KOps/s 1.7055 KOps/s $\color{#d91a1a}-0.46\%$
test_vmap_mlp_speed[True-False] 0.8185ms 0.5847ms 1.7103 KOps/s 1.6648 KOps/s $\color{#35bf28}+2.73\%$
test_vmap_mlp_speed[False-True] 0.7309ms 0.5141ms 1.9451 KOps/s 1.8398 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_vmap_mlp_speed[False-False] 0.7210ms 0.5135ms 1.9473 KOps/s 1.8400 KOps/s $\textbf{\color{#35bf28}+5.83\%}$
test_vmap_mlp_speed_decorator[True-True] 0.8445ms 0.6491ms 1.5407 KOps/s 1.3562 KOps/s $\textbf{\color{#35bf28}+13.60\%}$
test_vmap_mlp_speed_decorator[True-False] 0.9289ms 0.6471ms 1.5454 KOps/s 1.4893 KOps/s $\color{#35bf28}+3.77\%$
test_vmap_mlp_speed_decorator[False-True] 0.8554ms 0.5847ms 1.7102 KOps/s 1.6490 KOps/s $\color{#35bf28}+3.71\%$
test_vmap_mlp_speed_decorator[False-False] 0.7950ms 0.5706ms 1.7525 KOps/s 1.6521 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_vmap_transformer_speed[True-True] 7.8635ms 7.6356ms 130.9659 Ops/s 128.0857 Ops/s $\color{#35bf28}+2.25\%$
test_vmap_transformer_speed[True-False] 8.0721ms 7.6168ms 131.2895 Ops/s 125.2724 Ops/s $\color{#35bf28}+4.80\%$
test_vmap_transformer_speed[False-True] 7.7256ms 7.5313ms 132.7784 Ops/s 126.7172 Ops/s $\color{#35bf28}+4.78\%$
test_vmap_transformer_speed[False-False] 7.7883ms 7.5282ms 132.8339 Ops/s 127.7647 Ops/s $\color{#35bf28}+3.97\%$
test_vmap_transformer_speed_decorator[True-True] 19.4684ms 18.8277ms 53.1131 Ops/s 52.6337 Ops/s $\color{#35bf28}+0.91\%$
test_vmap_transformer_speed_decorator[True-False] 19.2824ms 18.6161ms 53.7169 Ops/s 52.3633 Ops/s $\color{#35bf28}+2.59\%$
test_vmap_transformer_speed_decorator[False-True] 18.7638ms 18.4193ms 54.2909 Ops/s 53.0988 Ops/s $\color{#35bf28}+2.25\%$
test_vmap_transformer_speed_decorator[False-False] 18.9552ms 18.3476ms 54.5030 Ops/s 53.0869 Ops/s $\color{#35bf28}+2.67\%$
test_to_module_speed[True] 2.9451ms 1.5524ms 644.1564 Ops/s 643.5436 Ops/s $\color{#35bf28}+0.10\%$
test_to_module_speed[False] 1.9805ms 1.5332ms 652.2299 Ops/s 651.5712 Ops/s $\color{#35bf28}+0.10\%$
test_tc_init 44.4110μs 27.2293μs 36.7252 KOps/s 48.5333 KOps/s $\textbf{\color{#d91a1a}-24.33\%}$
test_tc_init_nested 0.2594ms 55.4505μs 18.0341 KOps/s 22.9071 KOps/s $\textbf{\color{#d91a1a}-21.27\%}$
test_tc_first_layer_tensor 5.1906μs 0.3556μs 2.8123 MOps/s 2.7490 MOps/s $\color{#35bf28}+2.30\%$
test_tc_first_layer_nontensor 3.2062μs 0.3879μs 2.5778 MOps/s 2.5570 MOps/s $\color{#35bf28}+0.81\%$
test_tc_second_layer_tensor 44.6008μs 0.9649μs 1.0363 MOps/s 936.2695 KOps/s $\textbf{\color{#35bf28}+10.69\%}$
test_tc_second_layer_nontensor 10.9822μs 0.7963μs 1.2558 MOps/s 1.2241 MOps/s $\color{#35bf28}+2.59\%$
test_unbind 0.1124s 6.7782ms 147.5325 Ops/s 141.3793 Ops/s $\color{#35bf28}+4.35\%$
test_full_like 11.6937ms 11.0402ms 90.5782 Ops/s 75.6435 Ops/s $\textbf{\color{#35bf28}+19.74\%}$
test_zeros_like 8.2667ms 7.7743ms 128.6297 Ops/s 127.0539 Ops/s $\color{#35bf28}+1.24\%$
test_ones_like 8.5316ms 7.8983ms 126.6099 Ops/s 128.3566 Ops/s $\color{#d91a1a}-1.36\%$
test_clone 9.4258ms 9.1493ms 109.2978 Ops/s 109.0509 Ops/s $\color{#35bf28}+0.23\%$
test_squeeze 60.9410μs 11.3117μs 88.4038 KOps/s 88.7251 KOps/s $\color{#d91a1a}-0.36\%$
test_unsqueeze 96.3110μs 51.8558μs 19.2843 KOps/s 18.7260 KOps/s $\color{#35bf28}+2.98\%$
test_split 0.8669ms 97.4739μs 10.2592 KOps/s 9.9372 KOps/s $\color{#35bf28}+3.24\%$
test_permute 0.1524ms 0.1091ms 9.1669 KOps/s 8.7706 KOps/s $\color{#35bf28}+4.52\%$
test_stack 26.9726ms 26.4589ms 37.7945 Ops/s 37.7957 Ops/s $-0.00\%$
test_cat 26.4930ms 26.3789ms 37.9091 Ops/s 37.8367 Ops/s $\color{#35bf28}+0.19\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants