{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":522059239,"defaultBranch":"main","name":"streamvbyte64","ownerLogin":"mccullocht","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2022-08-06T21:39:27.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/87714736?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1690340381.0","currentOid":""},"activityList":{"items":[{"before":"595d13f0fb8e8b5a487b482cdf11137e4c86f142","after":null,"ref":"refs/heads/version0.2","pushedAt":"2023-07-26T02:59:41.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"}},{"before":"fc0b87c9d1b34ecf83702aaf26cec4901c997525","after":"ac86816bc9db4ff5deec213e11da74701bba61b0","ref":"refs/heads/main","pushedAt":"2023-07-26T02:59:40.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"Bump to version 0.2.0 (#9)\n\nREADME touchups since there are now `x86_64` implementations.\r\n\r\nUpgrade criterion.","shortMessageHtmlLink":"Bump to version 0.2.0 (#9)"}},{"before":null,"after":"595d13f0fb8e8b5a487b482cdf11137e4c86f142","ref":"refs/heads/version0.2","pushedAt":"2023-07-26T02:50:34.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"bump to version 0.2 with minor readme updates","shortMessageHtmlLink":"bump to version 0.2 with minor readme updates"}},{"before":"4ebddf7c021a35c2aa22b7e77d7beb4ab92d2594","after":null,"ref":"refs/heads/sse41-0124","pushedAt":"2023-07-22T20:19:50.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"}},{"before":"6e88a578cd7f6e3f0c100235f8d7ec9e7ef066d4","after":"fc0b87c9d1b34ecf83702aaf26cec4901c997525","ref":"refs/heads/main","pushedAt":"2023-07-22T20:19:49.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"SSE4.1 implementation for `Coder0124` (#8)\n\nThis is very similar to `Coder1234` with a slightly different encoding\r\nfunction. There is almost certainly an opportunity\r\nhere for refactoring to reduce code duplication.\r\n\r\n```\r\nCoder0124/encode/1 time: [495.63 ns 505.26 ns 517.97 ns] \r\n thrpt: [1.9769 Gelem/s 2.0267 Gelem/s 2.0661 Gelem/s]\r\n change:\r\n time: [-63.340% -61.981% -60.637%] (p = 0.00 < 0.05)\r\n thrpt: [+154.04% +163.03% +172.78%]\r\n Performance has improved.\r\nCoder0124/encode_deltas/1 \r\n time: [489.74 ns 504.03 ns 520.51 ns]\r\n thrpt: [1.9673 Gelem/s 2.0316 Gelem/s 2.0909 Gelem/s]\r\n change:\r\n time: [-67.880% -66.213% -64.493%] (p = 0.00 < 0.05)\r\n thrpt: [+181.63% +195.97% +211.33%]\r\n Performance has improved.\r\nCoder0124/decode/1 time: [302.90 ns 312.53 ns 324.79 ns] \r\n thrpt: [3.1528 Gelem/s 3.2765 Gelem/s 3.3806 Gelem/s]\r\n change:\r\n time: [-65.290% -63.181% -60.860%] (p = 0.00 < 0.05)\r\n thrpt: [+155.49% +171.60% +188.11%]\r\n Performance has improved.\r\nCoder0124/decode_deltas/1 \r\n time: [449.78 ns 461.84 ns 477.82 ns]\r\n thrpt: [2.1431 Gelem/s 2.2172 Gelem/s 2.2766 Gelem/s]\r\n change:\r\n time: [-53.030% -50.717% -48.166%] (p = 0.00 < 0.05)\r\n thrpt: [+92.923% +102.91% +112.90%]\r\n Performance has improved.\r\nCoder0124/skip_deltas/1 time: [354.54 ns 365.14 ns 378.66 ns] \r\n thrpt: [2.7043 Gelem/s 2.8044 Gelem/s 2.8882 Gelem/s]\r\n change:\r\n time: [-65.425% -63.972% -62.201%] (p = 0.00 < 0.05)\r\n thrpt: [+164.56% +177.56% +189.23%]\r\n Performance has improved.\r\nCoder0124/data_len/1 time: [87.136 ns 90.619 ns 94.750 ns] \r\n thrpt: [10.807 Gelem/s 11.300 Gelem/s 11.752 Gelem/s]\r\n change:\r\n time: [-6.5989% -1.7206% +3.0022%] (p = 0.51 > 0.05)\r\n thrpt: [-2.9147% +1.7507% +7.0652%]\r\n No change in performance detected.\r\nCoder0124/encode/2 time: [425.15 ns 438.14 ns 453.91 ns] \r\n thrpt: [2.2560 Gelem/s 2.3372 Gelem/s 2.4085 Gelem/s]\r\n change:\r\n time: [-67.636% -66.394% -65.128%] (p = 0.00 < 0.05)\r\n thrpt: [+186.76% +197.57% +208.99%]\r\n Performance has improved.\r\nCoder0124/encode_deltas/2 \r\n time: [494.15 ns 505.12 ns 516.98 ns]\r\n thrpt: [1.9807 Gelem/s 2.0272 Gelem/s 2.0722 Gelem/s]\r\n change:\r\n time: [-69.582% -67.844% -65.909%] (p = 0.00 < 0.05)\r\n thrpt: [+193.34% +210.98% +228.76%]\r\n Performance has improved.\r\nCoder0124/decode/2 time: [298.11 ns 306.27 ns 316.48 ns] \r\n thrpt: [3.2356 Gelem/s 3.3435 Gelem/s 3.4350 Gelem/s]\r\n change:\r\n time: [-65.815% -64.659% -63.622%] (p = 0.00 < 0.05)\r\n thrpt: [+174.89% +182.96% +192.53%]\r\n Performance has improved.\r\nCoder0124/decode_deltas/2 \r\n time: [519.41 ns 570.41 ns 624.84 ns]\r\n thrpt: [1.6388 Gelem/s 1.7952 Gelem/s 1.9715 Gelem/s]\r\n change:\r\n time: [-49.404% -46.497% -43.587%] (p = 0.00 < 0.05)\r\n thrpt: [+77.265% +86.906% +97.645%]\r\n Performance has improved.\r\nCoder0124/skip_deltas/2 time: [349.51 ns 358.46 ns 369.40 ns] \r\n thrpt: [2.7721 Gelem/s 2.8566 Gelem/s 2.9298 Gelem/s]\r\n change:\r\n time: [-66.624% -65.375% -64.128%] (p = 0.00 < 0.05)\r\n thrpt: [+178.77% +188.81% +199.61%]\r\n Performance has improved.\r\nCoder0124/data_len/2 time: [87.608 ns 91.542 ns 96.391 ns] \r\n thrpt: [10.623 Gelem/s 11.186 Gelem/s 11.688 Gelem/s]\r\n change:\r\n time: [-1.1902% +4.2728% +12.216%] (p = 0.26 > 0.05)\r\n thrpt: [-10.886% -4.0978% +1.2045%]\r\n No change in performance detected.\r\nCoder0124/encode/4 time: [428.08 ns 447.48 ns 471.31 ns] \r\n thrpt: [2.1727 Gelem/s 2.2884 Gelem/s 2.3921 Gelem/s]\r\n change:\r\n time: [-66.531% -65.270% -63.831%] (p = 0.00 < 0.05)\r\n thrpt: [+176.48% +187.94% +198.78%]\r\n Performance has improved.\r\nCoder0124/encode_deltas/4 \r\n time: [502.98 ns 519.48 ns 541.03 ns]\r\n thrpt: [1.8927 Gelem/s 1.9712 Gelem/s 2.0359 Gelem/s]\r\n change:\r\n time: [-65.173% -64.111% -62.987%] (p = 0.00 < 0.05)\r\n thrpt: [+170.18% +178.64% +187.14%]\r\n Performance has improved.\r\nCoder0124/decode/4 time: [298.85 ns 308.68 ns 320.81 ns] \r\n thrpt: [3.1919 Gelem/s 3.3173 Gelem/s 3.4264 Gelem/s]\r\n change:\r\n time: [-66.938% -65.830% -64.676%] (p = 0.00 < 0.05)\r\n thrpt: [+183.09% +192.65% +202.46%]\r\n Performance has improved.\r\nCoder0124/decode_deltas/4 \r\n time: [454.63 ns 469.77 ns 490.02 ns]\r\n thrpt: [2.0897 Gelem/s 2.1798 Gelem/s 2.2524 Gelem/s]\r\n change:\r\n time: [-52.498% -50.700% -48.578%] (p = 0.00 < 0.05)\r\n thrpt: [+94.468% +102.84% +110.52%]\r\n Performance has improved.\r\nCoder0124/skip_deltas/4 time: [344.65 ns 350.56 ns 358.09 ns] \r\n thrpt: [2.8596 Gelem/s 2.9211 Gelem/s 2.9712 Gelem/s]\r\n change:\r\n time: [-66.208% -65.153% -63.689%] (p = 0.00 < 0.05)\r\n thrpt: [+175.40% +186.97% +195.93%]\r\n Performance has improved.\r\nCoder0124/data_len/4 time: [86.205 ns 88.624 ns 91.513 ns] \r\n thrpt: [11.190 Gelem/s 11.554 Gelem/s 11.879 Gelem/s]\r\n change:\r\n time: [-6.6957% -2.3136% +2.0511%] (p = 0.32 > 0.05)\r\n thrpt: [-2.0099% +2.3684% +7.1762%]\r\n No change in performance detected.\r\n```","shortMessageHtmlLink":"SSE4.1 implementation for Coder0124 (#8)"}},{"before":null,"after":"4ebddf7c021a35c2aa22b7e77d7beb4ab92d2594","ref":"refs/heads/sse41-0124","pushedAt":"2023-07-22T20:11:53.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"completed implementation","shortMessageHtmlLink":"completed implementation"}},{"before":null,"after":"53528dc5966bdfdbc8e9b6cb847d0a12a138845f","ref":"refs/heads/avx2-1248","pushedAt":"2023-07-22T17:53:58.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"delta coding. this all performs too poorly to merge though","shortMessageHtmlLink":"delta coding. this all performs too poorly to merge though"}},{"before":"ec01a59df758a9b8054f5cd51b5af80a38d4a4f0","after":null,"ref":"refs/heads/sse4.1-1248","pushedAt":"2023-07-18T04:21:11.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"}},{"before":"51321c5eb886b98f55aae3508adc272efdcb8e2a","after":"6e88a578cd7f6e3f0c100235f8d7ec9e7ef066d4","ref":"refs/heads/main","pushedAt":"2023-07-18T04:21:10.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"Add an SSE4.1 accelerated implementation of `Coder1248` (#7)\n\nThis requires SSE4.1 instead of SSSE3 like `Coder1234` because of a few\r\ninstructions used in `encode()`. Otherwise this\r\nis pretty similar but requires dividing each group in half (2 instead of\r\n4) in order to perform all operations ~doubling the\r\nnumber of instructions run.\r\n\r\n```\r\nCoder1248/encode/1 time: [688.43 ns 712.29 ns 742.73 ns] \r\n thrpt: [1.3787 Gelem/s 1.4376 Gelem/s 1.4874 Gelem/s]\r\n change:\r\n time: [-59.800% -57.094% -54.556%] (p = 0.00 < 0.05)\r\n thrpt: [+120.05% +133.07% +148.76%]\r\n Performance has improved.\r\nCoder1248/encode_deltas/1 \r\n time: [846.81 ns 879.11 ns 918.99 ns]\r\n thrpt: [1.1143 Gelem/s 1.1648 Gelem/s 1.2092 Gelem/s]\r\n change:\r\n time: [-60.406% -59.050% -57.418%] (p = 0.00 < 0.05)\r\n thrpt: [+134.84% +144.20% +152.57%]\r\n Performance has improved.\r\nCoder1248/decode/1 time: [553.25 ns 569.95 ns 589.73 ns] \r\n thrpt: [1.7364 Gelem/s 1.7966 Gelem/s 1.8509 Gelem/s]\r\n change:\r\n time: [-32.638% -29.495% -26.343%] (p = 0.00 < 0.05)\r\n thrpt: [+35.765% +41.835% +48.452%]\r\n Performance has improved.\r\nCoder1248/decode_deltas/1 \r\n time: [651.41 ns 673.40 ns 700.64 ns]\r\n thrpt: [1.4615 Gelem/s 1.5206 Gelem/s 1.5720 Gelem/s]\r\n change:\r\n time: [-37.844% -33.409% -29.202%] (p = 0.00 < 0.05)\r\n thrpt: [+41.248% +50.171% +60.884%]\r\n Performance has improved.\r\nCoder1248/skip_deltas/1 time: [542.16 ns 561.37 ns 587.57 ns] \r\n thrpt: [1.7428 Gelem/s 1.8241 Gelem/s 1.8887 Gelem/s]\r\n change:\r\n time: [-42.197% -39.683% -37.328%] (p = 0.00 < 0.05)\r\n thrpt: [+59.560% +65.790% +73.002%]\r\n Performance has improved.\r\nCoder1248/data_len/1 time: [86.656 ns 88.934 ns 91.883 ns] \r\n thrpt: [11.145 Gelem/s 11.514 Gelem/s 11.817 Gelem/s]\r\n change:\r\n time: [-19.713% -15.441% -11.380%] (p = 0.00 < 0.05)\r\n thrpt: [+12.841% +18.261% +24.553%]\r\n Performance has improved.\r\nCoder1248/encode/4 time: [708.16 ns 732.35 ns 763.91 ns] \r\n thrpt: [1.3405 Gelem/s 1.3982 Gelem/s 1.4460 Gelem/s]\r\n change:\r\n time: [-76.996% -76.313% -75.636%] (p = 0.00 < 0.05)\r\n thrpt: [+310.45% +322.18% +334.71%]\r\n Performance has improved.\r\nCoder1248/encode_deltas/4 \r\n time: [885.67 ns 920.30 ns 960.93 ns]\r\n thrpt: [1.0656 Gelem/s 1.1127 Gelem/s 1.1562 Gelem/s]\r\n change:\r\n time: [-65.619% -63.226% -59.395%] (p = 0.00 < 0.05)\r\n thrpt: [+146.28% +171.93% +190.86%]\r\n Performance has improved.\r\nCoder1248/decode/4 time: [564.93 ns 592.38 ns 628.11 ns] \r\n thrpt: [1.6303 Gelem/s 1.7286 Gelem/s 1.8126 Gelem/s]\r\n change:\r\n time: [-36.025% -30.883% -25.997%] (p = 0.00 < 0.05)\r\n thrpt: [+35.129% +44.681% +56.311%]\r\n Performance has improved.\r\nCoder1248/decode_deltas/4 \r\n time: [664.58 ns 689.44 ns 721.52 ns]\r\n thrpt: [1.4192 Gelem/s 1.4853 Gelem/s 1.5408 Gelem/s]\r\n change:\r\n time: [-30.291% -27.265% -24.007%] (p = 0.00 < 0.05)\r\n thrpt: [+31.590% +37.485% +43.454%]\r\n Performance has improved.\r\nCoder1248/skip_deltas/4 time: [565.87 ns 592.91 ns 637.67 ns] \r\n thrpt: [1.6059 Gelem/s 1.7271 Gelem/s 1.8096 Gelem/s]\r\n change:\r\n time: [-40.827% -36.088% -30.983%] (p = 0.00 < 0.05)\r\n thrpt: [+44.891% +56.464% +68.997%]\r\n Performance has improved.\r\nCoder1248/data_len/4 time: [87.497 ns 91.482 ns 96.599 ns] \r\n thrpt: [10.601 Gelem/s 11.194 Gelem/s 11.703 Gelem/s]\r\n change:\r\n time: [-15.891% -12.125% -8.3843%] (p = 0.00 < 0.05)\r\n thrpt: [+9.1516% +13.798% +18.893%]\r\n Performance has improved.\r\nCoder1248/encode/8 time: [691.03 ns 708.77 ns 730.76 ns] \r\n thrpt: [1.4013 Gelem/s 1.4448 Gelem/s 1.4818 Gelem/s]\r\n change:\r\n time: [-74.611% -73.773% -72.896%] (p = 0.00 < 0.05)\r\n thrpt: [+268.95% +281.28% +293.88%]\r\n Performance has improved.\r\nCoder1248/encode_deltas/8 \r\n time: [898.42 ns 933.04 ns 973.62 ns]\r\n thrpt: [1.0517 Gelem/s 1.0975 Gelem/s 1.1398 Gelem/s]\r\n change:\r\n time: [-66.055% -63.955% -61.969%] (p = 0.00 < 0.05)\r\n thrpt: [+162.94% +177.43% +194.60%]\r\n Performance has improved.\r\nCoder1248/decode/8 time: [554.54 ns 571.99 ns 596.05 ns] \r\n thrpt: [1.7180 Gelem/s 1.7902 Gelem/s 1.8466 Gelem/s]\r\n change:\r\n time: [-35.713% -32.679% -29.763%] (p = 0.00 < 0.05)\r\n thrpt: [+42.375% +48.542% +55.552%]\r\n Performance has improved.\r\nCoder1248/decode_deltas/8 \r\n time: [660.32 ns 688.83 ns 729.63 ns]\r\n thrpt: [1.4035 Gelem/s 1.4866 Gelem/s 1.5508 Gelem/s]\r\n change:\r\n time: [-30.349% -27.541% -24.244%] (p = 0.00 < 0.05)\r\n thrpt: [+32.003% +38.010% +43.574%]\r\n Performance has improved.\r\nCoder1248/skip_deltas/8 time: [560.27 ns 591.64 ns 636.58 ns] \r\n thrpt: [1.6086 Gelem/s 1.7308 Gelem/s 1.8277 Gelem/s]\r\n change:\r\n time: [-44.146% -40.720% -37.144%] (p = 0.00 < 0.05)\r\n thrpt: [+59.093% +68.691% +79.037%]\r\n Performance has improved.\r\nCoder1248/data_len/8 time: [86.919 ns 91.829 ns 99.446 ns] \r\n thrpt: [10.297 Gelem/s 11.151 Gelem/s 11.781 Gelem/s]\r\n change:\r\n time: [-14.295% -9.5187% -3.7258%] (p = 0.00 < 0.05)\r\n thrpt: [+3.8699% +10.520% +16.679%]\r\n Performance has improved.\r\n```","shortMessageHtmlLink":"Add an SSE4.1 accelerated implementation of Coder1248 (#7)"}},{"before":null,"after":"ec01a59df758a9b8054f5cd51b5af80a38d4a4f0","ref":"refs/heads/sse4.1-1248","pushedAt":"2023-07-18T04:12:33.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"fix up TODO","shortMessageHtmlLink":"fix up TODO"}},{"before":"f7884153f9670175bbc9dee5b6ee79ae88e8e2c5","after":null,"ref":"refs/heads/shuffle-table-arch","pushedAt":"2023-07-16T22:55:29.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"}},{"before":"b75c28f34dc54d65d8eb00719b6e6c817760e72b","after":"51321c5eb886b98f55aae3508adc272efdcb8e2a","ref":"refs/heads/main","pushedAt":"2023-07-16T22:55:29.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"Refactor shuffle table generation to allow better cross-arch reuse (#6)\n\nThe current scheme works great for aarch64 and x86_64 ssse3 but would\r\nleave something to be desired for wider\r\ntypes on x86_64 (ssse3 for coder1248 or avx2/avx512). This also removes\r\nthe macros that define table generation\r\nfunctions in preference to a slightly clunkier const generics\r\nimplementation.","shortMessageHtmlLink":"Refactor shuffle table generation to allow better cross-arch reuse (#6)"}},{"before":"fb1318eb7a7a946f5b39ef4fcf04e3bfe2df23a8","after":"f7884153f9670175bbc9dee5b6ee79ae88e8e2c5","ref":"refs/heads/shuffle-table-arch","pushedAt":"2023-07-16T22:54:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"clippy fix","shortMessageHtmlLink":"clippy fix"}},{"before":"b240518108a7070fadb07361ad00fe912ae3ea64","after":"fb1318eb7a7a946f5b39ef4fcf04e3bfe2df23a8","ref":"refs/heads/shuffle-table-arch","pushedAt":"2023-07-16T22:49:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"fixup x86 code","shortMessageHtmlLink":"fixup x86 code"}},{"before":null,"after":"b240518108a7070fadb07361ad00fe912ae3ea64","ref":"refs/heads/shuffle-table-arch","pushedAt":"2023-07-16T22:44:02.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"refactor shuffle table generation to factor better for arch differences","shortMessageHtmlLink":"refactor shuffle table generation to factor better for arch differences"}},{"before":"3d4cccb4c5086395b96b7483ed2a613c7afb6855","after":null,"ref":"refs/heads/default-ssse3","pushedAt":"2023-07-16T16:53:03.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"}},{"before":"d7e9fbd35ec79cdedec15bbb163f727df67148e4","after":"b75c28f34dc54d65d8eb00719b6e6c817760e72b","ref":"refs/heads/main","pushedAt":"2023-07-16T16:53:02.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"Set configuration to target ssse3 by default on x86_64 targets (#5)\n\nTarget this feature as we'd like to generate native instructions\r\n(instead of emulated calls) in all cases. There is a runtime\r\nswitch to prevent use of ssse3 code paths if those instructions are not\r\navailable. This also ensures that we will run\r\nappropriate tests on hosts that have the feature.","shortMessageHtmlLink":"Set configuration to target ssse3 by default on x86_64 targets (#5)"}},{"before":null,"after":"3d4cccb4c5086395b96b7483ed2a613c7afb6855","ref":"refs/heads/default-ssse3","pushedAt":"2023-07-16T16:48:47.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"fix .cargo config","shortMessageHtmlLink":"fix .cargo config"}},{"before":"64efed1b422ba92feea89db72aa4e9e56e0cfb60","after":"87bf40c3a20afda084116b0e769c55e77229e524","ref":"refs/heads/encode-1248-narrow","pushedAt":"2023-07-14T23:13:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"slightly faster version where we use cmpgt instead of min","shortMessageHtmlLink":"slightly faster version where we use cmpgt instead of min"}},{"before":"6be7bfba7ead2bc0a87203d4d069d1e42127a149","after":null,"ref":"refs/heads/avx512","pushedAt":"2023-07-14T18:33:28.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"}},{"before":null,"after":"64efed1b422ba92feea89db72aa4e9e56e0cfb60","ref":"refs/heads/encode-1248-narrow","pushedAt":"2023-07-14T18:02:01.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"compute_tags() by narrowing approach instead of clz\n\nthis is roughly 15% slower than the existing approach _but_ could easily\nbe ported to ssse3.","shortMessageHtmlLink":"compute_tags() by narrowing approach instead of clz"}},{"before":"d7e9fbd35ec79cdedec15bbb163f727df67148e4","after":"6be7bfba7ead2bc0a87203d4d069d1e42127a149","ref":"refs/heads/avx512","pushedAt":"2023-07-09T03:08:50.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"avx512 instrinsics are not stabilized","shortMessageHtmlLink":"avx512 instrinsics are not stabilized"}},{"before":null,"after":"d7e9fbd35ec79cdedec15bbb163f727df67148e4","ref":"refs/heads/avx512","pushedAt":"2023-07-08T21:54:26.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"Add an SSSE3 implementation for `Coder1234` (#4)\n\nThis uses a similar approach to the neon implementation using pshufb\r\ninstead of vtbl instructions. Encoding looks very\r\ndifferent as SSSE3 and earlier vector extensions do not have clz/lzcnt\r\ninstructions. SSSE3 is also not default enabled on\r\nx86_64 builds so this is only enabled with the SSSE3 target feature --\r\nthere were problems in testing where ssse3 was\r\nnot enabled in code generation but triggered at runtime and was replaced\r\nwith some sort of emulation for shuffle and\r\nalignr intrinsics at a significant penalty.\r\n\r\nThis was developed on a github codespace instance so YMMV on benchmark\r\nnumbers. Improvements are substantial\r\nfor every benchmarked code path but `data_len` (which does not feature\r\nany optimization).\r\n```\r\nRUSTFLAGS=\"-Ctarget-feature=+ssse3\" cargo bench --bench=streamvbyte -- --baseline=main 'Coder1234/'\r\nCoder1234/encode/1 time: [414.00 ns 425.79 ns 441.38 ns] \r\n thrpt: [2.3200 Gelem/s 2.4049 Gelem/s 2.4734 Gelem/s]\r\n change:\r\n time: [-72.014% -71.049% -70.189%] (p = 0.00 < 0.05)\r\n thrpt: [+235.45% +245.41% +257.32%]\r\n Performance has improved.\r\nCoder1234/encode_deltas/1 \r\n time: [511.72 ns 521.31 ns 532.06 ns]\r\n thrpt: [1.9246 Gelem/s 1.9643 Gelem/s 2.0011 Gelem/s]\r\n change:\r\n time: [-67.935% -66.592% -65.348%] (p = 0.00 < 0.05)\r\n thrpt: [+188.58% +199.33% +211.87%]\r\n Performance has improved.\r\nCoder1234/decode/1 time: [338.60 ns 344.23 ns 351.39 ns] \r\n thrpt: [2.9141 Gelem/s 2.9748 Gelem/s 3.0242 Gelem/s]\r\n change:\r\n time: [-62.644% -60.490% -58.517%] (p = 0.00 < 0.05)\r\n thrpt: [+141.06% +153.10% +167.70%]\r\n Performance has improved.\r\nCoder1234/decode_deltas/1 \r\n time: [495.19 ns 507.90 ns 523.47 ns]\r\n thrpt: [1.9562 Gelem/s 2.0162 Gelem/s 2.0679 Gelem/s]\r\n change:\r\n time: [-47.585% -45.826% -44.073%] (p = 0.00 < 0.05)\r\n thrpt: [+78.804% +84.592% +90.784%]\r\n Performance has improved.\r\nCoder1234/skip_deltas/1 time: [392.73 ns 404.68 ns 420.36 ns] \r\n thrpt: [2.4360 Gelem/s 2.5304 Gelem/s 2.6074 Gelem/s]\r\n change:\r\n time: [-61.945% -60.753% -59.573%] (p = 0.00 < 0.05)\r\n thrpt: [+147.36% +154.80% +162.78%]\r\n Performance has improved.\r\nCoder1234/encode/2 time: [413.81 ns 426.02 ns 440.83 ns] \r\n thrpt: [2.3229 Gelem/s 2.4037 Gelem/s 2.4746 Gelem/s]\r\n change:\r\n time: [-71.303% -70.341% -69.309%] (p = 0.00 < 0.05)\r\n thrpt: [+225.83% +237.17% +248.47%]\r\n Performance has improved.\r\nCoder1234/encode_deltas/2 \r\n time: [496.66 ns 511.79 ns 531.95 ns]\r\n thrpt: [1.9250 Gelem/s 2.0008 Gelem/s 2.0618 Gelem/s]\r\n change:\r\n time: [-70.600% -69.049% -67.483%] (p = 0.00 < 0.05)\r\n thrpt: [+207.53% +223.09% +240.14%]\r\n Performance has improved.\r\nCoder1234/decode/2 time: [327.11 ns 336.62 ns 348.75 ns] \r\n thrpt: [2.9362 Gelem/s 3.0420 Gelem/s 3.1304 Gelem/s]\r\n change:\r\n time: [-62.626% -61.076% -59.326%] (p = 0.00 < 0.05)\r\n thrpt: [+145.86% +156.91% +167.56%]\r\n Performance has improved.\r\nCoder1234/decode_deltas/2 \r\n time: [479.86 ns 493.10 ns 509.24 ns]\r\n thrpt: [2.0108 Gelem/s 2.0767 Gelem/s 2.1340 Gelem/s]\r\n change:\r\n time: [-54.108% -51.289% -47.936%] (p = 0.00 < 0.05)\r\n thrpt: [+92.071% +105.29% +117.90%]\r\n Performance has improved.\r\nCoder1234/skip_deltas/2 time: [379.07 ns 385.22 ns 392.71 ns] \r\n thrpt: [2.6076 Gelem/s 2.6582 Gelem/s 2.7014 Gelem/s]\r\n change:\r\n time: [-64.025% -62.386% -60.994%] (p = 0.00 < 0.05)\r\n thrpt: [+156.37% +165.86% +177.97%]\r\n Performance has improved.\r\nCoder1234/encode/4 time: [418.62 ns 432.87 ns 451.15 ns] \r\n thrpt: [2.2698 Gelem/s 2.3656 Gelem/s 2.4461 Gelem/s]\r\n change:\r\n time: [-69.173% -66.468% -62.986%] (p = 0.00 < 0.05)\r\n thrpt: [+170.17% +198.22% +224.39%]\r\n Performance has improved.\r\nCoder1234/encode_deltas/4 \r\n time: [503.00 ns 517.36 ns 534.48 ns]\r\n thrpt: [1.9159 Gelem/s 1.9793 Gelem/s 2.0358 Gelem/s]\r\n change:\r\n time: [-67.702% -66.635% -65.383%] (p = 0.00 < 0.05)\r\n thrpt: [+188.87% +199.72% +209.62%]\r\n Performance has improved.\r\nCoder1234/decode/4 time: [314.98 ns 322.51 ns 331.85 ns] \r\n thrpt: [3.0857 Gelem/s 3.1751 Gelem/s 3.2510 Gelem/s]\r\n change:\r\n time: [-66.142% -64.236% -62.387%] (p = 0.00 < 0.05)\r\n thrpt: [+165.86% +179.61% +195.35%]\r\n Performance has improved.\r\nCoder1234/decode_deltas/4 \r\n time: [473.51 ns 484.47 ns 498.28 ns]\r\n thrpt: [2.0551 Gelem/s 2.1136 Gelem/s 2.1626 Gelem/s]\r\n change:\r\n time: [-50.860% -48.787% -46.571%] (p = 0.00 < 0.05)\r\n thrpt: [+87.164% +95.263% +103.50%]\r\n Performance has improved.\r\nCoder1234/skip_deltas/4 time: [363.59 ns 370.52 ns 378.94 ns] \r\n thrpt: [2.7023 Gelem/s 2.7637 Gelem/s 2.8164 Gelem/s]\r\n change:\r\n time: [-64.791% -63.940% -63.090%] (p = 0.00 < 0.05)\r\n thrpt: [+170.93% +177.32% +184.01%]\r\n Performance has improved.\r\n```","shortMessageHtmlLink":"Add an SSSE3 implementation for Coder1234 (#4)"}},{"before":"307da545e3a5c885e04913bc7eb252bee4191005","after":null,"ref":"refs/heads/fix-non-aarch64-build","pushedAt":"2023-07-08T21:54:16.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"}},{"before":"a0ce90b2126a6aaa3250bfcdcb73a2ad5c943876","after":null,"ref":"refs/heads/ssse3","pushedAt":"2023-07-07T05:08:34.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"}},{"before":"bd55c59881419982672fb9ef86176b4412e5b6a3","after":"d7e9fbd35ec79cdedec15bbb163f727df67148e4","ref":"refs/heads/main","pushedAt":"2023-07-07T05:08:33.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"Add an SSSE3 implementation for `Coder1234` (#4)\n\nThis uses a similar approach to the neon implementation using pshufb\r\ninstead of vtbl instructions. Encoding looks very\r\ndifferent as SSSE3 and earlier vector extensions do not have clz/lzcnt\r\ninstructions. SSSE3 is also not default enabled on\r\nx86_64 builds so this is only enabled with the SSSE3 target feature --\r\nthere were problems in testing where ssse3 was\r\nnot enabled in code generation but triggered at runtime and was replaced\r\nwith some sort of emulation for shuffle and\r\nalignr intrinsics at a significant penalty.\r\n\r\nThis was developed on a github codespace instance so YMMV on benchmark\r\nnumbers. Improvements are substantial\r\nfor every benchmarked code path but `data_len` (which does not feature\r\nany optimization).\r\n```\r\nRUSTFLAGS=\"-Ctarget-feature=+ssse3\" cargo bench --bench=streamvbyte -- --baseline=main 'Coder1234/'\r\nCoder1234/encode/1 time: [414.00 ns 425.79 ns 441.38 ns] \r\n thrpt: [2.3200 Gelem/s 2.4049 Gelem/s 2.4734 Gelem/s]\r\n change:\r\n time: [-72.014% -71.049% -70.189%] (p = 0.00 < 0.05)\r\n thrpt: [+235.45% +245.41% +257.32%]\r\n Performance has improved.\r\nCoder1234/encode_deltas/1 \r\n time: [511.72 ns 521.31 ns 532.06 ns]\r\n thrpt: [1.9246 Gelem/s 1.9643 Gelem/s 2.0011 Gelem/s]\r\n change:\r\n time: [-67.935% -66.592% -65.348%] (p = 0.00 < 0.05)\r\n thrpt: [+188.58% +199.33% +211.87%]\r\n Performance has improved.\r\nCoder1234/decode/1 time: [338.60 ns 344.23 ns 351.39 ns] \r\n thrpt: [2.9141 Gelem/s 2.9748 Gelem/s 3.0242 Gelem/s]\r\n change:\r\n time: [-62.644% -60.490% -58.517%] (p = 0.00 < 0.05)\r\n thrpt: [+141.06% +153.10% +167.70%]\r\n Performance has improved.\r\nCoder1234/decode_deltas/1 \r\n time: [495.19 ns 507.90 ns 523.47 ns]\r\n thrpt: [1.9562 Gelem/s 2.0162 Gelem/s 2.0679 Gelem/s]\r\n change:\r\n time: [-47.585% -45.826% -44.073%] (p = 0.00 < 0.05)\r\n thrpt: [+78.804% +84.592% +90.784%]\r\n Performance has improved.\r\nCoder1234/skip_deltas/1 time: [392.73 ns 404.68 ns 420.36 ns] \r\n thrpt: [2.4360 Gelem/s 2.5304 Gelem/s 2.6074 Gelem/s]\r\n change:\r\n time: [-61.945% -60.753% -59.573%] (p = 0.00 < 0.05)\r\n thrpt: [+147.36% +154.80% +162.78%]\r\n Performance has improved.\r\nCoder1234/encode/2 time: [413.81 ns 426.02 ns 440.83 ns] \r\n thrpt: [2.3229 Gelem/s 2.4037 Gelem/s 2.4746 Gelem/s]\r\n change:\r\n time: [-71.303% -70.341% -69.309%] (p = 0.00 < 0.05)\r\n thrpt: [+225.83% +237.17% +248.47%]\r\n Performance has improved.\r\nCoder1234/encode_deltas/2 \r\n time: [496.66 ns 511.79 ns 531.95 ns]\r\n thrpt: [1.9250 Gelem/s 2.0008 Gelem/s 2.0618 Gelem/s]\r\n change:\r\n time: [-70.600% -69.049% -67.483%] (p = 0.00 < 0.05)\r\n thrpt: [+207.53% +223.09% +240.14%]\r\n Performance has improved.\r\nCoder1234/decode/2 time: [327.11 ns 336.62 ns 348.75 ns] \r\n thrpt: [2.9362 Gelem/s 3.0420 Gelem/s 3.1304 Gelem/s]\r\n change:\r\n time: [-62.626% -61.076% -59.326%] (p = 0.00 < 0.05)\r\n thrpt: [+145.86% +156.91% +167.56%]\r\n Performance has improved.\r\nCoder1234/decode_deltas/2 \r\n time: [479.86 ns 493.10 ns 509.24 ns]\r\n thrpt: [2.0108 Gelem/s 2.0767 Gelem/s 2.1340 Gelem/s]\r\n change:\r\n time: [-54.108% -51.289% -47.936%] (p = 0.00 < 0.05)\r\n thrpt: [+92.071% +105.29% +117.90%]\r\n Performance has improved.\r\nCoder1234/skip_deltas/2 time: [379.07 ns 385.22 ns 392.71 ns] \r\n thrpt: [2.6076 Gelem/s 2.6582 Gelem/s 2.7014 Gelem/s]\r\n change:\r\n time: [-64.025% -62.386% -60.994%] (p = 0.00 < 0.05)\r\n thrpt: [+156.37% +165.86% +177.97%]\r\n Performance has improved.\r\nCoder1234/encode/4 time: [418.62 ns 432.87 ns 451.15 ns] \r\n thrpt: [2.2698 Gelem/s 2.3656 Gelem/s 2.4461 Gelem/s]\r\n change:\r\n time: [-69.173% -66.468% -62.986%] (p = 0.00 < 0.05)\r\n thrpt: [+170.17% +198.22% +224.39%]\r\n Performance has improved.\r\nCoder1234/encode_deltas/4 \r\n time: [503.00 ns 517.36 ns 534.48 ns]\r\n thrpt: [1.9159 Gelem/s 1.9793 Gelem/s 2.0358 Gelem/s]\r\n change:\r\n time: [-67.702% -66.635% -65.383%] (p = 0.00 < 0.05)\r\n thrpt: [+188.87% +199.72% +209.62%]\r\n Performance has improved.\r\nCoder1234/decode/4 time: [314.98 ns 322.51 ns 331.85 ns] \r\n thrpt: [3.0857 Gelem/s 3.1751 Gelem/s 3.2510 Gelem/s]\r\n change:\r\n time: [-66.142% -64.236% -62.387%] (p = 0.00 < 0.05)\r\n thrpt: [+165.86% +179.61% +195.35%]\r\n Performance has improved.\r\nCoder1234/decode_deltas/4 \r\n time: [473.51 ns 484.47 ns 498.28 ns]\r\n thrpt: [2.0551 Gelem/s 2.1136 Gelem/s 2.1626 Gelem/s]\r\n change:\r\n time: [-50.860% -48.787% -46.571%] (p = 0.00 < 0.05)\r\n thrpt: [+87.164% +95.263% +103.50%]\r\n Performance has improved.\r\nCoder1234/skip_deltas/4 time: [363.59 ns 370.52 ns 378.94 ns] \r\n thrpt: [2.7023 Gelem/s 2.7637 Gelem/s 2.8164 Gelem/s]\r\n change:\r\n time: [-64.791% -63.940% -63.090%] (p = 0.00 < 0.05)\r\n thrpt: [+170.93% +177.32% +184.01%]\r\n Performance has improved.\r\n```","shortMessageHtmlLink":"Add an SSSE3 implementation for Coder1234 (#4)"}},{"before":"aeb1c46b7a0653fb7ebf762af16db36debf84c2d","after":"a0ce90b2126a6aaa3250bfcdcb73a2ad5c943876","ref":"refs/heads/ssse3","pushedAt":"2023-07-07T05:06:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"make data_len 80% faster with a simd-like approach","shortMessageHtmlLink":"make data_len 80% faster with a simd-like approach"}},{"before":"f0de7efba74e4e07a25d1e03313c631cdda92235","after":"aeb1c46b7a0653fb7ebf762af16db36debf84c2d","ref":"refs/heads/ssse3","pushedAt":"2023-07-07T04:21:38.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"add direct reference to source of encode tag generation","shortMessageHtmlLink":"add direct reference to source of encode tag generation"}},{"before":"e987827cc20ae71f161c7e76602f8acb2be8c89e","after":"f0de7efba74e4e07a25d1e03313c631cdda92235","ref":"refs/heads/ssse3","pushedAt":"2023-07-06T16:49:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"Set flags so that we don't compile the ssse3 implementation without that\ntarget feature being enabled. Without this we would generate code that\ncalls emulation functions(?) for ssse3 features whenever run on a cpu\nthat supports ssse3.\n\nGetting this right yields 100-200% improvements on all encode and decode\nbenchmarks.\n\nUse RUSTFLAGS=\"-Ctarget-cpu=native\" to use the sss3 impl.","shortMessageHtmlLink":"Set flags so that we don't compile the ssse3 implementation without that"}},{"before":"cd1a93e946b72a0494435a0e00e4bde3f5bec1a1","after":"e987827cc20ae71f161c7e76602f8acb2be8c89e","ref":"refs/heads/ssse3","pushedAt":"2023-07-05T18:05:55.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mccullocht","name":"Trevor McCulloch","path":"/mccullocht","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/87714736?s=80&v=4"},"commit":{"message":"fix shuffle table generator location for neon","shortMessageHtmlLink":"fix shuffle table generator location for neon"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADXTDnzAA","startCursor":null,"endCursor":null}},"title":"Activity ยท mccullocht/streamvbyte64"}