-
-
Couldn't load subscription status.
- Fork 33.6k
buffer: speed up concat via TypedArray#set #60399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #60399 +/- ##
==========================================
+ Coverage 88.56% 88.69% +0.12%
==========================================
Files 704 704
Lines 207774 208663 +889
Branches 40025 40676 +651
==========================================
+ Hits 184022 185065 +1043
+ Misses 15807 15677 -130
+ Partials 7945 7921 -24
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
Benchmark GHA: https://github.com/aduh95/node/actions/runs/18809427089 Results (improvements across the board) |
|
@aduh95 the CI didn't pick up the benchmarks, I think the category is |
This reverts commit 34adb7c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more context
Here are the bench results on 22, 24, 25, and this PR
This PR almost fixes the perf regression in Buffer.concat from 22 to 24
It looks good, but what exactly caused the regression in the first place?
chalker@macbook-air node % nvm use 22
Now using node v22.21.0 (npm v10.9.4)
chalker@macbook-air node % node benchmark/buffers/buffer-concat.js
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=4: 15,549,190.105855972
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=4: 14,168,246.61802625
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=4: 15,250,695.217239005
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=4: 14,881,656.04854307
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=4: 10,125,850.525561389
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=4: 10,183,947.552670104
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=16: 6,768,836.156496439
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=16: 7,270,903.994401986
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=16: 6,108,494.482353465
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=16: 6,542,388.544277659
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=16: 2,367,896.0628124923
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=16: 2,437,196.804720441
chalker@macbook-air node % nvm use 24
Now using node v24.10.0 (npm v11.6.1)
chalker@macbook-air node % node benchmark/buffers/buffer-concat.js
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=4: 8,266,971.511241144
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=4: 8,450,827.030151948
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=4: 7,980,955.445069204
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=4: 7,957,282.680666895
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=4: 6,043,342.063646001
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=4: 6,059,500.523097745
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=16: 2,634,699.341399924
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=16: 2,642,643.5743133803
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=16: 2,489,485.812264763
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=16: 2,499,791.361163519
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=16: 1,448,505.2980656528
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=16: 1,455,386.8242531868
chalker@macbook-air node % nvm use 25
Now using node v25.0.0 (npm v11.6.2)
chalker@macbook-air node % node benchmark/buffers/buffer-concat.js
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=4: 8,443,602.204506326
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=4: 8,741,819.91031439
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=4: 8,261,916.466778204
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=4: 8,200,330.6332309665
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=4: 6,053,314.568058172
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=4: 5,747,503.203784013
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=16: 2,849,496.7008776525
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=16: 2,931,439.135078883
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=16: 2,680,797.822186869
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=16: 2,744,654.5057383967
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=16: 1,494,425.0957491235
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=16: 1,482,919.7763571613
chalker@macbook-air node % ./out/Release/node.1 benchmark/buffers/buffer-concat.js
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=4: 12,080,759.879796438
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=4: 14,726,388.307247685
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=4: 13,503,990.007047394
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=4: 13,297,236.875731088
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=4: 8,745,435.374665165
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=4: 8,914,795.383326115
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=16: 5,195,659.914599197
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=16: 5,432,054.883146755
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=16: 4,640,642.64269006
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=16: 4,887,798.298893448
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=16: 2,009,977.447651042
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=16: 2,030,030.9224206505|
This is strange, the situation makes little sense This PR claims
Reverting #54087 changes in What is happening here? |
|
No, they claimed an improvement for However,
Funnily enough, we weren't using primordials here before, but anyway, that change is already made |
|
@gurgunday Ah, I missed this being a different benchmark, thanks! But it still doesn't hold First is this branch, second - reverting #54087 changes in I see a ~2x improvement in non-partial case from revert chalker@macbook-air node % ./out/Release/node.1 benchmark/buffers/buffer-copy.js
buffers/buffer-copy.js n=6000000 partial="true" bytes=8: 45,410,725.82422833
buffers/buffer-copy.js n=6000000 partial="false" bytes=8: 51,199,380.91073919
buffers/buffer-copy.js n=6000000 partial="true" bytes=128: 52,721,200.82657353
buffers/buffer-copy.js n=6000000 partial="false" bytes=128: 51,513,089.690728284
buffers/buffer-copy.js n=6000000 partial="true" bytes=1024: 42,652,878.82975401
buffers/buffer-copy.js n=6000000 partial="false" bytes=1024: 35,971,914.85553271
chalker@macbook-air node % ./out/Release/node benchmark/buffers/buffer-copy.js
buffers/buffer-copy.js n=6000000 partial="true" bytes=8: 30,556,627.159494136
buffers/buffer-copy.js n=6000000 partial="false" bytes=8: 104,728,111.1614892
buffers/buffer-copy.js n=6000000 partial="true" bytes=128: 33,079,395.312182166
buffers/buffer-copy.js n=6000000 partial="false" bytes=128: 96,758,521.7890999
buffers/buffer-copy.js n=6000000 partial="true" bytes=1024: 28,040,817.260461506
buffers/buffer-copy.js n=6000000 partial="false" bytes=1024: 55,677,198.893486194 |
|
How about this? Instead of this PR, apply this diff to the base branch diff --git a/lib/buffer.js b/lib/buffer.js
index c9f45d33388..f82b6825712 100644
--- a/lib/buffer.js
+++ b/lib/buffer.js
@@ -252,7 +252,11 @@ function _copyActual(source, target, targetStart, sourceStart, sourceEnd) {
if (nb <= 0)
return 0;
- _copy(source, target, targetStart, sourceStart, nb);
+ if (sourceStart === 0 && sourceEnd === sourceLen) {
+ TypedArrayPrototypeSet(target, source, targetStart);
+ } else {
+ _copy(source, target, targetStart, sourceStart, nb);
+ }
return nb;
}Might need an extra check to ensure that's an uint8arr Results ( chalker@macbook-air node % ./out/Release/node.1 benchmark/buffers/buffer-copy.js
buffers/buffer-copy.js n=6000000 partial="true" bytes=8: 46,845,904.301625155
buffers/buffer-copy.js n=6000000 partial="false" bytes=8: 50,741,656.54307192
buffers/buffer-copy.js n=6000000 partial="true" bytes=128: 49,825,179.22407614
buffers/buffer-copy.js n=6000000 partial="false" bytes=128: 51,545,708.9047841
buffers/buffer-copy.js n=6000000 partial="true" bytes=1024: 42,703,170.17161358
buffers/buffer-copy.js n=6000000 partial="false" bytes=1024: 35,837,566.30240946
chalker@macbook-air node % ./out/Release/node benchmark/buffers/buffer-copy.js
buffers/buffer-copy.js n=6000000 partial="true" bytes=8: 47,471,671.813901365
buffers/buffer-copy.js n=6000000 partial="false" bytes=8: 101,261,478.3683036
buffers/buffer-copy.js n=6000000 partial="true" bytes=128: 51,872,937.24023005
buffers/buffer-copy.js n=6000000 partial="false" bytes=128: 97,101,062.34873779
buffers/buffer-copy.js n=6000000 partial="true" bytes=1024: 42,115,718.20021988
buffers/buffer-copy.js n=6000000 partial="false" bytes=1024: 55,631,533.807746686
chalker@macbook-air node % ./out/Release/node.1 benchmark/buffers/buffer-concat.js
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=4: 13,946,525.326070635
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=4: 12,546,789.133156924
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=4: 13,251,734.014242765
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=4: 12,997,571.826125372
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=4: 8,592,787.817549294
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=4: 8,691,184.52718848
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=16: 5,103,637.037397647
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=16: 5,344,501.845730393
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=16: 4,606,995.683786366
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=16: 4,892,011.386376643
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=16: 1,956,437.6702594468
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=16: 1,954,990.617572372
chalker@macbook-air node % ./out/Release/node benchmark/buffers/buffer-concat.js
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=4: 14,095,126.032274174
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=4: 14,524,987.517588852
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=4: 13,516,814.630167618
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=4: 13,178,458.98513511
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=4: 8,433,984.771154312
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=4: 8,881,528.766000504
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=16: 5,155,880.805703049
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=16: 5,525,322.983445663
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=16: 4,684,892.216968649
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=16: 4,980,325.13218421
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=16: 1,955,496.7607208379
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=16: 2,009,711.9329158156(the slight difference in |
Co-authored-by: Nikita Skovoroda <chalkerx@gmail.com>
|
We might need a CI run to verify |
|
Also, unrelated to this PR but related to the perf regression from 22 to 24 Whatever happed to Since 24.0.0, there is no observable perf difference between In that case places relying on
upd: #60423 |
|
I can remove This is pretty fast, removing most of the regressions of #54087 for both main: buffers/buffer-copy.js
buffers/buffer-copy.js n=6000000 partial="true" bytes=8: 40,621,680.975935884
buffers/buffer-copy.js n=6000000 partial="false" bytes=8: 41,741,453.437408686
buffers/buffer-copy.js n=6000000 partial="true" bytes=128: 40,867,909.33854818
buffers/buffer-copy.js n=6000000 partial="false" bytes=128: 38,587,659.50570816
buffers/buffer-copy.js n=6000000 partial="true" bytes=1024: 28,267,535.736348692
buffers/buffer-copy.js n=6000000 partial="false" bytes=1024: 24,461,058.268385213
buffers/buffer-concat-fill.js
buffers/buffer-concat-fill.js n=800000 extraSize=1: 3,257,088.077109085
buffers/buffer-concat-fill.js n=800000 extraSize=256: 3,147,272.683253827
buffers/buffer-concat-fill.js n=800000 extraSize=1024: 2,670,117.793915302
buffers/buffer-concat.js
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=4: 6,980,252.046780044
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=4: 6,830,178.28899765
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=4: 6,819,840.744442903
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=4: 6,719,693.899127752
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=4: 4,519,928.301733328
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=4: 4,529,137.222061498
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=16: 2,276,024.9224729007
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=16: 2,332,623.341103885
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=16: 2,183,238.7775802654
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=16: 2,215,731.7223715517
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=16: 995,647.6197721304
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=16: 1,001,899.5916066903pr: buffers/buffer-copy.js
buffers/buffer-copy.js n=6000000 partial="true" bytes=8: 39,361,200.516615756
buffers/buffer-copy.js n=6000000 partial="false" bytes=8: 79,086,897.4799316
buffers/buffer-copy.js n=6000000 partial="true" bytes=128: 40,317,918.07302284
buffers/buffer-copy.js n=6000000 partial="false" bytes=128: 74,925,504.38070068
buffers/buffer-copy.js n=6000000 partial="true" bytes=1024: 27,618,121.127528023
buffers/buffer-copy.js n=6000000 partial="false" bytes=1024: 40,185,949.49713848
buffers/buffer-concat-fill.js
buffers/buffer-concat-fill.js n=800000 extraSize=1: 4,014,117.1482457365
buffers/buffer-concat-fill.js n=800000 extraSize=256: 3,947,359.4940471966
buffers/buffer-concat-fill.js n=800000 extraSize=1024: 3,286,207.070069123
buffers/buffer-concat.js
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=4: 11,725,434.299093843
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=4: 11,560,888.60631508
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=4: 10,916,068.559351033
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=4: 10,753,290.298486339
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=4: 6,269,989.706871398
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=4: 5,705,964.543164859
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=1 pieces=16: 4,078,167.4175609415
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=1 pieces=16: 4,331,797.635265173
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=16 pieces=16: 3,761,914.7009729715
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=16 pieces=16: 3,921,810.5301073547
buffers/buffer-concat.js n=800000 withTotalLength=0 pieceSize=256 pieces=16: 1,255,559.3637891742
buffers/buffer-concat.js n=800000 withTotalLength=1 pieceSize=256 pieces=16: 1,251,530.6806879821 |
|
No objection to |
|
Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1748/ |
Ah, I see numbers from CI. Simple version (current): Most of the improvement is there, but the simplified version is indeed consistently slower on concat by a bit |
|
That works, I agree Let's first fix non-partial copy degradation too We can then see if we can go further |
This comment was marked as resolved.
This comment was marked as resolved.
|
The AIX and LinuxOne CI failures look like real issues with this PR and is most likely due to not handling endianness properly. |
|
@richardlau ah, you are right |
Huge win by avoiding complex
_copyActuallogic when we can copy the whole source into target. The native copy implementation is faster only for partial copies.Before:
After: