Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: allow reading data into a static buffer #25436

Merged
merged 1 commit into from Aug 23, 2019

Conversation

@mscdex
Copy link
Contributor

commented Jan 10, 2019

This is more or less the changes I described in nodejs/node-eps#27 with some updates since the C++ side had changed a fair amount. I thought I would put this out there to see if there was still interest in merging a feature like this (even if it's not this PR as-is).

This implementation is opt-in, only for net (although it could obviously be adapted for other places where dynamic buffers are used), and it goes the simple and easy route and completely bypasses the data streaming feature of sockets (although 'end' and other such non-data-related events are still emitted as usual).

Having this kind of power is obviously not for all use cases, but for network protocol implementations it can make things a lot faster because the connection data is typically parsed synchronously and any raw data that needs to be passed on to end users could simply be copied.

Here are some example benchmark results (recvbuflen=0 indicates old/current behavior of allocating a new buffer for each chunk read):

net/net-s2c.js dur=5 recvbuflen=0 type="buf" sendchunklen=256: 1.071279628253138
net/net-s2c.js dur=5 recvbuflen=65536 type="buf" sendchunklen=256: 1.1356590323233846
net/net-s2c.js dur=5 recvbuflen=1048576 type="buf" sendchunklen=256: 1.129328387002154
net/net-s2c.js dur=5 recvbuflen=0 type="buf" sendchunklen=32768: 19.848600557880395
net/net-s2c.js dur=5 recvbuflen=65536 type="buf" sendchunklen=32768: 28.77196840127128
net/net-s2c.js dur=5 recvbuflen=1048576 type="buf" sendchunklen=32768: 28.674568832052028
net/net-s2c.js dur=5 recvbuflen=0 type="buf" sendchunklen=131072: 23.3627596199997
net/net-s2c.js dur=5 recvbuflen=65536 type="buf" sendchunklen=131072: 36.92736185280725
net/net-s2c.js dur=5 recvbuflen=1048576 type="buf" sendchunklen=131072: 39.21447761809467
net/net-s2c.js dur=5 recvbuflen=0 type="buf" sendchunklen=16777216: 21.913559134252168
net/net-s2c.js dur=5 recvbuflen=65536 type="buf" sendchunklen=16777216: 35.96126656747849
net/net-s2c.js dur=5 recvbuflen=1048576 type="buf" sendchunklen=16777216: 36.709655230358784

I did not add documentation or tests yet because API is not set in stone.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

@mscdex mscdex force-pushed the mscdex:net-client-static-buffer branch from 659c22d to f84b416 Jan 10, 2019

@mscdex mscdex changed the title lib,src: allow reading data into a static buffer net: allow reading data into a static buffer Jan 10, 2019

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Jan 10, 2019

@addaleax

This comment has been minimized.

Copy link
Member

commented Jan 12, 2019

I like the feature idea. Regarding the implementation:

  • I don’t think passing the buffer along the constructors is a good idea, given the complexity that it introduces to our code. I know it’s an extra call to C++, but adding a single method would probably be much cleaner.
  • It’s also kind of odd to stream_buf as a member of StreamBase, and add if/else cases everywhere that handle the cases of having/not having a custom buffer each time – it would, imo, be cleaner to create a class similar to EmitToJSStreamListener (but with custom a OnStreamAlloc method), and attach an instance of it to the stream when calling the C++ method.
@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Jan 12, 2019

I'm open to suggestions (especially specific code changes), this is very much just a proof of concept to show the potential, significant performance gains.

@addaleax

This comment has been minimized.

Copy link
Member

commented Jan 13, 2019

@mscdex Since you’re asking for specific code changes: addaleax@d8b783f is about what I’d have in mind (applied on top of this PR). It also brings the combined diff nicely down from +225/−66 to +139/−45.

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Jan 13, 2019

@addaleax Merged, thanks.

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Jan 21, 2019

@addaleax

This comment has been minimized.

Copy link
Member

commented Jan 21, 2019

@mscdex The code LGTM here, but I’d also like other people to weigh in more. I assume you don’t want to do unnecessary work in case this doesn’t get merged, but I also guess for a lot of people docs and tests help; so, tl;dr on the API here:


This PR adds an option to net.Socket instances (easy to extend to e.g. HTTP/2) where instead of using the traditional stream.Readable interface, a user can provide a fixed buffer and a callback which is called when data has been read from the socket~~, with no backpressure handling or similar features~~.

Usage looks something like this:

const socket = net.connect({
  host: 'example.com',
  port: 80,
  onread: {
    buffer: Buffer.allocUnsafe(65536),
    callback(bytesRead, buffer) {
      // Do something with the first `bytesRead` bytes of `buffer`.
      // The `buffer` will always be the same object, so all data
      // needs to be handled synchronously here.
    }
  }
@sam-github

This comment has been minimized.

Copy link
Member

commented Jan 21, 2019

There has been some talk of creating a node API layer that is closer to uv, so more efficient code can be written on top of it. Is that where this is going?

cf http://docs.libuv.org/en/v1.x/stream.html#c.uv_read_start

The onread use of a buffer seems to make unnecessary the uv alloc/read callback pair, that seems reasonable to me, it reduces roundtrip time in that only one C++ to js callback is needed. I guess the prices is the need for sync data handling. Maybe that's OK, as long as the data is immediately passed to a protocol parser, or unzipper, or something of the sort, though it wouldn't work for a simple pipe of data to fs.write(), for example (AFAICT).

re: lack of backpressure support, at the uv layer the caller has to explicitly start/stop data flow to exert backpressure. Isn't there a need for a readStart/Stop API as uv has? Or does the handle have that already? There is the Readable.pause() method, but I think the idea here is to bypass the streams API.

@mscdex mscdex force-pushed the mscdex:net-client-static-buffer branch from 18f7aca to 86e015e Jan 21, 2019

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Jan 21, 2019

re: lack of backpressure support, at the uv layer the caller has to explicitly start/stop data flow to exert backpressure. Isn't there a need for a readStart/Stop API as uv has? Or does the handle have that already? There is the Readable.pause() method, but I think the idea here is to bypass the streams API.

The onread callback can explicitly return false to stop reading from the socket, so there is still backpressure support, it's just at the OS/C++ layer and not at the JS streams layer. So if you return false from the callback, you can call socket.resume() or socket.read() to resume reading from the socket.

@mscdex mscdex force-pushed the mscdex:net-client-static-buffer branch from 86e015e to 8cde1aa Jan 21, 2019

@benjamingr

This comment has been minimized.

Copy link
Member

commented Jan 25, 2019

I like the idea, we had quite a long discussion about it in denoland/deno#387

@mscdex mscdex force-pushed the mscdex:net-client-static-buffer branch from 8cde1aa to 475614d Jan 28, 2019

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2019

@sam-github

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

@mscdex Unanswered questions in #25436 (comment)

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2019

@sam-github I answered how starting and stopping is handled in this PR in #25436 (comment). Were you looking for some other information?

@sam-github

This comment has been minimized.

Copy link
Member

commented Jan 29, 2019

My apologies @mscdex, that comment exactly answers my question, I don't know how I missed it.

@jasnell

This comment has been minimized.

Copy link
Member

commented Jan 29, 2019

Ping @nodejs/streams to review ... in particular, @mcollina, I'd love your thoughts on this.

@mcollina
Copy link
Member

left a comment

Are you planning to add this to other areas of core? Specifically, are you planning to add it to TLS?

What will happen if pause is called within callback?

I think this API could be a bit limiting to what it could be achieved. What I would like to see is more a situation where the same buffer moves between the producer and the consumer, like this:

let oldBuffer = null
const socket = net.connect({
  host: 'example.com',
  port: 80,
  onread: {
    buffer() { 
     return oldBuffer || Buffer.allocUnsafe(65536)
    },
    callback(bytesRead, buffer) {
      destination.write(buffer, (err) => {
        oldBuffer = buffer
      })
    }
  }
lib/net.js Show resolved Hide resolved
@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

Are you planning to add this to other areas of core? Specifically, are you planning to add it to TLS?

Probably.

What will happen if pause is called within callback?

With the PR as it currently is, node will not read from the socket.

I think this API could be a bit limiting to what it could be achieved. What I would like to see is more a situation where the same buffer moves between the producer and the consumer, like this:

I don't understand what exactly is expected with the suggested implementation. Is buffer() being called from C++ land for every socket read? Is it only being called once inside the net.Socket() constructor? If it's the former, then what would you suggest for avoiding any performance hit in doing so?

Also, due to the asynchronous nature of destination.write() and the synchronous nature of callback(), when oldBuffer is set the same Buffer will be written to destination and oldBuffer's contents could change multiple times while that write is occurring, which goes against what people expect of streams, unless you meant something else by the example?

@mcollina

This comment has been minimized.

Copy link
Member

commented Feb 2, 2019

I don't understand what exactly is expected with the suggested implementation. Is buffer() being called from C++ land for every socket read? Is it only being called once inside the net.Socket() constructor? If it's the former, then what would you suggest for avoiding any performance hit in doing so?

The key idea is to provide a mechanism to reuse multiple buffers, not just one. Whenever there is a need for a read, C++ can call buffer to get one. We can maintain a list of allocated buffers, that will flow back into the pool after being read. This mechanism is driven by the user. This should answer your second question as well.
I'm suggesting an alternative implementation that could broaden the scope, I'm not referring specifically to this specific implementation.

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Feb 4, 2019

Whenever there is a need for a read, C++ can call buffer to get one.

That would probably cause a noticeable performance hit, unless you were willing to compromise a bit and instead have the function that the C++ side calls when the socket is read from return the next Buffer to use (by calling buffer()). That would avoid any extra round trips between JS and C++ but the semantics would be different (asking for Buffer for next read vs asking for Buffer for current read) if that matters.

@mcollina

This comment has been minimized.

Copy link
Member

commented Feb 4, 2019

That would probably cause a noticeable performance hit, unless you were willing to compromise a bit and instead have the function that the C++ side calls when the socket is read from return the next Buffer to use (by calling buffer()). That would avoid any extra round trips between JS and C++ but the semantics would be different (asking for Buffer for next read vs asking for Buffer for current read) if that matters.

+1, that's what I meant, sorry for making it complicated.

@reklatsmasters

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2019

I think API should allow us to use custom memory allocators and strategies like buffer-reuse-pool. And this api shoud be compatible with dgram which is not stream-based.

@mscdex mscdex force-pushed the mscdex:net-client-static-buffer branch from 3a9ed04 to acebacc Aug 4, 2019

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Aug 4, 2019

@reklatsmasters I think a request like that is probably better implemented at a more global scale (e.g. via Buffer in JS/C++ land) instead of per-module.

@bnoordhuis
Copy link
Member

left a comment

LGTM and nice!

doc/api/net.md Outdated Show resolved Hide resolved

@mscdex mscdex force-pushed the mscdex:net-client-static-buffer branch from acebacc to 4029b05 Aug 4, 2019

@nodejs-github-bot

This comment has been minimized.

@mscdex mscdex force-pushed the mscdex:net-client-static-buffer branch from 4029b05 to 31344e4 Aug 18, 2019

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Aug 18, 2019

@mcollina I've added support for a function that can return a Buffer now, let me know if that looks good to you.

@mcollina
Copy link
Member

left a comment

LGTM

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Aug 18, 2019

@bnoordhuis let me know if these recent changes look good to you as well

@@ -629,6 +632,22 @@ For [IPC][] connections, available `options` are:
See [Identifying paths for IPC connections][]. If provided, the TCP-specific
options above are ignored.
For both types, available `options` include:
* `onread` {Object} - If specified, incoming data is stored in a single `buffer`

This comment has been minimized.

Copy link
@jasnell

jasnell Aug 19, 2019

Member

An example showing this in use would be ideal.

This comment has been minimized.

Copy link
@mscdex

mscdex Aug 21, 2019

Author Contributor

Added. Let me know if it suffices.

This comment has been minimized.

Copy link
@jasnell

jasnell Aug 21, 2019

Member

Thank you! looks good

@jasnell
Copy link
Member

left a comment

LGTM with an example added to the documentation

@mscdex mscdex force-pushed the mscdex:net-client-static-buffer branch from 31344e4 to e269428 Aug 21, 2019

@nodejs-github-bot

This comment has been minimized.

net: allow reading data into a static buffer
Co-Authored-By: Anna Henningsen <anna@addaleax.net>

PR-URL: #25436
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

@mscdex mscdex force-pushed the mscdex:net-client-static-buffer branch from e269428 to 8292b28 Aug 23, 2019

@mscdex mscdex merged commit 8292b28 into nodejs:master Aug 23, 2019

1 of 2 checks passed

Travis CI - Pull Request Build Errored
Details
Travis CI - Branch Build Passed
Details

@mscdex mscdex deleted the mscdex:net-client-static-buffer branch Aug 23, 2019

BridgeAR added a commit that referenced this pull request Sep 3, 2019
net: allow reading data into a static buffer
Co-Authored-By: Anna Henningsen <anna@addaleax.net>

PR-URL: #25436
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@BridgeAR BridgeAR referenced this pull request Sep 3, 2019
BridgeAR added a commit that referenced this pull request Sep 3, 2019
2019-09-04, Version 12.10.0 (Current)
Notable changes:

* deps:
  * Update npm to 6.10.3 (isaacs)
    #29023
* fs:
  * Add recursive option to rmdir() (cjihrig)
    #29168
  * Allow passing true to emitClose option (Giorgos Ntemiris)
    #29212
  * Add \*timeNs properties to BigInt Stats objects (Joyee Cheung)
    #21387
* net:
  * Allow reading data into a static buffer (Brian White)
    #25436

PR-URL: #29429
BridgeAR added a commit that referenced this pull request Sep 4, 2019
2019-09-04, Version 12.10.0 (Current)
Notable changes:

* deps:
  * Update npm to 6.10.3 (isaacs)
    #29023
* fs:
  * Add recursive option to rmdir() (cjihrig)
    #29168
  * Allow passing true to emitClose option (Giorgos Ntemiris)
    #29212
  * Add \*timeNs properties to BigInt Stats objects (Joyee Cheung)
    #21387
* net:
  * Allow reading data into a static buffer (Brian White)
    #25436

PR-URL: #29429
BridgeAR added a commit that referenced this pull request Sep 4, 2019
2019-09-04, Version 12.10.0 (Current)
Notable changes:

* deps:
  * Update npm to 6.10.3 (isaacs)
    #29023
* fs:
  * Add recursive option to rmdir() (cjihrig)
    #29168
  * Allow passing true to emitClose option (Giorgos Ntemiris)
    #29212
  * Add \*timeNs properties to BigInt Stats objects (Joyee Cheung)
    #21387
* net:
  * Allow reading data into a static buffer (Brian White)
    #25436

PR-URL: #29429
@jorangreef

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

Thanks @mscdex this is excellent. Will this also work for constructing net servers or is it limited to net clients?

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2019

@jorangreef Only clients at the moment.

JeniaBR added a commit to JeniaBR/node that referenced this pull request Sep 11, 2019
net: allow reading data into a static buffer
Co-Authored-By: Anna Henningsen <anna@addaleax.net>

PR-URL: nodejs#25436
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
JeniaBR added a commit to JeniaBR/node that referenced this pull request Sep 11, 2019
2019-09-04, Version 12.10.0 (Current)
Notable changes:

* deps:
  * Update npm to 6.10.3 (isaacs)
    nodejs#29023
* fs:
  * Add recursive option to rmdir() (cjihrig)
    nodejs#29168
  * Allow passing true to emitClose option (Giorgos Ntemiris)
    nodejs#29212
  * Add \*timeNs properties to BigInt Stats objects (Joyee Cheung)
    nodejs#21387
* net:
  * Allow reading data into a static buffer (Brian White)
    nodejs#25436

PR-URL: nodejs#29429
JeniaBR added a commit to JeniaBR/node that referenced this pull request Sep 11, 2019
net: allow reading data into a static buffer
Co-Authored-By: Anna Henningsen <anna@addaleax.net>

PR-URL: nodejs#25436
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
JeniaBR added a commit to JeniaBR/node that referenced this pull request Sep 11, 2019
2019-09-04, Version 12.10.0 (Current)
Notable changes:

* deps:
  * Update npm to 6.10.3 (isaacs)
    nodejs#29023
* fs:
  * Add recursive option to rmdir() (cjihrig)
    nodejs#29168
  * Allow passing true to emitClose option (Giorgos Ntemiris)
    nodejs#29212
  * Add \*timeNs properties to BigInt Stats objects (Joyee Cheung)
    nodejs#21387
* net:
  * Allow reading data into a static buffer (Brian White)
    nodejs#25436

PR-URL: nodejs#29429
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.