New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readline: add support for async iteration #23916

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@TimothyGu
Copy link
Member

TimothyGu commented Oct 27, 2018

Rewritten version of #18904, using more existing streams mechanisms.

Depends on #23901 for some of the edge case tests (relevant commits included within this PR).

Co-authored-by: Ivan Filenko ivan.filenko@protonmail.com
Fixes: #18603
Refs: #18904

/cc @mcollina @devsnek @prog1dev

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@TimothyGu TimothyGu requested review from mcollina and devsnek Oct 27, 2018

@TimothyGu TimothyGu force-pushed the TimothyGu:readline-async-iteration branch from 550bb04 to b631b4c Oct 27, 2018

@TimothyGu

This comment has been minimized.

Show resolved Hide resolved doc/api/readline.md Outdated
@vsemozhetbyt

This comment has been minimized.

Copy link
Member

vsemozhetbyt commented Oct 27, 2018

Should we benchmark this comparatively with the 'line' event way and maybe add a note if this currently proves to be significantly slower? (Refs: #23032 (comment))

@mcollina
Copy link
Member

mcollina left a comment

Good work!

};
this.on('line', lineListener);
this.on('close', closeListener);
this[kLineObjectStream] = readable;

This comment has been minimized.

@mcollina

mcollina Oct 27, 2018

Member

Would it make sense expose this as a separate method? converting to a stream might be an issue for multiple people.

This comment has been minimized.

@TimothyGu

TimothyGu Nov 20, 2018

Member

I consider this an implementation detail of @@asyncIterator method. A major reason of why the performance of this method isn't up to par to 'line' event, as you have noted in #23916 (comment), is because of the double buffering necessitated by the intermediate stream, so I'd rather not expose the stream at the moment.

@vsemozhetbyt

This comment has been minimized.

Copy link
Member

vsemozhetbyt commented Oct 28, 2018

I've compiled the branch and tested with 1 GB file (22 514 395 lines):

Scripts and results:
'use strict';

const fs = require('fs');
const readline = require('readline');

let counter = 0;
let dummy;

console.time('event');

const rl = readline.createInterface({
  input: fs.createReadStream('big-file.txt', 'utf8'),
  crlfDelay: Infinity,
});

rl.on('line', (line) => {
  counter++;
  dummy = line;
}).on('close', () => {
  console.timeEnd('event');
  console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
});
event: 15763.332ms
Lines: 22514395, last line length: 1
'use strict';

const fs = require('fs');
const readline = require('readline');

(async function main() {
  let counter = 0;
  let dummy;

  console.time('asyncIterator');

  const rl = readline.createInterface({
    input: fs.createReadStream('big-file.txt', 'utf8'),
    crlfDelay: Infinity,
  });

  for await (const line of rl) {
    counter++;
    dummy = line;
  }

  console.timeEnd('asyncIterator');
  console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
})();
asyncIterator: 44865.388ms
Lines: 22514395, last line length: 1

So event implementation currently 3 times as fast as async iterator implementation. Maybe we should warn about this.

@mcollina

This comment has been minimized.

Copy link
Member

mcollina commented Oct 28, 2018

Can you check reading the same file with fs.createReadStream()  and iterating? I think our last check was about 2 times slower. Considering that we are doing double-buffering, a 3x factor seems ok to me. I don't know if we can improve this at all, a good part of that is due to async iteration overhead.

@vsemozhetbyt

This comment has been minimized.

Copy link
Member

vsemozhetbyt commented Oct 28, 2018

@mcollina Do you mean to compare async iterating over unsplit chunks vs async iterating over split lines? If so, I have ~ 4x factor:

Scripts and results:
const fs = require('fs');

(async function main() {
  let counter = 0;
  let dummy;

  console.time('asyncIteratorChunks');

  for await (const chunk of fs.createReadStream('big-file.txt', 'utf8')) {
    counter++;
    dummy = chunk;
  }

  console.timeEnd('asyncIteratorChunks');
  console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
})();
asyncIteratorChunks: 10048.069ms
Chunks: 15811, last chunk length: 22280
'use strict';

const fs = require('fs');
const readline = require('readline');

(async function main() {
  let counter = 0;
  let dummy;

  console.time('asyncIteratorLines');

  const rl = readline.createInterface({
    input: fs.createReadStream('big-file.txt', 'utf8'),
    crlfDelay: Infinity,
  });

  for await (const line of rl) {
    counter++;
    dummy = line;
  }

  console.timeEnd('asyncIteratorLines');
  console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
})();
asyncIteratorLines: 43501.427ms
Lines: 22514395, last line length: 1

@vsemozhetbyt

This comment has been minimized.

Copy link
Member

vsemozhetbyt commented Oct 28, 2018

@mcollina Or do you mean to compare async iterating over unsplit chunks vs event implementation for unsplit chunks? If so, I have 1:1 factor, i.e. the same speed:

Scripts and results:
'use strict';

const fs = require('fs');

let counter = 0;
let dummy;

console.time('eventChunks');

const readable = fs.createReadStream('big-file.txt', 'utf8');

readable.on('data', (chunk) => {
  counter++;
  dummy = chunk;
}).on('close', () => {
  console.timeEnd('eventChunks');
  console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
});
eventChunks: 9978.033ms
Chunks: 15811, last chunk length: 22280
'use strict';

const fs = require('fs');

(async function main() {
  let counter = 0;
  let dummy;

  console.time('asyncIteratorChunks');

  for await (const chunk of fs.createReadStream('big-file.txt', 'utf8')) {
    counter++;
    dummy = chunk;
  }

  console.timeEnd('asyncIteratorChunks');
  console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
})();
asyncIteratorChunks: 9922.896ms
Chunks: 15811, last chunk length: 22280

@vsemozhetbyt

This comment has been minimized.

Copy link
Member

vsemozhetbyt commented Oct 28, 2018

#23901 has landed, it seems those commits can be excluded to simplify reviews.

@vsemozhetbyt

This comment has been minimized.

Copy link
Member

vsemozhetbyt commented Oct 28, 2018

And beware #23929, we may have conflicts.

@mcollina

This comment has been minimized.

Copy link
Member

mcollina commented Oct 30, 2018

@vsemozhetbyt thanks for those benchmarks, those are quite interesting. Specifically the fact that using stream iteration is now essentially on par with on('data') is fantastic!

Related to 'readline', I think the culprit of the problem is double-buffering. Essentially we would need a custom implementation that does not rely on the streams one if we want to improve it.

TimothyGu and others added some commits Oct 27, 2018

readline: add support for async iteration
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com>
Fixes: #18603
Refs: #18904

@TimothyGu TimothyGu force-pushed the TimothyGu:readline-async-iteration branch from b631b4c to f8ff7c7 Nov 20, 2018

@TimothyGu

This comment has been minimized.

Copy link
Member

TimothyGu commented Nov 20, 2018

@vsemozhetbyt @mcollina I've updated this PR to address the documentation comments. Please take a look.

@mcollina
Copy link
Member

mcollina left a comment

LGTM

@vsemozhetbyt

This comment has been minimized.

Copy link
Member

vsemozhetbyt commented Nov 20, 2018

Docs LGTM. Thank you!

@vsemozhetbyt

This comment has been minimized.

Copy link
Member

vsemozhetbyt commented Nov 20, 2018

Maybe cc @nodejs/streams ?

@mcollina

This comment has been minimized.

Copy link
Member

mcollina commented Nov 20, 2018

@devsnek maybe? Anyway this could land because it's older than a week. I'd recommend to wait for 2 days and then land if no one objects.

@devsnek

This comment has been minimized.

Copy link
Member

devsnek commented Nov 20, 2018

I'm not a huge fan of the implementation but if it works it works 🤷‍♂️

@Trott

This comment has been minimized.

@Trott

This comment has been minimized.

Copy link
Member

Trott commented Nov 20, 2018

@Trott Trott added the semver-minor label Nov 20, 2018

@Trott

This comment has been minimized.

Copy link
Member

Trott commented Nov 20, 2018

Landed in 2a7432d

@Trott Trott closed this Nov 20, 2018

Trott added a commit to Trott/io.js that referenced this pull request Nov 20, 2018

readline: add support for async iteration
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com>
Fixes: nodejs#18603
Refs: nodejs#18904
PR-URL: nodejs#23916
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gus Caplan <me@gus.host>

@TimothyGu TimothyGu deleted the TimothyGu:readline-async-iteration branch Nov 21, 2018

targos added a commit that referenced this pull request Nov 21, 2018

readline: add support for async iteration
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com>
Fixes: #18603
Refs: #18904
PR-URL: #23916
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gus Caplan <me@gus.host>

@vsemozhetbyt vsemozhetbyt referenced this pull request Nov 24, 2018

Closed

stream: added experimental support for for-await #17755

4 of 4 tasks complete
@rauschma

This comment has been minimized.

Copy link

rauschma commented Nov 25, 2018

Playing devil’s advocate: Is this really the right way of providing this functionality?

Consider the following example (adapted from the docs):

const readline = require('readline');

async function processLines() {
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

  for await (const line of rl) {
    console.log(line);
  }
}

rl is about input and output. I’d prefer a simpler tool function for processing lines. For example:

async function processLines() {
  for await (const line of readline.splitLines(process.stdin)) {
    console.log(line);
  }
}

You can find a relatively simple implementation of splitLines() here (chunksToLinesAsync()): https://github.com/rauschma/stringio/blob/master/ts/src/index.ts

However, a promisified rl.question() would be useful, IMO:

// Callback
rl.question('What is your favorite food? ', (answer) => {
  console.log(`Oh, so your favorite food is ${answer}`);
});

// Promise
async function main() {
  const answer = await questionAsync('What is your favorite food? ');
  console.log(`Oh, so your favorite food is ${answer}`);
}

rvagg added a commit that referenced this pull request Nov 28, 2018

readline: add support for async iteration
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com>
Fixes: #18603
Refs: #18904
PR-URL: #23916
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gus Caplan <me@gus.host>

@BridgeAR BridgeAR referenced this pull request Dec 5, 2018

Merged

v11.4.0 proposal #24854

4 of 4 tasks complete

BridgeAR added a commit that referenced this pull request Dec 6, 2018

2018-12-06, Version 11.4.0 (Current)
Notable Changes:

* console,util:
  * `console` functions now handle symbols as defined in the spec.
    #23708
  * The inspection `depth` default is now back at 2.
    #24326
* dgram,net:
  * Added ipv6Only option for `net` and `dgram`.
    #23798
* http:
  * Chosing between the http parser is now possible per runtime flag.
    #24739
* readline:
  * The `readline` module now supports async iterators.
    #23916
* repl:
  * The multiline history feature is removed.
    #24804
* tls:
  * Added min/max protocol version options.
    #24405
  * The X.509 public key info now includes the RSA bit size and the
    elliptic curve. #24358
* url:
  * `pathToFileURL()` now supports LF, CR and TAB.
    #23720
* Windows:
  * Tools are not installed using Boxstarter anymore.
    #24677
  * The install-tools scripts or now included in the dist.
    #24233
* Added new collaborator:
  * [antsmartian](https://github.com/antsmartian) - Anto Aravinth.
    #24655

PR-URL: #24854

BridgeAR added a commit that referenced this pull request Dec 7, 2018

2018-12-07, Version 11.4.0 (Current)
Notable Changes:

* console,util:
  * `console` functions now handle symbols as defined in the spec.
    #23708
  * The inspection `depth` default is now back at 2.
    #24326
* dgram,net:
  * Added ipv6Only option for `net` and `dgram`.
    #23798
* http:
  * Chosing between the http parser is now possible per runtime flag.
    #24739
* readline:
  * The `readline` module now supports async iterators.
    #23916
* repl:
  * The multiline history feature is removed.
    #24804
* tls:
  * Added min/max protocol version options.
    #24405
  * The X.509 public key info now includes the RSA bit size and the
    elliptic curve. #24358
* url:
  * `pathToFileURL()` now supports LF, CR and TAB.
    #23720
* Windows:
  * Tools are not installed using Boxstarter anymore.
    #24677
  * The install-tools scripts or now included in the dist.
    #24233
* Added new collaborator:
  * [antsmartian](https://github.com/antsmartian) - Anto Aravinth.
    #24655

PR-URL: #24854

BridgeAR added a commit that referenced this pull request Dec 7, 2018

2018-12-07, Version 11.4.0 (Current)
Notable Changes:

* console,util:
  * `console` functions now handle symbols as defined in the spec.
    #23708
  * The inspection `depth` default is now back at 2.
    #24326
* dgram,net:
  * Added ipv6Only option for `net` and `dgram`.
    #23798
* http:
  * Chosing between the http parser is now possible per runtime flag.
    #24739
* readline:
  * The `readline` module now supports async iterators.
    #23916
* repl:
  * The multiline history feature is removed.
    #24804
* tls:
  * Added min/max protocol version options.
    #24405
  * The X.509 public key info now includes the RSA bit size and the
    elliptic curve. #24358
* url:
  * `pathToFileURL()` now supports LF, CR and TAB.
    #23720
* Windows:
  * Tools are not installed using Boxstarter anymore.
    #24677
  * The install-tools scripts or now included in the dist.
    #24233
* Added new collaborator:
  * [antsmartian](https://github.com/antsmartian) - Anto Aravinth.
    #24655

PR-URL: #24854

BridgeAR added a commit that referenced this pull request Dec 7, 2018

2018-12-07, Version 11.4.0 (Current)
Notable Changes:

* console,util:
  * `console` functions now handle symbols as defined in the spec.
    #23708
  * The inspection `depth` default is now back at 2.
    #24326
* dgram,net:
  * Added ipv6Only option for `net` and `dgram`.
    #23798
* http:
  * Chosing between the http parser is now possible per runtime flag.
    #24739
* readline:
  * The `readline` module now supports async iterators.
    #23916
* repl:
  * The multiline history feature is removed.
    #24804
* tls:
  * Added min/max protocol version options.
    #24405
  * The X.509 public key info now includes the RSA bit size and the
    elliptic curve. #24358
* url:
  * `pathToFileURL()` now supports LF, CR and TAB.
    #23720
* Windows:
  * Tools are not installed using Boxstarter anymore.
    #24677
  * The install-tools scripts or now included in the dist.
    #24233
* Added new collaborator:
  * [antsmartian](https://github.com/antsmartian) - Anto Aravinth.
    #24655

PR-URL: #24854

BridgeAR added a commit that referenced this pull request Dec 7, 2018

2018-12-07, Version 11.4.0 (Current)
Notable Changes:

* console,util:
  * `console` functions now handle symbols as defined in the spec.
    #23708
  * The inspection `depth` default is now back at 2.
    #24326
* dgram,net:
  * Added ipv6Only option for `net` and `dgram`.
    #23798
* http:
  * Chosing between the http parser is now possible per runtime flag.
    #24739
* readline:
  * The `readline` module now supports async iterators.
    #23916
* repl:
  * The multiline history feature is removed.
    #24804
* tls:
  * Added min/max protocol version options.
    #24405
  * The X.509 public key info now includes the RSA bit size and the
    elliptic curve. #24358
* url:
  * `pathToFileURL()` now supports LF, CR and TAB.
    #23720
* Windows:
  * Tools are not installed using Boxstarter anymore.
    #24677
  * The install-tools scripts or now included in the dist.
    #24233
* Added new collaborator:
  * [antsmartian](https://github.com/antsmartian) - Anto Aravinth.
    #24655

PR-URL: #24854

refack added a commit to refack/node that referenced this pull request Jan 14, 2019

readline: add support for async iteration
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com>
Fixes: nodejs#18603
Refs: nodejs#18904
PR-URL: nodejs#23916
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gus Caplan <me@gus.host>

refack added a commit to refack/node that referenced this pull request Jan 14, 2019

2018-12-07, Version 11.4.0 (Current)
Notable Changes:

* console,util:
  * `console` functions now handle symbols as defined in the spec.
    nodejs#23708
  * The inspection `depth` default is now back at 2.
    nodejs#24326
* dgram,net:
  * Added ipv6Only option for `net` and `dgram`.
    nodejs#23798
* http:
  * Chosing between the http parser is now possible per runtime flag.
    nodejs#24739
* readline:
  * The `readline` module now supports async iterators.
    nodejs#23916
* repl:
  * The multiline history feature is removed.
    nodejs#24804
* tls:
  * Added min/max protocol version options.
    nodejs#24405
  * The X.509 public key info now includes the RSA bit size and the
    elliptic curve. nodejs#24358
* url:
  * `pathToFileURL()` now supports LF, CR and TAB.
    nodejs#23720
* Windows:
  * Tools are not installed using Boxstarter anymore.
    nodejs#24677
  * The install-tools scripts or now included in the dist.
    nodejs#24233
* Added new collaborator:
  * [antsmartian](https://github.com/antsmartian) - Anto Aravinth.
    nodejs#24655

PR-URL: nodejs#24854
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment